Using grep() to subset lines of text

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Using grep() to subset lines of text

ppaarrkk
I have two vectors, a and b. b is a text file. I want to find in b those elements of a which occur at the beginning of the line in b. I have the following code, but it only returns a value for the first value in a, but I want both. Any ideas please.


a = c(2,3)

b = NULL
b[1] = "aaa 2 aaa"
b[2] = "2 aaa"
b[3] = "3 aaa"
b[4] = "aaa 3 aaa"

grep(paste("^",a, sep=""), b )
Reply | Threaded
Open this post in threaded view
|

Re: Using grep() to subset lines of text

Gabor Grothendieck
Try this:

> a <- 2:3
> b <- c("aaa 2 aaa", "2 aaa", "3 aaa", "aaa 3 aaa")
>
> re <- paste("^(", paste(a, collapse = "|"), ")", sep = "")
> re
[1] "^(2|3)"
> grep(re, b, value = TRUE)
[1] "2 aaa" "3 aaa"

On Sat, Nov 29, 2008 at 7:00 AM, ppaarrkk <[hidden email]> wrote:

>
> I have two vectors, a and b. b is a text file. I want to find in b those
> elements of a which occur at the beginning of the line in b. I have the
> following code, but it only returns a value for the first value in a, but I
> want both. Any ideas please.
>
>
> a = c(2,3)
>
> b = NULL
> b[1] = "aaa 2 aaa"
> b[2] = "2 aaa"
> b[3] = "3 aaa"
> b[4] = "aaa 3 aaa"
>
> grep(paste("^",a, sep=""), b )
>
> --
> View this message in context: http://www.nabble.com/Using-grep%28%29-to-subset-lines-of-text-tp20746365p20746365.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using grep() to subset lines of text

macrakis
Hmm, this brings up an interesting question.  What if the string I'm looking
for contains escape characters?  For example, grep( paste( "^", "(ab)" ),
c("ab","(ab)") ) => c(1), not c(2).

I couldn't find an equivalent to Emacs's regexp-quote, which would let me
write regexp.quote("(ab)") => "\\(ab\\)".  The syntax of regular expressions
is complicated enough that this is not trivial. Is there perhaps a CRAN
package with regular expression utilities?

            -s

On Sat, Nov 29, 2008 at 7:12 AM, Gabor Grothendieck <[hidden email]
> wrote:

> > a <- 2:3
> > b <- c("aaa 2 aaa", "2 aaa", "3 aaa", "aaa 3 aaa")
> > re <- paste("^(", paste(a, collapse = "|"), ")", sep = "")
> > grep(re, b, value = TRUE)
> [1] "2 aaa" "3 aaa"
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using grep() to subset lines of text

Gabor Grothendieck
grep has a fixed = TRUE argument if you want to ignore all regexp's.

On Sat, Nov 29, 2008 at 3:55 PM, Stavros Macrakis <[hidden email]> wrote:

> Hmm, this brings up an interesting question.  What if the string I'm looking
> for contains escape characters?  For example, grep( paste( "^", "(ab)" ),
> c("ab","(ab)") ) => c(1), not c(2).
>
> I couldn't find an equivalent to Emacs's regexp-quote, which would let me
> write regexp.quote("(ab)") => "\\(ab\\)".  The syntax of regular expressions
> is complicated enough that this is not trivial. Is there perhaps a CRAN
> package with regular expression utilities?
>
>             -s
>
> On Sat, Nov 29, 2008 at 7:12 AM, Gabor Grothendieck
> <[hidden email]> wrote:
>>
>> > a <- 2:3
>> > b <- c("aaa 2 aaa", "2 aaa", "3 aaa", "aaa 3 aaa")
>> > re <- paste("^(", paste(a, collapse = "|"), ")", sep = "")
>> > grep(re, b, value = TRUE)
>> [1] "2 aaa" "3 aaa"
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using grep() to subset lines of text

macrakis
But I don't want to ignore all regexp's -- I want to build a regexp which
contains string components which are parameters.

            -s

On Sat, Nov 29, 2008 at 6:51 PM, Gabor Grothendieck <[hidden email]
> wrote:

> grep has a fixed = TRUE argument if you want to ignore all regexp's.
>
> On Sat, Nov 29, 2008 at 3:55 PM, Stavros Macrakis <[hidden email]>
> wrote:
> > Hmm, this brings up an interesting question.  What if the string I'm
> looking
> > for contains escape characters?  For example, grep( paste( "^", "(ab)" ),
> > c("ab","(ab)") ) => c(1), not c(2).
> >
> > I couldn't find an equivalent to Emacs's regexp-quote, which would let me
> > write regexp.quote("(ab)") => "\\(ab\\)".  The syntax of regular
> expressions
> > is complicated enough that this is not trivial. Is there perhaps a CRAN
> > package with regular expression utilities?
> >
> >             -s
> >
> > On Sat, Nov 29, 2008 at 7:12 AM, Gabor Grothendieck
> > <[hidden email]> wrote:
> >>
> >> > a <- 2:3
> >> > b <- c("aaa 2 aaa", "2 aaa", "3 aaa", "aaa 3 aaa")
> >> > re <- paste("^(", paste(a, collapse = "|"), ")", sep = "")
> >> > grep(re, b, value = TRUE)
> >> [1] "2 aaa" "3 aaa"
> >
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using grep() to subset lines of text

Gabor Grothendieck
Try this. For each character x in s,  if "x" is punctuation it is replaced
with "\\x" otherwise with "[x]" :

library(gsubfn)
gsubfn('.', ~ if (any(grep("[[:punct:]]", x))) paste0('\\', x) else
paste0('[', x, ']'), s)

See http://gsubfn.googlecode.com


On Sat, Nov 29, 2008 at 10:09 PM, Stavros Macrakis
<[hidden email]> wrote:

> But I don't want to ignore all regexp's -- I want to build a regexp which
> contains string components which are parameters.
>
>             -s
>
> On Sat, Nov 29, 2008 at 6:51 PM, Gabor Grothendieck
> <[hidden email]> wrote:
>>
>> grep has a fixed = TRUE argument if you want to ignore all regexp's.
>>
>> On Sat, Nov 29, 2008 at 3:55 PM, Stavros Macrakis <[hidden email]>
>> wrote:
>> > Hmm, this brings up an interesting question.  What if the string I'm
>> > looking
>> > for contains escape characters?  For example, grep( paste( "^", "(ab)"
>> > ),
>> > c("ab","(ab)") ) => c(1), not c(2).
>> >
>> > I couldn't find an equivalent to Emacs's regexp-quote, which would let
>> > me
>> > write regexp.quote("(ab)") => "\\(ab\\)".  The syntax of regular
>> > expressions
>> > is complicated enough that this is not trivial. Is there perhaps a CRAN
>> > package with regular expression utilities?
>> >
>> >             -s
>> >
>> > On Sat, Nov 29, 2008 at 7:12 AM, Gabor Grothendieck
>> > <[hidden email]> wrote:
>> >>
>> >> > a <- 2:3
>> >> > b <- c("aaa 2 aaa", "2 aaa", "3 aaa", "aaa 3 aaa")
>> >> > re <- paste("^(", paste(a, collapse = "|"), ")", sep = "")
>> >> > grep(re, b, value = TRUE)
>> >> [1] "2 aaa" "3 aaa"
>> >
>> >
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using grep() to subset lines of text

Uwe Ligges-3
In reply to this post by ppaarrkk


ppaarrkk wrote:

> I have two vectors, a and b. b is a text file. I want to find in b those
> elements of a which occur at the beginning of the line in b. I have the
> following code, but it only returns a value for the first value in a, but I
> want both. Any ideas please.
>
>
> a = c(2,3)
>
> b = NULL
> b[1] = "aaa 2 aaa"
> b[2] = "2 aaa"
> b[3] = "3 aaa"
> b[4] = "aaa 3 aaa"
>
> grep(paste("^",a, sep=""), b )
>


grep(paste("^", a, collapse = "|", sep = ""), b)

Uwe Ligges

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.