Must be obvious but not to me : problem with regular expression

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Must be obvious but not to me : problem with regular expression

PtitBleu
Hi,

I have a vector called nfichiers of 138 names of file whose extension is .P0 or P1 ... to P8.
The script is not the same when the extension is P0 or P(1 to 8).

Examples of file names :
[128] "Output0.P0"      
[129] "Output0.P1"      
[130] "Output0.P2"      
[131] "Output01102007.P0"
[132] "Output01102007.P1"
[133] "Output01102007.P2"
[134] "Output01102007.P3"
[135] "Output01102007.P4"


To extract the names of file with .P0 extension I wrote :
nfichiers[grep(".P0", nfichiers)]
For the other extensions :
nfichiers[grep(".P[^0]", nfichiers)]

But for the last, I get a length of 138 that is the length of the initial vector although I have 130 files with .P0 extension.

So I tried "manually" with a small vector :
> s
[1] "aa.P0" "bb.P0" "cc.P1" "dd.P2"
> s[grep(".P[^0]", s)]
[1] "cc.P1" "dd.P2"

It works !!!

Has someone an idea to solve this small problem ?
Thanks in advance,
Ptit Bleu.

Reply | Threaded
Open this post in threaded view
|

Re: Must be obvious but not to me : problem with regular expression

Duncan Murdoch
On 12/17/2007 9:34 AM, Ptit_Bleu wrote:

> Hi,
>
> I have a vector called nfichiers of 138 names of file whose extension is .P0
> or P1 ... to P8.
> The script is not the same when the extension is P0 or P(1 to 8).
>
> Examples of file names :
> [128] "Output0.P0"      
> [129] "Output0.P1"      
> [130] "Output0.P2"      
> [131] "Output01102007.P0"
> [132] "Output01102007.P1"
> [133] "Output01102007.P2"
> [134] "Output01102007.P3"
> [135] "Output01102007.P4"
>
>
> To extract the names of file with .P0 extension I wrote :
> nfichiers[grep(".P0", nfichiers)]
> For the other extensions :
> nfichiers[grep(".P[^0]", nfichiers)]
>
> But for the last, I get a length of 138 that is the length of the initial
> vector although I have 130 files with .P0 extension.

One problem above is that "." is special in regular expressions.  I'd
also suggest adding $ at the end, to force the match to the end of the
string.  That is, code as

grep("\\.P0$", nfichiers)

and

grep("\\.P[^0]$", nfichiers)

I don't know what false matches you were seeing, but this should
eliminate some.

Duncan Murdoch

>
> So I tried "manually" with a small vector :
>> s
> [1] "aa.P0" "bb.P0" "cc.P1" "dd.P2"
>> s[grep(".P[^0]", s)]
> [1] "cc.P1" "dd.P2"
>
> It works !!!
>
> Has someone an idea to solve this small problem ?
> Thanks in advance,
> Ptit Bleu.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Must be obvious but not to me : problem with regular expression

Uwe Ligges
In reply to this post by PtitBleu


Ptit_Bleu wrote:

> Hi,
>
> I have a vector called nfichiers of 138 names of file whose extension is .P0
> or P1 ... to P8.
> The script is not the same when the extension is P0 or P(1 to 8).
>
> Examples of file names :
> [128] "Output0.P0"      
> [129] "Output0.P1"      
> [130] "Output0.P2"      
> [131] "Output01102007.P0"
> [132] "Output01102007.P1"
> [133] "Output01102007.P2"
> [134] "Output01102007.P3"
> [135] "Output01102007.P4"
>
>
> To extract the names of file with .P0 extension I wrote :
> nfichiers[grep(".P0", nfichiers)]
> For the other extensions :
> nfichiers[grep(".P[^0]", nfichiers)]
>
> But for the last, I get a length of 138 that is the length of the initial
> vector although I have 130 files with .P0 extension.
>
> So I tried "manually" with a small vector :
>> s
> [1] "aa.P0" "bb.P0" "cc.P1" "dd.P2"
>> s[grep(".P[^0]", s)]
> [1] "cc.P1" "dd.P2"


I guess you want
     grep("\\.P0$", nfichiers)
Otherwise you get "XP0X" as a positive as well.

And for the others:
   grep("\\.P[^0]$", nfichiers)
with ".P[^0]", you'd get "XPXX" as positive, for example...
because you are looking for something that contains a P that is preceded
by any character and followed by some non-zero character.

Uwe Ligges


> It works !!!
>
> Has someone an idea to solve this small problem ?
> Thanks in advance,
> Ptit Bleu.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.