regex pattern assistance

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

regex pattern assistance

Tom Wright-9
Hi,
Can anyone please assist.

given the string

> x<-"/mnt/AO/AO Data/S01-012/120824/"

I would like to extract "S01-012"

require(stringr)
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+")

both nearly work. I expected I would use something like:
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+")

but I don't seem able to get the square bracket grouping to work
correctly. Can someone please show me where I am going wrong?

Thanks,
Tom

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex pattern assistance

S Ellison-2


> -----Original Message-----
> > x<-"/mnt/AO/AO Data/S01-012/120824/"
>
> I would like to extract "S01-012"


> gsub("/mnt/AO/AO Data/(.+)/.+", "\\1", x)

#does it, as does
> gsub("/mnt/AO/AO Data/([\\w-]+)/.+", "\\1", x, perl=TRUE)    # \w is perl RE; the default is POSIX, which would be.
> gsub("/mnt/AO/AO Data/([[:alnum:]-]+)/.+", "\\1", x)  

#and
> str_match(x,"/mnt/AO/AO Data/(.+)/.+")
> str_match(x,"/mnt/AO/AO Data/([[:alnum:]-]+)/.+")   #again, needs POSIX, not perl RE

You had also, btw, missed the '.' in the closing '.+', meaning that your regex was looking for '/+', that is, multiple instances of '/'

Steve E



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex pattern assistance

Marc Schwartz-3
In reply to this post by Tom Wright-9
On Aug 15, 2014, at 11:18 AM, Tom Wright <[hidden email]> wrote:

> Hi,
> Can anyone please assist.
>
> given the string
>
>> x<-"/mnt/AO/AO Data/S01-012/120824/"
>
> I would like to extract "S01-012"
>
> require(stringr)
>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+")
>
> both nearly work. I expected I would use something like:
>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+")
>
> but I don't seem able to get the square bracket grouping to work
> correctly. Can someone please show me where I am going wrong?
>
> Thanks,
> Tom


Is the desired substring always in the same relative position in the path?

If so:

> strsplit(x, "/")
[[1]]
[1] ""        "mnt"     "AO"      "AO Data" "S01-012" "120824"

> unlist(strsplit(x, "/"))[5]
[1] "S01-012"



Alternatively, again, presuming the same position:

> gsub("/mnt/AO/AO Data/([^/]+)/.+", "\\1", x)
[1] "S01-012"


You don't need all of the double backslashes in your regex above. The '/' character is not a special regex character, whereas '\' is and needs to be escaped.

Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex pattern assistance

Rui Barradas
In reply to this post by Tom Wright-9
Hello,

I don't believe you need an extra package for that. Try


sub("\\/mnt\\/AO\\/AO Data\\/([-[:alnum:]]*)\\/.+", "\\1", x)

or, with package stringr,

str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/.+")


Hope this helps,

Rui Barradas

Em 15-08-2014 17:18, Tom Wright escreveu:

> Hi,
> Can anyone please assist.
>
> given the string
>
>> x<-"/mnt/AO/AO Data/S01-012/120824/"
>
> I would like to extract "S01-012"
>
> require(stringr)
>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+")
>
> both nearly work. I expected I would use something like:
>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+")
>
> but I don't seem able to get the square bracket grouping to work
> correctly. Can someone please show me where I am going wrong?
>
> Thanks,
> Tom
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex pattern assistance

Tom Wright-9
In reply to this post by Tom Wright-9
WOW!!!

What can I say 4 answers in less than 4 minutes. Thank you everyone. If
I can't make it work now I don't deserve to.

btw. the strsplit approach wouldn't work for me as:
a) I wanted to play with regex and
b) the location isn't consistent.

Nice to see email support still works, not everything has moved to
linkedin and stackoverflow.


Thanks again,
Tom


On Fri, 2014-08-15 at 12:18 -0400, Tom Wright wrote:

> Hi,
> Can anyone please assist.
>
> given the string
>
> > x<-"/mnt/AO/AO Data/S01-012/120824/"
>
> I would like to extract "S01-012"
>
> require(stringr)
> > str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
> > str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+")
>
> both nearly work. I expected I would use something like:
> > str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+")
>
> but I don't seem able to get the square bracket grouping to work
> correctly. Can someone please show me where I am going wrong?
>
> Thanks,
> Tom
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex pattern assistance

Jeff Newmiller
Must be another lucky streak. :-)
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

On August 15, 2014 9:56:21 AM PDT, Tom Wright <[hidden email]> wrote:

>WOW!!!
>
>What can I say 4 answers in less than 4 minutes. Thank you everyone. If
>I can't make it work now I don't deserve to.
>
>btw. the strsplit approach wouldn't work for me as:
>a) I wanted to play with regex and
>b) the location isn't consistent.
>
>Nice to see email support still works, not everything has moved to
>linkedin and stackoverflow.
>
>
>Thanks again,
>Tom
>
>
>On Fri, 2014-08-15 at 12:18 -0400, Tom Wright wrote:
>> Hi,
>> Can anyone please assist.
>>
>> given the string
>>
>> > x<-"/mnt/AO/AO Data/S01-012/120824/"
>>
>> I would like to extract "S01-012"
>>
>> require(stringr)
>> > str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
>> > str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+")
>>
>> both nearly work. I expected I would use something like:
>> > str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+")
>>
>> but I don't seem able to get the square bracket grouping to work
>> correctly. Can someone please show me where I am going wrong?
>>
>> Thanks,
>> Tom
>>
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex pattern assistance

Marc Schwartz-3
In reply to this post by Tom Wright-9

On Aug 15, 2014, at 11:56 AM, Tom Wright <[hidden email]> wrote:

> WOW!!!
>
> What can I say 4 answers in less than 4 minutes. Thank you everyone. If
> I can't make it work now I don't deserve to.
>
> btw. the strsplit approach wouldn't work for me as:
> a) I wanted to play with regex and
> b) the location isn't consistent.


Tom,

If not in the same relative position, is the substring pattern always the same? That is 3 characters, a hyphen, then 3 characters? If so, would any other part of the path follow the same pattern or is it unique?

If the pattern is the same and is unique in the path:

> gsub(".*([[:alnum:]]{3}-[[:alnum:]]{3}).*", "\\1", x)
[1] "S01-012"


is another possible alternative and more flexible:

y <- "/mnt/AO/AO Data/Another Level/Yet Another Level/S01-012/120824/"

> gsub(".*([[:alnum:]]{3}-[[:alnum:]]{3}).*", "\\1", y)
[1] "S01-012"


z <- "/mnt/AO/AO Data/Another Level/Yet Another Level/S01-012/One More Level/120824/"

> gsub(".*([[:alnum:]]{3}-[[:alnum:]]{3}).*", "\\1", z)
[1] "S01-012"


>
> Nice to see email support still works, not everything has moved to
> linkedin and stackoverflow.


Stackoverflow?  ;-)

Regards,

Marc


>
>
> Thanks again,
> Tom
>
>
> On Fri, 2014-08-15 at 12:18 -0400, Tom Wright wrote:
>> Hi,
>> Can anyone please assist.
>>
>> given the string
>>
>>> x<-"/mnt/AO/AO Data/S01-012/120824/"
>>
>> I would like to extract "S01-012"
>>
>> require(stringr)
>>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
>>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+")
>>
>> both nearly work. I expected I would use something like:
>>> str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+")
>>
>> but I don't seem able to get the square bracket grouping to work
>> correctly. Can someone please show me where I am going wrong?
>>
>> Thanks,
>> Tom

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regex pattern assistance

arun kirshna
In reply to this post by Tom Wright-9


Hi Tom,
You could try:
library(stringr)
str_extract(x, perl("(?<=[A-Za-z]{4}/).*(?=/[0-9])"))
#[1] "S01-012"
A.K.



On Friday, August 15, 2014 12:20 PM, Tom Wright <[hidden email]> wrote:
Hi,
Can anyone please assist.

given the string

> x<-"/mnt/AO/AO Data/S01-012/120824/"

I would like to extract "S01-012"

require(stringr)
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+")

both nearly work. I expected I would use something like:
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+")

but I don't seem able to get the square bracket grouping to work
correctly. Can someone please show me where I am going wrong?

Thanks,
Tom

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.