Regex - subsetting parts of a file name.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Regex - subsetting parts of a file name.

arnaud Gaboury
A directory is full of data.frames cache files. All these files have
the same pattern:

df.some_name.RData

my.cache.list <- c("df.subject_test.RData", "df.subject_train.RData",
"df.y_test.RData",
"df.y_train.RData")

I want to keep only the part inside the two points. After lots of
headache using grep() when trying something like this:

grep('.(.*?).','df.subject_test.RData',value=T)

 I couldn't find a clean one liner and found this workaround:

my.cache.list <- gsub('df.','',my.cache.list)
my.cache.list <- gsub('.RData','',my.cache.list)

The two above commands do the trick, but a clean one line with some
regex expression would be a more "elegant" way.

Does anyone have any suggestion ?

TY for help

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Regex - subsetting parts of a file name.

arun kirshna
Try:
gsub(".*\\.(.*)\\..*","\\1", my.cache.list)
[1] "subject_test"  "subject_train" "y_test"        "y_train"

#or

library(stringr)
str_extract(my.cache.list, perl('(?<=\\.).*(?=\\.)'))
[1] "subject_test"  "subject_train" "y_test"        "y_train" 

A.K.




On Thursday, July 31, 2014 11:05 AM, arnaud gaboury <[hidden email]> wrote:
A directory is full of data.frames cache files. All these files have
the same pattern:

df.some_name.RData

my.cache.list <- c("df.subject_test.RData", "df.subject_train.RData",
"df.y_test.RData",
"df.y_train.RData")

I want to keep only the part inside the two points. After lots of
headache using grep() when trying something like this:

grep('.(.*?).','df.subject_test.RData',value=T)

I couldn't find a clean one liner and found this workaround:

my.cache.list <- gsub('df.','',my.cache.list)
my.cache.list <- gsub('.RData','',my.cache.list)

The two above commands do the trick, but a clean one line with some
regex expression would be a more "elegant" way.

Does anyone have any suggestion ?

TY for help

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Regex - subsetting parts of a file name.

S Ellison-2
In reply to this post by arnaud Gaboury
> I want to keep only the part inside the two points. After lots of headache
> using grep() when trying something like this:
>
> grep('.(.*?).','df.subject_test.RData',value=T)
>
>
> Does anyone have any suggestion ?

gsub("df\\.(.+)\\.RData", "\\1", 'df.subject_test.RData')


Steve E



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Regex - subsetting parts of a file name.

Sarah Goslee
In reply to this post by arnaud Gaboury
Hi,

Here are two possibilities:

R> as.vector(sapply(my.cache.list, function(x)strsplit(x, "\\.")[[1]][2]))
[1] "subject_test"  "subject_train" "y_test"        "y_train"


R> gsub("df\\.(.*)\\.RData", "\\1", my.cache.list)
[1] "subject_test"  "subject_train" "y_test"        "y_train"


Note that "." will match any character, while "\\." matches a period.

Sarah

On Thu, Jul 31, 2014 at 4:27 AM, arnaud gaboury
<[hidden email]> wrote:

> A directory is full of data.frames cache files. All these files have
> the same pattern:
>
> df.some_name.RData
>
> my.cache.list <- c("df.subject_test.RData", "df.subject_train.RData",
> "df.y_test.RData",
> "df.y_train.RData")
>
> I want to keep only the part inside the two points. After lots of
> headache using grep() when trying something like this:
>
> grep('.(.*?).','df.subject_test.RData',value=T)
>
>  I couldn't find a clean one liner and found this workaround:
>
> my.cache.list <- gsub('df.','',my.cache.list)
> my.cache.list <- gsub('.RData','',my.cache.list)
>
> The two above commands do the trick, but a clean one line with some
> regex expression would be a more "elegant" way.
>
> Does anyone have any suggestion ?
>
> TY for help
>


--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Regex - subsetting parts of a file name.

arnaud Gaboury
>
> R> as.vector(sapply(my.cache.list, function(x)strsplit(x, "\\.")[[1]][2]))
> [1] "subject_test"  "subject_train" "y_test"        "y_train"
>
>
> R> gsub("df\\.(.*)\\.RData", "\\1", my.cache.list)
> [1] "subject_test"  "subject_train" "y_test"        "y_train"
>
>
> Note that "." will match any character, while "\\." matches a period.



Thank you for your various suggestions.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.