how to factor in the ID of the imported subtable to R table?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how to factor in the ID of the imported subtable to R table?

R help mailing list-2
Dear R community,

I am new to R—did some online tutorials and exercises in R playground. I was wondering if I could seek guidance on the following matter.

I have a set of 403 .csv files. Each.csv file contains the same layouts and distinguished by subject ID and date in the file name. The dataset looks like this:

Sub1-20170305.csv
Sub2-20180214.csv

Sub403-20191109.csv

I will use rbind function to combine 403 csv files in a single file (myFile). I will create two new variables (use mutate function) in myFile (subject ID and date). Is there a way to subtract subject ID (shown as “Sub1, 2,,,403”) and date from the name of the csv file and then place them in “subject ID” and “date” in myFile?

Any info on the issue itself or where to look for will be appreciated.

Thanks,

CJ






        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to factor in the ID of the imported subtable to R table?

David Winsemius-2

On 5/21/20 9:24 AM, YANJUN CHEN via R-help wrote:

> Dear R community,
>
> I am new to R—did some online tutorials and exercises in R playground. I was wondering if I could seek guidance on the following matter.
>
> I have a set of 403 .csv files. Each.csv file contains the same layouts and distinguished by subject ID and date in the file name. The dataset looks like this:
>
> Sub1-20170305.csv
> Sub2-20180214.csv
> …
> Sub403-20191109.csv


Something along the lines of:

?regex ; ?sub

?read.table

?data.frame

?do.call

?rbind

myfiles <- lapply( list.files(your_path) , # each file name will be
passed to anonymous function

                            function(nm) data.frame( subID = sub("-.+",
"",  nm), # remove chars after "-"

date=sub("^.+-(.{8})[.]csv", "\\1", nm), #extract date as capture class

                                             #assuming all files have
same number of columns with no headers

                                             read.table(
paste0(your_path, nm) )

big_file <- do.call(rbind, myfiles)

>
> I will use rbind function to combine 403 csv files in a single file (myFile). I will create two new variables (use mutate function) in myFile (subject ID and date). Is there a way to subtract subject ID (shown as “Sub1, 2,,,403”) and date from the name of the csv file and then place them in “subject ID” and “date” in myFile?
>
> Any info on the issue itself or where to look for will be appreciated.


If you search StackOverflow or Rseek with topic terms " stacking
multiple data files" you should find many worked examples.

> Thanks,
>
> CJ
>
>
>
>
>
>
> [[alternative HTML version deleted]]


You should now read the Posting Guide which will explain why you should
NOT post in HTML.


Best;

David.

> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.