File conca.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

File conca.

Val-17
Hi All,

I have data files in several folders and want combine all  these files
in one file.  In each folder  there are several files  and these
files have the same structure but different names.   First, in each
folder  I want to concatenate(rbind) all files in to one file. While I
am  reading each files and concatenating (rbind) all files, I want to
added  the folder name as one variable  in each row. I am reading the
folder names  from a file and for demonstration I am using only two
folders  as shown below.
Data\week1             # folder name 1
           WT13.csv
           WT26.csv           ...
           WT10.csv
Data\week2            #folder name 2
           WT02.csv
           WT12.csv

Below please find  my attempt,

folders=c("week1","week2")
for(i in folders){
  path=paste("\data\"", i , sep = "")
  setwd(path)
  Flist = list.files(path,pattern = "^WT")
  dataA =  lapply(Flist, function(x)read.csv(x, header=T))
  Alldata = do.call("rbind", dataA)     # combine all files
  Alldata$foldername=i                  # adding the folder name
}
The above works for  for one folder but how can I do it for more than
one folders?

Thank you in advance,

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: File conca.

PIKAL Petr
Hi

Help with such operations is rather tricky as only you know exact structrure
of your folders.

see some hints in line

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Val
> Sent: Tuesday, November 5, 2019 4:33 AM
> To: [hidden email] ([hidden email]) <[hidden email]>
> Subject: [R] File conca.
>
> Hi All,
>
> I have data files in several folders and want combine all  these files in
one
> file.  In each folder  there are several files  and these
> files have the same structure but different names.   First, in each
> folder  I want to concatenate(rbind) all files in to one file. While I am
> reading each files and concatenating (rbind) all files, I want to added
the

> folder name as one variable  in each row. I am reading the folder names
> from a file and for demonstration I am using only two folders  as shown
> below.
> Data\week1             # folder name 1
>            WT13.csv
>            WT26.csv           ...
>            WT10.csv
> Data\week2            #folder name 2
>            WT02.csv
>            WT12.csv
>
> Below please find  my attempt,
>
> folders=c("week1","week2")
> for(i in folders){
>   path=paste("\data\"", i , sep = "")
>   setwd(path)
you should use
wd <- setwd(path)

which keeps the original directory for subsequent use

>   Flist = list.files(path,pattern = "^WT")
>   dataA =  lapply(Flist, function(x)read.csv(x, header=T))
>   Alldata = do.call("rbind", dataA)     # combine all files
>   Alldata$foldername=i                  # adding the folder name
>

now you can do

setwd(wd)

to return to original directory
}

> The above works for  for one folder but how can I do it for more than one
> folders?

You also need to decide if you want all data from all folders in one object
called Alldata or if you want several Alldata objects, one for each folder.

In second case you could use list structure for Alldata. In the first case
you could store data from each folder in some temporary object and use rbind
directly.

something like

temp <- do.call("rbind", dataA)
temp$foldername <- i

Alldata <- temp
in the first cycle
and
Alldata <- rbind(Alldata, temp)
in second and all others.

Or you could initiate first Alldata manually and use only
Alldata <- rbind(Alldata, temp)

in your loop.

Cheers
Petr

>
> Thank you in advance,
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: File conca.

Jeff Newmiller
I recommend not using setwd unless you have to (e.g at the beginning of a script run by cron or another task scheduler). It is much simpler to build paths to directories and files using file.path.

On November 5, 2019 12:13:19 AM PST, PIKAL Petr <[hidden email]> wrote:

>Hi
>
>Help with such operations is rather tricky as only you know exact
>structrure
>of your folders.
>
>see some hints in line
>
>> -----Original Message-----
>> From: R-help <[hidden email]> On Behalf Of Val
>> Sent: Tuesday, November 5, 2019 4:33 AM
>> To: [hidden email] ([hidden email])
><[hidden email]>
>> Subject: [R] File conca.
>>
>> Hi All,
>>
>> I have data files in several folders and want combine all  these
>files in
>one
>> file.  In each folder  there are several files  and these
>> files have the same structure but different names.   First, in each
>> folder  I want to concatenate(rbind) all files in to one file. While
>I am
>> reading each files and concatenating (rbind) all files, I want to
>added
>the
>> folder name as one variable  in each row. I am reading the folder
>names
>> from a file and for demonstration I am using only two folders  as
>shown
>> below.
>> Data\week1             # folder name 1
>>            WT13.csv
>>            WT26.csv           ...
>>            WT10.csv
>> Data\week2            #folder name 2
>>            WT02.csv
>>            WT12.csv
>>
>> Below please find  my attempt,
>>
>> folders=c("week1","week2")
>> for(i in folders){
>>   path=paste("\data\"", i , sep = "")
>>   setwd(path)
>
>you should use
>wd <- setwd(path)
>
>which keeps the original directory for subsequent use
>
>>   Flist = list.files(path,pattern = "^WT")
>>   dataA =  lapply(Flist, function(x)read.csv(x, header=T))
>>   Alldata = do.call("rbind", dataA)     # combine all files
>>   Alldata$foldername=i                  # adding the folder name
>>
>
>now you can do
>
>setwd(wd)
>
>to return to original directory
>}
>
>> The above works for  for one folder but how can I do it for more than
>one
>> folders?
>
>You also need to decide if you want all data from all folders in one
>object
>called Alldata or if you want several Alldata objects, one for each
>folder.
>
>In second case you could use list structure for Alldata. In the first
>case
>you could store data from each folder in some temporary object and use
>rbind
>directly.
>
>something like
>
>temp <- do.call("rbind", dataA)
>temp$foldername <- i
>
>Alldata <- temp
>in the first cycle
>and
>Alldata <- rbind(Alldata, temp)
>in second and all others.
>
>Or you could initiate first Alldata manually and use only
>Alldata <- rbind(Alldata, temp)
>
>in your loop.
>
>Cheers
>Petr
>
>>
>> Thank you in advance,
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: File conca.

Val-17
In reply to this post by PIKAL Petr
Thank you Petr and Jeff fro your suggestions.

I made some improvement but  still need some tweaking.  I could not
get correctly the folders names added to each row. Only the last
forename was added.
table(Alldata$oldername) resulted
   week2
    25500

Please see the complete,

####################################################
folders=c("week1","week2")
for(i in folders){
  path=paste("\data\"", i , sep = "")
  wd <-  setwd(path)
  Flist = list.files(path,pattern = "^WT")
  dataA =  lapply(Flist, function(x)read.csv(x, header=T))
  setwd(wd)
  temp = do.call("rbind", Alldata)
  temp$foldername <- i
  Alldata <- temp
  Alldata <- rbind(Alldata, temp)
}
#######################################################
Any suggestion please?


On Tue, Nov 5, 2019 at 2:13 AM PIKAL Petr <[hidden email]> wrote:

>
> Hi
>
> Help with such operations is rather tricky as only you know exact structrure
> of your folders.
>
> see some hints in line
>
> > -----Original Message-----
> > From: R-help <[hidden email]> On Behalf Of Val
> > Sent: Tuesday, November 5, 2019 4:33 AM
> > To: [hidden email] ([hidden email]) <[hidden email]>
> > Subject: [R] File conca.
> >
> > Hi All,
> >
> > I have data files in several folders and want combine all  these files in
> one
> > file.  In each folder  there are several files  and these
> > files have the same structure but different names.   First, in each
> > folder  I want to concatenate(rbind) all files in to one file. While I am
> > reading each files and concatenating (rbind) all files, I want to added
> the
> > folder name as one variable  in each row. I am reading the folder names
> > from a file and for demonstration I am using only two folders  as shown
> > below.
> > Data\week1             # folder name 1
> >            WT13.csv
> >            WT26.csv           ...
> >            WT10.csv
> > Data\week2            #folder name 2
> >            WT02.csv
> >            WT12.csv
> >
> > Below please find  my attempt,
> >
> > folders=c("week1","week2")
> > for(i in folders){
> >   path=paste("\data\"", i , sep = "")
> >   setwd(path)
>
> you should use
> wd <- setwd(path)
>
> which keeps the original directory for subsequent use
>
> >   Flist = list.files(path,pattern = "^WT")
> >   dataA =  lapply(Flist, function(x)read.csv(x, header=T))
> >   Alldata = do.call("rbind", dataA)     # combine all files
> >   Alldata$foldername=i                  # adding the folder name
> >
>
> now you can do
>
> setwd(wd)
>
> to return to original directory
> }
>
> > The above works for  for one folder but how can I do it for more than one
> > folders?
>
> You also need to decide if you want all data from all folders in one object
> called Alldata or if you want several Alldata objects, one for each folder.
>
> In second case you could use list structure for Alldata. In the first case
> you could store data from each folder in some temporary object and use rbind
> directly.
>
> something like
>
> temp <- do.call("rbind", dataA)
> temp$foldername <- i
>
> Alldata <- temp
> in the first cycle
> and
> Alldata <- rbind(Alldata, temp)
> in second and all others.
>
> Or you could initiate first Alldata manually and use only
> Alldata <- rbind(Alldata, temp)
>
> in your loop.
>
> Cheers
> Petr
>
> >
> > Thank you in advance,
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: File conca.

PIKAL Petr
Hi

in line

> -----Original Message-----
> From: Val <[hidden email]>
> Sent: Wednesday, November 6, 2019 3:24 AM
> To: PIKAL Petr <[hidden email]>
> Cc: [hidden email] ([hidden email]) <[hidden email]>
> Subject: Re: [R] File conca.
>
> Thank you Petr and Jeff fro your suggestions.
>
> I made some improvement but  still need some tweaking.  I could not get
> correctly the folders names added to each row. Only the last forename was
> added.
> table(Alldata$oldername) resulted
>    week2
>     25500
>
> Please see the complete,
>
> ####################################################
> folders=c("week1","week2")
> for(i in folders){
>   path=paste("\data\"", i , sep = "")
>   wd <-  setwd(path)
>   Flist = list.files(path,pattern = "^WT")
>   dataA =  lapply(Flist, function(x)read.csv(x, header=T))
>   setwd(wd)
>   temp = do.call("rbind", Alldata)

Shouldn't it be
temp = do.call("rbind", dataA)


This is problematic piece

>   temp$foldername <- i # this seems to be OK

But these in each cycle put recent temp in Alldata and adds temp again by
rbinding.

>   Alldata <- temp
>   Alldata <- rbind(Alldata, temp)

I understand from your description that you want all data from all files in
one Alldata object.

You could either read the files from first folder and put them into Alldata
**before** your cycle.
Alldata <- temp
After you declare Alldata in such way, you could use only

Alldata <- rbind(Alldata, temp)

in your cycle to add data from other folders.

Or you could use some incremental variable to check if it is the first run.

something like

k <- 0

for(i in folders){...
k <- k+1
....
if (k==1)    Alldata <- temp else Alldata <- rbind(Alldata, temp)
...
}

Cheers
Petr

> }
> #######################################################
> Any suggestion please?
>
>
> On Tue, Nov 5, 2019 at 2:13 AM PIKAL Petr <[hidden email]> wrote:
> >
> > Hi
> >
> > Help with such operations is rather tricky as only you know exact
> > structrure of your folders.
> >
> > see some hints in line
> >
> > > -----Original Message-----
> > > From: R-help <[hidden email]> On Behalf Of Val
> > > Sent: Tuesday, November 5, 2019 4:33 AM
> > > To: [hidden email] ([hidden email])
> > > <[hidden email]>
> > > Subject: [R] File conca.
> > >
> > > Hi All,
> > >
> > > I have data files in several folders and want combine all  these
> > > files in
> > one
> > > file.  In each folder  there are several files  and these
> > > files have the same structure but different names.   First, in each
> > > folder  I want to concatenate(rbind) all files in to one file. While
> > > I am reading each files and concatenating (rbind) all files, I want
> > > to added
> > the
> > > folder name as one variable  in each row. I am reading the folder
> > > names from a file and for demonstration I am using only two folders
> > > as shown below.
> > > Data\week1             # folder name 1
> > >            WT13.csv
> > >            WT26.csv           ...
> > >            WT10.csv
> > > Data\week2            #folder name 2
> > >            WT02.csv
> > >            WT12.csv
> > >
> > > Below please find  my attempt,
> > >
> > > folders=c("week1","week2")
> > > for(i in folders){
> > >   path=paste("\data\"", i , sep = "")
> > >   setwd(path)
> >
> > you should use
> > wd <- setwd(path)
> >
> > which keeps the original directory for subsequent use
> >
> > >   Flist = list.files(path,pattern = "^WT")
> > >   dataA =  lapply(Flist, function(x)read.csv(x, header=T))
> > >   Alldata = do.call("rbind", dataA)     # combine all files
> > >   Alldata$foldername=i                  # adding the folder name
> > >
> >
> > now you can do
> >
> > setwd(wd)
> >
> > to return to original directory
> > }
> >
> > > The above works for  for one folder but how can I do it for more
> > > than one folders?
> >
> > You also need to decide if you want all data from all folders in one
> > object called Alldata or if you want several Alldata objects, one for each
> folder.
> >
> > In second case you could use list structure for Alldata. In the first
> > case you could store data from each folder in some temporary object
> > and use rbind directly.
> >
> > something like
> >
> > temp <- do.call("rbind", dataA)
> > temp$foldername <- i
> >
> > Alldata <- temp
> > in the first cycle
> > and
> > Alldata <- rbind(Alldata, temp)
> > in second and all others.
> >
> > Or you could initiate first Alldata manually and use only Alldata <-
> > rbind(Alldata, temp)
> >
> > in your loop.
> >
> > Cheers
> > Petr
> >
> > >
> > > Thank you in advance,
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > > guide.html and provide commented, minimal, self-contained,
> > > reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.