|
Hi,
I am relatively new to R. Have scourged the help files and the www but havent been able to get a solution. I have around 250 csv files, one file for each date. They have columns of all types, numeric, string etc. The name of each file is the date in the form of 'yyyymmdd'. There is no column within the file which helps me identify the date on which the file was generated, only the filename has that info. I am selecting some data (using read.csv.sql) from each file and creating a dataset for each day. Ultimately I will combine all the datasets. I can accomplish the select and combine part, but after combining I wont have a record as to the date corresponding to the data. Hence I want to insert the filename as a column in the respective file to help me in identifying to what date each data row belongs to. Sorry for the long mail, but wanted to make myself clear. Any help would be greatly appreciated. Thanks in advance, Shivam [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
This might do it for you:
for (i in fileNames){ input <- read.table(i, .....) # you might want to use regular expressions to extract just the date. input$fileName <- i write.table(i, ....) } On Mon, Apr 23, 2012 at 12:29 PM, Shivam <[hidden email]> wrote: > Hi, > > I am relatively new to R. Have scourged the help files and the www but > havent been able to get a solution. > > I have around 250 csv files, one file for each date. They have columns of > all types, numeric, string etc. The name of each file is the date in the > form of 'yyyymmdd'. There is no column within the file which helps me > identify the date on which the file was generated, only the filename has > that info. > > I am selecting some data (using read.csv.sql) from each file and creating a > dataset for each day. Ultimately I will combine all the datasets. I can > accomplish the select and combine part, but after combining I wont have a > record as to the date corresponding to the data. > > Hence I want to insert the filename as a column in the respective file to > help me in identifying to what date each data row belongs to. > > Sorry for the long mail, but wanted to make myself clear. Any help would be > greatly appreciated. > > Thanks in advance, > Shivam > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Shivam
This little example might help.
> foo <- data.frame(a=1:10, b=letters[1:0]) > foo a b 1 1 a 2 2 a 3 3 a 4 4 a 5 5 a 6 6 a 7 7 a 8 8 a 9 9 a 10 10 a > foo$date <- '20120423' > foo a b date 1 1 a 20120423 2 2 a 20120423 3 3 a 20120423 4 4 a 20120423 5 5 a 20120423 6 6 a 20120423 7 7 a 20120423 8 8 a 20120423 9 9 a 20120423 10 10 a 20120423 In other words, immediately after reading the data into a data frame, add a date column as in the example. You'll have to extract the date from the filename, of course. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/23/12 9:29 AM, "Shivam" <[hidden email]> wrote: >Hi, > >I am relatively new to R. Have scourged the help files and the www but >havent been able to get a solution. > >I have around 250 csv files, one file for each date. They have columns of >all types, numeric, string etc. The name of each file is the date in the >form of 'yyyymmdd'. There is no column within the file which helps me >identify the date on which the file was generated, only the filename has >that info. > >I am selecting some data (using read.csv.sql) from each file and creating >a >dataset for each day. Ultimately I will combine all the datasets. I can >accomplish the select and combine part, but after combining I wont have a >record as to the date corresponding to the data. > >Hence I want to insert the filename as a column in the respective file to >help me in identifying to what date each data row belongs to. > >Sorry for the long mail, but wanted to make myself clear. Any help would >be >greatly appreciated. > >Thanks in advance, >Shivam > > [[alternative HTML version deleted]] > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks for the quick response. It works for an individual dataframe, but I
have many dataframes. This is the code so far fnames = list.files(path = getwd()) for (i in 1:length(fnames)){ assign(paste("file",i,sep=""),read.csv.sql(fnames[i], sql = "select * from file where V3 == 'XXX' and V5=='YYY'",header = FALSE, sep= '|', eol = "\n")) } This generates dataframes named as as file1,file2,...,file250. Is there a way to do something like below within the same loop? file1$date = substr(fnames[1],1,8)) file2$date = substr(fnames[2],1,8)) . . file250$date = substr(fnames[250],1,8)) assign(paste("file",i,sep="")$date doesnt work. Any help? On Tue, Apr 24, 2012 at 12:01 AM, MacQueen, Don <[hidden email]> wrote: > This little example might help. > > > foo <- data.frame(a=1:10, b=letters[1:0]) > > foo > a b > 1 1 a > 2 2 a > 3 3 a > 4 4 a > 5 5 a > 6 6 a > 7 7 a > 8 8 a > 9 9 a > 10 10 a > > foo$date <- '20120423' > > foo > a b date > 1 1 a 20120423 > 2 2 a 20120423 > 3 3 a 20120423 > 4 4 a 20120423 > 5 5 a 20120423 > 6 6 a 20120423 > 7 7 a 20120423 > 8 8 a 20120423 > 9 9 a 20120423 > 10 10 a 20120423 > > > In other words, immediately after reading the data into a data frame, add > a date column as in the example. You'll have to extract the date from the > filename, of course. > > -Don > > > -- > Don MacQueen > > Lawrence Livermore National Laboratory > 7000 East Ave., L-627 > Livermore, CA 94550 > 925-423-1062 > > > > > > On 4/23/12 9:29 AM, "Shivam" <[hidden email]> wrote: > > >Hi, > > > >I am relatively new to R. Have scourged the help files and the www but > >havent been able to get a solution. > > > >I have around 250 csv files, one file for each date. They have columns of > >all types, numeric, string etc. The name of each file is the date in the > >form of 'yyyymmdd'. There is no column within the file which helps me > >identify the date on which the file was generated, only the filename has > >that info. > > > >I am selecting some data (using read.csv.sql) from each file and creating > >a > >dataset for each day. Ultimately I will combine all the datasets. I can > >accomplish the select and combine part, but after combining I wont have a > >record as to the date corresponding to the data. > > > >Hence I want to insert the filename as a column in the respective file to > >help me in identifying to what date each data row belongs to. > > > >Sorry for the long mail, but wanted to make myself clear. Any help would > >be > >greatly appreciated. > > > >Thanks in advance, > >Shivam > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >[hidden email] mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- *Victoria Concordia Crescit* [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Reposting in hope of a reply.
On Tue, Apr 24, 2012 at 1:12 AM, Shivam <[hidden email]> wrote: > Thanks for the quick response. It works for an individual dataframe, but I > have many dataframes. This is the code so far > > fnames = list.files(path = getwd()) > for (i in 1:length(fnames)){ > assign(paste("file",i,sep=""),read.csv.sql(fnames[i], sql = "select * from > file where V3 == 'XXX' and V5=='YYY'",header = FALSE, sep= '|', eol = "\n")) > } > > This generates dataframes named as as file1,file2,...,file250. Is there a > way to do something like below within the same loop? > > file1$date = substr(fnames[1],1,8)) > file2$date = substr(fnames[2],1,8)) > . > . > file250$date = substr(fnames[250],1,8)) > > assign(paste("file",i,sep="")$date doesnt work. > > Any help? > > > > > > On Tue, Apr 24, 2012 at 12:01 AM, MacQueen, Don <[hidden email]>wrote: > >> This little example might help. >> >> > foo <- data.frame(a=1:10, b=letters[1:0]) >> > foo >> a b >> 1 1 a >> 2 2 a >> 3 3 a >> 4 4 a >> 5 5 a >> 6 6 a >> 7 7 a >> 8 8 a >> 9 9 a >> 10 10 a >> > foo$date <- '20120423' >> > foo >> a b date >> 1 1 a 20120423 >> 2 2 a 20120423 >> 3 3 a 20120423 >> 4 4 a 20120423 >> 5 5 a 20120423 >> 6 6 a 20120423 >> 7 7 a 20120423 >> 8 8 a 20120423 >> 9 9 a 20120423 >> 10 10 a 20120423 >> >> >> In other words, immediately after reading the data into a data frame, add >> a date column as in the example. You'll have to extract the date from the >> filename, of course. >> >> -Don >> >> >> -- >> Don MacQueen >> >> Lawrence Livermore National Laboratory >> 7000 East Ave., L-627 >> Livermore, CA 94550 >> 925-423-1062 >> >> >> >> >> >> On 4/23/12 9:29 AM, "Shivam" <[hidden email]> wrote: >> >> >Hi, >> > >> >I am relatively new to R. Have scourged the help files and the www but >> >havent been able to get a solution. >> > >> >I have around 250 csv files, one file for each date. They have columns of >> >all types, numeric, string etc. The name of each file is the date in the >> >form of 'yyyymmdd'. There is no column within the file which helps me >> >identify the date on which the file was generated, only the filename has >> >that info. >> > >> >I am selecting some data (using read.csv.sql) from each file and creating >> >a >> >dataset for each day. Ultimately I will combine all the datasets. I can >> >accomplish the select and combine part, but after combining I wont have a >> >record as to the date corresponding to the data. >> > >> >Hence I want to insert the filename as a column in the respective file to >> >help me in identifying to what date each data row belongs to. >> > >> >Sorry for the long mail, but wanted to make myself clear. Any help would >> >be >> >greatly appreciated. >> > >> >Thanks in advance, >> >Shivam >> > >> > [[alternative HTML version deleted]] >> > >> >______________________________________________ >> >[hidden email] mailing list >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> > > > -- > *Victoria Concordia Crescit* > -- *Victoria Concordia Crescit* [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Programatically dealing with large numbers of separately-named objects leads to syntactically complicated code that is hard to read and maintain.
Load the data frames into a list so you can access them by numeric or named index, and then getting at the loaded data will be much easier. fnames = list.files(path = getwd()) # preallocating the list for efficiency (execution speed) dtalist <- vector( "list", length(fnames) ) for (i in seq_len(length(fnames))){ dtalist[[i]] <- read.csv.sql(fnames[i], sql = "select * from file where V3 == 'XXX' and V5=='YYY'",header = FALSE, sep= '|', eol ="\n")) dtalist[[i]]$date <- substr(fnames[i],1,8)) } names(dtalist) <- fnames # now you can optionally refer to dtalist$file20120424.csv or dtalist[["file20120424"]] if you wish. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Shivam <[hidden email]> wrote: >Reposting in hope of a reply. > >On Tue, Apr 24, 2012 at 1:12 AM, Shivam <[hidden email]> wrote: > >> Thanks for the quick response. It works for an individual dataframe, >but I >> have many dataframes. This is the code so far >> >> fnames = list.files(path = getwd()) >> for (i in 1:length(fnames)){ >> assign(paste("file",i,sep=""),read.csv.sql(fnames[i], sql = "select * >from >> file where V3 == 'XXX' and V5=='YYY'",header = FALSE, sep= '|', eol = >"\n")) >> } >> >> This generates dataframes named as as file1,file2,...,file250. Is >there a >> way to do something like below within the same loop? >> >> file1$date = substr(fnames[1],1,8)) >> file2$date = substr(fnames[2],1,8)) >> . >> . >> file250$date = substr(fnames[250],1,8)) >> >> assign(paste("file",i,sep="")$date doesnt work. >> >> Any help? >> >> >> >> >> >> On Tue, Apr 24, 2012 at 12:01 AM, MacQueen, Don ><[hidden email]>wrote: >> >>> This little example might help. >>> >>> > foo <- data.frame(a=1:10, b=letters[1:0]) >>> > foo >>> a b >>> 1 1 a >>> 2 2 a >>> 3 3 a >>> 4 4 a >>> 5 5 a >>> 6 6 a >>> 7 7 a >>> 8 8 a >>> 9 9 a >>> 10 10 a >>> > foo$date <- '20120423' >>> > foo >>> a b date >>> 1 1 a 20120423 >>> 2 2 a 20120423 >>> 3 3 a 20120423 >>> 4 4 a 20120423 >>> 5 5 a 20120423 >>> 6 6 a 20120423 >>> 7 7 a 20120423 >>> 8 8 a 20120423 >>> 9 9 a 20120423 >>> 10 10 a 20120423 >>> >>> >>> In other words, immediately after reading the data into a data >frame, add >>> a date column as in the example. You'll have to extract the date >from the >>> filename, of course. >>> >>> -Don >>> >>> >>> -- >>> Don MacQueen >>> >>> Lawrence Livermore National Laboratory >>> 7000 East Ave., L-627 >>> Livermore, CA 94550 >>> 925-423-1062 >>> >>> >>> >>> >>> >>> On 4/23/12 9:29 AM, "Shivam" <[hidden email]> wrote: >>> >>> >Hi, >>> > >>> >I am relatively new to R. Have scourged the help files and the www >but >>> >havent been able to get a solution. >>> > >>> >I have around 250 csv files, one file for each date. They have >columns of >>> >all types, numeric, string etc. The name of each file is the date >in the >>> >form of 'yyyymmdd'. There is no column within the file which helps >me >>> >identify the date on which the file was generated, only the >filename has >>> >that info. >>> > >>> >I am selecting some data (using read.csv.sql) from each file and >creating >>> >a >>> >dataset for each day. Ultimately I will combine all the datasets. I >can >>> >accomplish the select and combine part, but after combining I wont >have a >>> >record as to the date corresponding to the data. >>> > >>> >Hence I want to insert the filename as a column in the respective >file to >>> >help me in identifying to what date each data row belongs to. >>> > >>> >Sorry for the long mail, but wanted to make myself clear. Any help >would >>> >be >>> >greatly appreciated. >>> > >>> >Thanks in advance, >>> >Shivam >>> > >>> > [[alternative HTML version deleted]] >>> > >>> >______________________________________________ >>> >[hidden email] mailing list >>> >https://stat.ethz.ch/mailman/listinfo/r-help >>> >PLEASE do read the posting guide >>> >http://www.R-project.org/posting-guide.html >>> >and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> -- >> *Victoria Concordia Crescit* >> > > > >-- >*Victoria Concordia Crescit* > > [[alternative HTML version deleted]] > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks Jeff. I had tried the 'list' approach as well but got stuck with the
below error: "Error in `$<-.data.frame`(`*tmp*`, "date", value = "20100701") : replacement has 1 rows, data has 0" Couldnt find a work around to this, hence resorted to the multiple dataframes approach. Any insights into this? On Tue, Apr 24, 2012 at 9:51 PM, Jeff Newmiller <[hidden email]>wrote: > Programatically dealing with large numbers of separately-named objects > leads to syntactically complicated code that is hard to read and maintain. > > Load the data frames into a list so you can access them by numeric or > named index, and then getting at the loaded data will be much easier. > > fnames = list.files(path = getwd()) > # preallocating the list for efficiency (execution speed) > dtalist <- vector( "list", length(fnames) ) > for (i in seq_len(length(fnames))){ > dtalist[[i]] <- read.csv.sql(fnames[i], sql = "select * from file where > V3 == 'XXX' and V5=='YYY'",header = FALSE, sep= '|', eol ="\n")) > dtalist[[i]]$date <- substr(fnames[i],1,8)) > } > names(dtalist) <- fnames > # now you can optionally refer to dtalist$file20120424.csv or > dtalist[["file20120424"]] if you wish. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<[hidden email]> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > > > Shivam <[hidden email]> wrote: > > >Reposting in hope of a reply. > > > >On Tue, Apr 24, 2012 at 1:12 AM, Shivam <[hidden email]> wrote: > > > >> Thanks for the quick response. It works for an individual dataframe, > >but I > >> have many dataframes. This is the code so far > >> > >> fnames = list.files(path = getwd()) > >> for (i in 1:length(fnames)){ > >> assign(paste("file",i,sep=""),read.csv.sql(fnames[i], sql = "select * > >from > >> file where V3 == 'XXX' and V5=='YYY'",header = FALSE, sep= '|', eol = > >"\n")) > >> } > >> > >> This generates dataframes named as as file1,file2,...,file250. Is > >there a > >> way to do something like below within the same loop? > >> > >> file1$date = substr(fnames[1],1,8)) > >> file2$date = substr(fnames[2],1,8)) > >> . > >> . > >> file250$date = substr(fnames[250],1,8)) > >> > >> assign(paste("file",i,sep="")$date doesnt work. > >> > >> Any help? > >> > >> > >> > >> > >> > >> On Tue, Apr 24, 2012 at 12:01 AM, MacQueen, Don > ><[hidden email]>wrote: > >> > >>> This little example might help. > >>> > >>> > foo <- data.frame(a=1:10, b=letters[1:0]) > >>> > foo > >>> a b > >>> 1 1 a > >>> 2 2 a > >>> 3 3 a > >>> 4 4 a > >>> 5 5 a > >>> 6 6 a > >>> 7 7 a > >>> 8 8 a > >>> 9 9 a > >>> 10 10 a > >>> > foo$date <- '20120423' > >>> > foo > >>> a b date > >>> 1 1 a 20120423 > >>> 2 2 a 20120423 > >>> 3 3 a 20120423 > >>> 4 4 a 20120423 > >>> 5 5 a 20120423 > >>> 6 6 a 20120423 > >>> 7 7 a 20120423 > >>> 8 8 a 20120423 > >>> 9 9 a 20120423 > >>> 10 10 a 20120423 > >>> > >>> > >>> In other words, immediately after reading the data into a data > >frame, add > >>> a date column as in the example. You'll have to extract the date > >from the > >>> filename, of course. > >>> > >>> -Don > >>> > >>> > >>> -- > >>> Don MacQueen > >>> > >>> Lawrence Livermore National Laboratory > >>> 7000 East Ave., L-627 > >>> Livermore, CA 94550 > >>> 925-423-1062 > >>> > >>> > >>> > >>> > >>> > >>> On 4/23/12 9:29 AM, "Shivam" <[hidden email]> wrote: > >>> > >>> >Hi, > >>> > > >>> >I am relatively new to R. Have scourged the help files and the www > >but > >>> >havent been able to get a solution. > >>> > > >>> >I have around 250 csv files, one file for each date. They have > >columns of > >>> >all types, numeric, string etc. The name of each file is the date > >in the > >>> >form of 'yyyymmdd'. There is no column within the file which helps > >me > >>> >identify the date on which the file was generated, only the > >filename has > >>> >that info. > >>> > > >>> >I am selecting some data (using read.csv.sql) from each file and > >creating > >>> >a > >>> >dataset for each day. Ultimately I will combine all the datasets. I > >can > >>> >accomplish the select and combine part, but after combining I wont > >have a > >>> >record as to the date corresponding to the data. > >>> > > >>> >Hence I want to insert the filename as a column in the respective > >file to > >>> >help me in identifying to what date each data row belongs to. > >>> > > >>> >Sorry for the long mail, but wanted to make myself clear. Any help > >would > >>> >be > >>> >greatly appreciated. > >>> > > >>> >Thanks in advance, > >>> >Shivam > >>> > > >>> > [[alternative HTML version deleted]] > >>> > > >>> >______________________________________________ > >>> >[hidden email] mailing list > >>> >https://stat.ethz.ch/mailman/listinfo/r-help > >>> >PLEASE do read the posting guide > >>> >http://www.R-project.org/posting-guide.html > >>> >and provide commented, minimal, self-contained, reproducible code. > >>> > >>> > >> > >> > >> -- > >> *Victoria Concordia Crescit* > >> > > > > > > > >-- > >*Victoria Concordia Crescit* > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >[hidden email] mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- *Victoria Concordia Crescit* [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
