|
I have a text file that has semi-colon separated values. The table is nearly 10,000 by 585. The files looks as follows:
******************************************* First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4 Unit1 Unit2 Unit3 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 ******************************************* The first three lines need to be skipped. Moreover, line 5 doesn't have units for all the variables and hence, has to be skipped as well. Effectively, I want the following to be read to a dataframe skipping rows 1, 2, 3 and 5. ******************************************* variable1 Variable2 Variable3 Variable4 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 ******************************************* I tried using read.table with skip for line 1-3 as follows inputData <- read.table("test.txt",sep = ";",skip = 3) but the line 4 is creating problem with the following error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 3 did not have 585 elements Can someone help me with this? Thank you. Ravi |
|
Hello,
Try the following. head <- readLines("test.txt", n=4)[4] dat <- read.table("test.txt", skip=5) names(dat) <- unlist(strsplit(head, " ")) dat hope this helps, Rui Barradas Em 09-07-2012 11:23, vioravis escreveu: > I have a text file that has semi-colon separated values. The table is nearly > 10,000 by 585. The files looks as follows: > > ******************************************* > First line: Skip this line > Second line: skip this line > Third line: skip this line > variable1 Variable2 Variable3 Variable4 > Unit1 Unit2 Unit3 > 10 0.1 0.01 0.001 > 20 0.2 0.02 0.002 > 30 0.3 0.03 0.003 > 40 0.4 0.04 0.004 > ******************************************* > > The first three lines need to be skipped. Moreover, line 5 doesn't have > units for all the variables and hence, has to be skipped as well. > Effectively, I want the following to be read to a dataframe skipping rows 1, > 2, 3 and 5. > > ******************************************* > variable1 Variable2 Variable3 Variable4 > 10 0.1 0.01 0.001 > 20 0.2 0.02 0.002 > 30 0.3 0.03 0.003 > 40 0.4 0.04 0.004 > ******************************************* > > I tried using read.table with skip for line 1-3 as follows > > inputData <- read.table("test.txt",sep = ";",skip = 3) > > but the line 4 is creating problem with the following error: > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 3 did not have 585 elements > > Can someone help me with this? > > Thank you. > > Ravi > > -- > View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by vioravis
Hi, I guess you should have "fill=TRUE" in the read.table. dat1<-read.table(text=" First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4 Unit1 Unit2 Unit3 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 ",sep="",skip=4, fill=TRUE,header=TRUE) dat1<-dat1[-1,] row.names(dat1)<-1:nrow(dat1) dat1 variable1 Variable2 Variable3 Variable4 1 10 0.1 0.01 0.001 2 20 0.2 0.02 0.002 3 30 0.3 0.03 0.003 4 40 0.4 0.04 0.004 A.K. ----- Original Message ----- From: vioravis <[hidden email]> To: [hidden email] Cc: Sent: Monday, July 9, 2012 6:23 AM Subject: [R] Skipping lines and incomplete rows I have a text file that has semi-colon separated values. The table is nearly 10,000 by 585. The files looks as follows: ******************************************* First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4 Unit1 Unit2 Unit3 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 ******************************************* The first three lines need to be skipped. Moreover, line 5 doesn't have units for all the variables and hence, has to be skipped as well. Effectively, I want the following to be read to a dataframe skipping rows 1, 2, 3 and 5. ******************************************* variable1 Variable2 Variable3 Variable4 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 ******************************************* I tried using read.table with skip for line 1-3 as follows inputData <- read.table("test.txt",sep = ";",skip = 3) but the line 4 is creating problem with the following error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 3 did not have 585 elements Can someone help me with this? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by vioravis
Hello,
Just now I checked reading directly from .txt file instead of the one showed in my earlier reply, #Use skip=3 instead of 4. dat1<-read.table("dat1.txt",sep="",skip=3,fill=TRUE,header=TRUE) dat1<-dat1[-1,] row.names(dat1)<-1:nrow(dat1) dat1 variable1 Variable2 Variable3 Variable4 1 10 0.1 0.01 0.001 2 20 0.2 0.02 0.002 3 30 0.3 0.03 0.003 4 40 0.4 0.04 0.004 Hope it works. A.K. ----- Original Message ----- From: vioravis <[hidden email]> To: [hidden email] Cc: Sent: Monday, July 9, 2012 6:23 AM Subject: [R] Skipping lines and incomplete rows I have a text file that has semi-colon separated values. The table is nearly 10,000 by 585. The files looks as follows: ******************************************* First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4 Unit1 Unit2 Unit3 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 ******************************************* The first three lines need to be skipped. Moreover, line 5 doesn't have units for all the variables and hence, has to be skipped as well. Effectively, I want the following to be read to a dataframe skipping rows 1, 2, 3 and 5. ******************************************* variable1 Variable2 Variable3 Variable4 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 ******************************************* I tried using read.table with skip for line 1-3 as follows inputData <- read.table("test.txt",sep = ";",skip = 3) but the line 4 is creating problem with the following error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 3 did not have 585 elements Can someone help me with this? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks a lot Rui and Arun.
The methods work fine with the data I gave but when I tried the two methods with the following semi-colon separated data using sep = ";". Only the first 3 columnns are read properly rest of the columns are either empty or NAs. ********************************************************************************************** Remove this line Remove this line Remove this line Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2 ;[m/s];[°];°C;[hPa];[MWh];[MWh] 1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112 1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455 *********************************************************************************************** I used the following code: dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE) dat1<-dat1[-1,] row.names(dat1)<-1:nrow(dat1) Could you please let me know what is wrong with this approach? Thank you. Ravi |
|
Hello,
My approach was slightly different, to use readLines to take care of the header and read.table for the data. This works with the new dataset you've posted, but we must use the option comment.char = "". Try the following. head <- readLines("test.txt", n=4)[4] dat <- read.table("test.txt", skip=5, sep=";", stringsAsFactors=FALSE, comment.char="c") names(dat) <- unlist(strsplit(head, ";")) dat$Time <- as.Date(dat$Time, format="%m/%d/%Y") dat$Temp[dat$Temp == '#N/A'] <- NA dat$Press[dat$Press == '#N/A'] <- NA dat It works with me, good luck. Rui Barradas Em 10-07-2012 06:41, vioravis escreveu: > Thanks a lot Rui and Arun. > > The methods work fine with the data I gave but when I tried the two methods > with the following semi-colon separated data using sep = ";". Only the first > 3 columnns are read properly rest of the columns are either empty or NAs. > > > ********************************************************************************************** > Remove this line > Remove this line > Remove this line > Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2 > ;[m/s];[°];°C;[hPa];[MWh];[MWh] > 1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 > 1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 > 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112 > 1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000 > 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455 > *********************************************************************************************** > > I used the following code: > dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE) > dat1<-dat1[-1,] > row.names(dat1)<-1:nrow(dat1) > > Could you please let me know what is wrong with this approach? > > Thank you. > > Ravi > > -- > View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Or maybe it's better to coerce Temp and Press to numeric, if they are
variables temperature and presssure. dat$Time <- as.Date(dat$Time, format="%m/%d/%Y") dat$Temp <- as.numeric(dat$Temp) dat$Press <- as.numeric(dat$Press) This makes those '#N/A' values NA. Rui Barradas Em 10-07-2012 09:34, Rui Barradas escreveu: > Hello, > > My approach was slightly different, to use readLines to take care of the > header and read.table for the data. This works with the new dataset > you've posted, but we must use the option comment.char = "". > > Try the following. > > > head <- readLines("test.txt", n=4)[4] > dat <- read.table("test.txt", skip=5, sep=";", stringsAsFactors=FALSE, > comment.char="c") > names(dat) <- unlist(strsplit(head, ";")) > > dat$Time <- as.Date(dat$Time, format="%m/%d/%Y") > dat$Temp[dat$Temp == '#N/A'] <- NA > dat$Press[dat$Press == '#N/A'] <- NA > dat > > > It works with me, good luck. > > Rui Barradas > > Em 10-07-2012 06:41, vioravis escreveu: >> Thanks a lot Rui and Arun. >> >> The methods work fine with the data I gave but when I tried the two >> methods >> with the following semi-colon separated data using sep = ";". Only the >> first >> 3 columnns are read properly rest of the columns are either empty or NAs. >> >> >> ********************************************************************************************** >> >> Remove this line >> Remove this line >> Remove this line >> Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2 >> ;[m/s];[°];°C;[hPa];[MWh];[MWh] >> 1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 >> 1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 >> 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112 >> 1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000 >> 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455 >> *********************************************************************************************** >> >> >> I used the following code: >> dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE) >> dat1<-dat1[-1,] >> row.names(dat1)<-1:nrow(dat1) >> >> Could you please let me know what is wrong with this approach? >> >> Thank you. >> >> Ravi >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html >> >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by vioravis
Hello Ravi,
I was not aware that your dataset have special character "#" before NA. If it was just plain NA, it would have worked. So, It's not because of sep= ";". See below: #Without "#" dat1<-read.table(text=" Remove this line Remove this line Remove this line Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2 ;[m/s];[°];°C;[hPa];[MWh];[MWh] 1/1/2012;0.0;0;NA;NA;0.0000;0.0000 1/2/2012;0.0;0;NA;NA;0.0000;0.0000 1/3/2012;0.0;0;NA;NA;1.5651;2.2112 1/4/2012;0.0;0;NA;NA;1.0000;2.0000 1/5/2012;0.0;0;NA;NA;3.2578;7.5455 ",sep=";",header=TRUE,fill=TRUE,skip=4,stringsAsFactors=FALSE) > dat1 Time Actual.Speed Actual.Direction Temp Press Value1 Value2 1 [m/s] [°] °C [hPa] [MWh] [MWh] 2 1/1/2012 0.0 0 <NA> <NA> 0.0000 0.0000 3 1/2/2012 0.0 0 <NA> <NA> 0.0000 0.0000 4 1/3/2012 0.0 0 <NA> <NA> 1.5651 2.2112 5 1/4/2012 0.0 0 <NA> <NA> 1.0000 2.0000 6 1/5/2012 0.0 0 <NA> <NA> 3.2578 7.5455 #With "#": Reading data from the .txt file. # In the documentation (http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html), comment.char="#" is an option in the read.table, but unfortunately it shows only blank columns after the first three columns. #I think Rui's method of reading header separately using readLines might be a good option. Or if you know the columnheadings, then you can do this: dat2<-read.table("dat2.txt",skip=4,col.names=c("Time","Actual Speed","Actual Direction", "Temp","Press","Value1","Value2"),fill=TRUE,sep=";",comment.char="c") > dat2 Time Actual.Speed Actual.Direction Temp Press Value1 Value2 1 [m/s] [°] °C [hPa] [MWh] [MWh] 2 1/1/2012 0.0 0 #NA #NA 0.0000 0.0000 3 1/2/2012 0.0 0 #NA #NA 0.0000 0.0000 4 1/3/2012 0.0 0 #NA #NA 1.5651 2.2112 5 1/4/2012 0.0 0 #NA #NA 1.0000 2.0000 6 1/5/2012 0.0 0 #NA #NA 3.2578 7.5455 A.K. ----- Original Message ----- From: vioravis <[hidden email]> To: [hidden email] Cc: Sent: Tuesday, July 10, 2012 1:41 AM Subject: Re: [R] Skipping lines and incomplete rows Thanks a lot Rui and Arun. The methods work fine with the data I gave but when I tried the two methods with the following semi-colon separated data using sep = ";". Only the first 3 columnns are read properly rest of the columns are either empty or NAs. ********************************************************************************************** Remove this line Remove this line Remove this line Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2 ;[m/s];[°];°C;[hPa];[MWh];[MWh] 1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112 1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455 *********************************************************************************************** I used the following code: dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE) dat1<-dat1[-1,] row.names(dat1)<-1:nrow(dat1) Could you please let me know what is wrong with this approach? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
You actually jsut need to say what the comment char and what the
na.strings are: read.table(filename, sep=";", skip=3, header=TRUE, na.string="#N/A", comment.char="") Uwe Ligges On 10.07.2012 19:30, arun wrote: > Hello Ravi, > > I was not aware that your dataset have special character "#" before NA. If it was just plain NA, it would have worked. So, It's not because of sep= ";". > > See below: > > #Without "#" > dat1<-read.table(text=" > Remove this line > Remove this line > Remove this line > Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2 > ;[m/s];[°];°C;[hPa];[MWh];[MWh] > 1/1/2012;0.0;0;NA;NA;0.0000;0.0000 > 1/2/2012;0.0;0;NA;NA;0.0000;0.0000 > 1/3/2012;0.0;0;NA;NA;1.5651;2.2112 > 1/4/2012;0.0;0;NA;NA;1.0000;2.0000 > 1/5/2012;0.0;0;NA;NA;3.2578;7.5455 > ",sep=";",header=TRUE,fill=TRUE,skip=4,stringsAsFactors=FALSE) >> dat1 > Time Actual.Speed Actual.Direction Temp Press Value1 Value2 > 1 [m/s] [°] °C [hPa] [MWh] [MWh] > 2 1/1/2012 0.0 0 <NA> <NA> 0.0000 0.0000 > 3 1/2/2012 0.0 0 <NA> <NA> 0.0000 0.0000 > 4 1/3/2012 0.0 0 <NA> <NA> 1.5651 2.2112 > 5 1/4/2012 0.0 0 <NA> <NA> 1.0000 2.0000 > 6 1/5/2012 0.0 0 <NA> <NA> 3.2578 7.5455 > > > #With "#": Reading data from the .txt file. > > # In the documentation (http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html), comment.char="#" is an option in the read.table, but unfortunately it shows only blank columns after the first three columns. > > > #I think Rui's method of reading header separately using readLines might be a good option. Or if you know the columnheadings, then you can do this: > > dat2<-read.table("dat2.txt",skip=4,col.names=c("Time","Actual Speed","Actual Direction", "Temp","Press","Value1","Value2"),fill=TRUE,sep=";",comment.char="c") >> dat2 > Time Actual.Speed Actual.Direction Temp Press Value1 Value2 > 1 [m/s] [°] °C [hPa] [MWh] [MWh] > 2 1/1/2012 0.0 0 #NA #NA 0.0000 0.0000 > 3 1/2/2012 0.0 0 #NA #NA 0.0000 0.0000 > 4 1/3/2012 0.0 0 #NA #NA 1.5651 2.2112 > 5 1/4/2012 0.0 0 #NA #NA 1.0000 2.0000 > 6 1/5/2012 0.0 0 #NA #NA 3.2578 7.5455 > > > A.K. > > > > > > > > > > > ----- Original Message ----- > From: vioravis <[hidden email]> > To: [hidden email] > Cc: > Sent: Tuesday, July 10, 2012 1:41 AM > Subject: Re: [R] Skipping lines and incomplete rows > > Thanks a lot Rui and Arun. > > The methods work fine with the data I gave but when I tried the two methods > with the following semi-colon separated data using sep = ";". Only the first > 3 columnns are read properly rest of the columns are either empty or NAs. > > > ********************************************************************************************** > Remove this line > Remove this line > Remove this line > Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2 > ;[m/s];[°];°C;[hPa];[MWh];[MWh] > 1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 > 1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000 > 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112 > 1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000 > 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455 > *********************************************************************************************** > > I used the following code: > dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE) > dat1<-dat1[-1,] > row.names(dat1)<-1:nrow(dat1) > > Could you please let me know what is wrong with this approach? > > Thank you. > > Ravi > > -- > View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks a lot for the guidance. I have another text file with a time stamp and an empty column as given below:
******************************************************************************************** First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4 Unit1 Unit2 Unit3 11/1/2004 0:00 0.1 0.001 11/1/2004 0:10 0.2 0.002 11/1/2004 0:20 0.3 0.003 11/1/2004 0:30 0.4 0.004 ******************************************************************************************** This is space separated text file. When I use the following code: head <- readLines("testInput.txt", n=4)[4] dat <- read.table("testInput.txt", skip=5, sep="",fill = TRUE, stringsAsFactors=FALSE) names(dat) <- unlist(strsplit(head, " ")) I get the following output: > str(dat) 'data.frame': 4 obs. of 4 variables: $ variable1: chr "11/1/2004" "11/1/2004" "11/1/2004" "11/1/2004" $ Variable2: chr "0:00" "0:10" "0:20" "0:30" $ Variable3: num 0.1 0.2 0.3 0.4 $ Variable4: num 0.001 0.002 0.003 0.004 Variable1's date and time gets split as Variable1 and Variable2 whereas they should both be part of Variable1. Also, the empty column is missing from the data frame. Is there a way to handle these two cases? Thank you. Ravi |
|
Hello,
That seems easy. dat$variable1 <- with(dat, paste(variable1, variable2)) dat$variable2 <- dat$variable3 dat$variable3 <- "" Then convert variable1 to date/time using as.POSIXct or strptime See ?strptime. Hope this helps, Rui Barradas Em 11-07-2012 13:30, vioravis escreveu: > Thanks a lot for the guidance. I have another text file with a time stamp and > an empty column as given below: > > ******************************************************************************************** > First line: Skip this line > Second line: skip this line > Third line: skip this line > variable1 Variable2 Variable3 Variable4 > Unit1 Unit2 Unit3 > 11/1/2004 0:00 0.1 0.001 > 11/1/2004 0:10 0.2 0.002 > 11/1/2004 0:20 0.3 0.003 > 11/1/2004 0:30 0.4 0.004 > ******************************************************************************************** > > This is space separated text file. When I use the following code: > > head <- readLines("testInput.txt", n=4)[4] > dat <- read.table("testInput.txt", skip=5, sep="",fill = TRUE, > stringsAsFactors=FALSE) > names(dat) <- unlist(strsplit(head, " ")) > > I get the following output: > >> str(dat) > 'data.frame': 4 obs. of 4 variables: > $ variable1: chr "11/1/2004" "11/1/2004" "11/1/2004" "11/1/2004" > $ Variable2: chr "0:00" "0:10" "0:20" "0:30" > $ Variable3: num 0.1 0.2 0.3 0.4 > $ Variable4: num 0.001 0.002 0.003 0.004 > > Variable1's date and time gets split as Variable1 and Variable2 whereas they > should both be part of Variable1. > > Also, the empty column is missing from the data frame. > > Is there a way to handle these two cases? > > Thank you. > > Ravi > > > -- > View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4636129.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by vioravis
Hello,
Try this: dat3<-read.table("dat3.txt",sep="",skip=3,header=TRUE,fill=TRUE) dat4<-data.frame(variable1=paste(dat3[,1],dat3[,2],sep=" "),Variable2=dat3[,3],Variable3="",Variable4=dat3[,4]) dat4<-dat4[-1,] row.names(dat4)<-1:nrow(dat4) dat4 variable1 Variable2 Variable3 Variable4 1 11/1/2004 0:00 0.1 0.001 2 11/1/2004 0:10 0.2 0.002 3 11/1/2004 0:20 0.3 0.003 4 11/1/2004 0:30 0.4 0.004 #If you need to convert date to class "Date" dat4$variable1<-as.Date(dat4[,1],format="%m/%d/%Y %H:%M") A.K. ----- Original Message ----- From: vioravis <[hidden email]> To: [hidden email] Cc: Sent: Wednesday, July 11, 2012 8:30 AM Subject: Re: [R] Skipping lines and incomplete rows Thanks a lot for the guidance. I have another text file with a time stamp and an empty column as given below: ******************************************************************************************** First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4 Unit1 Unit2 Unit3 11/1/2004 0:00 0.1 0.001 11/1/2004 0:10 0.2 0.002 11/1/2004 0:20 0.3 0.003 11/1/2004 0:30 0.4 0.004 ******************************************************************************************** This is space separated text file. When I use the following code: head <- readLines("testInput.txt", n=4)[4] dat <- read.table("testInput.txt", skip=5, sep="",fill = TRUE, stringsAsFactors=FALSE) names(dat) <- unlist(strsplit(head, " ")) I get the following output: > str(dat) 'data.frame': 4 obs. of 4 variables: $ variable1: chr "11/1/2004" "11/1/2004" "11/1/2004" "11/1/2004" $ Variable2: chr "0:00" "0:10" "0:20" "0:30" $ Variable3: num 0.1 0.2 0.3 0.4 $ Variable4: num 0.001 0.002 0.003 0.004 Variable1's date and time gets split as Variable1 and Variable2 whereas they should both be part of Variable1. Also, the empty column is missing from the data frame. Is there a way to handle these two cases? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4636129.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
