Skipping lines and incomplete rows

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Skipping lines and incomplete rows

vioravis
I have a text file that has semi-colon separated values. The table is nearly 10,000 by 585. The files looks as follows:

*******************************************
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
                   Unit1         Unit2         Unit3
10              0.1               0.01           0.001
20              0.2               0.02           0.002
30              0.3               0.03           0.003
40              0.4               0.04           0.004
*******************************************

The first three lines need to be skipped. Moreover, line 5 doesn't have units for all the variables and hence, has to be skipped as well. Effectively, I want the following to be read to a dataframe skipping rows 1, 2, 3 and 5.

*******************************************
variable1 Variable2 Variable3 Variable4
10              0.1               0.01           0.001
20              0.2               0.02           0.002
30              0.3               0.03           0.003
40              0.4               0.04           0.004
*******************************************

I tried using read.table with skip for line 1-3 as follows

inputData <- read.table("test.txt",sep = ";",skip = 3)

but the line 4 is creating problem with the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 3 did not have 585 elements

Can someone help me with this?

Thank you.

Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

Rui Barradas
Hello,

Try the following.

head <- readLines("test.txt", n=4)[4]
dat <- read.table("test.txt", skip=5)
names(dat) <- unlist(strsplit(head, " "))
dat


hope this helps,

Rui Barradas

Em 09-07-2012 11:23, vioravis escreveu:

> I have a text file that has semi-colon separated values. The table is nearly
> 10,000 by 585. The files looks as follows:
>
> *******************************************
> First line: Skip this line
> Second line: skip this line
> Third line: skip this line
> variable1 Variable2 Variable3 Variable4
>                     Unit1         Unit2         Unit3
> 10              0.1               0.01           0.001
> 20              0.2               0.02           0.002
> 30              0.3               0.03           0.003
> 40              0.4               0.04           0.004
> *******************************************
>
> The first three lines need to be skipped. Moreover, line 5 doesn't have
> units for all the variables and hence, has to be skipped as well.
> Effectively, I want the following to be read to a dataframe skipping rows 1,
> 2, 3 and 5.
>
> *******************************************
> variable1 Variable2 Variable3 Variable4
> 10              0.1               0.01           0.001
> 20              0.2               0.02           0.002
> 30              0.3               0.03           0.003
> 40              0.4               0.04           0.004
> *******************************************
>
> I tried using read.table with skip for line 1-3 as follows
>
> inputData <- read.table("test.txt",sep = ";",skip = 3)
>
> but the line 4 is creating problem with the following error:
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
>    line 3 did not have 585 elements
>
> Can someone help me with this?
>
> Thank you.
>
> Ravi
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

arun kirshna
In reply to this post by vioravis


Hi,

I guess you should have "fill=TRUE" in the read.table.

dat1<-read.table(text="
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
                  Unit1        Unit2        Unit3
10              0.1              0.01          0.001
20              0.2              0.02          0.002
30              0.3              0.03          0.003
40              0.4              0.04          0.004
",sep="",skip=4, fill=TRUE,header=TRUE)
dat1<-dat1[-1,]
row.names(dat1)<-1:nrow(dat1)
dat1
  variable1 Variable2 Variable3 Variable4
1        10       0.1      0.01     0.001
2        20       0.2      0.02     0.002
3        30       0.3      0.03     0.003
4        40       0.4      0.04     0.004


A.K.




----- Original Message -----
From: vioravis <[hidden email]>
To: [hidden email]
Cc:
Sent: Monday, July 9, 2012 6:23 AM
Subject: [R] Skipping lines and incomplete rows

I have a text file that has semi-colon separated values. The table is nearly
10,000 by 585. The files looks as follows:

*******************************************
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
                   Unit1         Unit2         Unit3
10              0.1               0.01           0.001
20              0.2               0.02           0.002
30              0.3               0.03           0.003
40              0.4               0.04           0.004
*******************************************

The first three lines need to be skipped. Moreover, line 5 doesn't have
units for all the variables and hence, has to be skipped as well.
Effectively, I want the following to be read to a dataframe skipping rows 1,
2, 3 and 5.

*******************************************
variable1 Variable2 Variable3 Variable4
10              0.1               0.01           0.001
20              0.2               0.02           0.002
30              0.3               0.03           0.003
40              0.4               0.04           0.004
*******************************************

I tried using read.table with skip for line 1-3 as follows

inputData <- read.table("test.txt",sep = ";",skip = 3)

but the line 4 is creating problem with the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
  line 3 did not have 585 elements

Can someone help me with this?

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

arun kirshna
In reply to this post by vioravis
Hello,

Just now I checked reading directly from .txt file instead of the one showed in my earlier reply,


#Use skip=3 instead of 4.

dat1<-read.table("dat1.txt",sep="",skip=3,fill=TRUE,header=TRUE)
dat1<-dat1[-1,]
 row.names(dat1)<-1:nrow(dat1)
 dat1
  variable1 Variable2 Variable3 Variable4
1        10       0.1      0.01     0.001
2        20       0.2      0.02     0.002
3        30       0.3      0.03     0.003
4        40       0.4      0.04     0.004

Hope it works.


A.K.




----- Original Message -----
From: vioravis <[hidden email]>
To: [hidden email]
Cc:
Sent: Monday, July 9, 2012 6:23 AM
Subject: [R] Skipping lines and incomplete rows

I have a text file that has semi-colon separated values. The table is nearly
10,000 by 585. The files looks as follows:

*******************************************
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
                   Unit1         Unit2         Unit3
10              0.1               0.01           0.001
20              0.2               0.02           0.002
30              0.3               0.03           0.003
40              0.4               0.04           0.004
*******************************************

The first three lines need to be skipped. Moreover, line 5 doesn't have
units for all the variables and hence, has to be skipped as well.
Effectively, I want the following to be read to a dataframe skipping rows 1,
2, 3 and 5.

*******************************************
variable1 Variable2 Variable3 Variable4
10              0.1               0.01           0.001
20              0.2               0.02           0.002
30              0.3               0.03           0.003
40              0.4               0.04           0.004
*******************************************

I tried using read.table with skip for line 1-3 as follows

inputData <- read.table("test.txt",sep = ";",skip = 3)

but the line 4 is creating problem with the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
  line 3 did not have 585 elements

Can someone help me with this?

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

vioravis
Thanks a lot Rui and Arun.

The methods work fine with the data I gave but when I tried the two methods with the following semi-colon separated data using sep = ";". Only the first 3 columnns are read properly rest of the columns are either empty or NAs.


**********************************************************************************************
Remove this line
Remove this line
Remove this line
Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
;[m/s];[°];°C;[hPa];[MWh];[MWh]
1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112
1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000
1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455
***********************************************************************************************

I used the following code:
dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE)
dat1<-dat1[-1,]
row.names(dat1)<-1:nrow(dat1)

Could you please let me know what is wrong with this approach?

Thank you.

Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

Rui Barradas
Hello,

My approach was slightly different, to use readLines to take care of the
header and read.table for the data. This works with the new dataset
you've posted, but we must use the option comment.char = "".

Try the following.


head <- readLines("test.txt", n=4)[4]
dat <- read.table("test.txt", skip=5, sep=";", stringsAsFactors=FALSE,
comment.char="c")
names(dat) <- unlist(strsplit(head, ";"))

dat$Time <- as.Date(dat$Time, format="%m/%d/%Y")
dat$Temp[dat$Temp == '#N/A'] <- NA
dat$Press[dat$Press == '#N/A'] <- NA
dat


It works with me, good luck.

Rui Barradas

Em 10-07-2012 06:41, vioravis escreveu:

> Thanks a lot Rui and Arun.
>
> The methods work fine with the data I gave but when I tried the two methods
> with the following semi-colon separated data using sep = ";". Only the first
> 3 columnns are read properly rest of the columns are either empty or NAs.
>
>
> **********************************************************************************************
> Remove this line
> Remove this line
> Remove this line
> Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
> ;[m/s];[°];°C;[hPa];[MWh];[MWh]
> 1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
> 1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
> 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112
> 1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000
> 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455
> ***********************************************************************************************
>
> I used the following code:
> dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE)
> dat1<-dat1[-1,]
> row.names(dat1)<-1:nrow(dat1)
>
> Could you please let me know what is wrong with this approach?
>
> Thank you.
>
> Ravi
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

Rui Barradas
Or maybe it's better to coerce Temp and Press to numeric, if they are
variables temperature and presssure.

dat$Time <- as.Date(dat$Time, format="%m/%d/%Y")
dat$Temp <- as.numeric(dat$Temp)
dat$Press <- as.numeric(dat$Press)

This makes those '#N/A' values NA.

Rui Barradas

Em 10-07-2012 09:34, Rui Barradas escreveu:

> Hello,
>
> My approach was slightly different, to use readLines to take care of the
> header and read.table for the data. This works with the new dataset
> you've posted, but we must use the option comment.char = "".
>
> Try the following.
>
>
> head <- readLines("test.txt", n=4)[4]
> dat <- read.table("test.txt", skip=5, sep=";", stringsAsFactors=FALSE,
> comment.char="c")
> names(dat) <- unlist(strsplit(head, ";"))
>
> dat$Time <- as.Date(dat$Time, format="%m/%d/%Y")
> dat$Temp[dat$Temp == '#N/A'] <- NA
> dat$Press[dat$Press == '#N/A'] <- NA
> dat
>
>
> It works with me, good luck.
>
> Rui Barradas
>
> Em 10-07-2012 06:41, vioravis escreveu:
>> Thanks a lot Rui and Arun.
>>
>> The methods work fine with the data I gave but when I tried the two
>> methods
>> with the following semi-colon separated data using sep = ";". Only the
>> first
>> 3 columnns are read properly rest of the columns are either empty or NAs.
>>
>>
>> **********************************************************************************************
>>
>> Remove this line
>> Remove this line
>> Remove this line
>> Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
>> ;[m/s];[°];°C;[hPa];[MWh];[MWh]
>> 1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
>> 1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
>> 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112
>> 1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000
>> 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455
>> ***********************************************************************************************
>>
>>
>> I used the following code:
>> dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE)
>> dat1<-dat1[-1,]
>> row.names(dat1)<-1:nrow(dat1)
>>
>> Could you please let me know what is wrong with this approach?
>>
>> Thank you.
>>
>> Ravi
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html
>>
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

arun kirshna
In reply to this post by vioravis
Hello Ravi,

I was not aware that your dataset have special character "#" before NA.  If it was just plain NA, it would have worked.  So, It's not because of sep= ";".

See below:

#Without "#"
dat1<-read.table(text="
 Remove this line
 Remove this line
 Remove this line
 Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
  ;[m/s];[°];°C;[hPa];[MWh];[MWh]
 1/1/2012;0.0;0;NA;NA;0.0000;0.0000
 1/2/2012;0.0;0;NA;NA;0.0000;0.0000
 1/3/2012;0.0;0;NA;NA;1.5651;2.2112
 1/4/2012;0.0;0;NA;NA;1.0000;2.0000
 1/5/2012;0.0;0;NA;NA;3.2578;7.5455
 ",sep=";",header=TRUE,fill=TRUE,skip=4,stringsAsFactors=FALSE)
> dat1
      Time Actual.Speed Actual.Direction Temp Press Value1 Value2
1                 [m/s]              [°]   °C [hPa]  [MWh]  [MWh]
2 1/1/2012          0.0                0 <NA>  <NA> 0.0000 0.0000
3 1/2/2012          0.0                0 <NA>  <NA> 0.0000 0.0000
4 1/3/2012          0.0                0 <NA>  <NA> 1.5651 2.2112
5 1/4/2012          0.0                0 <NA>  <NA> 1.0000 2.0000
6 1/5/2012          0.0                0 <NA>  <NA> 3.2578 7.5455


#With "#": Reading data from the .txt file. 

# In the documentation (http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html), comment.char="#" is an option in the read.table, but unfortunately it shows only blank columns after the first three columns. 


#I think Rui's method of reading header separately using readLines might be a good option.  Or if you know the columnheadings, then you can do this:

dat2<-read.table("dat2.txt",skip=4,col.names=c("Time","Actual Speed","Actual Direction", "Temp","Press","Value1","Value2"),fill=TRUE,sep=";",comment.char="c")
> dat2
      Time Actual.Speed Actual.Direction Temp Press Value1 Value2
1                 [m/s]              [°]   °C [hPa]  [MWh]  [MWh]
2 1/1/2012          0.0                0  #NA   #NA 0.0000 0.0000
3 1/2/2012          0.0                0  #NA   #NA 0.0000 0.0000
4 1/3/2012          0.0                0  #NA   #NA 1.5651 2.2112
5 1/4/2012          0.0                0  #NA   #NA 1.0000 2.0000
6 1/5/2012          0.0                0  #NA   #NA 3.2578 7.5455


A.K.










----- Original Message -----
From: vioravis <[hidden email]>
To: [hidden email]
Cc:
Sent: Tuesday, July 10, 2012 1:41 AM
Subject: Re: [R] Skipping lines and incomplete rows

Thanks a lot Rui and Arun.

The methods work fine with the data I gave but when I tried the two methods
with the following semi-colon separated data using sep = ";". Only the first
3 columnns are read properly rest of the columns are either empty or NAs.


**********************************************************************************************
Remove this line
Remove this line
Remove this line
Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
;[m/s];[°];°C;[hPa];[MWh];[MWh]
1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112
1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000
1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455
***********************************************************************************************

I used the following code:
dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE)
dat1<-dat1[-1,]
row.names(dat1)<-1:nrow(dat1)

Could you please let me know what is wrong with this approach?

Thank you.

Ravi

--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

Uwe Ligges-3
You actually jsut need to say what the comment char and what the
na.strings are:

read.table(filename, sep=";", skip=3, header=TRUE, na.string="#N/A",
comment.char="")

Uwe Ligges


On 10.07.2012 19:30, arun wrote:

> Hello Ravi,
>
> I was not aware that your dataset have special character "#" before NA.  If it was just plain NA, it would have worked.  So, It's not because of sep= ";".
>
> See below:
>
> #Without "#"
> dat1<-read.table(text="
>   Remove this line
>   Remove this line
>   Remove this line
>   Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
>    ;[m/s];[°];°C;[hPa];[MWh];[MWh]
>   1/1/2012;0.0;0;NA;NA;0.0000;0.0000
>   1/2/2012;0.0;0;NA;NA;0.0000;0.0000
>   1/3/2012;0.0;0;NA;NA;1.5651;2.2112
>   1/4/2012;0.0;0;NA;NA;1.0000;2.0000
>   1/5/2012;0.0;0;NA;NA;3.2578;7.5455
>   ",sep=";",header=TRUE,fill=TRUE,skip=4,stringsAsFactors=FALSE)
>> dat1
>        Time Actual.Speed Actual.Direction Temp Press Value1 Value2
> 1                 [m/s]              [°]   °C [hPa]  [MWh]  [MWh]
> 2 1/1/2012          0.0                0 <NA>  <NA> 0.0000 0.0000
> 3 1/2/2012          0.0                0 <NA>  <NA> 0.0000 0.0000
> 4 1/3/2012          0.0                0 <NA>  <NA> 1.5651 2.2112
> 5 1/4/2012          0.0                0 <NA>  <NA> 1.0000 2.0000
> 6 1/5/2012          0.0                0 <NA>  <NA> 3.2578 7.5455
>
>
> #With "#": Reading data from the .txt file.
>
> # In the documentation (http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html), comment.char="#" is an option in the read.table, but unfortunately it shows only blank columns after the first three columns.
>
>
> #I think Rui's method of reading header separately using readLines might be a good option.  Or if you know the columnheadings, then you can do this:
>
> dat2<-read.table("dat2.txt",skip=4,col.names=c("Time","Actual Speed","Actual Direction", "Temp","Press","Value1","Value2"),fill=TRUE,sep=";",comment.char="c")
>> dat2
>        Time Actual.Speed Actual.Direction Temp Press Value1 Value2
> 1                 [m/s]              [°]   °C [hPa]  [MWh]  [MWh]
> 2 1/1/2012          0.0                0  #NA   #NA 0.0000 0.0000
> 3 1/2/2012          0.0                0  #NA   #NA 0.0000 0.0000
> 4 1/3/2012          0.0                0  #NA   #NA 1.5651 2.2112
> 5 1/4/2012          0.0                0  #NA   #NA 1.0000 2.0000
> 6 1/5/2012          0.0                0  #NA   #NA 3.2578 7.5455
>
>
> A.K.
>
>
>
>
>
>
>
>
>
>
> ----- Original Message -----
> From: vioravis <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Tuesday, July 10, 2012 1:41 AM
> Subject: Re: [R] Skipping lines and incomplete rows
>
> Thanks a lot Rui and Arun.
>
> The methods work fine with the data I gave but when I tried the two methods
> with the following semi-colon separated data using sep = ";". Only the first
> 3 columnns are read properly rest of the columns are either empty or NAs.
>
>
> **********************************************************************************************
> Remove this line
> Remove this line
> Remove this line
> Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
> ;[m/s];[°];°C;[hPa];[MWh];[MWh]
> 1/1/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
> 1/2/2012;0.0;0;#N/A;#N/A;0.0000;0.0000
> 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112
> 1/4/2012;0.0;0;#N/A;#N/A;1.0000;2.0000
> 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455
> ***********************************************************************************************
>
> I used the following code:
> dat1<-read.table("testInput.txt",sep=";",skip=3,fill=TRUE,header=TRUE)
> dat1<-dat1[-1,]
> row.names(dat1)<-1:nrow(dat1)
>
> Could you please let me know what is wrong with this approach?
>
> Thank you.
>
> Ravi
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

vioravis
Thanks a lot for the guidance. I have another text file with a time stamp and an empty column as given below:

********************************************************************************************
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
                Unit1     Unit2     Unit3
11/1/2004 0:00  0.1                 0.001
11/1/2004 0:10  0.2                 0.002
11/1/2004 0:20  0.3                 0.003
11/1/2004 0:30  0.4                 0.004
********************************************************************************************

This is space separated text file. When I use the following code:

head <- readLines("testInput.txt", n=4)[4]
dat <- read.table("testInput.txt", skip=5, sep="",fill = TRUE, stringsAsFactors=FALSE)
names(dat) <- unlist(strsplit(head, " "))

I get the following output:

> str(dat)
'data.frame':   4 obs. of  4 variables:
 $ variable1: chr  "11/1/2004" "11/1/2004" "11/1/2004" "11/1/2004"
 $ Variable2: chr  "0:00" "0:10" "0:20" "0:30"
 $ Variable3: num  0.1 0.2 0.3 0.4
 $ Variable4: num  0.001 0.002 0.003 0.004

Variable1's date and time gets split as Variable1 and Variable2 whereas they should both be part of Variable1.

Also, the empty column is missing from the data frame.

Is there a way to handle these two cases?

Thank you.

Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

Rui Barradas
Hello,

That seems easy.

dat$variable1 <- with(dat, paste(variable1, variable2))
dat$variable2 <- dat$variable3
dat$variable3 <- ""

Then convert variable1 to date/time using as.POSIXct or strptime

See ?strptime.

Hope this helps,

Rui Barradas

Em 11-07-2012 13:30, vioravis escreveu:

> Thanks a lot for the guidance. I have another text file with a time stamp and
> an empty column as given below:
>
> ********************************************************************************************
> First line: Skip this line
> Second line: skip this line
> Third line: skip this line
> variable1 Variable2 Variable3 Variable4
>                  Unit1     Unit2     Unit3
> 11/1/2004 0:00  0.1                 0.001
> 11/1/2004 0:10  0.2                 0.002
> 11/1/2004 0:20  0.3                 0.003
> 11/1/2004 0:30  0.4                 0.004
> ********************************************************************************************
>
> This is space separated text file. When I use the following code:
>
> head <- readLines("testInput.txt", n=4)[4]
> dat <- read.table("testInput.txt", skip=5, sep="",fill = TRUE,
> stringsAsFactors=FALSE)
> names(dat) <- unlist(strsplit(head, " "))
>
> I get the following output:
>
>> str(dat)
> 'data.frame':   4 obs. of  4 variables:
>   $ variable1: chr  "11/1/2004" "11/1/2004" "11/1/2004" "11/1/2004"
>   $ Variable2: chr  "0:00" "0:10" "0:20" "0:30"
>   $ Variable3: num  0.1 0.2 0.3 0.4
>   $ Variable4: num  0.001 0.002 0.003 0.004
>
> Variable1's date and time gets split as Variable1 and Variable2 whereas they
> should both be part of Variable1.
>
> Also, the empty column is missing from the data frame.
>
> Is there a way to handle these two cases?
>
> Thank you.
>
> Ravi
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4636129.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Skipping lines and incomplete rows

arun kirshna
In reply to this post by vioravis
Hello,
Try this:
dat3<-read.table("dat3.txt",sep="",skip=3,header=TRUE,fill=TRUE)
 dat4<-data.frame(variable1=paste(dat3[,1],dat3[,2],sep=" "),Variable2=dat3[,3],Variable3="",Variable4=dat3[,4])
 dat4<-dat4[-1,]
row.names(dat4)<-1:nrow(dat4)
dat4
      variable1 Variable2 Variable3 Variable4
1 11/1/2004 0:00       0.1               0.001
2 11/1/2004 0:10       0.2               0.002
3 11/1/2004 0:20       0.3               0.003
4 11/1/2004 0:30       0.4               0.004
#If you need to convert date to class "Date"
dat4$variable1<-as.Date(dat4[,1],format="%m/%d/%Y %H:%M")
A.K.




----- Original Message -----
From: vioravis <[hidden email]>
To: [hidden email]
Cc:
Sent: Wednesday, July 11, 2012 8:30 AM
Subject: Re: [R] Skipping lines and incomplete rows

Thanks a lot for the guidance. I have another text file with a time stamp and
an empty column as given below:

********************************************************************************************
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
                Unit1     Unit2     Unit3
11/1/2004 0:00  0.1                 0.001
11/1/2004 0:10  0.2                 0.002
11/1/2004 0:20  0.3                 0.003
11/1/2004 0:30  0.4                 0.004
********************************************************************************************

This is space separated text file. When I use the following code:

head <- readLines("testInput.txt", n=4)[4]
dat <- read.table("testInput.txt", skip=5, sep="",fill = TRUE,
stringsAsFactors=FALSE)
names(dat) <- unlist(strsplit(head, " "))

I get the following output:

> str(dat)
'data.frame':   4 obs. of  4 variables:
$ variable1: chr  "11/1/2004" "11/1/2004" "11/1/2004" "11/1/2004"
$ Variable2: chr  "0:00" "0:10" "0:20" "0:30"
$ Variable3: num  0.1 0.2 0.3 0.4
$ Variable4: num  0.001 0.002 0.003 0.004

Variable1's date and time gets split as Variable1 and Variable2 whereas they
should both be part of Variable1.

Also, the empty column is missing from the data frame.

Is there a way to handle these two cases?

Thank you.

Ravi


--
View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4636129.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.