Extraneous full stop in csv read

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Extraneous full stop in csv read

JohnDee
I ran into a puzzling minor behaviour I would like to understand.
Reading in a csv file, I find an extraneous "." after a column header,
"in" [short for "inches"] thus, "in.". Is this due to "in" being
reserved?  I initially blamed this on RStudio or to processing the data
through LibreCalc. However, the same result occurs in a console R
session.  Sending the file to the console via less reveals no strange
characters in the first line.  The data is California statewide
rainfall which was screen captured from the Western Regional Climate
Center web site.

First 15 lines including header line:

"yr","mo","Data","in"
1895,1,8243,8.243
1895,2,2265,2.265
1895,3,2340,2.34
1895,4,1014,1.014
1895,5,1281,1.281
1895,6,58,0.058
1895,7,156,0.156
1895,8,140,0.14
1895,9,1087,1.087
1895,10,322,0.322
1895,11,1331,1.331
1895,12,2428,2.428
1896,1,7156,7.156
1896,2,712,0.712
1896,3,2982,2.982

File read in as follows:

x <- read.csv('DRI-mo-prp.csv', header = T)

Structure:

 str(x)
'data.frame':   1469 obs. of  4 variables:
 $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
 $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
 $ in. : num  8.24 2.27 2.34 1.01 1.28 ...
[note "in" is now "in."]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extraneous full stop in csv read

jholtman
try the 'read_csv' function in the 'readr' package:

> x <- readr::read_csv('"yr","mo","Data","in"
+ 1895,1,8243,8.243
+ 1895,2,2265,2.265
+ 1895,3,2340,2.34
+ 1895,4,1014,1.014
+ 1895,5,1281,1.281
+ 1895,6,58,0.058
+ 1895,7,156,0.156
+ 1895,8,140,0.14
+ 1895,9,1087,1.087
+ 1895,10,322,0.322
+ 1895,11,1331,1.331
+ 1895,12,2428,2.428
+ 1896,1,7156,7.156
+ 1896,2,712,0.712
+ 1896,3,2982,2.982
+ ')
> str(x)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 15 obs. of  4 variables:
 $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
 $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
 $ in  : num  8.24 2.27 2.34 1.01 1.28 ...


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Jun 28, 2017 at 7:30 PM, John <[hidden email]> wrote:

> I ran into a puzzling minor behaviour I would like to understand.
> Reading in a csv file, I find an extraneous "." after a column header,
> "in" [short for "inches"] thus, "in.". Is this due to "in" being
> reserved?  I initially blamed this on RStudio or to processing the data
> through LibreCalc. However, the same result occurs in a console R
> session.  Sending the file to the console via less reveals no strange
> characters in the first line.  The data is California statewide
> rainfall which was screen captured from the Western Regional Climate
> Center web site.
>
> First 15 lines including header line:
>
> "yr","mo","Data","in"
> 1895,1,8243,8.243
> 1895,2,2265,2.265
> 1895,3,2340,2.34
> 1895,4,1014,1.014
> 1895,5,1281,1.281
> 1895,6,58,0.058
> 1895,7,156,0.156
> 1895,8,140,0.14
> 1895,9,1087,1.087
> 1895,10,322,0.322
> 1895,11,1331,1.331
> 1895,12,2428,2.428
> 1896,1,7156,7.156
> 1896,2,712,0.712
> 1896,3,2982,2.982
>
> File read in as follows:
>
> x <- read.csv('DRI-mo-prp.csv', header = T)
>
> Structure:
>
>  str(x)
> 'data.frame':   1469 obs. of  4 variables:
>  $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
>  $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
>  $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
>  $ in. : num  8.24 2.27 2.34 1.01 1.28 ...
> [note "in" is now "in."]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extraneous full stop in csv read

jholtman
In reply to this post by JohnDee
or use the 'check.names = FALSE':

> x <- read.csv(text = '"yr","mo","Data","in"
+ 1895,1,8243,8.243
+ 1895,2,2265,2.265
+ 1895,3,2340,2.34
+ 1895,4,1014,1.014
+ 1895,5,1281,1.281
+ 1895,6,58,0.058
+ 1895,7,156,0.156
+ 1895,8,140,0.14
+ 1895,9,1087,1.087
+ 1895,10,322,0.322
+ 1895,11,1331,1.331
+ 1895,12,2428,2.428
+ 1896,1,7156,7.156
+ 1896,2,712,0.712
+ 1896,3,2982,2.982
+ ', check.names = FALSE)
> str(x)
'data.frame': 15 obs. of  4 variables:
 $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
 $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
 $ in  : num  8.24 2.27 2.34 1.01 1.28 ...


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Jun 28, 2017 at 7:30 PM, John <[hidden email]> wrote:

> I ran into a puzzling minor behaviour I would like to understand.
> Reading in a csv file, I find an extraneous "." after a column header,
> "in" [short for "inches"] thus, "in.". Is this due to "in" being
> reserved?  I initially blamed this on RStudio or to processing the data
> through LibreCalc. However, the same result occurs in a console R
> session.  Sending the file to the console via less reveals no strange
> characters in the first line.  The data is California statewide
> rainfall which was screen captured from the Western Regional Climate
> Center web site.
>
> First 15 lines including header line:
>
> "yr","mo","Data","in"
> 1895,1,8243,8.243
> 1895,2,2265,2.265
> 1895,3,2340,2.34
> 1895,4,1014,1.014
> 1895,5,1281,1.281
> 1895,6,58,0.058
> 1895,7,156,0.156
> 1895,8,140,0.14
> 1895,9,1087,1.087
> 1895,10,322,0.322
> 1895,11,1331,1.331
> 1895,12,2428,2.428
> 1896,1,7156,7.156
> 1896,2,712,0.712
> 1896,3,2982,2.982
>
> File read in as follows:
>
> x <- read.csv('DRI-mo-prp.csv', header = T)
>
> Structure:
>
>  str(x)
> 'data.frame':   1469 obs. of  4 variables:
>  $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
>  $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
>  $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
>  $ in. : num  8.24 2.27 2.34 1.01 1.28 ...
> [note "in" is now "in."]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extraneous full stop in csv read

David Winsemius
In reply to this post by JohnDee

> On Jun 28, 2017, at 4:30 PM, John <[hidden email]> wrote:
>
> I ran into a puzzling minor behaviour I would like to understand.
> Reading in a csv file, I find an extraneous "." after a column header,
> "in" [short for "inches"] thus, "in.". Is this due to "in" being
> reserved?  I initially blamed this on RStudio or to processing the data
> through LibreCalc. However, the same result occurs in a console R
> session.  Sending the file to the console via less reveals no strange
> characters in the first line.  The data is California statewide
> rainfall which was screen captured from the Western Regional Climate
> Center web site.
>
> First 15 lines including header line:
>
> "yr","mo","Data","in"
> 1895,1,8243,8.243
> 1895,2,2265,2.265
> 1895,3,2340,2.34
> 1895,4,1014,1.014
> 1895,5,1281,1.281
> 1895,6,58,0.058
> 1895,7,156,0.156
> 1895,8,140,0.14
> 1895,9,1087,1.087
> 1895,10,322,0.322
> 1895,11,1331,1.331
> 1895,12,2428,2.428
> 1896,1,7156,7.156
> 1896,2,712,0.712
> 1896,3,2982,2.982
>
> File read in as follows:
>
> x <- read.csv('DRI-mo-prp.csv', header = T)

If I change one of those other headers to "for", I also see the period-suffix appended, which supports your theory about reserved words being protected. If for some reason this were important to you, hten I'd suggest first looking at the code for make.names which in turn indicates that it's done with a .Internal call, so you'll need to look at the source code for the base-package.
--
David.

>
> Structure:
>
> str(x)
> 'data.frame':   1469 obs. of  4 variables:
> $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
> $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
> $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
> $ in. : num  8.24 2.27 2.34 1.01 1.28 ...
> [note "in" is now "in."]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extraneous full stop in csv read

Duncan Murdoch-2
In reply to this post by JohnDee
On 28/06/2017 7:30 PM, John wrote:

> I ran into a puzzling minor behaviour I would like to understand.
> Reading in a csv file, I find an extraneous "." after a column header,
> "in" [short for "inches"] thus, "in.". Is this due to "in" being
> reserved?  I initially blamed this on RStudio or to processing the data
> through LibreCalc. However, the same result occurs in a console R
> session.  Sending the file to the console via less reveals no strange
> characters in the first line.  The data is California statewide
> rainfall which was screen captured from the Western Regional Climate
> Center web site.
>
> First 15 lines including header line:
>
> "yr","mo","Data","in"
> 1895,1,8243,8.243
> 1895,2,2265,2.265
> 1895,3,2340,2.34
> 1895,4,1014,1.014
> 1895,5,1281,1.281
> 1895,6,58,0.058
> 1895,7,156,0.156
> 1895,8,140,0.14
> 1895,9,1087,1.087
> 1895,10,322,0.322
> 1895,11,1331,1.331
> 1895,12,2428,2.428
> 1896,1,7156,7.156
> 1896,2,712,0.712
> 1896,3,2982,2.982
>
> File read in as follows:
>
> x <- read.csv('DRI-mo-prp.csv', header = T)
>
> Structure:
>
>  str(x)
> 'data.frame':   1469 obs. of  4 variables:
>  $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
>  $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
>  $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
>  $ in. : num  8.24 2.27 2.34 1.01 1.28 ...
> [note "in" is now "in."]

Yes, "in" is not a valid variable name, because of its syntactic use.
You can stop this correction by setting check.names=FALSE in your call
to read.csv.  This will make it a little tricky to deal with in some
situations, e.g.

 > x <- data.frame(4)
 > names(x) <- "in"
 > x
   in
1  4
 > x$in
Error: unexpected 'in' in "x$in"

but you can work around this problem: x[, "in"] and x$`in` are both fine.

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.