Reading in csv with footer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading in csv with footer

Noah Silverman
Hi,

I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format.

Toy example:

label_1, label_2, label_3
1,2,3
3,2,4
2,3,4
Total Rows: 3


When I try to import this into R with:  d <- read.table("foo.csv", header=T, sep=",")
It fails to import properly because of the last line.

Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly.  I don't like this extra layer of processing.

Is there a way to import something like this cleanly in R.

Thanks!

--
Noah
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in csv with footer

Steve Lianoglou-6
Hi,

On Sun, Feb 12, 2012 at 7:05 PM, Noah Silverman <[hidden email]> wrote:

> Hi,
>
> I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format.
>
> Toy example:
>
> label_1, label_2, label_3
> 1,2,3
> 3,2,4
> 2,3,4
> Total Rows: 3
>
> When I try to import this into R with:  d <- read.table("foo.csv", header=T, sep=",")
> It fails to import properly because of the last line.
>
> Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly.  I don't like this extra layer of processing.
>
> Is there a way to import something like this cleanly in R.

This is arguably the file's problem, so I'm not sure how many "clean"
solutions you will find, but one thing you can do is perhaps count the
number of lines in the file, then set the `nrows` argument in your
call to read.table to be 1 less than that.

How to count the lines, though? Assuming you're on *nix (or have
cygwin), you can do something like:

N <- system("wc -l /path/to/file.csv")

(you'll have to do some parsing on N)

You could also first call `readLines` and find the length of the
result, but this would require you to read the file twice, so ... pick
your poison.

Too bad the person authoring the file doesn't prefix those lines with
some comment character ...

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in csv with footer

Henrique Dallazuanna
In reply to this post by Noah Silverman
This works for me:

Lines <- "label_1, label_2, label_3
1,2,3
3,2,4
2,3,4
Total Rows: 3"

d <- head(read.csv(textConnection(Lines)), -1)
closeAllConnections()

On Sun, Feb 12, 2012 at 10:05 PM, Noah Silverman <[hidden email]>wrote:

> Hi,
>
> I have a CSV file that is formatted well, except that the last line is a
> "summary" not is CSV format.
>
> Toy example:
>
> label_1, label_2, label_3
> 1,2,3
> 3,2,4
> 2,3,4
> Total Rows: 3
>
>
> When I try to import this into R with:  d <- read.table("foo.csv",
> header=T, sep=",")
> It fails to import properly because of the last line.
>
> Currently, I have a shell script that strips the last line from the file,
> then it imports to R cleanly.  I don't like this extra layer of processing.
>
> Is there a way to import something like this cleanly in R.
>
> Thanks!
>
> --
> Noah
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in csv with footer

Noah Silverman
In reply to this post by Steve Lianoglou-6
Thanks Steve,

Your suggestion about nrows seems like the easiest.

Thanks!

--
Noah Silverman
UCLA Department of Statistics
8208 Math Sciences Building
Los Angeles, CA 90095

On Feb 12, 2012, at 4:23 PM, Steve Lianoglou wrote:

> Hi,
>
> On Sun, Feb 12, 2012 at 7:05 PM, Noah Silverman <[hidden email]> wrote:
>> Hi,
>>
>> I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format.
>>
>> Toy example:
>>
>> label_1, label_2, label_3
>> 1,2,3
>> 3,2,4
>> 2,3,4
>> Total Rows: 3
>>
>> When I try to import this into R with:  d <- read.table("foo.csv", header=T, sep=",")
>> It fails to import properly because of the last line.
>>
>> Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly.  I don't like this extra layer of processing.
>>
>> Is there a way to import something like this cleanly in R.
>
> This is arguably the file's problem, so I'm not sure how many "clean"
> solutions you will find, but one thing you can do is perhaps count the
> number of lines in the file, then set the `nrows` argument in your
> call to read.table to be 1 less than that.
>
> How to count the lines, though? Assuming you're on *nix (or have
> cygwin), you can do something like:
>
> N <- system("wc -l /path/to/file.csv")
>
> (you'll have to do some parsing on N)
>
> You could also first call `readLines` and find the length of the
> result, but this would require you to read the file twice, so ... pick
> your poison.
>
> Too bad the person authoring the file doesn't prefix those lines with
> some comment character ...
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in csv with footer

Noah Silverman
In reply to this post by Henrique Dallazuanna
Nice one!!!

Tanks.

--
Noah Silverman
UCLA Department of Statistics
8208 Math Sciences Building
Los Angeles, CA 90095

On Feb 12, 2012, at 4:26 PM, Henrique Dallazuanna wrote:

> This works for me:
>
> Lines <- "label_1, label_2, label_3
> 1,2,3
> 3,2,4
> 2,3,4
> Total Rows: 3"
>
> d <- head(read.csv(textConnection(Lines)), -1)
> closeAllConnections()
>
> On Sun, Feb 12, 2012 at 10:05 PM, Noah Silverman <[hidden email]> wrote:
> Hi,
>
> I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format.
>
> Toy example:
>
> label_1, label_2, label_3
> 1,2,3
> 3,2,4
> 2,3,4
> Total Rows: 3
>
>
> When I try to import this into R with:  d <- read.table("foo.csv", header=T, sep=",")
> It fails to import properly because of the last line.
>
> Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly.  I don't like this extra layer of processing.
>
> Is there a way to import something like this cleanly in R.
>
> Thanks!
>
> --
> Noah
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in csv with footer

William Dunlap
In reply to this post by Henrique Dallazuanna
That prints nicely, but the first column in the
result got turned into a factor:
  > d <- head(read.csv(textConnection(Lines)), -1)
  > str(d)
  'data.frame':   3 obs. of  3 variables:
   $ label_1: Factor w/ 4 levels "1","2","3","Total Rows: 3": 1 3 2
   $ label_2: int  2 2 3
   $ label_3: int  3 4 4
(Remove the call to head and you will see why.)

You could use head(,-1) on the output of readLines so
read.csv never sees the last value:
  > d2 <- read.csv(textConnection(head(readLines(textConnection(Lines)), -1)))
  > str(d2)
  'data.frame':   3 obs. of  3 variables:
   $ label_1: int  1 3 2
   $ label_2: int  2 2 3
   $ label_3: int  3 4 4
or you could use a pipe connection that called the shell script.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Henrique
> Dallazuanna
> Sent: Sunday, February 12, 2012 4:27 PM
> To: Noah Silverman
> Cc: r-help
> Subject: Re: [R] Reading in csv with footer
>
> This works for me:
>
> Lines <- "label_1, label_2, label_3
> 1,2,3
> 3,2,4
> 2,3,4
> Total Rows: 3"
>
> d <- head(read.csv(textConnection(Lines)), -1)
> closeAllConnections()
>
> On Sun, Feb 12, 2012 at 10:05 PM, Noah Silverman <[hidden email]>wrote:
>
> > Hi,
> >
> > I have a CSV file that is formatted well, except that the last line is a
> > "summary" not is CSV format.
> >
> > Toy example:
> >
> > label_1, label_2, label_3
> > 1,2,3
> > 3,2,4
> > 2,3,4
> > Total Rows: 3
> >
> >
> > When I try to import this into R with:  d <- read.table("foo.csv",
> > header=T, sep=",")
> > It fails to import properly because of the last line.
> >
> > Currently, I have a shell script that strips the last line from the file,
> > then it imports to R cleanly.  I don't like this extra layer of processing.
> >
> > Is there a way to import something like this cleanly in R.
> >
> > Thanks!
> >
> > --
> > Noah
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
> [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in csv with footer

Rolf Turner-3
In reply to this post by Noah Silverman
On 13/02/12 13:05, Noah Silverman wrote:

> Hi,
>
> I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format.
>
> Toy example:
>
> label_1, label_2, label_3
> 1,2,3
> 3,2,4
> 2,3,4
> Total Rows: 3
>
>
> When I try to import this into R with:  d<- read.table("foo.csv", header=T, sep=",")
> It fails to import properly because of the last line.
>
> Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly.  I don't like this extra layer of processing.
>
> Is there a way to import something like this cleanly in R.

How clean is clean?

You need to count the number of lines in the file, and then set the nrows
argument of read.csv() to be two less.  (*Two* r.t. one, because of the
header.)

Counting the lines --- three possibilities that I can see:

     (1) nlines() from the "parser" package
     (2) countLines() from the "R.utils" package
     (3) brute force:
         x <- readLines(<filename>)
         n <- length(x)

Having determined n, do:

         y <- read.csv(<filename>,nrows=n-2)

     cheers,

         Rolf Turner

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in csv with footer

chuck.01
In reply to this post by Noah Silverman
I believe this should work

d <- read.table("foo.csv", header=T, sep=",", comment="T")

although its spitting back a warning... this used to work for me.  



Noah Silverman wrote
Hi,

I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format.

Toy example:

label_1, label_2, label_3
1,2,3
3,2,4
2,3,4
Total Rows: 3


When I try to import this into R with:  d <- read.table("foo.csv", header=T, sep=",")
It fails to import properly because of the last line.

Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly.  I don't like this extra layer of processing.

Is there a way to import something like this cleanly in R.

Thanks!

--
Noah
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.