read.csv and field containing single quotes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

read.csv and field containing single quotes

Benilton Carvalho-2
I need to read in csv files, created by 3rd party, with fields
containing single quotes (as shown below).

"header1","header2","header3","header4"
"field1r1","field2r1","field3r1","field4r1"
"field1r2","field2r2","field3r2PartA), field3r2PartB Very" Long","field4r2"
"field1r3","field2r3","field3r3","field4r3"


read.csv(filename, quote="\"'", header=TRUE) won't read the file
represented above, unless the 3rd line has Very""  (double quotes)
instead of Very" (single quotes)... and this is documented (scan() man
page).

Assuming that the creation of such csv files is something I'm not in a
position to interfere with, are there (preferably, "all in R")
suggestions on how to handle such task?

For the moment, I'm using my poor man's solution (below), but any
tricks that would simplify this task would be great.

Thank you very much,

benilton


parser <- function(fname, header=TRUE, stringsAsFactors=FALSE){
    txt <- readLines(fname)
    txt <- gsub("^\"|\"$", "", txt)
    txt <- strsplit(txt, "\",\"")
    txt <- do.call(rbind, lapply(txt, function(x) gsub("\"", "\"\"", x)))
    if (header){
        nms <- txt[1,]
        txt <- txt[-1,]
    }
    txt <- as.data.frame(txt, stringsAsFactors=stringsAsFactors)
    if (header) names(txt) <- nms
    txt
}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: read.csv and field containing single quotes

Henrique Dallazuanna
Benilton,

Try this:

read.table(textConnection(gsub('","', "','", gsub('^\"|\"$', "'",
readLines('../teste.csv')))), sep = ',', quote = "'", header = TRUE)

On Mon, Mar 26, 2012 at 8:09 PM, Benilton Carvalho
<[hidden email]> wrote:

> I need to read in csv files, created by 3rd party, with fields
> containing single quotes (as shown below).
>
> "header1","header2","header3","header4"
> "field1r1","field2r1","field3r1","field4r1"
> "field1r2","field2r2","field3r2PartA), field3r2PartB Very" Long","field4r2"
> "field1r3","field2r3","field3r3","field4r3"
>
>
> read.csv(filename, quote="\"'", header=TRUE) won't read the file
> represented above, unless the 3rd line has Very""  (double quotes)
> instead of Very" (single quotes)... and this is documented (scan() man
> page).
>
> Assuming that the creation of such csv files is something I'm not in a
> position to interfere with, are there (preferably, "all in R")
> suggestions on how to handle such task?
>
> For the moment, I'm using my poor man's solution (below), but any
> tricks that would simplify this task would be great.
>
> Thank you very much,
>
> benilton
>
>
> parser <- function(fname, header=TRUE, stringsAsFactors=FALSE){
>    txt <- readLines(fname)
>    txt <- gsub("^\"|\"$", "", txt)
>    txt <- strsplit(txt, "\",\"")
>    txt <- do.call(rbind, lapply(txt, function(x) gsub("\"", "\"\"", x)))
>    if (header){
>        nms <- txt[1,]
>        txt <- txt[-1,]
>    }
>    txt <- as.data.frame(txt, stringsAsFactors=stringsAsFactors)
>    if (header) names(txt) <- nms
>    txt
> }
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: read.csv and field containing single quotes

Benilton Carvalho-2
Thanks Henrique...

giving it a try now, but it'll take a good while, given the file size.

Cheers,
b

On 27 March 2012 02:35, Henrique Dallazuanna <[hidden email]> wrote:

> Benilton,
>
> Try this:
>
> read.table(textConnection(gsub('","', "','", gsub('^\"|\"$', "'",
> readLines('../teste.csv')))), sep = ',', quote = "'", header = TRUE)
>
> On Mon, Mar 26, 2012 at 8:09 PM, Benilton Carvalho
> <[hidden email]> wrote:
> > I need to read in csv files, created by 3rd party, with fields
> > containing single quotes (as shown below).
> >
> > "header1","header2","header3","header4"
> > "field1r1","field2r1","field3r1","field4r1"
> > "field1r2","field2r2","field3r2PartA), field3r2PartB Very"
> Long","field4r2"
> > "field1r3","field2r3","field3r3","field4r3"
> >
> >
> > read.csv(filename, quote="\"'", header=TRUE) won't read the file
> > represented above, unless the 3rd line has Very""  (double quotes)
> > instead of Very" (single quotes)... and this is documented (scan() man
> > page).
> >
> > Assuming that the creation of such csv files is something I'm not in a
> > position to interfere with, are there (preferably, "all in R")
> > suggestions on how to handle such task?
> >
> > For the moment, I'm using my poor man's solution (below), but any
> > tricks that would simplify this task would be great.
> >
> > Thank you very much,
> >
> > benilton
> >
> >
> > parser <- function(fname, header=TRUE, stringsAsFactors=FALSE){
> >    txt <- readLines(fname)
> >    txt <- gsub("^\"|\"$", "", txt)
> >    txt <- strsplit(txt, "\",\"")
> >    txt <- do.call(rbind, lapply(txt, function(x) gsub("\"", "\"\"", x)))
> >    if (header){
> >        nms <- txt[1,]
> >        txt <- txt[-1,]
> >    }
> >    txt <- as.data.frame(txt, stringsAsFactors=stringsAsFactors)
> >    if (header) names(txt) <- nms
> >    txt
> > }
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: read.csv and field containing single quotes

Rainer M Krug-6
In reply to this post by Benilton Carvalho-2
On 27/03/12 01:09, Benilton Carvalho wrote:
> I need to read in csv files, created by 3rd party, with fields
> containing single quotes (as shown below).
>
> "header1","header2","header3","header4"
> "field1r1","field2r1","field3r1","field4r1"
> "field1r2","field2r2","field3r2PartA), field3r2PartB Very" Long","field4r2"
> "field1r3","field2r3","field3r3","field4r3"

You could try under your OS, to

1) replace ", with ', (assuming that the csv does not contain any'
2) read into R with sep="\'"

If the file is huge, some in OS solution would be the best.

Cheers,

Rainer


>
>
> read.csv(filename, quote="\"'", header=TRUE) won't read the file
> represented above, unless the 3rd line has Very""  (double quotes)
> instead of Very" (single quotes)... and this is documented (scan() man
> page).
>
> Assuming that the creation of such csv files is something I'm not in a
> position to interfere with, are there (preferably, "all in R")
> suggestions on how to handle such task?
>
> For the moment, I'm using my poor man's solution (below), but any
> tricks that would simplify this task would be great.
>
> Thank you very much,
>
> benilton
>
>
> parser <- function(fname, header=TRUE, stringsAsFactors=FALSE){
>     txt <- readLines(fname)
>     txt <- gsub("^\"|\"$", "", txt)
>     txt <- strsplit(txt, "\",\"")
>     txt <- do.call(rbind, lapply(txt, function(x) gsub("\"", "\"\"", x)))
>     if (header){
>         nms <- txt[1,]
>         txt <- txt[-1,]
>     }
>     txt <- as.data.frame(txt, stringsAsFactors=stringsAsFactors)
>     if (header) names(txt) <- nms
>     txt
> }
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :       +33 - (0)9 53 10 27 44
Cell:       +33 - (0)6 85 62 59 98
Fax :       +33 - (0)9 58 10 27 44

Fax (D):    +49 - (0)3 21 21 25 22 44

email:      [hidden email]

Skype:      RMkrug

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.