what do you think about write.table(... qmethod = "excel")?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

what do you think about write.table(... qmethod = "excel")?

PaulJohnson32gmail
Last week one of our clients reported trouble with a csv file I
generated with write.table.  He said that columns with quotes for
character variables were rejected by their data importer, which was
revised to match the way Microsoft Excel uses quotation marks in
character variables.  I explained to them that quoted character
variables are virtuous and wise, of course, but they say Microsoft
Excel CSV export no longer quotes characters unless they include
commas in the values.

They showed me a CSV file from Excel that looked like this

x1,x2,x3,x4 5 6
fred,barney,betty,x
bambam,"fred,wilma",pebbles,y

Note how the quotes only happen on row 2 column 2. I was surprised it
did that, but now I have some pressure to write a csv maker that has
that structure.  Its weird, even when there are spaces in values there
are no quotation marks.

Has anybody done this and verified that it matches CSV from MS Excel?
If I succeed will you consider a patch?

pj
--
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: what do you think about write.table(... qmethod = "excel")?

Dirk Eddelbuettel

On 19 September 2017 at 12:04, Paul Johnson wrote:
| They showed me a CSV file from Excel that looked like this
|
| x1,x2,x3,x4 5 6
| fred,barney,betty,x
| bambam,"fred,wilma",pebbles,y
|
| Note how the quotes only happen on row 2 column 2. I was surprised it
| did that, but now I have some pressure to write a csv maker that has
| that structure.  Its weird, even when there are spaces in values there
| are no quotation marks.
|
| Has anybody done this and verified that it matches CSV from MS Excel?
| If I succeed will you consider a patch?

R> data.table::fread("/tmp/paul.csv")
       x1         x2      x3 x4 5 6
1:   fred     barney   betty      x
2: bambam fred,wilma pebbles      y
R> data.table::fread("/tmp/paul.csv")
       x1         x2      x3 x4
1:   fred     barney   betty  x
2: bambam fred,wilma pebbles  y
R>

The only difference is that between calls one and two, I removed the stray
    "5 6"
from the first line.

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: what do you think about write.table(... qmethod = "excel")?

Ista Zahn
In reply to this post by PaulJohnson32gmail
On Tue, Sep 19, 2017 at 1:04 PM, Paul Johnson <[hidden email]> wrote:

> Last week one of our clients reported trouble with a csv file I
> generated with write.table.  He said that columns with quotes for
> character variables were rejected by their data importer, which was
> revised to match the way Microsoft Excel uses quotation marks in
> character variables.  I explained to them that quoted character
> variables are virtuous and wise, of course, but they say Microsoft
> Excel CSV export no longer quotes characters unless they include
> commas in the values.
>
> They showed me a CSV file from Excel that looked like this
>
> x1,x2,x3,x4 5 6
> fred,barney,betty,x
> bambam,"fred,wilma",pebbles,y
>
> Note how the quotes only happen on row 2 column 2. I was surprised it
> did that, but now I have some pressure to write a csv maker that has
> that structure.

I think you should resist that pressure. It really makes no sense to
write a .csv parser that _only_ supports .csv files created by Excel.
If you're going to use Excel as a model, a more sensible approach
would be to write a csv parser that supports all the formats that
Excel itself supports; Excel of course has no problem importing

"x1","x2","x3","x4"
"fred","barney","betty","x"
"bambam","fred,wilma","pebbles","y"

So, seriously, tell them to just fix their csv parser. Since they seem
hung up on Excel, it may help to point out that it does in fact import
csv produced by write.csv without complaint.

Best,
Ista

 Its weird, even when there are spaces in values there

> are no quotation marks.
>
> Has anybody done this and verified that it matches CSV from MS Excel?
> If I succeed will you consider a patch?
>
> pj
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: what do you think about write.table(... qmethod = "excel")?

Duncan Murdoch-2
On 19/09/2017 4:10 PM, Ista Zahn wrote:

> On Tue, Sep 19, 2017 at 1:04 PM, Paul Johnson <[hidden email]> wrote:
>> Last week one of our clients reported trouble with a csv file I
>> generated with write.table.  He said that columns with quotes for
>> character variables were rejected by their data importer, which was
>> revised to match the way Microsoft Excel uses quotation marks in
>> character variables.  I explained to them that quoted character
>> variables are virtuous and wise, of course, but they say Microsoft
>> Excel CSV export no longer quotes characters unless they include
>> commas in the values.
>>
>> They showed me a CSV file from Excel that looked like this
>>
>> x1,x2,x3,x4 5 6
>> fred,barney,betty,x
>> bambam,"fred,wilma",pebbles,y
>>
>> Note how the quotes only happen on row 2 column 2. I was surprised it
>> did that, but now I have some pressure to write a csv maker that has
>> that structure.
>
> I think you should resist that pressure.

That depends on whether this is a paying client or not.

 > It really makes no sense to
> write a .csv parser that _only_ supports .csv files created by Excel.

That's true, but if that's what they want to do, and they're willing to
pay to be able to write files that imitate Excel, then why not do what
they ask?

On the other hand, if they aren't willing to pay for the work, then you
should lecture them on how silly their request is.

In any case, base R functions should not include nonsense, so this is
not something that should go into R.

Duncan Murdoch

> If you're going to use Excel as a model, a more sensible approach
> would be to write a csv parser that supports all the formats that
> Excel itself supports; Excel of course has no problem importing
>
> "x1","x2","x3","x4"
> "fred","barney","betty","x"
> "bambam","fred,wilma","pebbles","y"
>
> So, seriously, tell them to just fix their csv parser. Since they seem
> hung up on Excel, it may help to point out that it does in fact import
> csv produced by write.csv without complaint.
>
> Best,
> Ista
>
>   Its weird, even when there are spaces in values there
>> are no quotation marks.
>>
>> Has anybody done this and verified that it matches CSV from MS Excel?
>> If I succeed will you consider a patch?
>>
>> pj
>> --
>> Paul E. Johnson   http://pj.freefaculty.org
>> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>>
>> To write to me directly, please address me at pauljohn at ku.edu.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: what do you think about write.table(... qmethod = "excel")?

PaulJohnson32gmail
On Tue, Sep 19, 2017 at 4:45 PM, Duncan Murdoch
<[hidden email]> wrote:

>
>
> That's true, but if that's what they want to do, and they're willing to pay
> to be able to write files that imitate Excel, then why not do what they ask?
>
> On the other hand, if they aren't willing to pay for the work, then you
> should lecture them on how silly their request is.
>
> In any case, base R functions should not include nonsense, so this is not
> something that should go into R.
>
> Duncan Murdoch
>

I understand.  This is a paying client, I'm going where the money goes.

Here's my current working example of a function that writes a CSV
exactly as Excel does. I've posted this into StackOverflow
(https://stackoverflow.com/questions/25743018/how-to-conditionally-remove-quotes-in-write-csv).
It is buried under a thread, maybe will not get much attention and
bug-checking. Oh, well, I can hope. Processes variables
column-by-column, I don't know how to do it differently.



##' Write CSV files with quotes same as MS Excel 2013 or newer
##'
##' R's write.csv inserts quotes around all elements in a character
##' vector (if quote = TRUE).  In contrast, MS Excel CSV export no
##' longer inserts quotation marks on all elements in character
##' variables, except when the cells include commas or quotation
##' marks.  This function generates CSV files that are, so far as we
##' know, exactly the same "quoted style" as MS Excel CSV export
##' files.
##'
##' This works by manually inserting quotation marks where necessary and
##' turning FALSE R's own method to insert quotation marks.
##' @param x a data frame
##' @param file character string for file name
##' @param row.names Default FALSE for row.names
##' @importFrom utils write.table
##' @return the return from write.table, using revised quotes
##' @export
##' @author Paul Johnson
##' @examples
##' set.seed(234)
##' x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""),
##'       x2 = rnorm(4), stringsAsFactors = FALSE)
##' x1
##' fn <- tempfile(pattern = "testcsv", fileext = ".csv")
##' writeCSV(x1, file = fn)
##' readLines(fn)
##' x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE)
##' all.equal(x1,x2)
writeCSV <- function(x, file, row.names = FALSE){
    xischar <- colnames(x)[sapply(x, is.character)]
    for(jj in xischar){
        x[ , jj] <- gsub('"', '""', x[ , jj], fixed = TRUE)
        needsquotes <- grep('[\",]', x[ ,jj])
        x[needsquotes, jj] <- paste0("\"", x[needsquotes, jj], "\"")
    }
    write.table(x, file = file, sep = ",", quote = FALSE,
                row.names = row.names)
}

Output:

>  set.seed(234)
>  x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""),
+        x2 = rnorm(4), stringsAsFactors = FALSE)
>  x1
                    x1         x2
1                    a  0.6607697
2                  b,c -2.0529830
3                    b -1.4992061
4 The "Washington, DC"  1.4712331
>  fn <- tempfile(pattern = "testcsv", fileext = ".csv")
>  writeCSV(x1, file = fn)
>  readLines(fn)
[1] "x1,x2"
[2] "a,0.660769736644892"
[3] "\"b,c\",-2.052983003941"
[4] "b,-1.49920605110092"
[5] "\"The \"\"Washington, DC\"\"\",1.4712331168047"
>  x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE)
>  all.equal(x1,x2)
[1] TRUE

I already see one problem, that I've got no special arrangement for
column names with commas or quotes. People who want column names with
those things are even more wrong than the people write a parser that
can't understand quotes on character variables.

--
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: what do you think about write.table(... qmethod = "excel")?

David Winsemius

> On Sep 20, 2017, at 9:53 AM, Paul Johnson <[hidden email]> wrote:
>
> On Tue, Sep 19, 2017 at 4:45 PM, Duncan Murdoch
> <[hidden email]> wrote:
>>
>>
>> That's true, but if that's what they want to do, and they're willing to pay
>> to be able to write files that imitate Excel, then why not do what they ask?
>>
>> On the other hand, if they aren't willing to pay for the work, then you
>> should lecture them on how silly their request is.
>>
>> In any case, base R functions should not include nonsense, so this is not
>> something that should go into R.
>>
>> Duncan Murdoch
>>
>
> I understand.  This is a paying client, I'm going where the money goes.
>
> Here's my current working example of a function that writes a CSV
> exactly as Excel does. I've posted this into StackOverflow
> (https://stackoverflow.com/questions/25743018/how-to-conditionally-remove-quotes-in-write-csv).
> It is buried under a thread, maybe will not get much attention and
> bug-checking. Oh, well, I can hope. Processes variables
> column-by-column, I don't know how to do it differently.
>
>
>
> ##' Write CSV files with quotes same as MS Excel 2013 or newer
> ##'
> ##' R's write.csv inserts quotes around all elements in a character
> ##' vector (if quote = TRUE).  In contrast, MS Excel CSV export no
> ##' longer inserts quotation marks on all elements in character
> ##' variables, except when the cells include commas or quotation
> ##' marks.  This function generates CSV files that are, so far as we
> ##' know, exactly the same "quoted style" as MS Excel CSV export
> ##' files.
> ##'
> ##' This works by manually inserting quotation marks where necessary and
> ##' turning FALSE R's own method to insert quotation marks.
> ##' @param x a data frame
> ##' @param file character string for file name
> ##' @param row.names Default FALSE for row.names
> ##' @importFrom utils write.table
> ##' @return the return from write.table, using revised quotes
> ##' @export
> ##' @author Paul Johnson
> ##' @examples
> ##' set.seed(234)
> ##' x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""),
> ##'       x2 = rnorm(4), stringsAsFactors = FALSE)
> ##' x1
> ##' fn <- tempfile(pattern = "testcsv", fileext = ".csv")
> ##' writeCSV(x1, file = fn)
> ##' readLines(fn)
> ##' x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE)
> ##' all.equal(x1,x2)
> writeCSV <- function(x, file, row.names = FALSE){
>    xischar <- colnames(x)[sapply(x, is.character)]

? Should that be:


    xIsCharOrFactor <- colnames(x)[sapply(x, function(z) is.character(z) | is.factor(z) )]

Just asking. No testing done.
__
David.

>    for(jj in xischar){
>        x[ , jj] <- gsub('"', '""', x[ , jj], fixed = TRUE)
>        needsquotes <- grep('[\",]', x[ ,jj])
>        x[needsquotes, jj] <- paste0("\"", x[needsquotes, jj], "\"")
>    }
>    write.table(x, file = file, sep = ",", quote = FALSE,
>                row.names = row.names)
> }
>
> Output:
>
>> set.seed(234)
>> x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""),
> +        x2 = rnorm(4), stringsAsFactors = FALSE)
>> x1
>                    x1         x2
> 1                    a  0.6607697
> 2                  b,c -2.0529830
> 3                    b -1.4992061
> 4 The "Washington, DC"  1.4712331
>> fn <- tempfile(pattern = "testcsv", fileext = ".csv")
>> writeCSV(x1, file = fn)
>> readLines(fn)
> [1] "x1,x2"
> [2] "a,0.660769736644892"
> [3] "\"b,c\",-2.052983003941"
> [4] "b,-1.49920605110092"
> [5] "\"The \"\"Washington, DC\"\"\",1.4712331168047"
>> x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE)
>> all.equal(x1,x2)
> [1] TRUE
>
> I already see one problem, that I've got no special arrangement for
> column names with commas or quotes. People who want column names with
> those things are even more wrong than the people write a parser that
> can't understand quotes on character variables.
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel