readxl issue

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

readxl issue

Thomas Subia-2
Colleagues,

I'm using readxl and dplyr to extract a specific cell from all worksheets in a directory.
All of these worksheets have the same physical layout.

Issue 1: Minus sign replaced by an X after data extraction.

library(plyr)
library(readxl)

files <- list.files(pattern="*.xls", full.names = FALSE)
avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9"))
avg_list <- as.data.frame(avgs)
trans_avgs <- t(avg_list)
write.table(trans_avgs,"avgs.txt")

Here are the first lines of the avgs.txt file.
"X.0.51571428571428557"
"X.0.5349795918367346"
"X.0.4895714285714286"
"X.0.5112448979591836"

The original Excel file contains
-0.516
-0.535
-0.490
-0.511

It appears that readxl is changing the - sign to an X.
Is there any feature in readxl which I can change so that readxl extracts the minus sign?

Issue 2: Duplicate dates contain additional characters

dates <- lapply(files, read_excel, sheet="Flow Data", range=("h14"))
dates_list <- as.data.frame(dates)
trans_dates <- t(dates_list)
write.table(trans_dates ,"dates.txt")

Here are the first lines of the dates.txt file.

"X43859"
"X43859.1"
"X43859.2"
"X43859.3"
"X43833"

In Excel, this is what is recorded.

1/29/2020
1/29/2020
1/29/2020
1/29/2020
1/3/2020

It appears that readxl is adding additional characters which are signaling duplicate dates.
Is there any feature in readxl which can I can change to eliminate these additional characters?

Some advice would be appreciated.


Thomas Subia

Statistician / Senior Quality Engineer
IMG Companies 
225 Mountain Vista Parkway
Livermore, CA 94551
T. (925) 273-1106
F. (925) 273-1111
E. [hidden email]


Precision Manufacturing for Emerging Technologies
imgprecision.com 

The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: readxl issue

Jeff Newmiller
Pay attention to whether the read_csv call is configured to expect first line as header.

On February 5, 2020 11:09:01 AM PST, Thomas Subia <[hidden email]> wrote:

>Colleagues,
>
>I'm using readxl and dplyr to extract a specific cell from all
>worksheets in a directory.
>All of these worksheets have the same physical layout.
>
>Issue 1: Minus sign replaced by an X after data extraction.
>
>library(plyr)
>library(readxl)
>
>files <- list.files(pattern="*.xls", full.names = FALSE)
>avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9"))
>avg_list <- as.data.frame(avgs)
>trans_avgs <- t(avg_list)
>write.table(trans_avgs,"avgs.txt")
>
>Here are the first lines of the avgs.txt file.
>"X.0.51571428571428557"
>"X.0.5349795918367346"
>"X.0.4895714285714286"
>"X.0.5112448979591836"
>
>The original Excel file contains
>-0.516
>-0.535
>-0.490
>-0.511
>
>It appears that readxl is changing the - sign to an X.
>Is there any feature in readxl which I can change so that readxl
>extracts the minus sign?
>
>Issue 2: Duplicate dates contain additional characters
>
>dates <- lapply(files, read_excel, sheet="Flow Data", range=("h14"))
>dates_list <- as.data.frame(dates)
>trans_dates <- t(dates_list)
>write.table(trans_dates ,"dates.txt")
>
>Here are the first lines of the dates.txt file.
>
>"X43859"
>"X43859.1"
>"X43859.2"
>"X43859.3"
>"X43833"
>
>In Excel, this is what is recorded.
>
>1/29/2020
>1/29/2020
>1/29/2020
>1/29/2020
>1/3/2020
>
>It appears that readxl is adding additional characters which are
>signaling duplicate dates.
>Is there any feature in readxl which can I can change to eliminate
>these additional characters?
>
>Some advice would be appreciated.
>
>
>Thomas Subia
>
>Statistician / Senior Quality Engineer
>IMG Companies 
>225 Mountain Vista Parkway
>Livermore, CA 94551
>T. (925) 273-1106
>F. (925) 273-1111
>E. [hidden email]
>
>
>Precision Manufacturing for Emerging Technologies
>imgprecision.com 
>
>The contents of this message, together with any attachments, are
>intended only for the use of the individual or entity to which they are
>addressed and may contain information that is legally privileged,
>confidential and exempt from disclosure. If you are not the intended
>recipient, you are hereby notified that any dissemination,
>distribution, or copying of this message, or any attachment, is
>strictly prohibited. If you have received this message in error, please
>notify the original sender or IMG Companies, LLC at Tel: 925-273-1100
>immediately by telephone or by return E-mail and delete this message,
>along with any attachments, from your computer. Thank you.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: readxl issue

Thomas Subia-2
Jeff,

You wrote: " Pay attention to whether the read_csv call is configured to expect first line as header."

Here is the code I'm using to extract one cell from a series of Excel files having the same physical format.

library(plyr)
library(readxl)
files <- list.files(pattern="*.xls", full.names = TRUE)
# Extract part average from cell c6 for all Excel files
avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9"))
# Write data to text file
write.table(avgs ,"avgs.txt",sep="\t")

I'm not sure where read_csv applies here.

Thanks for your help!

Thomas Subia

-----Original Message-----
From: Jeff Newmiller <[hidden email]>
Sent: Wednesday, February 05, 2020 11:42 AM
To: [hidden email]; Thomas Subia <[hidden email]>; [hidden email]
Subject: Re: [R] readxl issue

Pay attention to whether the read_csv call is configured to expect first line as header.

On February 5, 2020 11:09:01 AM PST, Thomas Subia <[hidden email]> wrote:

>Colleagues,
>
>I'm using readxl and dplyr to extract a specific cell from all
>worksheets in a directory.
>All of these worksheets have the same physical layout.
>
>Issue 1: Minus sign replaced by an X after data extraction.
>
>library(plyr)
>library(readxl)
>
>files <- list.files(pattern="*.xls", full.names = FALSE) avgs <-
>lapply(files, read_excel, sheet="Flow Data", range=("c9")) avg_list <-
>as.data.frame(avgs) trans_avgs <- t(avg_list)
>write.table(trans_avgs,"avgs.txt")
>
>Here are the first lines of the avgs.txt file.
>"X.0.51571428571428557"
>"X.0.5349795918367346"
>"X.0.4895714285714286"
>"X.0.5112448979591836"
>
>The original Excel file contains
>-0.516
>-0.535
>-0.490
>-0.511
>
>It appears that readxl is changing the - sign to an X.
>Is there any feature in readxl which I can change so that readxl
>extracts the minus sign?
>
>Issue 2: Duplicate dates contain additional characters
>
>dates <- lapply(files, read_excel, sheet="Flow Data", range=("h14"))
>dates_list <- as.data.frame(dates) trans_dates <- t(dates_list)
>write.table(trans_dates ,"dates.txt")
>
>Here are the first lines of the dates.txt file.
>
>"X43859"
>"X43859.1"
>"X43859.2"
>"X43859.3"
>"X43833"
>
>In Excel, this is what is recorded.
>
>1/29/2020
>1/29/2020
>1/29/2020
>1/29/2020
>1/3/2020
>
>It appears that readxl is adding additional characters which are
>signaling duplicate dates.
>Is there any feature in readxl which can I can change to eliminate
>these additional characters?
>
>Some advice would be appreciated.
>
>
>Thomas Subia
>
>Statistician / Senior Quality Engineer
>IMG Companies
>225 Mountain Vista Parkway
>Livermore, CA 94551
>T. (925) 273-1106
>F. (925) 273-1111
>E. [hidden email]
>
>
>Precision Manufacturing for Emerging Technologies imgprecision.com
>
>The contents of this message, together with any attachments, are
>intended only for the use of the individual or entity to which they are
>addressed and may contain information that is legally privileged,
>confidential and exempt from disclosure. If you are not the intended
>recipient, you are hereby notified that any dissemination,
>distribution, or copying of this message, or any attachment, is
>strictly prohibited. If you have received this message in error, please
>notify the original sender or IMG Companies, LLC at Tel: 925-273-1100
>immediately by telephone or by return E-mail and delete this message,
>along with any attachments, from your computer. Thank you.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: readxl issue

Rolf Turner

On 6/02/20 11:01 am, Thomas Subia wrote:

> Jeff,
>
> You wrote: " Pay attention to whether the read_csv call is configured to expect first line as header."
>
> Here is the code I'm using to extract one cell from a series of Excel files having the same physical format.
>
> library(plyr)
> library(readxl)
> files <- list.files(pattern="*.xls", full.names = TRUE)
> # Extract part average from cell c6 for all Excel files
> avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9"))
> # Write data to text file
> write.table(avgs ,"avgs.txt",sep="\t")
>
> I'm not sure where read_csv applies here.

Jeff probably did not read your email as carefully as he might have, and
made the unwarranted assumption that you'd be doing something sensible
like using *.csv data files rather than mucking about with the
notoriously perilous arcana of Excel.

I *don't* muck about with Excel myself, unless absolutely forced to, so
I can't make any specific suggestions, but I *can* make a general
suggestion.  When things go wrong, break the procedure down into simple
atomic steps.

* initially deal with just *one* of your *.xls files
* read it in:  xxx <- read_excel(<whatever>)
* *don't* write it out to a text file yet, *look* at the <expletive
deleted> thing in R first; print it (just type "xxx" or "head(xxx)" if
it's large)
* perhaps convert it to a data frame (it will be a "tibble" and tibbles
confuse the issue): yyy <- as.data.frame(xxx)
* investigate whether supplying some more arguments to read_excel()
solves your problem
* perhaps try read.xlsx from the xlsx package and see if any insights
are revealed.
* if you're still stumped, make available to the list one of the
problematic *.xls files (note that you *cannot* attach such a file to
your email; it will get stripped; you'll need to provide a URL from
which the file can be obtained).  Given such a file someone on the list
may be able to help you.

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: readxl issue

Jeff Newmiller
My assessment was and is likely correct.

My error was in writing the wrong function name... should have been read_excel whose parameters need to be adjusted to not look for a header line.

On February 5, 2020 3:48:36 PM PST, Rolf Turner <[hidden email]> wrote:

>
>On 6/02/20 11:01 am, Thomas Subia wrote:
>
>> Jeff,
>>
>> You wrote: " Pay attention to whether the read_csv call is configured
>to expect first line as header."
>>
>> Here is the code I'm using to extract one cell from a series of Excel
>files having the same physical format.
>>
>> library(plyr)
>> library(readxl)
>> files <- list.files(pattern="*.xls", full.names = TRUE)
>> # Extract part average from cell c6 for all Excel files
>> avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9"))
>> # Write data to text file
>> write.table(avgs ,"avgs.txt",sep="\t")
>>
>> I'm not sure where read_csv applies here.
>
>Jeff probably did not read your email as carefully as he might have,
>and
>made the unwarranted assumption that you'd be doing something sensible
>like using *.csv data files rather than mucking about with the
>notoriously perilous arcana of Excel.
>
>I *don't* muck about with Excel myself, unless absolutely forced to, so
>
>I can't make any specific suggestions, but I *can* make a general
>suggestion.  When things go wrong, break the procedure down into simple
>
>atomic steps.
>
>* initially deal with just *one* of your *.xls files
>* read it in:  xxx <- read_excel(<whatever>)
>* *don't* write it out to a text file yet, *look* at the <expletive
>deleted> thing in R first; print it (just type "xxx" or "head(xxx)" if
>it's large)
>* perhaps convert it to a data frame (it will be a "tibble" and tibbles
>
>confuse the issue): yyy <- as.data.frame(xxx)
>* investigate whether supplying some more arguments to read_excel()
>solves your problem
>* perhaps try read.xlsx from the xlsx package and see if any insights
>are revealed.
>* if you're still stumped, make available to the list one of the
>problematic *.xls files (note that you *cannot* attach such a file to
>your email; it will get stripped; you'll need to provide a URL from
>which the file can be obtained).  Given such a file someone on the list
>
>may be able to help you.
>
>cheers,
>
>Rolf Turner

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: readxl issue

Abby Spurdle
In reply to this post by Thomas Subia-2
> I'm using readxl and dplyr to extract a specific cell from all worksheets in a directory.
> All of these worksheets have the same physical layout.

I don't have access to Excel, so can't test this.

I have a suspicion that the problem is not readxl.
But rather the "as.data.frame" step.

Try reading one file at a time.
And inspect (1) your immediate object after reading the Excel file,
and then (2) the data.frame after the as.data.frame call.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: readxl issue

Abby Spurdle
In reply to this post by Jeff Newmiller
> should have been read_excel whose parameters need to be adjusted to not look for a header line.

And Jeff's probably correct (from the package documentation):
> read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,

Set col_names to FALSE...???

But still be careful with the as.data.frame call.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.