readxl question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

readxl question

R help mailing list-2
Colleagues,

 

I have 250 Excel files in a directory. Each of those files has the same
layout. The problem is that the data in each Excel data is not in
rectangular form. I've been using readxl to extract the data which I need.
Each of my metrics are stored in a particular cell. For each metric, I
create text files which stores my metrics.

 

library(plyr)

library(readxl)

 

files <- list.files(pattern="*.xls", full.names = FALSE)

 

# Extract Work Order

WO <- lapply(files, read_excel, sheet="Sheet1", range=("B9")) WO_list <-
as.data.frame(WO) trans_WO <- t(WO_list) write.table(trans_WO ,"WO.txt")

 

# Extract bubble 14_1

BUBBLE_14_1 <- lapply(files, read_excel, sheet="Sheet1", range=("c46"))
BUBBLE_14_1_list <- as.data.frame(BUBBLE_14_1)

trans_BUBBLE_14_1 <- t(BUBBLE_14_1_list)

 

write.table(trans_BUBBLE_14_1,"BUBBLE_14_1.txt")

 

 

# Extract bubble 14_2

BUBBLE_14_2 <- lapply(files, read_excel, sheet="Sheet1", range=("c62"))
BUBBLE_14_2_list <- as.data.frame(BUBBLE_14_2)

trans_BUBBLE_14_2 <- t(BUBBLE_14_2_list)

write.table(trans_BUBBLE_14_2,"BUBBLE_14_2.txt")

 

After the text files have been created, I cut and paste the contents of each
text file to Excel.

This has worked fine if the number of cells I am extracting from a file is
small.

If the number gets larger, this method is inefficient.

 

Any advice on how to do this would be appreciated.

 

All the best,

 

Thomas Subia


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: readxl question

PIKAL Petr
Hi


Are you sure that your command read values from respective cells?

I tried it and got empty data frame with names
> WO <- lapply(files, read_excel, sheet=1, range=("B3"))
> as.data.frame(WO)
[1] ano                 TP303               X96                
[4] X0                  X3.7519999999999998 X26.7              
<0 rows> (or 0-length row.names)

To get data, col_names argument should be set to FALSE
WO <- lapply(files, read_excel, sheet=1, range=("B3"), col_names=FALSE)
WO2 <- lapply(files, read_excel, sheet=1, range=("B5"), col_names=FALSE)

After that unlist and one rbind together with t should be enough to give you
one table
WO <- unlist(WO)
WO2 <- unlist(WO2)
result <- t(rbind(WO, WO2))
result
     WO      WO2      
...1 "ano"   "ano"    
...1 "TP303" "261119/2"
...1 "96"    "288"    
...1 "0"     "192"    
...1 "3.752" "25.92094"
...1 "26.7"  "38.6"    
>

And instead txt document you could do

write.table(result, "result.xls", sep = "\t", row.names = F)

And now "result.xls" is directly readable with Excel

Cheers
Petr

>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Thomas Subia via
> R-help
> Sent: Saturday, August 22, 2020 6:25 AM
> To: [hidden email]
> Subject: [R] readxl question
>
> Colleagues,
>
>
>
> I have 250 Excel files in a directory. Each of those files has the same
layout.
> The problem is that the data in each Excel data is not in rectangular
form. I've
> been using readxl to extract the data which I need.
> Each of my metrics are stored in a particular cell. For each metric, I
create text

> files which stores my metrics.
>
>
>
> library(plyr)
>
> library(readxl)
>
>
>
> files <- list.files(pattern="*.xls", full.names = FALSE)
>
>
>
> # Extract Work Order
>
> WO <- lapply(files, read_excel, sheet="Sheet1", range=("B9")) WO_list <-
> as.data.frame(WO) trans_WO <- t(WO_list) write.table(trans_WO ,"WO.txt")
>
>
>
> # Extract bubble 14_1
>
> BUBBLE_14_1 <- lapply(files, read_excel, sheet="Sheet1", range=("c46"))
> BUBBLE_14_1_list <- as.data.frame(BUBBLE_14_1)
>
> trans_BUBBLE_14_1 <- t(BUBBLE_14_1_list)
>
>
>
> write.table(trans_BUBBLE_14_1,"BUBBLE_14_1.txt")
>
>
>
>
>
> # Extract bubble 14_2
>
> BUBBLE_14_2 <- lapply(files, read_excel, sheet="Sheet1", range=("c62"))
> BUBBLE_14_2_list <- as.data.frame(BUBBLE_14_2)
>
> trans_BUBBLE_14_2 <- t(BUBBLE_14_2_list)
>
> write.table(trans_BUBBLE_14_2,"BUBBLE_14_2.txt")
>
>
>
> After the text files have been created, I cut and paste the contents of
each
> text file to Excel.
>
> This has worked fine if the number of cells I am extracting from a file is
small.

>
> If the number gets larger, this method is inefficient.
>
>
>
> Any advice on how to do this would be appreciated.
>
>
>
> All the best,
>
>
>
> Thomas Subia
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: readxl question

Upton, Stephen C
From your example, it appears you are reading in the same excel file for
each function to get a value. I would look at creating a function that
extracts what you need from each file all at once, rather than separate
reads.

Stephen C. Upton
SEED (Simulation Experiments & Efficient Designs) Center for Data Farming
SEED Center website: https://harvest.nps.edu

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of PIKAL Petr
Sent: Wednesday, August 26, 2020 3:50 AM
To: Thomas Subia <[hidden email]>
Cc: [hidden email]
Subject: Re: [R] readxl question

NPS WARNING: *external sender* verify before acting.


Hi


Are you sure that your command read values from respective cells?

I tried it and got empty data frame with names
> WO <- lapply(files, read_excel, sheet=1, range=("B3"))
> as.data.frame(WO)
[1] ano                 TP303               X96
[4] X0                  X3.7519999999999998 X26.7
<0 rows> (or 0-length row.names)

To get data, col_names argument should be set to FALSE WO <- lapply(files,
read_excel, sheet=1, range=("B3"), col_names=FALSE)
WO2 <- lapply(files, read_excel, sheet=1, range=("B5"), col_names=FALSE)

After that unlist and one rbind together with t should be enough to give you
one table WO <- unlist(WO)
WO2 <- unlist(WO2)
result <- t(rbind(WO, WO2))
result
     WO      WO2
...1 "ano"   "ano"
...1 "TP303" "261119/2"
...1 "96"    "288"
...1 "0"     "192"
...1 "3.752" "25.92094"
...1 "26.7"  "38.6"
>

And instead txt document you could do

write.table(result, "result.xls", sep = "\t", row.names = F)

And now "result.xls" is directly readable with Excel

Cheers
Petr

>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Thomas Subia
> via R-help
> Sent: Saturday, August 22, 2020 6:25 AM
> To: [hidden email]
> Subject: [R] readxl question
>
> Colleagues,
>
>
>
> I have 250 Excel files in a directory. Each of those files has the
> same
layout.
> The problem is that the data in each Excel data is not in rectangular
form. I've
> been using readxl to extract the data which I need.
> Each of my metrics are stored in a particular cell. For each metric, I
create text

> files which stores my metrics.
>
>
>
> library(plyr)
>
> library(readxl)
>
>
>
> files <- list.files(pattern="*.xls", full.names = FALSE)
>
>
>
> # Extract Work Order
>
> WO <- lapply(files, read_excel, sheet="Sheet1", range=("B9")) WO_list
> <-
> as.data.frame(WO) trans_WO <- t(WO_list) write.table(trans_WO
> ,"WO.txt")
>
>
>
> # Extract bubble 14_1
>
> BUBBLE_14_1 <- lapply(files, read_excel, sheet="Sheet1",
> range=("c46")) BUBBLE_14_1_list <- as.data.frame(BUBBLE_14_1)
>
> trans_BUBBLE_14_1 <- t(BUBBLE_14_1_list)
>
>
>
> write.table(trans_BUBBLE_14_1,"BUBBLE_14_1.txt")
>
>
>
>
>
> # Extract bubble 14_2
>
> BUBBLE_14_2 <- lapply(files, read_excel, sheet="Sheet1",
> range=("c62")) BUBBLE_14_2_list <- as.data.frame(BUBBLE_14_2)
>
> trans_BUBBLE_14_2 <- t(BUBBLE_14_2_list)
>
> write.table(trans_BUBBLE_14_2,"BUBBLE_14_2.txt")
>
>
>
> After the text files have been created, I cut and paste the contents
> of
each
> text file to Excel.
>
> This has worked fine if the number of cells I am extracting from a
> file is
small.

>
> If the number gets larger, this method is inefficient.
>
>
>
> Any advice on how to do this would be appreciated.
>
>
>
> All the best,
>
>
>
> Thomas Subia
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html and provide commented, minimal, self-contained,
> reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: readxl question

PIKAL Petr
Hi

As OP has only about 250 files and in read_excel you cannot specify several
ranges at once, reading those values separately and concatenating them
together in one step seems to be the most efficient way. One probably could
design such function, but time spent on the function performing the task
only once is probably bigger than performing 250*3 reads.

I see inefficiency in writing each column into separate text file and
coppying it back to Excel file.

Cheers
Petr

> -----Original Message-----
> From: Upton, Stephen (Steve) (CIV) <[hidden email]>
> Sent: Wednesday, August 26, 2020 2:44 PM
> To: PIKAL Petr <[hidden email]>; Thomas Subia <[hidden email]>
> Cc: [hidden email]
> Subject: RE: [R] readxl question
>
> From your example, it appears you are reading in the same excel file for
> each function to get a value. I would look at creating a function that
> extracts what you need from each file all at once, rather than separate
> reads.
>
> Stephen C. Upton
> SEED (Simulation Experiments & Efficient Designs) Center for Data Farming
> SEED Center website: https://harvest.nps.edu
>
> -----Original Message-----
> From: R-help [mailto:[hidden email]] On Behalf Of PIKAL Petr
> Sent: Wednesday, August 26, 2020 3:50 AM
> To: Thomas Subia <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] readxl question
>
> NPS WARNING: *external sender* verify before acting.
>
>
> Hi
>
>
> Are you sure that your command read values from respective cells?
>
> I tried it and got empty data frame with names
> > WO <- lapply(files, read_excel, sheet=1, range=("B3"))
> > as.data.frame(WO)
> [1] ano                 TP303               X96
> [4] X0                  X3.7519999999999998 X26.7
> <0 rows> (or 0-length row.names)
>
> To get data, col_names argument should be set to FALSE WO <- lapply(files,
> read_excel, sheet=1, range=("B3"), col_names=FALSE)
> WO2 <- lapply(files, read_excel, sheet=1, range=("B5"), col_names=FALSE)
>
> After that unlist and one rbind together with t should be enough to give
you

> one table WO <- unlist(WO)
> WO2 <- unlist(WO2)
> result <- t(rbind(WO, WO2))
> result
>      WO      WO2
> ...1 "ano"   "ano"
> ...1 "TP303" "261119/2"
> ...1 "96"    "288"
> ...1 "0"     "192"
> ...1 "3.752" "25.92094"
> ...1 "26.7"  "38.6"
> >
>
> And instead txt document you could do
>
> write.table(result, "result.xls", sep = "\t", row.names = F)
>
> And now "result.xls" is directly readable with Excel
>
> Cheers
> Petr
>
> >
> > -----Original Message-----
> > From: R-help <[hidden email]> On Behalf Of Thomas Subia
> > via R-help
> > Sent: Saturday, August 22, 2020 6:25 AM
> > To: [hidden email]
> > Subject: [R] readxl question
> >
> > Colleagues,
> >
> >
> >
> > I have 250 Excel files in a directory. Each of those files has the
> > same
> layout.
> > The problem is that the data in each Excel data is not in rectangular
> form. I've
> > been using readxl to extract the data which I need.
> > Each of my metrics are stored in a particular cell. For each metric, I
> create text
> > files which stores my metrics.
> >
> >
> >
> > library(plyr)
> >
> > library(readxl)
> >
> >
> >
> > files <- list.files(pattern="*.xls", full.names = FALSE)
> >
> >
> >
> > # Extract Work Order
> >
> > WO <- lapply(files, read_excel, sheet="Sheet1", range=("B9")) WO_list
> > <-
> > as.data.frame(WO) trans_WO <- t(WO_list) write.table(trans_WO
> > ,"WO.txt")
> >
> >
> >
> > # Extract bubble 14_1
> >
> > BUBBLE_14_1 <- lapply(files, read_excel, sheet="Sheet1",
> > range=("c46")) BUBBLE_14_1_list <- as.data.frame(BUBBLE_14_1)
> >
> > trans_BUBBLE_14_1 <- t(BUBBLE_14_1_list)
> >
> >
> >
> > write.table(trans_BUBBLE_14_1,"BUBBLE_14_1.txt")
> >
> >
> >
> >
> >
> > # Extract bubble 14_2
> >
> > BUBBLE_14_2 <- lapply(files, read_excel, sheet="Sheet1",
> > range=("c62")) BUBBLE_14_2_list <- as.data.frame(BUBBLE_14_2)
> >
> > trans_BUBBLE_14_2 <- t(BUBBLE_14_2_list)
> >
> > write.table(trans_BUBBLE_14_2,"BUBBLE_14_2.txt")
> >
> >
> >
> > After the text files have been created, I cut and paste the contents
> > of
> each
> > text file to Excel.
> >
> > This has worked fine if the number of cells I am extracting from a
> > file is
> small.
> >
> > If the number gets larger, this method is inefficient.
> >
> >
> >
> > Any advice on how to do this would be appreciated.
> >
> >
> >
> > All the best,
> >
> >
> >
> > Thomas Subia
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html and provide commented, minimal, self-contained,
> > reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: readxl question

R help mailing list-2
Hi Thomas,

I am not familiar with the use of the range argument, but it seems to me
that the cell value becomes the column name. This might be fine, but you
might get into trouble if you have repeated cell values since
as.data.frame() will fix these.

I am also not sure about what you want, but this seems to capture your
example (reading the same cells in a number of files):

```
library(readxl)

# Create test set
path <- readxl_example("geometry.xls")

read_xls(path) # See the content

example_file1 <- tempfile(fileext = ".xls")
example_file2 <- tempfile(fileext = ".xls")

file.copy(path, example_file1, overwrite = TRUE)
file.copy(path, example_file2, overwrite = TRUE)

# Solve the problem using loops
files <- c(example_file1, example_file2)
ranges <- c("B4", "C5", "D6")

fr <- lapply(ranges, function(cur_range, files){
   x <- lapply(files, read_xls, sheet = 1, range = cur_range)
   t(as.data.frame(x))
}, files = files)

# Loop over fr and save content if needed
```

A couple of variations over the theme, where the cell content is
accessed after reading the file. This will not work well if the data in
the excel files does not start at A1, but if you can adjust for this it
should work just fine

```
# Solution #2

# Read the whole excel file, and access just the column - row
# This will give really unexpected results if the data does not start in
the
# cell A1 as is the case for geometry.xls. Also, it does not work with
ranges
# spaning more than a single cell
files <- rep(readxl_example("datasets.xlsx"), 3)
ranges <- c("B4", "C5", "D6")

# Loop over the files to avoid re-reading
fr <- lapply(files, function(cur_file, ranges){
   df <- read_excel(cur_file, sheet = 1)
   x <- lapply(ranges, function(cur_range, df){
     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
     df[cr$row, cr$col][[1]]
   }, df = df)
   as.data.frame(setNames(x, ranges))

}, ranges = ranges)

# Solution 3
# Like solution 2 but using purr

library(purrr)

files <- rep(readxl_example("datasets.xlsx"), 3)
ranges <- c("B4", "C5", "D6")

map_dfr(files, function(cur_file, ranges){
   map_dfc(ranges, function(cur_range, df){
     df <- read_excel(cur_file, sheet = 1)
     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
     setNames(df[cr$row, cr$col], cur_range)
   }, df = df)

}, ranges = ranges)

# Solution 4
# Like solution 3, but with the addition of the file name and producing
a single
# data.frame at the end

library(purrr)

path <- readxl_example("datasets.xls")
example_file1 <- tempfile(fileext = "_1.xls")
example_file2 <- tempfile(fileext = "_2.xls")
example_file3 <- tempfile(fileext = "_3.xls")

file.copy(path, example_file1, overwrite = TRUE)
file.copy(path, example_file2, overwrite = TRUE)
file.copy(path, example_file3, overwrite = TRUE)

files <- c(example_file1, example_file2, example_file3)

# Name the file paths with the file names. We can them make use of the
.id
# argument to map_dfr()
files <- setNames(files, basename(files))
ranges <- c("B4", "C5", "D6")

map_dfr(files, function(cur_file, ranges){
   map_dfc(ranges, function(cur_range, df){
     df <- read_excel(cur_file, sheet = 1)
     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
     setNames(df[cr$row, cr$col], cur_range)
   }, df = df)
}, ranges = ranges, .id = "filename")
```

HTH
Ulrik

On 2020-08-26 15:38, PIKAL Petr wrote:

> Hi
>
> As OP has only about 250 files and in read_excel you cannot specify
> several
> ranges at once, reading those values separately and concatenating them
> together in one step seems to be the most efficient way. One probably
> could
> design such function, but time spent on the function performing the
> task
> only once is probably bigger than performing 250*3 reads.
>
> I see inefficiency in writing each column into separate text file and
> coppying it back to Excel file.
>
> Cheers
> Petr
>
>> -----Original Message-----
>> From: Upton, Stephen (Steve) (CIV) <[hidden email]>
>> Sent: Wednesday, August 26, 2020 2:44 PM
>> To: PIKAL Petr <[hidden email]>; Thomas Subia
>> <[hidden email]>
>> Cc: [hidden email]
>> Subject: RE: [R] readxl question
>>
>> From your example, it appears you are reading in the same excel file
>> for
>> each function to get a value. I would look at creating a function that
>> extracts what you need from each file all at once, rather than
>> separate
>> reads.
>>
>> Stephen C. Upton
>> SEED (Simulation Experiments & Efficient Designs) Center for Data
>> Farming
>> SEED Center website: https://harvest.nps.edu
>>
>> -----Original Message-----
>> From: R-help [mailto:[hidden email]] On Behalf Of PIKAL
>> Petr
>> Sent: Wednesday, August 26, 2020 3:50 AM
>> To: Thomas Subia <[hidden email]>
>> Cc: [hidden email]
>> Subject: Re: [R] readxl question
>>
>> NPS WARNING: *external sender* verify before acting.
>>
>>
>> Hi
>>
>>
>> Are you sure that your command read values from respective cells?
>>
>> I tried it and got empty data frame with names
>> > WO <- lapply(files, read_excel, sheet=1, range=("B3"))
>> > as.data.frame(WO)
>> [1] ano                 TP303               X96
>> [4] X0                  X3.7519999999999998 X26.7
>> <0 rows> (or 0-length row.names)
>>
>> To get data, col_names argument should be set to FALSE WO <-
>> lapply(files,
>> read_excel, sheet=1, range=("B3"), col_names=FALSE)
>> WO2 <- lapply(files, read_excel, sheet=1, range=("B5"),
>> col_names=FALSE)
>>
>> After that unlist and one rbind together with t should be enough to
>> give
> you
>> one table WO <- unlist(WO)
>> WO2 <- unlist(WO2)
>> result <- t(rbind(WO, WO2))
>> result
>>      WO      WO2
>> ...1 "ano"   "ano"
>> ...1 "TP303" "261119/2"
>> ...1 "96"    "288"
>> ...1 "0"     "192"
>> ...1 "3.752" "25.92094"
>> ...1 "26.7"  "38.6"
>> >
>>
>> And instead txt document you could do
>>
>> write.table(result, "result.xls", sep = "\t", row.names = F)
>>
>> And now "result.xls" is directly readable with Excel
>>
>> Cheers
>> Petr
>>
>> >
>> > -----Original Message-----
>> > From: R-help <[hidden email]> On Behalf Of Thomas Subia
>> > via R-help
>> > Sent: Saturday, August 22, 2020 6:25 AM
>> > To: [hidden email]
>> > Subject: [R] readxl question
>> >
>> > Colleagues,
>> >
>> >
>> >
>> > I have 250 Excel files in a directory. Each of those files has the
>> > same
>> layout.
>> > The problem is that the data in each Excel data is not in rectangular
>> form. I've
>> > been using readxl to extract the data which I need.
>> > Each of my metrics are stored in a particular cell. For each metric, I
>> create text
>> > files which stores my metrics.
>> >
>> >
>> >
>> > library(plyr)
>> >
>> > library(readxl)
>> >
>> >
>> >
>> > files <- list.files(pattern="*.xls", full.names = FALSE)
>> >
>> >
>> >
>> > # Extract Work Order
>> >
>> > WO <- lapply(files, read_excel, sheet="Sheet1", range=("B9")) WO_list
>> > <-
>> > as.data.frame(WO) trans_WO <- t(WO_list) write.table(trans_WO
>> > ,"WO.txt")
>> >
>> >
>> >
>> > # Extract bubble 14_1
>> >
>> > BUBBLE_14_1 <- lapply(files, read_excel, sheet="Sheet1",
>> > range=("c46")) BUBBLE_14_1_list <- as.data.frame(BUBBLE_14_1)
>> >
>> > trans_BUBBLE_14_1 <- t(BUBBLE_14_1_list)
>> >
>> >
>> >
>> > write.table(trans_BUBBLE_14_1,"BUBBLE_14_1.txt")
>> >
>> >
>> >
>> >
>> >
>> > # Extract bubble 14_2
>> >
>> > BUBBLE_14_2 <- lapply(files, read_excel, sheet="Sheet1",
>> > range=("c62")) BUBBLE_14_2_list <- as.data.frame(BUBBLE_14_2)
>> >
>> > trans_BUBBLE_14_2 <- t(BUBBLE_14_2_list)
>> >
>> > write.table(trans_BUBBLE_14_2,"BUBBLE_14_2.txt")
>> >
>> >
>> >
>> > After the text files have been created, I cut and paste the contents
>> > of
>> each
>> > text file to Excel.
>> >
>> > This has worked fine if the number of cells I am extracting from a
>> > file is
>> small.
>> >
>> > If the number gets larger, this method is inefficient.
>> >
>> >
>> >
>> > Any advice on how to do this would be appreciated.
>> >
>> >
>> >
>> > All the best,
>> >
>> >
>> >
>> > Thomas Subia
>> >
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-
>> > guide.html and provide commented, minimal, self-contained,
>> > reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: readxl question

R help mailing list-2
I clearly didn't read well enough. As Petr pointed out, there is also
the col_names argument.

```
# Solution 4a

map_dfr(files, function(cur_file, ranges){
   map_dfc(ranges, function(cur_range, df){
     read_excel(cur_file, sheet = 1, col_names = cur_range, range =
cur_range)
   }, df = df)
}, ranges = ranges, .id = "filename")

```

On 2020-08-27 17:33, Ulrik Stervbo via R-help wrote:

> Hi Thomas,
>
> I am not familiar with the use of the range argument, but it seems to
> me that the cell value becomes the column name. This might be fine,
> but you might get into trouble if you have repeated cell values since
> as.data.frame() will fix these.
>
> I am also not sure about what you want, but this seems to capture your
> example (reading the same cells in a number of files):
>
> ```
> library(readxl)
>
> # Create test set
> path <- readxl_example("geometry.xls")
>
> read_xls(path) # See the content
>
> example_file1 <- tempfile(fileext = ".xls")
> example_file2 <- tempfile(fileext = ".xls")
>
> file.copy(path, example_file1, overwrite = TRUE)
> file.copy(path, example_file2, overwrite = TRUE)
>
> # Solve the problem using loops
> files <- c(example_file1, example_file2)
> ranges <- c("B4", "C5", "D6")
>
> fr <- lapply(ranges, function(cur_range, files){
>   x <- lapply(files, read_xls, sheet = 1, range = cur_range)
>   t(as.data.frame(x))
> }, files = files)
>
> # Loop over fr and save content if needed
> ```
>
> A couple of variations over the theme, where the cell content is
> accessed after reading the file. This will not work well if the data
> in the excel files does not start at A1, but if you can adjust for
> this it should work just fine
>
> ```
> # Solution #2
>
> # Read the whole excel file, and access just the column - row
> # This will give really unexpected results if the data does not start
> in the
> # cell A1 as is the case for geometry.xls. Also, it does not work with
> ranges
> # spaning more than a single cell
> files <- rep(readxl_example("datasets.xlsx"), 3)
> ranges <- c("B4", "C5", "D6")
>
> # Loop over the files to avoid re-reading
> fr <- lapply(files, function(cur_file, ranges){
>   df <- read_excel(cur_file, sheet = 1)
>   x <- lapply(ranges, function(cur_range, df){
>     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
>     df[cr$row, cr$col][[1]]
>   }, df = df)
>   as.data.frame(setNames(x, ranges))
>
> }, ranges = ranges)
>
> # Solution 3
> # Like solution 2 but using purr
>
> library(purrr)
>
> files <- rep(readxl_example("datasets.xlsx"), 3)
> ranges <- c("B4", "C5", "D6")
>
> map_dfr(files, function(cur_file, ranges){
>   map_dfc(ranges, function(cur_range, df){
>     df <- read_excel(cur_file, sheet = 1)
>     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
>     setNames(df[cr$row, cr$col], cur_range)
>   }, df = df)
>
> }, ranges = ranges)
>
> # Solution 4
> # Like solution 3, but with the addition of the file name and producing
> a single
> # data.frame at the end
>
> library(purrr)
>
> path <- readxl_example("datasets.xls")
> example_file1 <- tempfile(fileext = "_1.xls")
> example_file2 <- tempfile(fileext = "_2.xls")
> example_file3 <- tempfile(fileext = "_3.xls")
>
> file.copy(path, example_file1, overwrite = TRUE)
> file.copy(path, example_file2, overwrite = TRUE)
> file.copy(path, example_file3, overwrite = TRUE)
>
> files <- c(example_file1, example_file2, example_file3)
>
> # Name the file paths with the file names. We can them make use of the
> .id
> # argument to map_dfr()
> files <- setNames(files, basename(files))
> ranges <- c("B4", "C5", "D6")
>
> map_dfr(files, function(cur_file, ranges){
>   map_dfc(ranges, function(cur_range, df){
>     df <- read_excel(cur_file, sheet = 1)
>     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
>     setNames(df[cr$row, cr$col], cur_range)
>   }, df = df)
> }, ranges = ranges, .id = "filename")
> ```
>
> HTH
> Ulrik
>
> On 2020-08-26 15:38, PIKAL Petr wrote:
>> Hi
>>
>> As OP has only about 250 files and in read_excel you cannot specify
>> several
>> ranges at once, reading those values separately and concatenating them
>> together in one step seems to be the most efficient way. One probably
>> could
>> design such function, but time spent on the function performing the
>> task
>> only once is probably bigger than performing 250*3 reads.
>>
>> I see inefficiency in writing each column into separate text file and
>> coppying it back to Excel file.
>>
>> Cheers
>> Petr
>>
>>> -----Original Message-----
>>> From: Upton, Stephen (Steve) (CIV) <[hidden email]>
>>> Sent: Wednesday, August 26, 2020 2:44 PM
>>> To: PIKAL Petr <[hidden email]>; Thomas Subia
>>> <[hidden email]>
>>> Cc: [hidden email]
>>> Subject: RE: [R] readxl question
>>>
>>> From your example, it appears you are reading in the same excel file
>>> for
>>> each function to get a value. I would look at creating a function
>>> that
>>> extracts what you need from each file all at once, rather than
>>> separate
>>> reads.
>>>
>>> Stephen C. Upton
>>> SEED (Simulation Experiments & Efficient Designs) Center for Data
>>> Farming
>>> SEED Center website: https://harvest.nps.edu
>>>
>>> -----Original Message-----
>>> From: R-help [mailto:[hidden email]] On Behalf Of PIKAL
>>> Petr
>>> Sent: Wednesday, August 26, 2020 3:50 AM
>>> To: Thomas Subia <[hidden email]>
>>> Cc: [hidden email]
>>> Subject: Re: [R] readxl question
>>>
>>> NPS WARNING: *external sender* verify before acting.
>>>
>>>
>>> Hi
>>>
>>>
>>> Are you sure that your command read values from respective cells?
>>>
>>> I tried it and got empty data frame with names
>>> > WO <- lapply(files, read_excel, sheet=1, range=("B3"))
>>> > as.data.frame(WO)
>>> [1] ano                 TP303               X96
>>> [4] X0                  X3.7519999999999998 X26.7
>>> <0 rows> (or 0-length row.names)
>>>
>>> To get data, col_names argument should be set to FALSE WO <-
>>> lapply(files,
>>> read_excel, sheet=1, range=("B3"), col_names=FALSE)
>>> WO2 <- lapply(files, read_excel, sheet=1, range=("B5"),
>>> col_names=FALSE)
>>>
>>> After that unlist and one rbind together with t should be enough to
>>> give
>> you
>>> one table WO <- unlist(WO)
>>> WO2 <- unlist(WO2)
>>> result <- t(rbind(WO, WO2))
>>> result
>>>      WO      WO2
>>> ...1 "ano"   "ano"
>>> ...1 "TP303" "261119/2"
>>> ...1 "96"    "288"
>>> ...1 "0"     "192"
>>> ...1 "3.752" "25.92094"
>>> ...1 "26.7"  "38.6"
>>> >
>>>
>>> And instead txt document you could do
>>>
>>> write.table(result, "result.xls", sep = "\t", row.names = F)
>>>
>>> And now "result.xls" is directly readable with Excel
>>>
>>> Cheers
>>> Petr
>>>
>>> >
>>> > -----Original Message-----
>>> > From: R-help <[hidden email]> On Behalf Of Thomas Subia
>>> > via R-help
>>> > Sent: Saturday, August 22, 2020 6:25 AM
>>> > To: [hidden email]
>>> > Subject: [R] readxl question
>>> >
>>> > Colleagues,
>>> >
>>> >
>>> >
>>> > I have 250 Excel files in a directory. Each of those files has the
>>> > same
>>> layout.
>>> > The problem is that the data in each Excel data is not in rectangular
>>> form. I've
>>> > been using readxl to extract the data which I need.
>>> > Each of my metrics are stored in a particular cell. For each metric, I
>>> create text
>>> > files which stores my metrics.
>>> >
>>> >
>>> >
>>> > library(plyr)
>>> >
>>> > library(readxl)
>>> >
>>> >
>>> >
>>> > files <- list.files(pattern="*.xls", full.names = FALSE)
>>> >
>>> >
>>> >
>>> > # Extract Work Order
>>> >
>>> > WO <- lapply(files, read_excel, sheet="Sheet1", range=("B9")) WO_list
>>> > <-
>>> > as.data.frame(WO) trans_WO <- t(WO_list) write.table(trans_WO
>>> > ,"WO.txt")
>>> >
>>> >
>>> >
>>> > # Extract bubble 14_1
>>> >
>>> > BUBBLE_14_1 <- lapply(files, read_excel, sheet="Sheet1",
>>> > range=("c46")) BUBBLE_14_1_list <- as.data.frame(BUBBLE_14_1)
>>> >
>>> > trans_BUBBLE_14_1 <- t(BUBBLE_14_1_list)
>>> >
>>> >
>>> >
>>> > write.table(trans_BUBBLE_14_1,"BUBBLE_14_1.txt")
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > # Extract bubble 14_2
>>> >
>>> > BUBBLE_14_2 <- lapply(files, read_excel, sheet="Sheet1",
>>> > range=("c62")) BUBBLE_14_2_list <- as.data.frame(BUBBLE_14_2)
>>> >
>>> > trans_BUBBLE_14_2 <- t(BUBBLE_14_2_list)
>>> >
>>> > write.table(trans_BUBBLE_14_2,"BUBBLE_14_2.txt")
>>> >
>>> >
>>> >
>>> > After the text files have been created, I cut and paste the contents
>>> > of
>>> each
>>> > text file to Excel.
>>> >
>>> > This has worked fine if the number of cells I am extracting from a
>>> > file is
>>> small.
>>> >
>>> > If the number gets larger, this method is inefficient.
>>> >
>>> >
>>> >
>>> > Any advice on how to do this would be appreciated.
>>> >
>>> >
>>> >
>>> > All the best,
>>> >
>>> >
>>> >
>>> > Thomas Subia
>>> >
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide http://www.R-project.org/posting-
>>> > guide.html and provide commented, minimal, self-contained,
>>> > reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.