Convert list of data frames to one data frame

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Convert list of data frames to one data frame

R help mailing list-2
I have a list of data frames which I would like to combine into one data
frame doing something like rbind. I wish to combine in column order and
not by names. However, there are issues.

The number of columns is not the same for each data frame. This is an
intermediate step to a problem and the number of columns could be
2,4,6,8,or10. There might be a few thousand data frames. Another problem
is that the names of the columns produced by the first step are garbage.

Below is a method that I obtained by asking a question on stack
overflow. Unfortunately, my example was not general enough. The code
below works for the simple case where the names of the people are
consistent. It does not work when the names are realistically not the same.

https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432 


Please note that the lapply step sets things up except for the column
name issue. If I could figure out a way to change the column names, then
the bind_rows step will, I believe, work.

So I really have two questions. How to change all column names of all
the data frames and then how to solve the original problem.

# The non general case works fine. It produces one data frame and I can
then change the column names to

# c("first1", "last1","first2", "last2","first3", "last3",)

#Non general easy case

employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),

data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),

data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
"Smith", "Adams")),

data.frame(first1 = ("Al"), second1 = "Jones"))

employees4BList

bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))

# This produces a nice list of data frames, except for the names

lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))

# This list is a disaster. I am looking for a solution that works in
this case.

employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),

data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),

data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
"Smith", "Adams")),

data.frame(first4 = ("Al"), second4 = "Jones2"))

  bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))

Thanks.

Ira


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

Sarah Goslee
Hi,

It isn't super clear to me what you're after. Is this what you intend?

> dfbycol(employees4BList)
  first1 last1 first2 last2 first3 last3
1     Al Jones   <NA>  <NA>   <NA>  <NA>
2     Al Jones   Barb Smith   <NA>  <NA>
3     Al Jones   Barb Smith  Carol Adams
4     Al Jones   <NA>  <NA>   <NA>  <NA>
>
> dfbycol(employees4List)
  first1  last1  first2 last2 first3 last3
1     Al  Jones    <NA>  <NA>   <NA>  <NA>
2    Al2  Jones    Barb Smith   <NA>  <NA>
3    Al3  Jones Barbara Smith  Carol Adams
4     Al Jones2    <NA>  <NA>   <NA>  <NA>


If so:

employees4BList = list(
data.frame(first1 = "Al", second1 = "Jones"),
data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
"Smith", "Adams")),
data.frame(first1 = ("Al"), second1 = "Jones"))

employees4List = list(
data.frame(first1 = ("Al"), second1 = "Jones"),
data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
"Smith", "Adams")),
data.frame(first4 = ("Al"), second4 = "Jones2"))

###

dfbycol <- function(x) {
  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
  x <- do.call(rbind, x)
  x <- data.frame(x, stringsAsFactors=FALSE)
  colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
  x
}

###

dfbycol(employees4BList)

dfbycol(employees4List)

On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
<[hidden email]> wrote:

> I have a list of data frames which I would like to combine into one data
> frame doing something like rbind. I wish to combine in column order and
> not by names. However, there are issues.
>
> The number of columns is not the same for each data frame. This is an
> intermediate step to a problem and the number of columns could be
> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
> is that the names of the columns produced by the first step are garbage.
>
> Below is a method that I obtained by asking a question on stack
> overflow. Unfortunately, my example was not general enough. The code
> below works for the simple case where the names of the people are
> consistent. It does not work when the names are realistically not the same.
>
> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>
>
> Please note that the lapply step sets things up except for the column
> name issue. If I could figure out a way to change the column names, then
> the bind_rows step will, I believe, work.
>
> So I really have two questions. How to change all column names of all
> the data frames and then how to solve the original problem.
>
> # The non general case works fine. It produces one data frame and I can
> then change the column names to
>
> # c("first1", "last1","first2", "last2","first3", "last3",)
>
> #Non general easy case
>
> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
>
> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>
> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
> "Smith", "Adams")),
>
> data.frame(first1 = ("Al"), second1 = "Jones"))
>
> employees4BList
>
> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))
>
> # This produces a nice list of data frames, except for the names
>
> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>
> # This list is a disaster. I am looking for a solution that works in
> this case.
>
> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
>
> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>
> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
> "Smith", "Adams")),
>
> data.frame(first4 = ("Al"), second4 = "Jones2"))
>
>   bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
>
> Thanks.
>
> Ira
>

--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

David Winsemius

> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <[hidden email]> wrote:
>
> Hi,
>
> It isn't super clear to me what you're after.

Agree.

Had a different read of ht erequest. Thought the request was for a first step that "harmonized" the names of the columns and then used `dplyr::bind_rows`:

library(dplyr)
 newList <- lapply( employees4List, 'names<-', names(employees4List[[1]]) )
 bind_rows(newList)

#---------

   first1 second1
1      Al   Jones
2     Al2   Jones
3    Barb   Smith
4     Al3   Jones
5 Barbara   Smith
6   Carol   Adams
7      Al  Jones2

Might want to wrap suppressWarnings around the right side of that assignment since there were many warnings regarding incongruent factor levels.

--
David.

> Is this what you intend?
>
>> dfbycol(employees4BList)
>  first1 last1 first2 last2 first3 last3
> 1     Al Jones   <NA>  <NA>   <NA>  <NA>
> 2     Al Jones   Barb Smith   <NA>  <NA>
> 3     Al Jones   Barb Smith  Carol Adams
> 4     Al Jones   <NA>  <NA>   <NA>  <NA>
>>
>> dfbycol(employees4List)
>  first1  last1  first2 last2 first3 last3
> 1     Al  Jones    <NA>  <NA>   <NA>  <NA>
> 2    Al2  Jones    Barb Smith   <NA>  <NA>
> 3    Al3  Jones Barbara Smith  Carol Adams
> 4     Al Jones2    <NA>  <NA>   <NA>  <NA>
>
>
> If so:
>
> employees4BList = list(
> data.frame(first1 = "Al", second1 = "Jones"),
> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
> "Smith", "Adams")),
> data.frame(first1 = ("Al"), second1 = "Jones"))
>
> employees4List = list(
> data.frame(first1 = ("Al"), second1 = "Jones"),
> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
> "Smith", "Adams")),
> data.frame(first4 = ("Al"), second4 = "Jones2"))
>
> ###
>
> dfbycol <- function(x) {
>  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>  x <- do.call(rbind, x)
>  x <- data.frame(x, stringsAsFactors=FALSE)
>  colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>  x
> }
>
> ###
>
> dfbycol(employees4BList)
>
> dfbycol(employees4List)
>
> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
> <[hidden email]> wrote:
>> I have a list of data frames which I would like to combine into one data
>> frame doing something like rbind. I wish to combine in column order and
>> not by names. However, there are issues.
>>
>> The number of columns is not the same for each data frame. This is an
>> intermediate step to a problem and the number of columns could be
>> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
>> is that the names of the columns produced by the first step are garbage.
>>
>> Below is a method that I obtained by asking a question on stack
>> overflow. Unfortunately, my example was not general enough. The code
>> below works for the simple case where the names of the people are
>> consistent. It does not work when the names are realistically not the same.
>>
>> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>
>>
>> Please note that the lapply step sets things up except for the column
>> name issue. If I could figure out a way to change the column names, then
>> the bind_rows step will, I believe, work.
>>
>> So I really have two questions. How to change all column names of all
>> the data frames and then how to solve the original problem.
>>
>> # The non general case works fine. It produces one data frame and I can
>> then change the column names to
>>
>> # c("first1", "last1","first2", "last2","first3", "last3",)
>>
>> #Non general easy case
>>
>> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
>>
>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>>
>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>>
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>
>> employees4BList
>>
>> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))
>>
>> # This produces a nice list of data frames, except for the names
>>
>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>
>> # This list is a disaster. I am looking for a solution that works in
>> this case.
>>
>> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
>>
>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>>
>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>>
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>
>>  bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
>>
>> Thanks.
>>
>> Ira
>>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

R help mailing list-2
 
Sarah and David,

Thank you for your responses.I will try and be clearer.

Base R solution: Sarah’smethod worked perfectly

Is there a dplyrsolution?

START: list of dataframes

FINISH: one data frame

DETAILS: The initiallist of data frames might have hundreds or a few thousand data frames. Everydata frame will have two columns. The first column will represent first names.The second column will represent last names. The column names are notconsistent. Data frames will most likely have from one to five rows.

SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames. Then somehow do an rbindeven though the number of columns differ from data frame to data frame.

EXAMPLE: List with twodata frames

# DF1

First          Last

George Washington

 

# DF2

Start              End

John               Adams

Thomas        Jefferson

 

# End Result. One dataframe

First1      Second1        First2           Second2

George Washington       NA                    NA

John               Adams    Thomas        Jefferson

 

DISCUSSION: As mentionedI posted something on Stack Overflow. Unfortunately, my example was not generalenough and so the suggested solutions worked on the easy case which I provided butnot when the names were different.

The suggested solution was:

library(dplyr)

bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))

 

On this site I pointedout that the inner function: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))

For each data frame correctlyspread the multiple rows into  1 by 2ndata frames. However, the column names were derived from the values and were amess. This caused a problem with bind_rows.

I felt that if I knewhow to change all the names of all of the data frames that were created afterlapply, then I could then use bind_rows. So if someone knows how to change allof the names at this intermediate stage, I hope that person will provide thesolution.

In  the end a 1 by 2 data frame would have namesFirst1      Second1. A 1 by 4 data framewould have names First1      Second1        First2           Second2.

Ira


    On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <[hidden email]> wrote:  
 
 
> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <[hidden email]> wrote:
>
> Hi,
>
> It isn't super clear to me what you're after.

Agree.

Had a different read of ht erequest. Thought the request was for a first step that "harmonized" the names of the columns and then used `dplyr::bind_rows`:

library(dplyr)
 newList <- lapply( employees4List, 'names<-', names(employees4List[[1]]) )
 bind_rows(newList)

#---------

  first1 second1
1      Al  Jones
2    Al2  Jones
3    Barb  Smith
4    Al3  Jones
5 Barbara  Smith
6  Carol  Adams
7      Al  Jones2

Might want to wrap suppressWarnings around the right side of that assignment since there were many warnings regarding incongruent factor levels.

--
David.

> Is this what you intend?
>
>> dfbycol(employees4BList)
>  first1 last1 first2 last2 first3 last3
> 1    Al Jones  <NA>  <NA>  <NA>  <NA>
> 2    Al Jones  Barb Smith  <NA>  <NA>
> 3    Al Jones  Barb Smith  Carol Adams
> 4    Al Jones  <NA>  <NA>  <NA>  <NA>
>>
>> dfbycol(employees4List)
>  first1  last1  first2 last2 first3 last3
> 1    Al  Jones    <NA>  <NA>  <NA>  <NA>
> 2    Al2  Jones    Barb Smith  <NA>  <NA>
> 3    Al3  Jones Barbara Smith  Carol Adams
> 4    Al Jones2    <NA>  <NA>  <NA>  <NA>
>
>
> If so:
>
> employees4BList = list(
> data.frame(first1 = "Al", second1 = "Jones"),
> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
> "Smith", "Adams")),
> data.frame(first1 = ("Al"), second1 = "Jones"))
>
> employees4List = list(
> data.frame(first1 = ("Al"), second1 = "Jones"),
> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
> "Smith", "Adams")),
> data.frame(first4 = ("Al"), second4 = "Jones2"))
>
> ###
>
> dfbycol <- function(x) {
>  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>  x <- do.call(rbind, x)
>  x <- data.frame(x, stringsAsFactors=FALSE)
>  colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>  x
> }
>
> ###
>
> dfbycol(employees4BList)
>
> dfbycol(employees4List)
>
> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
> <[hidden email]> wrote:
>> I have a list of data frames which I would like to combine into one data
>> frame doing something like rbind. I wish to combine in column order and
>> not by names. However, there are issues.
>>
>> The number of columns is not the same for each data frame. This is an
>> intermediate step to a problem and the number of columns could be
>> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
>> is that the names of the columns produced by the first step are garbage.
>>
>> Below is a method that I obtained by asking a question on stack
>> overflow. Unfortunately, my example was not general enough. The code
>> below works for the simple case where the names of the people are
>> consistent. It does not work when the names are realistically not the same.
>>
>> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>
>>
>> Please note that the lapply step sets things up except for the column
>> name issue. If I could figure out a way to change the column names, then
>> the bind_rows step will, I believe, work.
>>
>> So I really have two questions. How to change all column names of all
>> the data frames and then how to solve the original problem.
>>
>> # The non general case works fine. It produces one data frame and I can
>> then change the column names to
>>
>> # c("first1", "last1","first2", "last2","first3", "last3",)
>>
>> #Non general easy case
>>
>> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
>>
>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>>
>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>>
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>
>> employees4BList
>>
>> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))
>>
>> # This produces a nice list of data frames, except for the names
>>
>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>
>> # This list is a disaster. I am looking for a solution that works in
>> this case.
>>
>> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
>>
>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>>
>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>>
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>
>>  bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
>>
>> Thanks.
>>
>> Ira
>>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'  -Gehm's Corollary to Clarke's Third Law




 
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

Bert Gunter-2
Well, I don't know your constraints, of course; but if I understand
correctly, in situations like this, it is usually worthwhile to reconsider
your data structure.

This is a one-liner if you simply rbind all your data frames into one with
2 columns. Here's an example to indicate how:

## list of two data frames with different column names and numbers of rows:
zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a =
5:9,b = letters[11:15]))

## create common column names and bind them up:
do.call(rbind,lapply(zz,function(x){   names(x) <- c("first","last"); x}))

Note that the row names of the result tell you which original frame the
rows came from. This can also be obtained just from a count of rows (?nrow)
of the original list.

Apologies if I misunderstand or your query or your constraints make this
simple approach impossible.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jun 29, 2018 at 5:29 PM, Ira Sharenow via R-help <
[hidden email]> wrote:

>
> Sarah and David,
>
> Thank you for your responses.I will try and be clearer.
>
> Base R solution: Sarah’smethod worked perfectly
>
> Is there a dplyrsolution?
>
> START: list of dataframes
>
> FINISH: one data frame
>
> DETAILS: The initiallist of data frames might have hundreds or a few
> thousand data frames. Everydata frame will have two columns. The first
> column will represent first names.The second column will represent last
> names. The column names are notconsistent. Data frames will most likely
> have from one to five rows.
>
> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames.
> Then somehow do an rbindeven though the number of columns differ from data
> frame to data frame.
>
> EXAMPLE: List with twodata frames
>
> # DF1
>
> First          Last
>
> George Washington
>
>
>
> # DF2
>
> Start              End
>
> John               Adams
>
> Thomas        Jefferson
>
>
>
> # End Result. One dataframe
>
> First1      Second1        First2           Second2
>
> George Washington       NA                    NA
>
> John               Adams    Thomas        Jefferson
>
>
>
> DISCUSSION: As mentionedI posted something on Stack Overflow.
> Unfortunately, my example was not generalenough and so the suggested
> solutions worked on the easy case which I provided butnot when the names
> were different.
>
> The suggested solution was:
>
> library(dplyr)
>
> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>
>
>
> On this site I pointedout that the inner function: lapply(employees4List,
> function(x) rbind.data.frame(c(t(x))))
>
> For each data frame correctlyspread the multiple rows into  1 by 2ndata
> frames. However, the column names were derived from the values and were
> amess. This caused a problem with bind_rows.
>
> I felt that if I knewhow to change all the names of all of the data frames
> that were created afterlapply, then I could then use bind_rows. So if
> someone knows how to change allof the names at this intermediate stage, I
> hope that person will provide thesolution.
>
> In  the end a 1 by 2 data frame would have namesFirst1      Second1. A 1
> by 4 data framewould have names First1      Second1        First2
> Second2.
>
> Ira
>
>
>     On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <
> [hidden email]> wrote:
>
>
> > On Jun 29, 2018, at 7:28 AM, Sarah Goslee <[hidden email]>
> wrote:
> >
> > Hi,
> >
> > It isn't super clear to me what you're after.
>
> Agree.
>
> Had a different read of ht erequest. Thought the request was for a first
> step that "harmonized" the names of the columns and then used
> `dplyr::bind_rows`:
>
> library(dplyr)
>  newList <- lapply( employees4List, 'names<-', names(employees4List[[1]])
> )
>  bind_rows(newList)
>
> #---------
>
>   first1 second1
> 1      Al  Jones
> 2    Al2  Jones
> 3    Barb  Smith
> 4    Al3  Jones
> 5 Barbara  Smith
> 6  Carol  Adams
> 7      Al  Jones2
>
> Might want to wrap suppressWarnings around the right side of that
> assignment since there were many warnings regarding incongruent factor
> levels.
>
> --
> David.
> > Is this what you intend?
> >
> >> dfbycol(employees4BList)
> >  first1 last1 first2 last2 first3 last3
> > 1    Al Jones  <NA>  <NA>  <NA>  <NA>
> > 2    Al Jones  Barb Smith  <NA>  <NA>
> > 3    Al Jones  Barb Smith  Carol Adams
> > 4    Al Jones  <NA>  <NA>  <NA>  <NA>
> >>
> >> dfbycol(employees4List)
> >  first1  last1  first2 last2 first3 last3
> > 1    Al  Jones    <NA>  <NA>  <NA>  <NA>
> > 2    Al2  Jones    Barb Smith  <NA>  <NA>
> > 3    Al3  Jones Barbara Smith  Carol Adams
> > 4    Al Jones2    <NA>  <NA>  <NA>  <NA>
> >
> >
> > If so:
> >
> > employees4BList = list(
> > data.frame(first1 = "Al", second1 = "Jones"),
> > data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
> > data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
> > "Smith", "Adams")),
> > data.frame(first1 = ("Al"), second1 = "Jones"))
> >
> > employees4List = list(
> > data.frame(first1 = ("Al"), second1 = "Jones"),
> > data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
> > data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
> > "Smith", "Adams")),
> > data.frame(first4 = ("Al"), second4 = "Jones2"))
> >
> > ###
> >
> > dfbycol <- function(x) {
> >  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
> >  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
> >  x <- do.call(rbind, x)
> >  x <- data.frame(x, stringsAsFactors=FALSE)
> >  colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2),
> each=2))
> >  x
> > }
> >
> > ###
> >
> > dfbycol(employees4BList)
> >
> > dfbycol(employees4List)
> >
> > On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
> > <[hidden email]> wrote:
> >> I have a list of data frames which I would like to combine into one data
> >> frame doing something like rbind. I wish to combine in column order and
> >> not by names. However, there are issues.
> >>
> >> The number of columns is not the same for each data frame. This is an
> >> intermediate step to a problem and the number of columns could be
> >> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
> >> is that the names of the columns produced by the first step are garbage.
> >>
> >> Below is a method that I obtained by asking a question on stack
> >> overflow. Unfortunately, my example was not general enough. The code
> >> below works for the simple case where the names of the people are
> >> consistent. It does not work when the names are realistically not the
> same.
> >>
> >> https://stackoverflow.com/questions/50807970/converting-
> a-list-of-data-frames-not-a-simple-rbind-second-row-to-
> new-columns/50809432#50809432
> >>
> >>
> >> Please note that the lapply step sets things up except for the column
> >> name issue. If I could figure out a way to change the column names, then
> >> the bind_rows step will, I believe, work.
> >>
> >> So I really have two questions. How to change all column names of all
> >> the data frames and then how to solve the original problem.
> >>
> >> # The non general case works fine. It produces one data frame and I can
> >> then change the column names to
> >>
> >> # c("first1", "last1","first2", "last2","first3", "last3",)
> >>
> >> #Non general easy case
> >>
> >> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
> >>
> >> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
> >>
> >> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
> >> "Smith", "Adams")),
> >>
> >> data.frame(first1 = ("Al"), second1 = "Jones"))
> >>
> >> employees4BList
> >>
> >> bind_rows(lapply(employees4BList, function(x)
> rbind.data.frame(c(t(x)))))
> >>
> >> # This produces a nice list of data frames, except for the names
> >>
> >> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
> >>
> >> # This list is a disaster. I am looking for a solution that works in
> >> this case.
> >>
> >> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
> >>
> >> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
> >>
> >> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
> >> "Smith", "Adams")),
> >>
> >> data.frame(first4 = ("Al"), second4 = "Jones2"))
> >>
> >>  bind_rows(lapply(employees4List, function(x)
> rbind.data.frame(c(t(x)))))
> >>
> >> Thanks.
> >>
> >> Ira
> >>
> >
> > --
> > Sarah Goslee
> > http://www.functionaldiversity.org
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
> -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

R help mailing list-2
Bert,

Thanks for your idea. However, the end results is not what I am looking
for. Each initial data frame in the list will result in just one row in
the final data frame. In your case

Row 1 of the initial structure will become 1 b 2 c3d NA NA NA NA in the
end structure

Row 2 of the initial structure will become 5 k 6 l 7 m 8 n 9 o

Sarah’s code works

> dfbycol(zz)

first1 last1 first2 last2 first3 last3 first4 last4 first5 last5

one1b2c3d<NA><NA><NA><NA>

two5k6l7m8n9o


>

dfbycol <- function(x) {

x <- lapply(x, function(y)as.vector(t(as.matrix(y))))

x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})

x <- do.call(rbind, x)

x <- data.frame(x, stringsAsFactors=FALSE)

colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))

x

}

Thanks.

By the way I am working with a colleague on this. Apparently the data
came from reading in XML data.

Ira


On 6/29/2018 6:33 PM, Bert Gunter wrote:

> Well, I don't know your constraints, of course; but if I understand
> correctly, in situations like this, it is usually worthwhile to
> reconsider your data structure.
>
> This is a one-liner if you simply rbind all your data frames into one
> with 2 columns. Here's an example to indicate how:
>
> ## list of two data frames with different column names and numbers of
> rows:
> zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a =
> 5:9,b = letters[11:15]))
>
> ## create common column names and bind them up:
> do.call(rbind,lapply(zz,function(x){   names(x) <- c("first","last"); x}))
>
> Note that the row names of the result tell you which original frame
> the rows came from. This can also be obtained just from a count of
> rows (?nrow) of the original list.
>
> Apologies if I misunderstand or your query or your constraints make
> this simple approach impossible.
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Fri, Jun 29, 2018 at 5:29 PM, Ira Sharenow via R-help
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>
>     Sarah and David,
>
>     Thank you for your responses.I will try and be clearer.
>
>     Base R solution: Sarah’smethod worked perfectly
>
>     Is there a dplyrsolution?
>
>     START: list of dataframes
>
>     FINISH: one data frame
>
>     DETAILS: The initiallist of data frames might have hundreds or a
>     few thousand data frames. Everydata frame will have two columns.
>     The first column will represent first names.The second column will
>     represent last names. The column names are notconsistent. Data
>     frames will most likely have from one to five rows.
>
>     SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data
>     frames. Then somehow do an rbindeven though the number of columns
>     differ from data frame to data frame.
>
>     EXAMPLE: List with twodata frames
>
>     # DF1
>
>     First          Last
>
>     George Washington
>
>
>
>     # DF2
>
>     Start              End
>
>     John               Adams
>
>     Thomas        Jefferson
>
>
>
>     # End Result. One dataframe
>
>     First1      Second1        First2           Second2
>
>     George Washington       NA                    NA
>
>     John               Adams    Thomas        Jefferson
>
>
>
>     DISCUSSION: As mentionedI posted something on Stack Overflow.
>     Unfortunately, my example was not generalenough and so the
>     suggested solutions worked on the easy case which I provided
>     butnot when the names were different.
>
>     The suggested solution was:
>
>     library(dplyr)
>
>     bind_rows(lapply(employees4List,function(x)
>     rbind.data.frame(c(t(x)))))
>
>
>
>     On this site I pointedout that the inner function:
>     lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>
>     For each data frame correctlyspread the multiple rows into  1 by
>     2ndata frames. However, the column names were derived from the
>     values and were amess. This caused a problem with bind_rows.
>
>     I felt that if I knewhow to change all the names of all of the
>     data frames that were created afterlapply, then I could then use
>     bind_rows. So if someone knows how to change allof the names at
>     this intermediate stage, I hope that person will provide thesolution.
>
>     In  the end a 1 by 2 data frame would have namesFirst1 Second1. A
>     1 by 4 data framewould have names First1 Second1       
>     First2           Second2.
>
>     Ira
>
>
>         On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>
>     > On Jun 29, 2018, at 7:28 AM, Sarah Goslee
>     <[hidden email] <mailto:[hidden email]>> wrote:
>     >
>     > Hi,
>     >
>     > It isn't super clear to me what you're after.
>
>     Agree.
>
>     Had a different read of ht erequest. Thought the request was for a
>     first step that "harmonized" the names of the columns and then
>     used `dplyr::bind_rows`:
>
>     library(dplyr)
>      newList <- lapply( employees4List, 'names<-',
>     names(employees4List[[1]]) )
>      bind_rows(newList)
>
>     #---------
>
>       first1 second1
>     1      Al  Jones
>     2    Al2  Jones
>     3    Barb  Smith
>     4    Al3  Jones
>     5 Barbara  Smith
>     6  Carol  Adams
>     7      Al  Jones2
>
>     Might want to wrap suppressWarnings around the right side of that
>     assignment since there were many warnings regarding incongruent
>     factor levels.
>
>     --
>     David.
>     > Is this what you intend?
>     >
>     >> dfbycol(employees4BList)
>     >  first1 last1 first2 last2 first3 last3
>     > 1    Al Jones  <NA>  <NA>  <NA> <NA>
>     > 2    Al Jones  Barb Smith  <NA>  <NA>
>     > 3    Al Jones  Barb Smith  Carol Adams
>     > 4    Al Jones  <NA>  <NA>  <NA> <NA>
>     >>
>     >> dfbycol(employees4List)
>     >  first1  last1  first2 last2 first3 last3
>     > 1    Al  Jones    <NA>  <NA>  <NA> <NA>
>     > 2    Al2  Jones    Barb Smith  <NA>  <NA>
>     > 3    Al3  Jones Barbara Smith  Carol Adams
>     > 4    Al Jones2    <NA>  <NA>  <NA> <NA>
>     >
>     >
>     > If so:
>     >
>     > employees4BList = list(
>     > data.frame(first1 = "Al", second1 = "Jones"),
>     > data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>     > data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>     > "Smith", "Adams")),
>     > data.frame(first1 = ("Al"), second1 = "Jones"))
>     >
>     > employees4List = list(
>     > data.frame(first1 = ("Al"), second1 = "Jones"),
>     > data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones",
>     "Smith")),
>     > data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 =
>     c("Jones",
>     > "Smith", "Adams")),
>     > data.frame(first4 = ("Al"), second4 = "Jones2"))
>     >
>     > ###
>     >
>     > dfbycol <- function(x) {
>     >  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>     >  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>     >  x <- do.call(rbind, x)
>     >  x <- data.frame(x, stringsAsFactors=FALSE)
>     >  colnames(x) <- paste0(c("first", "last"), rep(seq(1,
>     ncol(x)/2), each=2))
>     >  x
>     > }
>     >
>     > ###
>     >
>     > dfbycol(employees4BList)
>     >
>     > dfbycol(employees4List)
>     >
>     > On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>     > <[hidden email] <mailto:[hidden email]>> wrote:
>     >> I have a list of data frames which I would like to combine into
>     one data
>     >> frame doing something like rbind. I wish to combine in column
>     order and
>     >> not by names. However, there are issues.
>     >>
>     >> The number of columns is not the same for each data frame. This
>     is an
>     >> intermediate step to a problem and the number of columns could be
>     >> 2,4,6,8,or10. There might be a few thousand data frames.
>     Another problem
>     >> is that the names of the columns produced by the first step are
>     garbage.
>     >>
>     >> Below is a method that I obtained by asking a question on stack
>     >> overflow. Unfortunately, my example was not general enough. The
>     code
>     >> below works for the simple case where the names of the people are
>     >> consistent. It does not work when the names are realistically
>     not the same.
>     >>
>     >>
>     https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>     <https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432>
>     >>
>     >>
>     >> Please note that the lapply step sets things up except for the
>     column
>     >> name issue. If I could figure out a way to change the column
>     names, then
>     >> the bind_rows step will, I believe, work.
>     >>
>     >> So I really have two questions. How to change all column names
>     of all
>     >> the data frames and then how to solve the original problem.
>     >>
>     >> # The non general case works fine. It produces one data frame
>     and I can
>     >> then change the column names to
>     >>
>     >> # c("first1", "last1","first2", "last2","first3", "last3",)
>     >>
>     >> #Non general easy case
>     >>
>     >> employees4BList = list(data.frame(first1 = "Al", second1 =
>     "Jones"),
>     >>
>     >> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones",
>     "Smith")),
>     >>
>     >> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>     >> "Smith", "Adams")),
>     >>
>     >> data.frame(first1 = ("Al"), second1 = "Jones"))
>     >>
>     >> employees4BList
>     >>
>     >> bind_rows(lapply(employees4BList, function(x)
>     rbind.data.frame(c(t(x)))))
>     >>
>     >> # This produces a nice list of data frames, except for the names
>     >>
>     >> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>     >>
>     >> # This list is a disaster. I am looking for a solution that
>     works in
>     >> this case.
>     >>
>     >> employees4List = list(data.frame(first1 = ("Al"), second1 =
>     "Jones"),
>     >>
>     >> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones",
>     "Smith")),
>     >>
>     >> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 =
>     c("Jones",
>     >> "Smith", "Adams")),
>     >>
>     >> data.frame(first4 = ("Al"), second4 = "Jones2"))
>     >>
>     >>  bind_rows(lapply(employees4List, function(x)
>     rbind.data.frame(c(t(x)))))
>     >>
>     >> Thanks.
>     >>
>     >> Ira
>     >>
>     >
>     > --
>     > Sarah Goslee
>     > http://www.functionaldiversity.org
>     <http://www.functionaldiversity.org>
>     >
>     > ______________________________________________
>     > [hidden email] <mailto:[hidden email]> mailing list
>     -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>     > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>     > and provide commented, minimal, self-contained, reproducible code.
>
>     David Winsemius
>     Alameda, CA, USA
>
>     'Any technology distinguishable from magic is insufficiently
>     advanced.'  -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>             [[alternative HTML version deleted]]
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>     and provide commented, minimal, self-contained, reproducible code.
>
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

Jeff Newmiller
In reply to this post by R help mailing list-2
Code below...

a) Just because something can be done with dplyr does not mean that is the
best way to do it. A solution in the hand is worth two on the Internet,
and dplyr is not always the fastest method anyway.

b) I highly recommend that you read Hadley Wickham's paper on tidy data
[1]. Also, having a group of one or more columns at all times that
uniquely identify where the data came from is a "key" to success [2].

c) Please read and follow one of the various online documents about making
reproducible examples in R (e.g. [3]). HTML formatting is really a pain
(at best... at worst, it corrupts your code) on a plain-text-only list
(you have read the Posting Guide, right?). Consider my example below as a
model for you to follow in the future, and make sure to set your email
program to send plain text. (Obviously your examples don't have to achieve
success... but they should bring us up to speed with where you are having
troubles IN R.)

[1] https://www.jstatsoft.org/article/view/v059i10
[2] http://r4ds.had.co.nz/relational-data.html#keys
[3] https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

----
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)

# note that these data frames all have character columns
# rather than factors, due to the as.is option when the
# data are read in.
DF1 <- read.table( text =
"First          Last
George          Washington
", header=TRUE, as.is = TRUE )

# dput looks ugly but is actually much more practical for
# providing R data on the mailing list... here is an example
dput( DF1 )
#> structure(list(First = "George", Last = "Washington")
#>, .Names = c("First",
#> "Last"), class = "data.frame", row.names = c(NA, -1L))

DF2 <- read.table( text =
"Start              End
John               Adams
Thomas        Jefferson
", header = TRUE, as.is = TRUE )

DFL <- list( DF1, DF2 )

# DFNames is a set of unique identifiers
DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
                   , data = DFL
                   )

DFL2 <- (   DFL1
         %>% mutate( data = lapply( data
                                  , function( DF ) {
                                      DF[[ ".PK" ]] <- seq.int( nrow( DF ))
                                      gather( DF, ".Col", "value", -.PK )
                                    }
                                  )
                   )
         %>% unnest
         %>% spread( .Col, value )
         )
DFL2
#> # A tibble: 3 x 6
#>   .DFNames   .PK End       First  Last       Start
#>   <chr>    <int> <chr>     <chr>  <chr>      <chr>
#> 1 DF1          1 <NA>      George Washington <NA>
#> 2 DF2          1 Adams     <NA>   <NA>       John
#> 3 DF2          2 Jefferson <NA>   <NA>       Thomas

#' Created on 2018-06-29 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
----

On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:

>
> Sarah and David,
>
> Thank you for your responses.I will try and be clearer.
>
> Base R solution: Sarah?smethod worked perfectly
>
> Is there a dplyrsolution?
>
> START: list of dataframes
>
> FINISH: one data frame
>
> DETAILS: The initiallist of data frames might have hundreds or a few thousand data frames. Everydata frame will have two columns. The first column will represent first names.The second column will represent last names. The column names are notconsistent. Data frames will most likely have from one to five rows.
>
> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames. Then somehow do an rbindeven though the number of columns differ from data frame to data frame.
>
> EXAMPLE: List with twodata frames
>
> # DF1
>
> First          Last
>
> George Washington
>
>  
>
> # DF2
>
> Start              End
>
> John               Adams
>
> Thomas        Jefferson
>
>  
>
> # End Result. One dataframe
>
> First1      Second1        First2           Second2
>
> George Washington       NA                    NA
>
> John               Adams    Thomas        Jefferson
>
>  
>
> DISCUSSION: As mentionedI posted something on Stack Overflow. Unfortunately, my example was not generalenough and so the suggested solutions worked on the easy case which I provided butnot when the names were different.
>
> The suggested solution was:
>
> library(dplyr)
>
> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>
>  
>
> On this site I pointedout that the inner function: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>
> For each data frame correctlyspread the multiple rows into  1 by 2ndata frames. However, the column names were derived from the values and were amess. This caused a problem with bind_rows.
>
> I felt that if I knewhow to change all the names of all of the data frames that were created afterlapply, then I could then use bind_rows. So if someone knows how to change allof the names at this intermediate stage, I hope that person will provide thesolution.
>
> In  the end a 1 by 2 data frame would have namesFirst1      Second1. A 1 by 4 data framewould have names First1      Second1        First2           Second2.
>
> Ira
>
>
>    On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <[hidden email]> wrote:
>
>
>> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <[hidden email]> wrote:
>>
>> Hi,
>>
>> It isn't super clear to me what you're after.
>
> Agree.
>
> Had a different read of ht erequest. Thought the request was for a first step that "harmonized" the names of the columns and then used `dplyr::bind_rows`:
>
> library(dplyr)
> newList <- lapply( employees4List, 'names<-', names(employees4List[[1]]) )
> bind_rows(newList)
>
> #---------
>
>   first1 second1
> 1      Al  Jones
> 2    Al2  Jones
> 3    Barb  Smith
> 4    Al3  Jones
> 5 Barbara  Smith
> 6  Carol  Adams
> 7      Al  Jones2
>
> Might want to wrap suppressWarnings around the right side of that assignment since there were many warnings regarding incongruent factor levels.
>
> --
> David.
>> Is this what you intend?
>>
>>> dfbycol(employees4BList)
>>   first1 last1 first2 last2 first3 last3
>> 1    Al Jones  <NA>  <NA>  <NA>  <NA>
>> 2    Al Jones  Barb Smith  <NA>  <NA>
>> 3    Al Jones  Barb Smith  Carol Adams
>> 4    Al Jones  <NA>  <NA>  <NA>  <NA>
>>>
>>> dfbycol(employees4List)
>>   first1  last1  first2 last2 first3 last3
>> 1    Al  Jones    <NA>  <NA>  <NA>  <NA>
>> 2    Al2  Jones    Barb Smith  <NA>  <NA>
>> 3    Al3  Jones Barbara Smith  Carol Adams
>> 4    Al Jones2    <NA>  <NA>  <NA>  <NA>
>>
>>
>> If so:
>>
>> employees4BList = list(
>> data.frame(first1 = "Al", second1 = "Jones"),
>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>
>> employees4List = list(
>> data.frame(first1 = ("Al"), second1 = "Jones"),
>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>
>> ###
>>
>> dfbycol <- function(x) {
>>   x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>>   x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>>   x <- do.call(rbind, x)
>>   x <- data.frame(x, stringsAsFactors=FALSE)
>>   colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>>   x
>> }
>>
>> ###
>>
>> dfbycol(employees4BList)
>>
>> dfbycol(employees4List)
>>
>> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>> <[hidden email]> wrote:
>>> I have a list of data frames which I would like to combine into one data
>>> frame doing something like rbind. I wish to combine in column order and
>>> not by names. However, there are issues.
>>>
>>> The number of columns is not the same for each data frame. This is an
>>> intermediate step to a problem and the number of columns could be
>>> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
>>> is that the names of the columns produced by the first step are garbage.
>>>
>>> Below is a method that I obtained by asking a question on stack
>>> overflow. Unfortunately, my example was not general enough. The code
>>> below works for the simple case where the names of the people are
>>> consistent. It does not work when the names are realistically not the same.
>>>
>>> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>>
>>>
>>> Please note that the lapply step sets things up except for the column
>>> name issue. If I could figure out a way to change the column names, then
>>> the bind_rows step will, I believe, work.
>>>
>>> So I really have two questions. How to change all column names of all
>>> the data frames and then how to solve the original problem.
>>>
>>> # The non general case works fine. It produces one data frame and I can
>>> then change the column names to
>>>
>>> # c("first1", "last1","first2", "last2","first3", "last3",)
>>>
>>> #Non general easy case
>>>
>>> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
>>>
>>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>>>
>>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>
>>> employees4BList
>>>
>>> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))
>>>
>>> # This produces a nice list of data frames, except for the names
>>>
>>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>>
>>> # This list is a disaster. I am looking for a solution that works in
>>> this case.
>>>
>>> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
>>>
>>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>>>
>>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>>
>>>   bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
>>>
>>> Thanks.
>>>
>>> Ira
>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'  -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

R help mailing list-2
 I would like to thank everyone who helped me out. I have obtained some offline help, so I would like to summarize all the information I have received.
Before I summarize the thread, there is one loose end.
Initially I thought
library(dplyr)
dplyr::bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
would work, but there were problems.
lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
spreads out the data frames converting the data frames from long to wide, but it messes up the names. So one question I still have, is how can I programmatically change all of the names?
After this initial step, the first data frame's names might be derived from
c("George", "Washington")
and the second data frame's names might be derived from
c("John", "Adams", "Thomas", "Jefferson")
What I want to change to the names to:
c("First1", "Second1")
and
c("First1", "Second1", "First2", "Second2")
I believe that will enable me to then go back and use bind_rows and complete that method of solution:
Step 1: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
Step 2: Clean up the names
Step 3: bind_rows
Immediately below is hopefully a clear and precise statement of the problem and the proposed solution path. Then there are the various solutions.
# Starting list of data frames
employees4List = list(data.frame(first1 = "Al", second1 = "Jones"),
                     data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
                     data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones", "Smith", "Adams")),
                     data.frame(first4 = ("Al"), second4 = "Jones2"))

employees4List


# Intermediate step that messes up the names but successfully converts from long to wide
lapply(employees4List, function(x) rbind.data.frame(c(t(x))))

# The intermediate list should likely look like this listFinal
df1 = data.frame(First1 = "Al", Second1 = "Jones", First2 = NA, Second2 = NA, First3 = NA, Second3 = NA,
                 First4 = NA, Second4 = NA)
df2 = data.frame(First1 = "Al2", Second1 = "Jones", First2 = "Barb", Second2 = "Smith",
                 First3 = NA, Second3 = NA, First4 = NA, Second4 = NA)

df3 = data.frame(First1 = "Al3", Second1 = "Jones", First2 = "Barbara", Second2 = "Smith",
                 First3 = "Carol", Second3 = "Adams", First4 = NA, Second4 = NA)
df4 = data.frame(First1 = "Al", Second1 = "Jones2", First2 = NA, Second2 = NA, First3 = NA, Second3 = NA,
                 First4 = NA, Second4 = NA)
listFinal = list(df1, df2, df3, df4)
listFinal

# Requested data frame (except that the columns are not just character but some are factor or even logical)
dplyr::bind_rows(listFinal)
Sarah Goslee solved the problem using base R.
Given
employees4List = list(
  data.frame(first1 = ("Al"), second1 = "Jones"),
  data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones2", "Smith")),
  data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones3",
                                                                "Smith", "Adams")),
  data.frame(first4 = ("Al"), second4 = "Jones2"))

This function produces the solution in the requested structure.
dfbycol <- function(x) {
  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
  x <- do.call(rbind, x)
  x <- data.frame(x, stringsAsFactors=FALSE)
  colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
  x
}
dfbycol(employees4List)
Offline, Jeff Newmiller and Bert Gunter provided alternative approaches to the problem as well as other advice. Their solutions meet the "tidy" criterion.
Bert suggested this online.
## list of two data frames with different column names and numbers of rows:
zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a = 5:9,b = letters[11:15]))
## create common column names and bind them up:
do.call(rbind,lapply(zz,function(x){   names(x) <- c("first","last"); x}))
This and the next suggestion by Jeff produced useful solutions but not in the requested form.
library(dplyr)
# note that these data frames all have character columns
# rather than factors, due to the as.is option when the
# data are read in.
DF1 <- read.table( text =
"First          Last
George          Washington
", header=TRUE, as.is = TRUE )
# dput looks ugly but is actually much more practical for
# providing R data on the mailing list... here is an example
dput( DF1 )
#> structure(list(First = "George", Last = "Washington")
#>, .Names = c("First",
#> "Last"), class = "data.frame", row.names = c(NA, -1L))

DF2 <- read.table( text =
"Start              End
John              Adams
Thomas        Jefferson
", header = TRUE, as.is = TRUE )

DFL <- list( DF1, DF2 )

# DFNames is a set of unique identifiers
DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
                  , data = DFL
                  )

DFL2 <- (  DFL1
        %>% mutate( data = lapply( data
                                  , function( DF ) {
                                      DF[[ ".PK" ]] <- seq.int( nrow( DF ))
                                      gather( DF, ".Col", "value", -.PK )
                                    }
                                  )
                  )
        %>% unnest
        %>% spread( .Col, value )
        )
DFL2
During the discussion, useful links were recommended
[1] https://www.jstatsoft.org/article/view/v059i10   Hadley on tidy data
[2] http://r4ds.had.co.nz/relational-data.html#keys  Hadley on relational data
[3] https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example  How to make a great reproducible example
http://adv-r.had.co.nz/Functionals.html     Improving lapply and related skills
Thanks again to everyone!
Ira




    On Friday, June 29, 2018, 7:47:13 PM PDT, Jeff Newmiller <[hidden email]> wrote:  
 
 Code below...

a) Just because something can be done with dplyr does not mean that is the
best way to do it. A solution in the hand is worth two on the Internet,
and dplyr is not always the fastest method anyway.

b) I highly recommend that you read Hadley Wickham's paper on tidy data
[1]. Also, having a group of one or more columns at all times that
uniquely identify where the data came from is a "key" to success [2].

c) Please read and follow one of the various online documents about making
reproducible examples in R (e.g. [3]). HTML formatting is really a pain
(at best... at worst, it corrupts your code) on a plain-text-only list
(you have read the Posting Guide, right?). Consider my example below as a
model for you to follow in the future, and make sure to set your email
program to send plain text. (Obviously your examples don't have to achieve
success... but they should bring us up to speed with where you are having
troubles IN R.)

[1] https://www.jstatsoft.org/article/view/v059i10
[2] http://r4ds.had.co.nz/relational-data.html#keys
[3] https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

----
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>    filter, lag
#> The following objects are masked from 'package:base':
#>
#>    intersect, setdiff, setequal, union
library(tidyr)

# note that these data frames all have character columns
# rather than factors, due to the as.is option when the
# data are read in.
DF1 <- read.table( text =
"First          Last
George          Washington
", header=TRUE, as.is = TRUE )

# dput looks ugly but is actually much more practical for
# providing R data on the mailing list... here is an example
dput( DF1 )
#> structure(list(First = "George", Last = "Washington")
#>, .Names = c("First",
#> "Last"), class = "data.frame", row.names = c(NA, -1L))

DF2 <- read.table( text =
"Start              End
John              Adams
Thomas        Jefferson
", header = TRUE, as.is = TRUE )

DFL <- list( DF1, DF2 )

# DFNames is a set of unique identifiers
DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
                  , data = DFL
                  )

DFL2 <- (  DFL1
        %>% mutate( data = lapply( data
                                  , function( DF ) {
                                      DF[[ ".PK" ]] <- seq.int( nrow( DF ))
                                      gather( DF, ".Col", "value", -.PK )
                                    }
                                  )
                  )
        %>% unnest
        %>% spread( .Col, value )
        )
DFL2
#> # A tibble: 3 x 6
#>  .DFNames  .PK End      First  Last      Start
#>  <chr>    <int> <chr>    <chr>  <chr>      <chr>
#> 1 DF1          1 <NA>      George Washington <NA>
#> 2 DF2          1 Adams    <NA>  <NA>      John
#> 3 DF2          2 Jefferson <NA>  <NA>      Thomas

#' Created on 2018-06-29 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
----

On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:

>
> Sarah and David,
>
> Thank you for your responses.I will try and be clearer.
>
> Base R solution: Sarah?smethod worked perfectly
>
> Is there a dplyrsolution?
>
> START: list of dataframes
>
> FINISH: one data frame
>
> DETAILS: The initiallist of data frames might have hundreds or a few thousand data frames. Everydata frame will have two columns. The first column will represent first names.The second column will represent last names. The column names are notconsistent. Data frames will most likely have from one to five rows.
>
> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames. Then somehow do an rbindeven though the number of columns differ from data frame to data frame.
>
> EXAMPLE: List with twodata frames
>
> # DF1
>
> First          Last
>
> George Washington
>
>  
>
> # DF2
>
> Start              End
>
> John               Adams
>
> Thomas        Jefferson
>
>  
>
> # End Result. One dataframe
>
> First1      Second1        First2           Second2
>
> George Washington       NA                    NA
>
> John               Adams    Thomas        Jefferson
>
>  
>
> DISCUSSION: As mentionedI posted something on Stack Overflow. Unfortunately, my example was not generalenough and so the suggested solutions worked on the easy case which I provided butnot when the names were different.
>
> The suggested solution was:
>
> library(dplyr)
>
> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>
>  
>
> On this site I pointedout that the inner function: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>
> For each data frame correctlyspread the multiple rows into  1 by 2ndata frames. However, the column names were derived from the values and were amess. This caused a problem with bind_rows.
>
> I felt that if I knewhow to change all the names of all of the data frames that were created afterlapply, then I could then use bind_rows. So if someone knows how to change allof the names at this intermediate stage, I hope that person will provide thesolution.
>
> In  the end a 1 by 2 data frame would have namesFirst1      Second1. A 1 by 4 data framewould have names First1      Second1        First2           Second2.
>
> Ira
>
>
>    On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <[hidden email]> wrote:
>
>
>> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <[hidden email]> wrote:
>>
>> Hi,
>>
>> It isn't super clear to me what you're after.
>
> Agree.
>
> Had a different read of ht erequest. Thought the request was for a first step that "harmonized" the names of the columns and then used `dplyr::bind_rows`:
>
> library(dplyr)
> newList <- lapply( employees4List, 'names<-', names(employees4List[[1]]) )
> bind_rows(newList)
>
> #---------
>
>   first1 second1
> 1      Al  Jones
> 2    Al2  Jones
> 3    Barb  Smith
> 4    Al3  Jones
> 5 Barbara  Smith
> 6  Carol  Adams
> 7      Al  Jones2
>
> Might want to wrap suppressWarnings around the right side of that assignment since there were many warnings regarding incongruent factor levels.
>
> --
> David.
>> Is this what you intend?
>>
>>> dfbycol(employees4BList)
>>   first1 last1 first2 last2 first3 last3
>> 1    Al Jones  <NA>  <NA>  <NA>  <NA>
>> 2    Al Jones  Barb Smith  <NA>  <NA>
>> 3    Al Jones  Barb Smith  Carol Adams
>> 4    Al Jones  <NA>  <NA>  <NA>  <NA>
>>>
>>> dfbycol(employees4List)
>>   first1  last1  first2 last2 first3 last3
>> 1    Al  Jones    <NA>  <NA>  <NA>  <NA>
>> 2    Al2  Jones    Barb Smith  <NA>  <NA>
>> 3    Al3  Jones Barbara Smith  Carol Adams
>> 4    Al Jones2    <NA>  <NA>  <NA>  <NA>
>>
>>
>> If so:
>>
>> employees4BList = list(
>> data.frame(first1 = "Al", second1 = "Jones"),
>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>
>> employees4List = list(
>> data.frame(first1 = ("Al"), second1 = "Jones"),
>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>
>> ###
>>
>> dfbycol <- function(x) {
>>   x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>>   x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>>   x <- do.call(rbind, x)
>>   x <- data.frame(x, stringsAsFactors=FALSE)
>>   colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>>   x
>> }
>>
>> ###
>>
>> dfbycol(employees4BList)
>>
>> dfbycol(employees4List)
>>
>> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>> <[hidden email]> wrote:
>>> I have a list of data frames which I would like to combine into one data
>>> frame doing something like rbind. I wish to combine in column order and
>>> not by names. However, there are issues.
>>>
>>> The number of columns is not the same for each data frame. This is an
>>> intermediate step to a problem and the number of columns could be
>>> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
>>> is that the names of the columns produced by the first step are garbage.
>>>
>>> Below is a method that I obtained by asking a question on stack
>>> overflow. Unfortunately, my example was not general enough. The code
>>> below works for the simple case where the names of the people are
>>> consistent. It does not work when the names are realistically not the same.
>>>
>>> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>>
>>>
>>> Please note that the lapply step sets things up except for the column
>>> name issue. If I could figure out a way to change the column names, then
>>> the bind_rows step will, I believe, work.
>>>
>>> So I really have two questions. How to change all column names of all
>>> the data frames and then how to solve the original problem.
>>>
>>> # The non general case works fine. It produces one data frame and I can
>>> then change the column names to
>>>
>>> # c("first1", "last1","first2", "last2","first3", "last3",)
>>>
>>> #Non general easy case
>>>
>>> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
>>>
>>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>>>
>>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>
>>> employees4BList
>>>
>>> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))
>>>
>>> # This produces a nice list of data frames, except for the names
>>>
>>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>>
>>> # This list is a disaster. I am looking for a solution that works in
>>> this case.
>>>
>>> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
>>>
>>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>>>
>>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>>> "Smith", "Adams")),
>>>
>>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>>
>>>   bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
>>>
>>> Thanks.
>>>
>>> Ira
>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'  -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The    .....      .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.      ##.#.  Live Go...
                                      Live:  OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.      #.O#.  with
/Software/Embedded Controllers)              .OO#.      .OO#.  rocks...1k
---------------------------------------------------------------------------  
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

Jeff Newmiller
Your request is getting a bit complicated with so much re-hashing, but
here are three solutions: base only, a bit of dplyr, and dplyr+tidyr:

#########
# input data
employees4List = list(data.frame(first1 = "Al", second1 =
"Jones"),
                       data.frame(first2 = c("Al2", "Barb"),
                                  second2 = c("Jones", "Smith")),
                       data.frame(first3 = c("Al3", "Barbara",
"Carol"),
                                  second3 = c("Jones", "Smith",
"Adams")),
                       data.frame(first4 = ("Al"), second4 =
"Jones2"))
employees4List
#> [[1]]
#>   first1 second1
#> 1     Al   Jones
#>
#> [[2]]
#>   first2 second2
#> 1    Al2   Jones
#> 2   Barb   Smith
#>
#> [[3]]
#>    first3 second3
#> 1     Al3   Jones
#> 2 Barbara   Smith
#> 3   Carol   Adams
#>
#> [[4]]
#>   first4 second4
#> 1     Al  Jones2

# expected output
df1 = data.frame(First1 = "Al", Second1 = "Jones",
                  First2 = NA, Second2 = NA,
                  First3 = NA, Second3 = NA,
                  First4 = NA, Second4 = NA)
df2 = data.frame(First1 = "Al2", Second1 = "Jones",
                  First2 = "Barb", Second2 = "Smith",
                  First3 = NA, Second3 = NA,
                  First4 = NA, Second4 = NA)
df3 = data.frame(First1 = "Al3", Second1 = "Jones",
                  First2 = "Barbara", Second2 = "Smith",
                  First3 = "Carol", Second3 = "Adams",
                  First4 = NA, Second4 = NA)
df4 = data.frame(First1 = "Al", Second1 = "Jones2",
                  First2 = NA, Second2 = NA,
                  First3 = NA, Second3 = NA,
                  First4 = NA, Second4 = NA)
listFinal = list(df1, df2, df3, df4)
listFinal
#> [[1]]
#>   First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1     Al   Jones     NA      NA     NA      NA     NA      NA
#>
#> [[2]]
#>   First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1    Al2   Jones   Barb   Smith     NA      NA     NA      NA
#>
#> [[3]]
#>   First1 Second1  First2 Second2 First3 Second3 First4 Second4
#> 1    Al3   Jones Barbara   Smith  Carol   Adams     NA      NA
#>
#> [[4]]
#>   First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1     Al  Jones2     NA      NA     NA      NA     NA      NA

myrename1 <- function( DF, m ) {
   # if a pair of columns is not present, raise an error
   stopifnot( 2 == length( DF ) )
   n <- nrow( DF )
   # use memory layout of elements of matrix
   # t() automatically converts to matrix (nrow=2)
   # matrix(,nrow=1) re-interprets the column-major output of t()
   # as a single row matrix
   result <- as.data.frame( matrix( t( DF ), nrow = 1 )
                          , stringsAsFactors = FALSE
                          )
   if ( n < m ) {
     result[ , seq( 2 * n + 1, 2 * m ) ] <- NA
   }
   setNames( result
           , sprintf( "%s%d"
                    , c( "First", "Second" )
                       , rep( seq.int( m ), each = 2 )
                       )
           )
}

m <- max( unlist( lapply( employees4List, nrow ) ) )
listFinal1 <- lapply( employees4List, myrename1, m = m )
listFinal1
#> [[1]]
#>   First1 Second1 First2 Second2 First3 Second3
#> 1     Al   Jones     NA      NA     NA      NA
#>
#> [[2]]
#>   First1 Second1 First2 Second2 First3 Second3
#> 1    Al2   Jones   Barb   Smith     NA      NA
#>
#> [[3]]
#>   First1 Second1  First2 Second2 First3 Second3
#> 1    Al3   Jones Barbara   Smith  Carol   Adams
#>
#> [[4]]
#>   First1 Second1 First2 Second2 First3 Second3
#> 1     Al  Jones2     NA      NA     NA      NA
result1 <- do.call( rbind, listFinal1 )
result1
#>   First1 Second1  First2 Second2 First3 Second3
#> 1     Al   Jones    <NA>    <NA>   <NA>    <NA>
#> 2    Al2   Jones    Barb   Smith   <NA>    <NA>
#> 3    Al3   Jones Barbara   Smith  Carol   Adams
#> 4     Al  Jones2    <NA>    <NA>   <NA>    <NA>

myrename2 <- function( DF ) {
   # if a pair of columns is not present, raise an error
   stopifnot( 2 == length( DF ) )
   n <- nrow( DF )
   # use memory layout of elements of matrix
   # t() automatically converts to matrix (nrow=2)
   # matrix(,nrow=1) re-interprets the column-major output of t()
   # as a single row matrix
   setNames( as.data.frame( matrix( t( DF ), nrow = 1 )
                          , stringsAsFactors = FALSE
                          )
           , sprintf( "%s%d"
                    , c( "First", "Second" )
                    , rep( seq.int( n ), each = 2 )
                    )
           )
}

listFinal2 <- lapply( employees4List, myrename2 )
listFinal2
#> [[1]]
#>   First1 Second1
#> 1     Al   Jones
#>
#> [[2]]
#>   First1 Second1 First2 Second2
#> 1    Al2   Jones   Barb   Smith
#>
#> [[3]]
#>   First1 Second1  First2 Second2 First3 Second3
#> 1    Al3   Jones Barbara   Smith  Carol   Adams
#>
#> [[4]]
#>   First1 Second1
#> 1     Al  Jones2
result2 <- dplyr::bind_rows( listFinal2 )
result2
#>   First1 Second1  First2 Second2 First3 Second3
#> 1     Al   Jones    <NA>    <NA>   <NA>    <NA>
#> 2    Al2   Jones    Barb   Smith   <NA>    <NA>
#> 3    Al3   Jones Barbara   Smith  Carol   Adams
#> 4     Al  Jones2    <NA>    <NA>   <NA>    <NA>

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)
myrename3 <- function( DF ) {
   # if a pair of columns is not present, raise an error
   stopifnot( 2 == length( DF ) )
   names( DF ) <- c( "a", "b" )
   m <- nrow( DF )
   (   DF
   %>% mutate_all( as.character )
   %>% mutate( rw = LETTERS[ seq.int( n() ) ] )
   %>% gather( col, val, -rw )
   %>% tidyr::unite( "labels", rw, col, sep="" )
   %>% spread( labels, val )
   %>% setNames( sprintf( "%s%d"
                        , c( "First", "Second" )
                        , rep( seq.int( m ), each = 2 )
                        )
               )
   )
}

listFinal3 <- lapply( employees4List, myrename3 )
listFinal3
#> [[1]]
#>   First1 Second1
#> 1     Al   Jones
#>
#> [[2]]
#>   First1 Second1 First2 Second2
#> 1    Al2   Jones   Barb   Smith
#>
#> [[3]]
#>   First1 Second1  First2 Second2 First3 Second3
#> 1    Al3   Jones Barbara   Smith  Carol   Adams
#>
#> [[4]]
#>   First1 Second1
#> 1     Al  Jones2
result3 <- dplyr::bind_rows( listFinal3 )
result3
#>   First1 Second1  First2 Second2 First3 Second3
#> 1     Al   Jones    <NA>    <NA>   <NA>    <NA>
#> 2    Al2   Jones    Barb   Smith   <NA>    <NA>
#> 3    Al3   Jones Barbara   Smith  Carol   Adams
#> 4     Al  Jones2    <NA>    <NA>   <NA>    <NA>

#' Created on 2018-06-30 by the [reprex
package](http://reprex.tidyverse.org) (v0.2.0).
#########

On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:

> I would like to thank everyone who helped me out. I have obtained some offline help, so I would like to summarize all the information I have received.
> Before I summarize the thread, there is one loose end.
> Initially I thought
> library(dplyr)
> dplyr::bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
> would work, but there were problems.
> lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
> spreads out the data frames converting the data frames from long to wide, but it messes up the names. So one question I still have, is how can I programmatically change all of the names?
> After this initial step, the first data frame's names might be derived from
> c("George", "Washington")
> and the second data frame's names might be derived from
> c("John", "Adams", "Thomas", "Jefferson")
> What I want to change to the names to:
> c("First1", "Second1")
> and
> c("First1", "Second1", "First2", "Second2")
> I believe that will enable me to then go back and use bind_rows and complete that method of solution:
> Step 1: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
> Step 2: Clean up the names
> Step 3: bind_rows
> Immediately below is hopefully a clear and precise statement of the problem and the proposed solution path. Then there are the various solutions.
> # Starting list of data frames
> employees4List = list(data.frame(first1 = "Al", second1 = "Jones"),
>                      data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>                      data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones", "Smith", "Adams")),
>                      data.frame(first4 = ("Al"), second4 = "Jones2"))
>
> employees4List
>
>
> # Intermediate step that messes up the names but successfully converts from long to wide
> lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>
> # The intermediate list should likely look like this listFinal
> df1 = data.frame(First1 = "Al", Second1 = "Jones", First2 = NA, Second2 = NA, First3 = NA, Second3 = NA,
>                  First4 = NA, Second4 = NA)
> df2 = data.frame(First1 = "Al2", Second1 = "Jones", First2 = "Barb", Second2 = "Smith",
>                  First3 = NA, Second3 = NA, First4 = NA, Second4 = NA)
>
> df3 = data.frame(First1 = "Al3", Second1 = "Jones", First2 = "Barbara", Second2 = "Smith",
>                  First3 = "Carol", Second3 = "Adams", First4 = NA, Second4 = NA)
> df4 = data.frame(First1 = "Al", Second1 = "Jones2", First2 = NA, Second2 = NA, First3 = NA, Second3 = NA,
>                  First4 = NA, Second4 = NA)
> listFinal = list(df1, df2, df3, df4)
> listFinal
>
> # Requested data frame (except that the columns are not just character but some are factor or even logical)
> dplyr::bind_rows(listFinal)
> Sarah Goslee solved the problem using base R.
> Given
> employees4List = list(
>   data.frame(first1 = ("Al"), second1 = "Jones"),
>   data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones2", "Smith")),
>   data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones3",
>                                                                 "Smith", "Adams")),
>   data.frame(first4 = ("Al"), second4 = "Jones2"))
>
> This function produces the solution in the requested structure.
> dfbycol <- function(x) {
>   x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>   x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>   x <- do.call(rbind, x)
>   x <- data.frame(x, stringsAsFactors=FALSE)
>   colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>   x
> }
> dfbycol(employees4List)
> Offline, Jeff Newmiller and Bert Gunter provided alternative approaches to the problem as well as other advice. Their solutions meet the "tidy" criterion.
> Bert suggested this online.
> ## list of two data frames with different column names and numbers of rows:
> zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a = 5:9,b = letters[11:15]))
> ## create common column names and bind them up:
> do.call(rbind,lapply(zz,function(x){   names(x) <- c("first","last"); x}))
> This and the next suggestion by Jeff produced useful solutions but not in the requested form.
> library(dplyr)
> # note that these data frames all have character columns
> # rather than factors, due to the as.is option when the
> # data are read in.
> DF1 <- read.table( text =
> "First          Last
> George          Washington
> ", header=TRUE, as.is = TRUE )
> # dput looks ugly but is actually much more practical for
> # providing R data on the mailing list... here is an example
> dput( DF1 )
> #> structure(list(First = "George", Last = "Washington")
> #>, .Names = c("First",
> #> "Last"), class = "data.frame", row.names = c(NA, -1L))
>
> DF2 <- read.table( text =
> "Start              End
> John              Adams
> Thomas        Jefferson
> ", header = TRUE, as.is = TRUE )
>
> DFL <- list( DF1, DF2 )
>
> # DFNames is a set of unique identifiers
> DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
>                   , data = DFL
>                   )
>
> DFL2 <- (  DFL1
>         %>% mutate( data = lapply( data
>                                   , function( DF ) {
>                                       DF[[ ".PK" ]] <- seq.int( nrow( DF ))
>                                       gather( DF, ".Col", "value", -.PK )
>                                     }
>                                   )
>                   )
>         %>% unnest
>         %>% spread( .Col, value )
>         )
> DFL2
> During the discussion, useful links were recommended
> [1] https://www.jstatsoft.org/article/view/v059i10   Hadley on tidy data
> [2] http://r4ds.had.co.nz/relational-data.html#keys  Hadley on relational data
> [3] https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example  How to make a great reproducible example
> http://adv-r.had.co.nz/Functionals.html     Improving lapply and related skills
> Thanks again to everyone!
> Ira
>
>
>
>
>    On Friday, June 29, 2018, 7:47:13 PM PDT, Jeff Newmiller <[hidden email]> wrote:
>
> Code below...
>
> a) Just because something can be done with dplyr does not mean that is the
> best way to do it. A solution in the hand is worth two on the Internet,
> and dplyr is not always the fastest method anyway.
>
> b) I highly recommend that you read Hadley Wickham's paper on tidy data
> [1]. Also, having a group of one or more columns at all times that
> uniquely identify where the data came from is a "key" to success [2].
>
> c) Please read and follow one of the various online documents about making
> reproducible examples in R (e.g. [3]). HTML formatting is really a pain
> (at best... at worst, it corrupts your code) on a plain-text-only list
> (you have read the Posting Guide, right?). Consider my example below as a
> model for you to follow in the future, and make sure to set your email
> program to send plain text. (Obviously your examples don't have to achieve
> success... but they should bring us up to speed with where you are having
> troubles IN R.)
>
> [1] https://www.jstatsoft.org/article/view/v059i10
> [2] http://r4ds.had.co.nz/relational-data.html#keys
> [3] https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> ----
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #>    filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #>    intersect, setdiff, setequal, union
> library(tidyr)
>
> # note that these data frames all have character columns
> # rather than factors, due to the as.is option when the
> # data are read in.
> DF1 <- read.table( text =
> "First          Last
> George          Washington
> ", header=TRUE, as.is = TRUE )
>
> # dput looks ugly but is actually much more practical for
> # providing R data on the mailing list... here is an example
> dput( DF1 )
> #> structure(list(First = "George", Last = "Washington")
> #>, .Names = c("First",
> #> "Last"), class = "data.frame", row.names = c(NA, -1L))
>
> DF2 <- read.table( text =
> "Start              End
> John              Adams
> Thomas        Jefferson
> ", header = TRUE, as.is = TRUE )
>
> DFL <- list( DF1, DF2 )
>
> # DFNames is a set of unique identifiers
> DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
>                   , data = DFL
>                   )
>
> DFL2 <- (  DFL1
>         %>% mutate( data = lapply( data
>                                   , function( DF ) {
>                                       DF[[ ".PK" ]] <- seq.int( nrow( DF ))
>                                       gather( DF, ".Col", "value", -.PK )
>                                     }
>                                   )
>                   )
>         %>% unnest
>         %>% spread( .Col, value )
>         )
> DFL2
> #> # A tibble: 3 x 6
> #>  .DFNames  .PK End      First  Last      Start
> #>  <chr>    <int> <chr>    <chr>  <chr>      <chr>
> #> 1 DF1          1 <NA>      George Washington <NA>
> #> 2 DF2          1 Adams    <NA>  <NA>      John
> #> 3 DF2          2 Jefferson <NA>  <NA>      Thomas
>
> #' Created on 2018-06-29 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
> ----
>
> On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:
>
>>
>> Sarah and David,
>>
>> Thank you for your responses.I will try and be clearer.
>>
>> Base R solution: Sarah?smethod worked perfectly
>>
>> Is there a dplyrsolution?
>>
>> START: list of dataframes
>>
>> FINISH: one data frame
>>
>> DETAILS: The initiallist of data frames might have hundreds or a few thousand data frames. Everydata frame will have two columns. The first column will represent first names.The second column will represent last names. The column names are notconsistent. Data frames will most likely have from one to five rows.
>>
>> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames. Then somehow do an rbindeven though the number of columns differ from data frame to data frame.
>>
>> EXAMPLE: List with twodata frames
>>
>> # DF1
>>
>> First          Last
>>
>> George Washington
>>
>>  
>>
>> # DF2
>>
>> Start              End
>>
>> John               Adams
>>
>> Thomas        Jefferson
>>
>>  
>>
>> # End Result. One dataframe
>>
>> First1      Second1        First2           Second2
>>
>> George Washington       NA                    NA
>>
>> John               Adams    Thomas        Jefferson
>>
>>  
>>
>> DISCUSSION: As mentionedI posted something on Stack Overflow. Unfortunately, my example was not generalenough and so the suggested solutions worked on the easy case which I provided butnot when the names were different.
>>
>> The suggested solution was:
>>
>> library(dplyr)
>>
>> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>>
>>  
>>
>> On this site I pointedout that the inner function: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>>
>> For each data frame correctlyspread the multiple rows into  1 by 2ndata frames. However, the column names were derived from the values and were amess. This caused a problem with bind_rows.
>>
>> I felt that if I knewhow to change all the names of all of the data frames that were created afterlapply, then I could then use bind_rows. So if someone knows how to change allof the names at this intermediate stage, I hope that person will provide thesolution.
>>
>> In  the end a 1 by 2 data frame would have namesFirst1      Second1. A 1 by 4 data framewould have names First1      Second1        First2           Second2.
>>
>> Ira
>>
>>
>>     On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <[hidden email]> wrote:
>>
>>
>>> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> It isn't super clear to me what you're after.
>>
>> Agree.
>>
>> Had a different read of ht erequest. Thought the request was for a first step that "harmonized" the names of the columns and then used `dplyr::bind_rows`:
>>
>> library(dplyr)
>> newList <- lapply( employees4List, 'names<-', names(employees4List[[1]]) )
>> bind_rows(newList)
>>
>> #---------
>>
>>   first1 second1
>> 1      Al  Jones
>> 2    Al2  Jones
>> 3    Barb  Smith
>> 4    Al3  Jones
>> 5 Barbara  Smith
>> 6  Carol  Adams
>> 7      Al  Jones2
>>
>> Might want to wrap suppressWarnings around the right side of that assignment since there were many warnings regarding incongruent factor levels.
>>
>> --
>> David.
>>> Is this what you intend?
>>>
>>>> dfbycol(employees4BList)
>>>   first1 last1 first2 last2 first3 last3
>>> 1    Al Jones  <NA>  <NA>  <NA>  <NA>
>>> 2    Al Jones  Barb Smith  <NA>  <NA>
>>> 3    Al Jones  Barb Smith  Carol Adams
>>> 4    Al Jones  <NA>  <NA>  <NA>  <NA>
>>>>
>>>> dfbycol(employees4List)
>>>   first1  last1  first2 last2 first3 last3
>>> 1    Al  Jones    <NA>  <NA>  <NA>  <NA>
>>> 2    Al2  Jones    Barb Smith  <NA>  <NA>
>>> 3    Al3  Jones Barbara Smith  Carol Adams
>>> 4    Al Jones2    <NA>  <NA>  <NA>  <NA>
>>>
>>>
>>> If so:
>>>
>>> employees4BList = list(
>>> data.frame(first1 = "Al", second1 = "Jones"),
>>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>>> "Smith", "Adams")),
>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>
>>> employees4List = list(
>>> data.frame(first1 = ("Al"), second1 = "Jones"),
>>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>>> "Smith", "Adams")),
>>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>>
>>> ###
>>>
>>> dfbycol <- function(x) {
>>>   x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>>>   x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>>>   x <- do.call(rbind, x)
>>>   x <- data.frame(x, stringsAsFactors=FALSE)
>>>   colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>>>   x
>>> }
>>>
>>> ###
>>>
>>> dfbycol(employees4BList)
>>>
>>> dfbycol(employees4List)
>>>
>>> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>>> <[hidden email]> wrote:
>>>> I have a list of data frames which I would like to combine into one data
>>>> frame doing something like rbind. I wish to combine in column order and
>>>> not by names. However, there are issues.
>>>>
>>>> The number of columns is not the same for each data frame. This is an
>>>> intermediate step to a problem and the number of columns could be
>>>> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
>>>> is that the names of the columns produced by the first step are garbage.
>>>>
>>>> Below is a method that I obtained by asking a question on stack
>>>> overflow. Unfortunately, my example was not general enough. The code
>>>> below works for the simple case where the names of the people are
>>>> consistent. It does not work when the names are realistically not the same.
>>>>
>>>> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>>>
>>>>
>>>> Please note that the lapply step sets things up except for the column
>>>> name issue. If I could figure out a way to change the column names, then
>>>> the bind_rows step will, I believe, work.
>>>>
>>>> So I really have two questions. How to change all column names of all
>>>> the data frames and then how to solve the original problem.
>>>>
>>>> # The non general case works fine. It produces one data frame and I can
>>>> then change the column names to
>>>>
>>>> # c("first1", "last1","first2", "last2","first3", "last3",)
>>>>
>>>> #Non general easy case
>>>>
>>>> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
>>>>
>>>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>>>>
>>>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>>>> "Smith", "Adams")),
>>>>
>>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>>
>>>> employees4BList
>>>>
>>>> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))
>>>>
>>>> # This produces a nice list of data frames, except for the names
>>>>
>>>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>>>
>>>> # This list is a disaster. I am looking for a solution that works in
>>>> this case.
>>>>
>>>> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
>>>>
>>>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>>>>
>>>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>>>> "Smith", "Adams")),
>>>>
>>>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>>>
>>>>   bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
>>>>
>>>> Thanks.
>>>>
>>>> Ira
>>>>
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> 'Any technology distinguishable from magic is insufficiently advanced.'  -Gehm's Corollary to Clarke's Third Law
>>
>>
>>
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The    .....      .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.      ##.#.  Live Go...
>                                       Live:  OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.      #.O#.  with
> /Software/Embedded Controllers)              .OO#.      .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Convert list of data frames to one data frame

R help mailing list-2
 
My final post for thisthread!

Since I first asked myquestion on Stack Overflow, I posted all the solutions along with my timingstudy there.

https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/51129202#51129202

Thanks again toeveryone for their help.

Ira


    On Saturday, June 30, 2018, 6:11:00 PM PDT, Jeff Newmiller <[hidden email]> wrote:  
 
 Your request is getting a bit complicated with so much re-hashing, but
here are three solutions: base only, a bit of dplyr, and dplyr+tidyr:

#########
# input data
employees4List = list(data.frame(first1 = "Al", second1 =
"Jones"),
                      data.frame(first2 = c("Al2", "Barb"),
                                  second2 = c("Jones", "Smith")),
                      data.frame(first3 = c("Al3", "Barbara",
"Carol"),
                                  second3 = c("Jones", "Smith",
"Adams")),
                      data.frame(first4 = ("Al"), second4 =
"Jones2"))
employees4List
#> [[1]]
#>  first1 second1
#> 1    Al  Jones
#>
#> [[2]]
#>  first2 second2
#> 1    Al2  Jones
#> 2  Barb  Smith
#>
#> [[3]]
#>    first3 second3
#> 1    Al3  Jones
#> 2 Barbara  Smith
#> 3  Carol  Adams
#>
#> [[4]]
#>  first4 second4
#> 1    Al  Jones2

# expected output
df1 = data.frame(First1 = "Al", Second1 = "Jones",
                  First2 = NA, Second2 = NA,
                  First3 = NA, Second3 = NA,
                  First4 = NA, Second4 = NA)
df2 = data.frame(First1 = "Al2", Second1 = "Jones",
                  First2 = "Barb", Second2 = "Smith",
                  First3 = NA, Second3 = NA,
                  First4 = NA, Second4 = NA)
df3 = data.frame(First1 = "Al3", Second1 = "Jones",
                  First2 = "Barbara", Second2 = "Smith",
                  First3 = "Carol", Second3 = "Adams",
                  First4 = NA, Second4 = NA)
df4 = data.frame(First1 = "Al", Second1 = "Jones2",
                  First2 = NA, Second2 = NA,
                  First3 = NA, Second3 = NA,
                  First4 = NA, Second4 = NA)
listFinal = list(df1, df2, df3, df4)
listFinal
#> [[1]]
#>  First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1    Al  Jones    NA      NA    NA      NA    NA      NA
#>
#> [[2]]
#>  First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1    Al2  Jones  Barb  Smith    NA      NA    NA      NA
#>
#> [[3]]
#>  First1 Second1  First2 Second2 First3 Second3 First4 Second4
#> 1    Al3  Jones Barbara  Smith  Carol  Adams    NA      NA
#>
#> [[4]]
#>  First1 Second1 First2 Second2 First3 Second3 First4 Second4
#> 1    Al  Jones2    NA      NA    NA      NA    NA      NA

myrename1 <- function( DF, m ) {
  # if a pair of columns is not present, raise an error
  stopifnot( 2 == length( DF ) )
  n <- nrow( DF )
  # use memory layout of elements of matrix
  # t() automatically converts to matrix (nrow=2)
  # matrix(,nrow=1) re-interprets the column-major output of t()
  # as a single row matrix
  result <- as.data.frame( matrix( t( DF ), nrow = 1 )
                          , stringsAsFactors = FALSE
                          )
  if ( n < m ) {
    result[ , seq( 2 * n + 1, 2 * m ) ] <- NA
  }
  setNames( result
          , sprintf( "%s%d"
                    , c( "First", "Second" )
                      , rep( seq.int( m ), each = 2 )
                      )
          )
}

m <- max( unlist( lapply( employees4List, nrow ) ) )
listFinal1 <- lapply( employees4List, myrename1, m = m )
listFinal1
#> [[1]]
#>  First1 Second1 First2 Second2 First3 Second3
#> 1    Al  Jones    NA      NA    NA      NA
#>
#> [[2]]
#>  First1 Second1 First2 Second2 First3 Second3
#> 1    Al2  Jones  Barb  Smith    NA      NA
#>
#> [[3]]
#>  First1 Second1  First2 Second2 First3 Second3
#> 1    Al3  Jones Barbara  Smith  Carol  Adams
#>
#> [[4]]
#>  First1 Second1 First2 Second2 First3 Second3
#> 1    Al  Jones2    NA      NA    NA      NA
result1 <- do.call( rbind, listFinal1 )
result1
#>  First1 Second1  First2 Second2 First3 Second3
#> 1    Al  Jones    <NA>    <NA>  <NA>    <NA>
#> 2    Al2  Jones    Barb  Smith  <NA>    <NA>
#> 3    Al3  Jones Barbara  Smith  Carol  Adams
#> 4    Al  Jones2    <NA>    <NA>  <NA>    <NA>

myrename2 <- function( DF ) {
  # if a pair of columns is not present, raise an error
  stopifnot( 2 == length( DF ) )
  n <- nrow( DF )
  # use memory layout of elements of matrix
  # t() automatically converts to matrix (nrow=2)
  # matrix(,nrow=1) re-interprets the column-major output of t()
  # as a single row matrix
  setNames( as.data.frame( matrix( t( DF ), nrow = 1 )
                          , stringsAsFactors = FALSE
                          )
          , sprintf( "%s%d"
                    , c( "First", "Second" )
                    , rep( seq.int( n ), each = 2 )
                    )
          )
}

listFinal2 <- lapply( employees4List, myrename2 )
listFinal2
#> [[1]]
#>  First1 Second1
#> 1    Al  Jones
#>
#> [[2]]
#>  First1 Second1 First2 Second2
#> 1    Al2  Jones  Barb  Smith
#>
#> [[3]]
#>  First1 Second1  First2 Second2 First3 Second3
#> 1    Al3  Jones Barbara  Smith  Carol  Adams
#>
#> [[4]]
#>  First1 Second1
#> 1    Al  Jones2
result2 <- dplyr::bind_rows( listFinal2 )
result2
#>  First1 Second1  First2 Second2 First3 Second3
#> 1    Al  Jones    <NA>    <NA>  <NA>    <NA>
#> 2    Al2  Jones    Barb  Smith  <NA>    <NA>
#> 3    Al3  Jones Barbara  Smith  Carol  Adams
#> 4    Al  Jones2    <NA>    <NA>  <NA>    <NA>

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>    filter, lag
#> The following objects are masked from 'package:base':
#>
#>    intersect, setdiff, setequal, union
library(tidyr)
myrename3 <- function( DF ) {
  # if a pair of columns is not present, raise an error
  stopifnot( 2 == length( DF ) )
  names( DF ) <- c( "a", "b" )
  m <- nrow( DF )
  (  DF
  %>% mutate_all( as.character )
  %>% mutate( rw = LETTERS[ seq.int( n() ) ] )
  %>% gather( col, val, -rw )
  %>% tidyr::unite( "labels", rw, col, sep="" )
  %>% spread( labels, val )
  %>% setNames( sprintf( "%s%d"
                        , c( "First", "Second" )
                        , rep( seq.int( m ), each = 2 )
                        )
              )
  )
}

listFinal3 <- lapply( employees4List, myrename3 )
listFinal3
#> [[1]]
#>  First1 Second1
#> 1    Al  Jones
#>
#> [[2]]
#>  First1 Second1 First2 Second2
#> 1    Al2  Jones  Barb  Smith
#>
#> [[3]]
#>  First1 Second1  First2 Second2 First3 Second3
#> 1    Al3  Jones Barbara  Smith  Carol  Adams
#>
#> [[4]]
#>  First1 Second1
#> 1    Al  Jones2
result3 <- dplyr::bind_rows( listFinal3 )
result3
#>  First1 Second1  First2 Second2 First3 Second3
#> 1    Al  Jones    <NA>    <NA>  <NA>    <NA>
#> 2    Al2  Jones    Barb  Smith  <NA>    <NA>
#> 3    Al3  Jones Barbara  Smith  Carol  Adams
#> 4    Al  Jones2    <NA>    <NA>  <NA>    <NA>

#' Created on 2018-06-30 by the [reprex
package](http://reprex.tidyverse.org) (v0.2.0).
#########

On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:

> I would like to thank everyone who helped me out. I have obtained some offline help, so I would like to summarize all the information I have received.
> Before I summarize the thread, there is one loose end.
> Initially I thought
> library(dplyr)
> dplyr::bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
> would work, but there were problems.
> lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
> spreads out the data frames converting the data frames from long to wide, but it messes up the names. So one question I still have, is how can I programmatically change all of the names?
> After this initial step, the first data frame's names might be derived from
> c("George", "Washington")
> and the second data frame's names might be derived from
> c("John", "Adams", "Thomas", "Jefferson")
> What I want to change to the names to:
> c("First1", "Second1")
> and
> c("First1", "Second1", "First2", "Second2")
> I believe that will enable me to then go back and use bind_rows and complete that method of solution:
> Step 1: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
> Step 2: Clean up the names
> Step 3: bind_rows
> Immediately below is hopefully a clear and precise statement of the problem and the proposed solution path. Then there are the various solutions.
> # Starting list of data frames
> employees4List = list(data.frame(first1 = "Al", second1 = "Jones"),
>                      data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>                      data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones", "Smith", "Adams")),
>                      data.frame(first4 = ("Al"), second4 = "Jones2"))
>
> employees4List
>
>
> # Intermediate step that messes up the names but successfully converts from long to wide
> lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>
> # The intermediate list should likely look like this listFinal
> df1 = data.frame(First1 = "Al", Second1 = "Jones", First2 = NA, Second2 = NA, First3 = NA, Second3 = NA,
>                  First4 = NA, Second4 = NA)
> df2 = data.frame(First1 = "Al2", Second1 = "Jones", First2 = "Barb", Second2 = "Smith",
>                  First3 = NA, Second3 = NA, First4 = NA, Second4 = NA)
>
> df3 = data.frame(First1 = "Al3", Second1 = "Jones", First2 = "Barbara", Second2 = "Smith",
>                  First3 = "Carol", Second3 = "Adams", First4 = NA, Second4 = NA)
> df4 = data.frame(First1 = "Al", Second1 = "Jones2", First2 = NA, Second2 = NA, First3 = NA, Second3 = NA,
>                  First4 = NA, Second4 = NA)
> listFinal = list(df1, df2, df3, df4)
> listFinal
>
> # Requested data frame (except that the columns are not just character but some are factor or even logical)
> dplyr::bind_rows(listFinal)
> Sarah Goslee solved the problem using base R.
> Given
> employees4List = list(
>   data.frame(first1 = ("Al"), second1 = "Jones"),
>   data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones2", "Smith")),
>   data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones3",
>                                                                 "Smith", "Adams")),
>   data.frame(first4 = ("Al"), second4 = "Jones2"))
>
> This function produces the solution in the requested structure.
> dfbycol <- function(x) {
>   x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>   x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>   x <- do.call(rbind, x)
>   x <- data.frame(x, stringsAsFactors=FALSE)
>   colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>   x
> }
> dfbycol(employees4List)
> Offline, Jeff Newmiller and Bert Gunter provided alternative approaches to the problem as well as other advice. Their solutions meet the "tidy" criterion.
> Bert suggested this online.
> ## list of two data frames with different column names and numbers of rows:
> zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a = 5:9,b = letters[11:15]))
> ## create common column names and bind them up:
> do.call(rbind,lapply(zz,function(x){   names(x) <- c("first","last"); x}))
> This and the next suggestion by Jeff produced useful solutions but not in the requested form.
> library(dplyr)
> # note that these data frames all have character columns
> # rather than factors, due to the as.is option when the
> # data are read in.
> DF1 <- read.table( text =
> "First          Last
> George          Washington
> ", header=TRUE, as.is = TRUE )
> # dput looks ugly but is actually much more practical for
> # providing R data on the mailing list... here is an example
> dput( DF1 )
> #> structure(list(First = "George", Last = "Washington")
> #>, .Names = c("First",
> #> "Last"), class = "data.frame", row.names = c(NA, -1L))
>
> DF2 <- read.table( text =
> "Start              End
> John              Adams
> Thomas        Jefferson
> ", header = TRUE, as.is = TRUE )
>
> DFL <- list( DF1, DF2 )
>
> # DFNames is a set of unique identifiers
> DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
>                   , data = DFL
>                   )
>
> DFL2 <- (  DFL1
>         %>% mutate( data = lapply( data
>                                   , function( DF ) {
>                                       DF[[ ".PK" ]] <- seq.int( nrow( DF ))
>                                       gather( DF, ".Col", "value", -.PK )
>                                     }
>                                   )
>                   )
>         %>% unnest
>         %>% spread( .Col, value )
>         )
> DFL2
> During the discussion, useful links were recommended
> [1] https://www.jstatsoft.org/article/view/v059i10   Hadley on tidy data
> [2] http://r4ds.had.co.nz/relational-data.html#keys  Hadley on relational data
> [3] https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example  How to make a great reproducible example
> http://adv-r.had.co.nz/Functionals.html     Improving lapply and related skills
> Thanks again to everyone!
> Ira
>
>
>
>
>    On Friday, June 29, 2018, 7:47:13 PM PDT, Jeff Newmiller <[hidden email]> wrote:
>
> Code below...
>
> a) Just because something can be done with dplyr does not mean that is the
> best way to do it. A solution in the hand is worth two on the Internet,
> and dplyr is not always the fastest method anyway.
>
> b) I highly recommend that you read Hadley Wickham's paper on tidy data
> [1]. Also, having a group of one or more columns at all times that
> uniquely identify where the data came from is a "key" to success [2].
>
> c) Please read and follow one of the various online documents about making
> reproducible examples in R (e.g. [3]). HTML formatting is really a pain
> (at best... at worst, it corrupts your code) on a plain-text-only list
> (you have read the Posting Guide, right?). Consider my example below as a
> model for you to follow in the future, and make sure to set your email
> program to send plain text. (Obviously your examples don't have to achieve
> success... but they should bring us up to speed with where you are having
> troubles IN R.)
>
> [1] https://www.jstatsoft.org/article/view/v059i10
> [2] http://r4ds.had.co.nz/relational-data.html#keys
> [3] https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> ----
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #>    filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #>    intersect, setdiff, setequal, union
> library(tidyr)
>
> # note that these data frames all have character columns
> # rather than factors, due to the as.is option when the
> # data are read in.
> DF1 <- read.table( text =
> "First          Last
> George          Washington
> ", header=TRUE, as.is = TRUE )
>
> # dput looks ugly but is actually much more practical for
> # providing R data on the mailing list... here is an example
> dput( DF1 )
> #> structure(list(First = "George", Last = "Washington")
> #>, .Names = c("First",
> #> "Last"), class = "data.frame", row.names = c(NA, -1L))
>
> DF2 <- read.table( text =
> "Start              End
> John              Adams
> Thomas        Jefferson
> ", header = TRUE, as.is = TRUE )
>
> DFL <- list( DF1, DF2 )
>
> # DFNames is a set of unique identifiers
> DFL1 <- data_frame( .DFNames = sprintf( "DF%d", 1:2 )
>                   , data = DFL
>                   )
>
> DFL2 <- (  DFL1
>         %>% mutate( data = lapply( data
>                                   , function( DF ) {
>                                       DF[[ ".PK" ]] <- seq.int( nrow( DF ))
>                                       gather( DF, ".Col", "value", -.PK )
>                                     }
>                                   )
>                   )
>         %>% unnest
>         %>% spread( .Col, value )
>         )
> DFL2
> #> # A tibble: 3 x 6
> #>  .DFNames  .PK End      First  Last      Start
> #>  <chr>    <int> <chr>    <chr>  <chr>      <chr>
> #> 1 DF1          1 <NA>      George Washington <NA>
> #> 2 DF2          1 Adams    <NA>  <NA>      John
> #> 3 DF2          2 Jefferson <NA>  <NA>      Thomas
>
> #' Created on 2018-06-29 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
> ----
>
> On Sat, 30 Jun 2018, Ira Sharenow via R-help wrote:
>
>>
>> Sarah and David,
>>
>> Thank you for your responses.I will try and be clearer.
>>
>> Base R solution: Sarah?smethod worked perfectly
>>
>> Is there a dplyrsolution?
>>
>> START: list of dataframes
>>
>> FINISH: one data frame
>>
>> DETAILS: The initiallist of data frames might have hundreds or a few thousand data frames. Everydata frame will have two columns. The first column will represent first names.The second column will represent last names. The column names are notconsistent. Data frames will most likely have from one to five rows.
>>
>> SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames. Then somehow do an rbindeven though the number of columns differ from data frame to data frame.
>>
>> EXAMPLE: List with twodata frames
>>
>> # DF1
>>
>> First          Last
>>
>> George Washington
>>
>>  
>>
>> # DF2
>>
>> Start              End
>>
>> John               Adams
>>
>> Thomas        Jefferson
>>
>>  
>>
>> # End Result. One dataframe
>>
>> First1      Second1        First2           Second2
>>
>> George Washington       NA                    NA
>>
>> John               Adams    Thomas        Jefferson
>>
>>  
>>
>> DISCUSSION: As mentionedI posted something on Stack Overflow. Unfortunately, my example was not generalenough and so the suggested solutions worked on the easy case which I provided butnot when the names were different.
>>
>> The suggested solution was:
>>
>> library(dplyr)
>>
>> bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))
>>
>>  
>>
>> On this site I pointedout that the inner function: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>>
>> For each data frame correctlyspread the multiple rows into  1 by 2ndata frames. However, the column names were derived from the values and were amess. This caused a problem with bind_rows.
>>
>> I felt that if I knewhow to change all the names of all of the data frames that were created afterlapply, then I could then use bind_rows. So if someone knows how to change allof the names at this intermediate stage, I hope that person will provide thesolution.
>>
>> In  the end a 1 by 2 data frame would have namesFirst1      Second1. A 1 by 4 data framewould have names First1      Second1        First2           Second2.
>>
>> Ira
>>
>>
>>     On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <[hidden email]> wrote:
>>
>>
>>> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> It isn't super clear to me what you're after.
>>
>> Agree.
>>
>> Had a different read of ht erequest. Thought the request was for a first step that "harmonized" the names of the columns and then used `dplyr::bind_rows`:
>>
>> library(dplyr)
>> newList <- lapply( employees4List, 'names<-', names(employees4List[[1]]) )
>> bind_rows(newList)
>>
>> #---------
>>
>>   first1 second1
>> 1      Al  Jones
>> 2    Al2  Jones
>> 3    Barb  Smith
>> 4    Al3  Jones
>> 5 Barbara  Smith
>> 6  Carol  Adams
>> 7      Al  Jones2
>>
>> Might want to wrap suppressWarnings around the right side of that assignment since there were many warnings regarding incongruent factor levels.
>>
>> --
>> David.
>>> Is this what you intend?
>>>
>>>> dfbycol(employees4BList)
>>>   first1 last1 first2 last2 first3 last3
>>> 1    Al Jones  <NA>  <NA>  <NA>  <NA>
>>> 2    Al Jones  Barb Smith  <NA>  <NA>
>>> 3    Al Jones  Barb Smith  Carol Adams
>>> 4    Al Jones  <NA>  <NA>  <NA>  <NA>
>>>>
>>>> dfbycol(employees4List)
>>>   first1  last1  first2 last2 first3 last3
>>> 1    Al  Jones    <NA>  <NA>  <NA>  <NA>
>>> 2    Al2  Jones    Barb Smith  <NA>  <NA>
>>> 3    Al3  Jones Barbara Smith  Carol Adams
>>> 4    Al Jones2    <NA>  <NA>  <NA>  <NA>
>>>
>>>
>>> If so:
>>>
>>> employees4BList = list(
>>> data.frame(first1 = "Al", second1 = "Jones"),
>>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>>> "Smith", "Adams")),
>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>
>>> employees4List = list(
>>> data.frame(first1 = ("Al"), second1 = "Jones"),
>>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>>> "Smith", "Adams")),
>>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>>
>>> ###
>>>
>>> dfbycol <- function(x) {
>>>   x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>>>   x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>>>   x <- do.call(rbind, x)
>>>   x <- data.frame(x, stringsAsFactors=FALSE)
>>>   colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>>>   x
>>> }
>>>
>>> ###
>>>
>>> dfbycol(employees4BList)
>>>
>>> dfbycol(employees4List)
>>>
>>> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>>> <[hidden email]> wrote:
>>>> I have a list of data frames which I would like to combine into one data
>>>> frame doing something like rbind. I wish to combine in column order and
>>>> not by names. However, there are issues.
>>>>
>>>> The number of columns is not the same for each data frame. This is an
>>>> intermediate step to a problem and the number of columns could be
>>>> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
>>>> is that the names of the columns produced by the first step are garbage.
>>>>
>>>> Below is a method that I obtained by asking a question on stack
>>>> overflow. Unfortunately, my example was not general enough. The code
>>>> below works for the simple case where the names of the people are
>>>> consistent. It does not work when the names are realistically not the same.
>>>>
>>>> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>>>>
>>>>
>>>> Please note that the lapply step sets things up except for the column
>>>> name issue. If I could figure out a way to change the column names, then
>>>> the bind_rows step will, I believe, work.
>>>>
>>>> So I really have two questions. How to change all column names of all
>>>> the data frames and then how to solve the original problem.
>>>>
>>>> # The non general case works fine. It produces one data frame and I can
>>>> then change the column names to
>>>>
>>>> # c("first1", "last1","first2", "last2","first3", "last3",)
>>>>
>>>> #Non general easy case
>>>>
>>>> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
>>>>
>>>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>>>>
>>>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>>>> "Smith", "Adams")),
>>>>
>>>> data.frame(first1 = ("Al"), second1 = "Jones"))
>>>>
>>>> employees4BList
>>>>
>>>> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))
>>>>
>>>> # This produces a nice list of data frames, except for the names
>>>>
>>>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>>>>
>>>> # This list is a disaster. I am looking for a solution that works in
>>>> this case.
>>>>
>>>> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
>>>>
>>>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>>>>
>>>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>>>> "Smith", "Adams")),
>>>>
>>>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>>>>
>>>>   bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
>>>>
>>>> Thanks.
>>>>
>>>> Ira
>>>>
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> 'Any technology distinguishable from magic is insufficiently advanced.'  -Gehm's Corollary to Clarke's Third Law
>>
>>
>>
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The    .....      .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.      ##.#.  Live Go...
>                                       Live:  OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.      #.O#.  with
> /Software/Embedded Controllers)              .OO#.      .OO#.  rocks...1k
> ---------------------------------------------------------------------------
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The    .....      .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.      ##.#.  Live Go...
                                      Live:  OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.      #.O#.  with
/Software/Embedded Controllers)              .OO#.      .OO#.  rocks...1k
---------------------------------------------------------------------------  
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.