merging or joining 2 dataframes: merge, rbind.fill, etc.?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

merging or joining 2 dataframes: merge, rbind.fill, etc.?

Anika Masters
#I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
(mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
& df2, and all the columns in both df1 & df2. The solution should
"work" even if the 2 dataframes are identical, and even if the 2
dataframes do not have the same column names.  The rbind.fill function
seems to work.  For learning purposes, are there other "good" ways to
solve this problem, using merge or other functions other than
rbind.fill?

#e.g. These 3 examples all seem to "work" correctly and as I hoped:

df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
list( NULL ,  c('a' , 'b' , 'd') ) ) )
df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
mydf <- merge(df2, df1, all.y=T, all.x=T)
mydf

#e.g. this works:
library(reshape)
mydf <- rbind.fill(df1, df2)
mydf

#This works:
library(reshape)
mydf <- rbind.fill(df1, df2)
mydf

#But this does not (the 2 dataframes are identical)
df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
list( NULL ,  c('a' , 'b' , 'd') ) ) )
df2 <- df1
mydf <- merge(df2, df1, all.y=T, all.x=T)
mydf

#Any way to get "mere" to work for this final example? Any other good solutions?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging or joining 2 dataframes: merge, rbind.fill, etc.?

arun kirshna
Hi,

You could also try:
library(gtools)
smartbind(df2,df1)
#  a  b  d
#1 7 99 12
#2 7 99 12


When df1!=df2
smartbind(df1,df2)
#   a  b  d  x  y  c
#1  7 99 12 NA NA NA
#2 NA 34 88 12 44 56
A.K.




----- Original Message -----
From: Anika Masters <[hidden email]>
To: [hidden email]
Cc:
Sent: Tuesday, February 26, 2013 1:55 PM
Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?

#I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
(mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
& df2, and all the columns in both df1 & df2. The solution should
"work" even if the 2 dataframes are identical, and even if the 2
dataframes do not have the same column names.  The rbind.fill function
seems to work.  For learning purposes, are there other "good" ways to
solve this problem, using merge or other functions other than
rbind.fill?

#e.g. These 3 examples all seem to "work" correctly and as I hoped:

df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
list( NULL ,  c('a' , 'b' , 'd') ) ) )
df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
mydf <- merge(df2, df1, all.y=T, all.x=T)
mydf

#e.g. this works:
library(reshape)
mydf <- rbind.fill(df1, df2)
mydf

#This works:
library(reshape)
mydf <- rbind.fill(df1, df2)
mydf

#But this does not (the 2 dataframes are identical)
df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
list( NULL ,  c('a' , 'b' , 'd') ) ) )
df2 <- df1
mydf <- merge(df2, df1, all.y=T, all.x=T)
mydf

#Any way to get "mere" to work for this final example? Any other good solutions?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging or joining 2 dataframes: merge, rbind.fill, etc.?

Nordlund, Dan (DSHS/RDA)
In reply to this post by Anika Masters
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of Anika Masters
> Sent: Tuesday, February 26, 2013 10:56 AM
> To: [hidden email]
> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?
>
> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
> (mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
> & df2, and all the columns in both df1 & df2. The solution should
> "work" even if the 2 dataframes are identical, and even if the 2
> dataframes do not have the same column names.  The rbind.fill function
> seems to work.  For learning purposes, are there other "good" ways to
> solve this problem, using merge or other functions other than
> rbind.fill?
>
> #e.g. These 3 examples all seem to "work" correctly and as I hoped:
>
> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
> list( NULL ,  c('a' , 'b' , 'd') ) ) )
> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
> dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
> mydf <- merge(df2, df1, all.y=T, all.x=T)
> mydf
>
> #e.g. this works:
> library(reshape)
> mydf <- rbind.fill(df1, df2)
> mydf
>
> #This works:
> library(reshape)
> mydf <- rbind.fill(df1, df2)
> mydf
>
> #But this does not (the 2 dataframes are identical)
> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
> list( NULL ,  c('a' , 'b' , 'd') ) ) )
> df2 <- df1
> mydf <- merge(df2, df1, all.y=T, all.x=T)
> mydf
>
> #Any way to get "mere" to work for this final example? Any other good
> solutions?

If rbind.fill(df1,df2) works, why do you need to use merge?

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging or joining 2 dataframes: merge, rbind.fill, etc.?

David Carlson
In reply to this post by arun kirshna
Clumsy but it doesn't require any packages:

merge2 <- function(x, y) {
if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){
    rbind(x, y)
    } else merge(x, y, all=TRUE)
}
merge2(df1, df2)
df3 <- df1
merge2(df1, df3)

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352




> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of arun
> Sent: Tuesday, February 26, 2013 1:14 PM
> To: Anika Masters
> Cc: R help
> Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
> etc.?
>
> Hi,
>
> You could also try:
> library(gtools)
> smartbind(df2,df1)
> #  a  b  d
> #1 7 99 12
> #2 7 99 12
>
>
> When df1!=df2
> smartbind(df1,df2)
> #   a  b  d  x  y  c
> #1  7 99 12 NA NA NA
> #2 NA 34 88 12 44 56
> A.K.
>
>
>
>
> ----- Original Message -----
> From: Anika Masters <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Tuesday, February 26, 2013 1:55 PM
> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?
>
> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
> (mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
> & df2, and all the columns in both df1 & df2. The solution should
> "work" even if the 2 dataframes are identical, and even if the 2
> dataframes do not have the same column names.  The rbind.fill function
> seems to work.  For learning purposes, are there other "good" ways to
> solve this problem, using merge or other functions other than
> rbind.fill?
>
> #e.g. These 3 examples all seem to "work" correctly and as I hoped:
>
> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
> list( NULL ,  c('a' , 'b' , 'd') ) ) )
> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
> dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
> mydf <- merge(df2, df1, all.y=T, all.x=T)
> mydf
>
> #e.g. this works:
> library(reshape)
> mydf <- rbind.fill(df1, df2)
> mydf
>
> #This works:
> library(reshape)
> mydf <- rbind.fill(df1, df2)
> mydf
>
> #But this does not (the 2 dataframes are identical)
> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
> list( NULL ,  c('a' , 'b' , 'd') ) ) )
> df2 <- df1
> mydf <- merge(df2, df1, all.y=T, all.x=T)
> mydf
>
> #Any way to get "mere" to work for this final example? Any other good
> solutions?
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging or joining 2 dataframes: merge, rbind.fill, etc.?

Anika Masters
Thanks Arun and David.  Another issue I am running into are memory
issues when one of the data frames I'm trying to rbind to or merge
with are "very large".  (This is a repetitive  problem, as I am trying
to merge/rbind thousands of small dataframes into a single "very
large" dataframe.)



I'm thinking of creating a function that creates an empty dataframe to
which I can add data, but will need to first determine and ensure that
each dataframe has the exact same columns, in the exact same
"location".



Before I write any new code, is there any pre-existing functions or
code that might solve this problem of "merging small or medium sized
dataframes with a "very large" dataframe.)

On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson <[hidden email]> wrote:

> Clumsy but it doesn't require any packages:
>
> merge2 <- function(x, y) {
> if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){
>     rbind(x, y)
>     } else merge(x, y, all=TRUE)
> }
> merge2(df1, df2)
> df3 <- df1
> merge2(df1, df3)
>
> ----------------------------------------------
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:r-help-bounces@r-
>> project.org] On Behalf Of arun
>> Sent: Tuesday, February 26, 2013 1:14 PM
>> To: Anika Masters
>> Cc: R help
>> Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
>> etc.?
>>
>> Hi,
>>
>> You could also try:
>> library(gtools)
>> smartbind(df2,df1)
>> #  a  b  d
>> #1 7 99 12
>> #2 7 99 12
>>
>>
>> When df1!=df2
>> smartbind(df1,df2)
>> #   a  b  d  x  y  c
>> #1  7 99 12 NA NA NA
>> #2 NA 34 88 12 44 56
>> A.K.
>>
>>
>>
>>
>> ----- Original Message -----
>> From: Anika Masters <[hidden email]>
>> To: [hidden email]
>> Cc:
>> Sent: Tuesday, February 26, 2013 1:55 PM
>> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?
>>
>> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
>> (mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
>> & df2, and all the columns in both df1 & df2. The solution should
>> "work" even if the 2 dataframes are identical, and even if the 2
>> dataframes do not have the same column names.  The rbind.fill function
>> seems to work.  For learning purposes, are there other "good" ways to
>> solve this problem, using merge or other functions other than
>> rbind.fill?
>>
>> #e.g. These 3 examples all seem to "work" correctly and as I hoped:
>>
>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
>> dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>> mydf
>>
>> #e.g. this works:
>> library(reshape)
>> mydf <- rbind.fill(df1, df2)
>> mydf
>>
>> #This works:
>> library(reshape)
>> mydf <- rbind.fill(df1, df2)
>> mydf
>>
>> #But this does not (the 2 dataframes are identical)
>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>> df2 <- df1
>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>> mydf
>>
>> #Any way to get "mere" to work for this final example? Any other good
>> solutions?
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging or joining 2 dataframes: merge, rbind.fill, etc.?

Jeff Newmiller
merge and rbind have very different memory usage profiles. There are some optimizations you can take advantage of if you store all of your small data frames in a list first, and then feed it through sapply (base) or ldply (plyr) to form the large data frame all at once, which can avoid the memory fragmentation associated with incrementally appending the data.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

Anika Masters <[hidden email]> wrote:

>Thanks Arun and David.  Another issue I am running into are memory
>issues when one of the data frames I'm trying to rbind to or merge
>with are "very large".  (This is a repetitive  problem, as I am trying
>to merge/rbind thousands of small dataframes into a single "very
>large" dataframe.)
>
>
>
>I'm thinking of creating a function that creates an empty dataframe to
>which I can add data, but will need to first determine and ensure that
>each dataframe has the exact same columns, in the exact same
>"location".
>
>
>
>Before I write any new code, is there any pre-existing functions or
>code that might solve this problem of "merging small or medium sized
>dataframes with a "very large" dataframe.)
>
>On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson <[hidden email]>
>wrote:
>> Clumsy but it doesn't require any packages:
>>
>> merge2 <- function(x, y) {
>> if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){
>>     rbind(x, y)
>>     } else merge(x, y, all=TRUE)
>> }
>> merge2(df1, df2)
>> df3 <- df1
>> merge2(df1, df3)
>>
>> ----------------------------------------------
>> David L Carlson
>> Associate Professor of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: [hidden email] [mailto:r-help-bounces@r-
>>> project.org] On Behalf Of arun
>>> Sent: Tuesday, February 26, 2013 1:14 PM
>>> To: Anika Masters
>>> Cc: R help
>>> Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
>>> etc.?
>>>
>>> Hi,
>>>
>>> You could also try:
>>> library(gtools)
>>> smartbind(df2,df1)
>>> #  a  b  d
>>> #1 7 99 12
>>> #2 7 99 12
>>>
>>>
>>> When df1!=df2
>>> smartbind(df1,df2)
>>> #   a  b  d  x  y  c
>>> #1  7 99 12 NA NA NA
>>> #2 NA 34 88 12 44 56
>>> A.K.
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: Anika Masters <[hidden email]>
>>> To: [hidden email]
>>> Cc:
>>> Sent: Tuesday, February 26, 2013 1:55 PM
>>> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill,
>etc.?
>>>
>>> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
>>> (mydf).  I want the 3rd dataframe to contain 1 row for each row in
>df1
>>> & df2, and all the columns in both df1 & df2. The solution should
>>> "work" even if the 2 dataframes are identical, and even if the 2
>>> dataframes do not have the same column names.  The rbind.fill
>function
>>> seems to work.  For learning purposes, are there other "good" ways
>to
>>> solve this problem, using merge or other functions other than
>>> rbind.fill?
>>>
>>> #e.g. These 3 examples all seem to "work" correctly and as I hoped:
>>>
>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
>>> dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>> mydf
>>>
>>> #e.g. this works:
>>> library(reshape)
>>> mydf <- rbind.fill(df1, df2)
>>> mydf
>>>
>>> #This works:
>>> library(reshape)
>>> mydf <- rbind.fill(df1, df2)
>>> mydf
>>>
>>> #But this does not (the 2 dataframes are identical)
>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>> df2 <- df1
>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>> mydf
>>>
>>> #Any way to get "mere" to work for this final example? Any other
>good
>>> solutions?
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging or joining 2 dataframes: merge, rbind.fill, etc.?

David Kulp-2
In reply to this post by Anika Masters
On Feb 26, 2013, at 9:33 PM, Anika Masters <[hidden email]> wrote:

> Thanks Arun and David.  Another issue I am running into are memory
> issues when one of the data frames I'm trying to rbind to or merge
> with are "very large".  (This is a repetitive  problem, as I am trying
> to merge/rbind thousands of small dataframes into a single "very
> large" dataframe.)
>
>
>
> I'm thinking of creating a function that creates an empty dataframe to
> which I can add data, but will need to first determine and ensure that
> each dataframe has the exact same columns, in the exact same
> "location".
>
>
>
> Before I write any new code, is there any pre-existing functions or
> code that might solve this problem of "merging small or medium sized
> dataframes with a "very large" dataframe.)

Consider plyr. Memory issues can be a problem, but it's a piece of
cake to write a one liner that iterates over a list of data frames and
returns them all rbind'd together.  Or just: do.call(rbind,
list.of.data.frames).

If memory is a serious problem then I think it's best to write your
own code that appends each row by index - which avoids copying entire
data frames in memory.

>
> On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson <[hidden email]> wrote:
>> Clumsy but it doesn't require any packages:
>>
>> merge2 <- function(x, y) {
>> if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){
>>    rbind(x, y)
>>    } else merge(x, y, all=TRUE)
>> }
>> merge2(df1, df2)
>> df3 <- df1
>> merge2(df1, df3)
>>
>> ----------------------------------------------
>> David L Carlson
>> Associate Professor of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: [hidden email] [mailto:r-help-bounces@r-
>>> project.org] On Behalf Of arun
>>> Sent: Tuesday, February 26, 2013 1:14 PM
>>> To: Anika Masters
>>> Cc: R help
>>> Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
>>> etc.?
>>>
>>> Hi,
>>>
>>> You could also try:
>>> library(gtools)
>>> smartbind(df2,df1)
>>> #  a  b  d
>>> #1 7 99 12
>>> #2 7 99 12
>>>
>>>
>>> When df1!=df2
>>> smartbind(df1,df2)
>>> #   a  b  d  x  y  c
>>> #1  7 99 12 NA NA NA
>>> #2 NA 34 88 12 44 56
>>> A.K.
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: Anika Masters <[hidden email]>
>>> To: [hidden email]
>>> Cc:
>>> Sent: Tuesday, February 26, 2013 1:55 PM
>>> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?
>>>
>>> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
>>> (mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
>>> & df2, and all the columns in both df1 & df2. The solution should
>>> "work" even if the 2 dataframes are identical, and even if the 2
>>> dataframes do not have the same column names.  The rbind.fill function
>>> seems to work.  For learning purposes, are there other "good" ways to
>>> solve this problem, using merge or other functions other than
>>> rbind.fill?
>>>
>>> #e.g. These 3 examples all seem to "work" correctly and as I hoped:
>>>
>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
>>> dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>> mydf
>>>
>>> #e.g. this works:
>>> library(reshape)
>>> mydf <- rbind.fill(df1, df2)
>>> mydf
>>>
>>> #This works:
>>> library(reshape)
>>> mydf <- rbind.fill(df1, df2)
>>> mydf
>>>
>>> #But this does not (the 2 dataframes are identical)
>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>> df2 <- df1
>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>> mydf
>>>
>>> #Any way to get "mere" to work for this final example? Any other good
>>> solutions?
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging or joining 2 dataframes: merge, rbind.fill, etc.?

djmuseR
Hi:

The other day I ran 100K simulations, each of which returned a 20 x 4
data frame. I stored these in a list object. When attempting to rbind
them into a single large data frame, my first thought was to try plyr:

library(plyr)
bigD <- ldply(L, rbind)   # where L is the list object

I quit at around a half hour. Ditto for do.call(rbind, L). [Sorry, I
didn't time it - these are approximate times.] I then checked to see
if the data.table package could do this, and lo and behold, I
discovered the rbindlist() function. When applied to my list object,
it ran correctly in under a second. Here's the actual example with
some names changed to mask the application:

g <- gs[1:100000]   # gs is a list of lists
> length(g)
[1] 100000
> class(g)
[1] "list"
> dim(g[[1]])
[1] 20  4
> dim(g[[100000]])
[1] 20  4
> library(data.table)
> system.time(bigD <- rbindlist(g))
   user  system elapsed
   0.45    0.02    0.47
> dim(bigD)
[1] 2000000       4
> class(bigD)
[1] "data.table" "data.frame"

Dennis

On Tue, Feb 26, 2013 at 7:05 PM, David Kulp <[hidden email]> wrote:

> On Feb 26, 2013, at 9:33 PM, Anika Masters <[hidden email]> wrote:
>
>> Thanks Arun and David.  Another issue I am running into are memory
>> issues when one of the data frames I'm trying to rbind to or merge
>> with are "very large".  (This is a repetitive  problem, as I am trying
>> to merge/rbind thousands of small dataframes into a single "very
>> large" dataframe.)
>>
>>
>>
>> I'm thinking of creating a function that creates an empty dataframe to
>> which I can add data, but will need to first determine and ensure that
>> each dataframe has the exact same columns, in the exact same
>> "location".
>>
>>
>>
>> Before I write any new code, is there any pre-existing functions or
>> code that might solve this problem of "merging small or medium sized
>> dataframes with a "very large" dataframe.)
>
> Consider plyr. Memory issues can be a problem, but it's a piece of
> cake to write a one liner that iterates over a list of data frames and
> returns them all rbind'd together.  Or just: do.call(rbind,
> list.of.data.frames).
>
> If memory is a serious problem then I think it's best to write your
> own code that appends each row by index - which avoids copying entire
> data frames in memory.
>
>>
>> On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson <[hidden email]> wrote:
>>> Clumsy but it doesn't require any packages:
>>>
>>> merge2 <- function(x, y) {
>>> if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){
>>>    rbind(x, y)
>>>    } else merge(x, y, all=TRUE)
>>> }
>>> merge2(df1, df2)
>>> df3 <- df1
>>> merge2(df1, df3)
>>>
>>> ----------------------------------------------
>>> David L Carlson
>>> Associate Professor of Anthropology
>>> Texas A&M University
>>> College Station, TX 77843-4352
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: [hidden email] [mailto:r-help-bounces@r-
>>>> project.org] On Behalf Of arun
>>>> Sent: Tuesday, February 26, 2013 1:14 PM
>>>> To: Anika Masters
>>>> Cc: R help
>>>> Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
>>>> etc.?
>>>>
>>>> Hi,
>>>>
>>>> You could also try:
>>>> library(gtools)
>>>> smartbind(df2,df1)
>>>> #  a  b  d
>>>> #1 7 99 12
>>>> #2 7 99 12
>>>>
>>>>
>>>> When df1!=df2
>>>> smartbind(df1,df2)
>>>> #   a  b  d  x  y  c
>>>> #1  7 99 12 NA NA NA
>>>> #2 NA 34 88 12 44 56
>>>> A.K.
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: Anika Masters <[hidden email]>
>>>> To: [hidden email]
>>>> Cc:
>>>> Sent: Tuesday, February 26, 2013 1:55 PM
>>>> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?
>>>>
>>>> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
>>>> (mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
>>>> & df2, and all the columns in both df1 & df2. The solution should
>>>> "work" even if the 2 dataframes are identical, and even if the 2
>>>> dataframes do not have the same column names.  The rbind.fill function
>>>> seems to work.  For learning purposes, are there other "good" ways to
>>>> solve this problem, using merge or other functions other than
>>>> rbind.fill?
>>>>
>>>> #e.g. These 3 examples all seem to "work" correctly and as I hoped:
>>>>
>>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>>> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
>>>> dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
>>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>>> mydf
>>>>
>>>> #e.g. this works:
>>>> library(reshape)
>>>> mydf <- rbind.fill(df1, df2)
>>>> mydf
>>>>
>>>> #This works:
>>>> library(reshape)
>>>> mydf <- rbind.fill(df1, df2)
>>>> mydf
>>>>
>>>> #But this does not (the 2 dataframes are identical)
>>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>>> df2 <- df1
>>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>>> mydf
>>>>
>>>> #Any way to get "mere" to work for this final example? Any other good
>>>> solutions?
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>> guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>> guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.