Quantcast

help with merging 2 data frames

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

help with merging 2 data frames

Dimitri Liakhovitski-2
Dear R-ers,

I feel I am close, but can't get it quite right.
Thanks a lot for your help!

Dimitri

# I have 2 data frames:

x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
(x);(y)

# I'd like to merge them so that the result looks like this:

desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
        e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
(desired)

# In other words, I want column e1 and e2 entries from data frame y to
be repeated based on matching of column a from x and columns a2 and
then a3 from y.

# I am trying step-by-step - first I am using column a2 from data
frame y for merging:
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
(out1)   # looking good - half of the job is done

# Step2 - does not work

# next line produces columns e1 and e2 twice (in real life I have tons
of columns like e1 and e2):
merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)

# next line also doesn't do the job:
merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)

# Finally, I tried this approach:
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
(out1); (out2)

# Now I need to merge these 2 - however, the next line doubles the
number of entries:
merge(out1,out2,by=names(out1),all.x=T,all.y=T)

--
Dimitri Liakhovitski
marketfusionanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help with merging 2 data frames

Jorge I Velez
Hi Dimitri,

Try creating a key for "x" and "y" and then merging the result by that
variable:

x$key <- with(x, paste(a, b, sep = "/"))
y$key <- with(y, paste(a2, b, sep = "/"))
merge(x, y, by = 'key')[, c(2:4, 8:9)]

HTH,
Jorge.-


On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote:

> Dear R-ers,
>
> I feel I am close, but can't get it quite right.
> Thanks a lot for your help!
>
> Dimitri
>
> # I have 2 data frames:
>
>
> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
>
> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
> (x);(y)
>
> # I'd like to merge them so that the result looks like this:
>
>
> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
>
> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
> (desired)
>
> # In other words, I want column e1 and e2 entries from data frame y to
> be repeated based on matching of column a from x and columns a2 and
> then a3 from y.
>
> # I am trying step-by-step - first I am using column a2 from data
> frame y for merging:
> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
> (out1)   # looking good - half of the job is done
>
> # Step2 - does not work
>
> # next line produces columns e1 and e2 twice (in real life I have tons
> of columns like e1 and e2):
> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>
> # next line also doesn't do the job:
>
> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
>
> # Finally, I tried this approach:
> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
> (out1); (out2)
>
> # Now I need to merge these 2 - however, the next line doubles the
> number of entries:
> merge(out1,out2,by=names(out1),all.x=T,all.y=T)
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help with merging 2 data frames

Dimitri Liakhovitski-2
Jorge, thank you!
that seems to be working, but unfortunately in real life I have
thousands of variables (except for a, a2, a3 and b) so that manually
selecting columns (as in c(2:4, 8:9)) would be too difficult...
Dimitri

On Wed, Jul 11, 2012 at 6:36 PM, Jorge I Velez <[hidden email]> wrote:

> Hi Dimitri,
>
> Try creating a key for "x" and "y" and then merging the result by that
> variable:
>
> x$key <- with(x, paste(a, b, sep = "/"))
> y$key <- with(y, paste(a2, b, sep = "/"))
> merge(x, y, by = 'key')[, c(2:4, 8:9)]
>
> HTH,
> Jorge.-
>
>
> On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote:
>>
>> Dear R-ers,
>>
>> I feel I am close, but can't get it quite right.
>> Thanks a lot for your help!
>>
>> Dimitri
>>
>> # I have 2 data frames:
>>
>>
>> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
>>
>> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
>> (x);(y)
>>
>> # I'd like to merge them so that the result looks like this:
>>
>>
>> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
>>
>> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
>> (desired)
>>
>> # In other words, I want column e1 and e2 entries from data frame y to
>> be repeated based on matching of column a from x and columns a2 and
>> then a3 from y.
>>
>> # I am trying step-by-step - first I am using column a2 from data
>> frame y for merging:
>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>> (out1)   # looking good - half of the job is done
>>
>> # Step2 - does not work
>>
>> # next line produces columns e1 and e2 twice (in real life I have tons
>> of columns like e1 and e2):
>> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>>
>> # next line also doesn't do the job:
>>
>> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
>>
>> # Finally, I tried this approach:
>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>> (out1); (out2)
>>
>> # Now I need to merge these 2 - however, the next line doubles the
>> number of entries:
>> merge(out1,out2,by=names(out1),all.x=T,all.y=T)
>>
>> --
>> Dimitri Liakhovitski
>> marketfusionanalytics.com
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



--
Dimitri Liakhovitski
marketfusionanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help with merging 2 data frames

Mercier Eloi
This should do the trick :

colnames(y)[1:2]=c("a","a")
y2=rbind(y[,-1], y[,-2]) #duplicating the "y" matrix so the identifiers
are only in 1 column
merged = merge(x,y2)

merged
    a b  d  e1  e2
1 aa 1 10 100 101
2 aa 2 20 200 201
3 ab 1 30 100 101
4 ab 2 40 200 201
5 ba 1 50 300 301
6 ba 2 60 400 401
7 bb 1 70 300 301
8 bb 2 80 400 401

desired
    a b  d  e1  e2
1 aa 1 10 100 101
2 aa 2 20 200 201
3 ab 1 30 100 101
4 ab 2 40 200 201
5 ba 1 50 300 301
6 ba 2 60 400 401
7 bb 1 70 300 301
8 bb 2 80 400 401

  all(merged==desired)
[1] TRUE

Cheers,

Eloi

On 12-07-11 03:50 PM, Dimitri Liakhovitski wrote:

> Jorge, thank you!
> that seems to be working, but unfortunately in real life I have
> thousands of variables (except for a, a2, a3 and b) so that manually
> selecting columns (as in c(2:4, 8:9)) would be too difficult...
> Dimitri
>
> On Wed, Jul 11, 2012 at 6:36 PM, Jorge I Velez <[hidden email]> wrote:
>> Hi Dimitri,
>>
>> Try creating a key for "x" and "y" and then merging the result by that
>> variable:
>>
>> x$key <- with(x, paste(a, b, sep = "/"))
>> y$key <- with(y, paste(a2, b, sep = "/"))
>> merge(x, y, by = 'key')[, c(2:4, 8:9)]
>>
>> HTH,
>> Jorge.-
>>
>>
>> On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote:
>>> Dear R-ers,
>>>
>>> I feel I am close, but can't get it quite right.
>>> Thanks a lot for your help!
>>>
>>> Dimitri
>>>
>>> # I have 2 data frames:
>>>
>>>
>>> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
>>>
>>> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
>>> (x);(y)
>>>
>>> # I'd like to merge them so that the result looks like this:
>>>
>>>
>>> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
>>>
>>> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
>>> (desired)
>>>
>>> # In other words, I want column e1 and e2 entries from data frame y to
>>> be repeated based on matching of column a from x and columns a2 and
>>> then a3 from y.
>>>
>>> # I am trying step-by-step - first I am using column a2 from data
>>> frame y for merging:
>>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>>> (out1)   # looking good - half of the job is done
>>>
>>> # Step2 - does not work
>>>
>>> # next line produces columns e1 and e2 twice (in real life I have tons
>>> of columns like e1 and e2):
>>> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>>>
>>> # next line also doesn't do the job:
>>>
>>> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
>>>
>>> # Finally, I tried this approach:
>>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>>> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>>> (out1); (out2)
>>>
>>> # Now I need to merge these 2 - however, the next line doubles the
>>> number of entries:
>>> merge(out1,out2,by=names(out1),all.x=T,all.y=T)
>>>
>>> --
>>> Dimitri Liakhovitski
>>> marketfusionanalytics.com
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>


--
Eloi Mercier
Bioinformatics PhD Student, UBC
Paul Pavlidis Lab
2185 East Mall
University of British Columbia
Vancouver BC V6T1Z4

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help with merging 2 data frames

Rui Barradas
In reply to this post by Dimitri Liakhovitski-2
Hello,

About many columns like 'e1' and 'e2' I don't know but with the provided
example the following does NOT depend on them, only on 'a', 'b' and 'a2'
and 'a3'.


z <- lapply(c("a2", "a3"), function(cc) merge(x, y, by.x=c("a", "b"),
by.y=c(cc, "b")))
z <- lapply(seq_along(z), function(i)
         z[[i]][ -which(names(z[[i]]) %in% c("a2", "a3")) ])
z <- do.call(rbind, z)
z <- z[order(z$a, z$b), ]
rownames(z) <- seq_len(nrow(z))
all.equal(desired, z)


Hope this helps,

Rui Barradas

Em 11-07-2012 23:28, Dimitri Liakhovitski escreveu:

> Dear R-ers,
>
> I feel I am close, but can't get it quite right.
> Thanks a lot for your help!
>
> Dimitri
>
> # I have 2 data frames:
>
> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
> (x);(y)
>
> # I'd like to merge them so that the result looks like this:
>
> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
> (desired)
>
> # In other words, I want column e1 and e2 entries from data frame y to
> be repeated based on matching of column a from x and columns a2 and
> then a3 from y.
>
> # I am trying step-by-step - first I am using column a2 from data
> frame y for merging:
> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
> (out1)   # looking good - half of the job is done
>
> # Step2 - does not work
>
> # next line produces columns e1 and e2 twice (in real life I have tons
> of columns like e1 and e2):
> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>
> # next line also doesn't do the job:
> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
>
> # Finally, I tried this approach:
> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
> (out1); (out2)
>
> # Now I need to merge these 2 - however, the next line doubles the
> number of entries:
> merge(out1,out2,by=names(out1),all.x=T,all.y=T)
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help with merging 2 data frames

Dimitri Liakhovitski-2
In reply to this post by Mercier Eloi
Thanks a lot, everyone for your helpful suggestions!

Eloi, this is very elegant - thank you!
I did not know 2 columns are allowed to have the same names! Always
good to learn something new.
Thanks again!
Dimitri

On Wed, Jul 11, 2012 at 6:58 PM, Mercier Eloi <[hidden email]> wrote:

> This should do the trick :
>
> colnames(y)[1:2]=c("a","a")
> y2=rbind(y[,-1], y[,-2]) #duplicating the "y" matrix so the identifiers are
> only in 1 column
> merged = merge(x,y2)
>
> merged
>    a b  d  e1  e2
> 1 aa 1 10 100 101
> 2 aa 2 20 200 201
> 3 ab 1 30 100 101
> 4 ab 2 40 200 201
> 5 ba 1 50 300 301
> 6 ba 2 60 400 401
> 7 bb 1 70 300 301
> 8 bb 2 80 400 401
>
> desired
>    a b  d  e1  e2
> 1 aa 1 10 100 101
> 2 aa 2 20 200 201
> 3 ab 1 30 100 101
> 4 ab 2 40 200 201
> 5 ba 1 50 300 301
> 6 ba 2 60 400 401
> 7 bb 1 70 300 301
> 8 bb 2 80 400 401
>
>  all(merged==desired)
> [1] TRUE
>
> Cheers,
>
> Eloi
>
>
> On 12-07-11 03:50 PM, Dimitri Liakhovitski wrote:
>>
>> Jorge, thank you!
>> that seems to be working, but unfortunately in real life I have
>> thousands of variables (except for a, a2, a3 and b) so that manually
>> selecting columns (as in c(2:4, 8:9)) would be too difficult...
>> Dimitri
>>
>> On Wed, Jul 11, 2012 at 6:36 PM, Jorge I Velez <[hidden email]>
>> wrote:
>>>
>>> Hi Dimitri,
>>>
>>> Try creating a key for "x" and "y" and then merging the result by that
>>> variable:
>>>
>>> x$key <- with(x, paste(a, b, sep = "/"))
>>> y$key <- with(y, paste(a2, b, sep = "/"))
>>> merge(x, y, by = 'key')[, c(2:4, 8:9)]
>>>
>>> HTH,
>>> Jorge.-
>>>
>>>
>>> On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote:
>>>>
>>>> Dear R-ers,
>>>>
>>>> I feel I am close, but can't get it quite right.
>>>> Thanks a lot for your help!
>>>>
>>>> Dimitri
>>>>
>>>> # I have 2 data frames:
>>>>
>>>>
>>>>
>>>> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
>>>>
>>>>
>>>> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
>>>> (x);(y)
>>>>
>>>> # I'd like to merge them so that the result looks like this:
>>>>
>>>>
>>>>
>>>> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
>>>>
>>>>
>>>> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
>>>> (desired)
>>>>
>>>> # In other words, I want column e1 and e2 entries from data frame y to
>>>> be repeated based on matching of column a from x and columns a2 and
>>>> then a3 from y.
>>>>
>>>> # I am trying step-by-step - first I am using column a2 from data
>>>> frame y for merging:
>>>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>>>> (out1)   # looking good - half of the job is done
>>>>
>>>> # Step2 - does not work
>>>>
>>>> # next line produces columns e1 and e2 twice (in real life I have tons
>>>> of columns like e1 and e2):
>>>> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>>>>
>>>> # next line also doesn't do the job:
>>>>
>>>>
>>>> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
>>>>
>>>> # Finally, I tried this approach:
>>>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>>>> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>>>> (out1); (out2)
>>>>
>>>> # Now I need to merge these 2 - however, the next line doubles the
>>>> number of entries:
>>>> merge(out1,out2,by=names(out1),all.x=T,all.y=T)
>>>>
>>>> --
>>>> Dimitri Liakhovitski
>>>> marketfusionanalytics.com
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>
>
> --
> Eloi Mercier
> Bioinformatics PhD Student, UBC
> Paul Pavlidis Lab
> 2185 East Mall
> University of British Columbia
> Vancouver BC V6T1Z4
>



--
Dimitri Liakhovitski
marketfusionanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...