|
Dear R-ers,
I feel I am close, but can't get it quite right. Thanks a lot for your help! Dimitri # I have 2 data frames: x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) (x);(y) # I'd like to merge them so that the result looks like this: desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) (desired) # In other words, I want column e1 and e2 entries from data frame y to be repeated based on matching of column a from x and columns a2 and then a3 from y. # I am trying step-by-step - first I am using column a2 from data frame y for merging: out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) (out1) # looking good - half of the job is done # Step2 - does not work # next line produces columns e1 and e2 twice (in real life I have tons of columns like e1 and e2): merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) # next line also doesn't do the job: merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) # Finally, I tried this approach: out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) (out1); (out2) # Now I need to merge these 2 - however, the next line doubles the number of entries: merge(out1,out2,by=names(out1),all.x=T,all.y=T) -- Dimitri Liakhovitski marketfusionanalytics.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Dimitri,
Try creating a key for "x" and "y" and then merging the result by that variable: x$key <- with(x, paste(a, b, sep = "/")) y$key <- with(y, paste(a2, b, sep = "/")) merge(x, y, by = 'key')[, c(2:4, 8:9)] HTH, Jorge.- On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote: > Dear R-ers, > > I feel I am close, but can't get it quite right. > Thanks a lot for your help! > > Dimitri > > # I have 2 data frames: > > > x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) > > y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) > (x);(y) > > # I'd like to merge them so that the result looks like this: > > > desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), > > e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) > (desired) > > # In other words, I want column e1 and e2 entries from data frame y to > be repeated based on matching of column a from x and columns a2 and > then a3 from y. > > # I am trying step-by-step - first I am using column a2 from data > frame y for merging: > out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) > (out1) # looking good - half of the job is done > > # Step2 - does not work > > # next line produces columns e1 and e2 twice (in real life I have tons > of columns like e1 and e2): > merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) > > # next line also doesn't do the job: > > merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) > > # Finally, I tried this approach: > out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) > out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) > (out1); (out2) > > # Now I need to merge these 2 - however, the next line doubles the > number of entries: > merge(out1,out2,by=names(out1),all.x=T,all.y=T) > > -- > Dimitri Liakhovitski > marketfusionanalytics.com > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Jorge, thank you!
that seems to be working, but unfortunately in real life I have thousands of variables (except for a, a2, a3 and b) so that manually selecting columns (as in c(2:4, 8:9)) would be too difficult... Dimitri On Wed, Jul 11, 2012 at 6:36 PM, Jorge I Velez <[hidden email]> wrote: > Hi Dimitri, > > Try creating a key for "x" and "y" and then merging the result by that > variable: > > x$key <- with(x, paste(a, b, sep = "/")) > y$key <- with(y, paste(a2, b, sep = "/")) > merge(x, y, by = 'key')[, c(2:4, 8:9)] > > HTH, > Jorge.- > > > On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote: >> >> Dear R-ers, >> >> I feel I am close, but can't get it quite right. >> Thanks a lot for your help! >> >> Dimitri >> >> # I have 2 data frames: >> >> >> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) >> >> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) >> (x);(y) >> >> # I'd like to merge them so that the result looks like this: >> >> >> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), >> >> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) >> (desired) >> >> # In other words, I want column e1 and e2 entries from data frame y to >> be repeated based on matching of column a from x and columns a2 and >> then a3 from y. >> >> # I am trying step-by-step - first I am using column a2 from data >> frame y for merging: >> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) >> (out1) # looking good - half of the job is done >> >> # Step2 - does not work >> >> # next line produces columns e1 and e2 twice (in real life I have tons >> of columns like e1 and e2): >> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) >> >> # next line also doesn't do the job: >> >> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) >> >> # Finally, I tried this approach: >> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) >> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) >> (out1); (out2) >> >> # Now I need to merge these 2 - however, the next line doubles the >> number of entries: >> merge(out1,out2,by=names(out1),all.x=T,all.y=T) >> >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- Dimitri Liakhovitski marketfusionanalytics.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
This should do the trick :
colnames(y)[1:2]=c("a","a") y2=rbind(y[,-1], y[,-2]) #duplicating the "y" matrix so the identifiers are only in 1 column merged = merge(x,y2) merged a b d e1 e2 1 aa 1 10 100 101 2 aa 2 20 200 201 3 ab 1 30 100 101 4 ab 2 40 200 201 5 ba 1 50 300 301 6 ba 2 60 400 401 7 bb 1 70 300 301 8 bb 2 80 400 401 desired a b d e1 e2 1 aa 1 10 100 101 2 aa 2 20 200 201 3 ab 1 30 100 101 4 ab 2 40 200 201 5 ba 1 50 300 301 6 ba 2 60 400 401 7 bb 1 70 300 301 8 bb 2 80 400 401 all(merged==desired) [1] TRUE Cheers, Eloi On 12-07-11 03:50 PM, Dimitri Liakhovitski wrote: > Jorge, thank you! > that seems to be working, but unfortunately in real life I have > thousands of variables (except for a, a2, a3 and b) so that manually > selecting columns (as in c(2:4, 8:9)) would be too difficult... > Dimitri > > On Wed, Jul 11, 2012 at 6:36 PM, Jorge I Velez <[hidden email]> wrote: >> Hi Dimitri, >> >> Try creating a key for "x" and "y" and then merging the result by that >> variable: >> >> x$key <- with(x, paste(a, b, sep = "/")) >> y$key <- with(y, paste(a2, b, sep = "/")) >> merge(x, y, by = 'key')[, c(2:4, 8:9)] >> >> HTH, >> Jorge.- >> >> >> On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote: >>> Dear R-ers, >>> >>> I feel I am close, but can't get it quite right. >>> Thanks a lot for your help! >>> >>> Dimitri >>> >>> # I have 2 data frames: >>> >>> >>> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) >>> >>> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) >>> (x);(y) >>> >>> # I'd like to merge them so that the result looks like this: >>> >>> >>> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), >>> >>> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) >>> (desired) >>> >>> # In other words, I want column e1 and e2 entries from data frame y to >>> be repeated based on matching of column a from x and columns a2 and >>> then a3 from y. >>> >>> # I am trying step-by-step - first I am using column a2 from data >>> frame y for merging: >>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) >>> (out1) # looking good - half of the job is done >>> >>> # Step2 - does not work >>> >>> # next line produces columns e1 and e2 twice (in real life I have tons >>> of columns like e1 and e2): >>> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) >>> >>> # next line also doesn't do the job: >>> >>> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) >>> >>> # Finally, I tried this approach: >>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) >>> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) >>> (out1); (out2) >>> >>> # Now I need to merge these 2 - however, the next line doubles the >>> number of entries: >>> merge(out1,out2,by=names(out1),all.x=T,all.y=T) >>> >>> -- >>> Dimitri Liakhovitski >>> marketfusionanalytics.com >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > > -- Eloi Mercier Bioinformatics PhD Student, UBC Paul Pavlidis Lab 2185 East Mall University of British Columbia Vancouver BC V6T1Z4 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Dimitri Liakhovitski-2
Hello,
About many columns like 'e1' and 'e2' I don't know but with the provided example the following does NOT depend on them, only on 'a', 'b' and 'a2' and 'a3'. z <- lapply(c("a2", "a3"), function(cc) merge(x, y, by.x=c("a", "b"), by.y=c(cc, "b"))) z <- lapply(seq_along(z), function(i) z[[i]][ -which(names(z[[i]]) %in% c("a2", "a3")) ]) z <- do.call(rbind, z) z <- z[order(z$a, z$b), ] rownames(z) <- seq_len(nrow(z)) all.equal(desired, z) Hope this helps, Rui Barradas Em 11-07-2012 23:28, Dimitri Liakhovitski escreveu: > Dear R-ers, > > I feel I am close, but can't get it quite right. > Thanks a lot for your help! > > Dimitri > > # I have 2 data frames: > > x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) > y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) > (x);(y) > > # I'd like to merge them so that the result looks like this: > > desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), > e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) > (desired) > > # In other words, I want column e1 and e2 entries from data frame y to > be repeated based on matching of column a from x and columns a2 and > then a3 from y. > > # I am trying step-by-step - first I am using column a2 from data > frame y for merging: > out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) > (out1) # looking good - half of the job is done > > # Step2 - does not work > > # next line produces columns e1 and e2 twice (in real life I have tons > of columns like e1 and e2): > merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) > > # next line also doesn't do the job: > merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) > > # Finally, I tried this approach: > out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) > out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) > (out1); (out2) > > # Now I need to merge these 2 - however, the next line doubles the > number of entries: > merge(out1,out2,by=names(out1),all.x=T,all.y=T) > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Mercier Eloi
Thanks a lot, everyone for your helpful suggestions!
Eloi, this is very elegant - thank you! I did not know 2 columns are allowed to have the same names! Always good to learn something new. Thanks again! Dimitri On Wed, Jul 11, 2012 at 6:58 PM, Mercier Eloi <[hidden email]> wrote: > This should do the trick : > > colnames(y)[1:2]=c("a","a") > y2=rbind(y[,-1], y[,-2]) #duplicating the "y" matrix so the identifiers are > only in 1 column > merged = merge(x,y2) > > merged > a b d e1 e2 > 1 aa 1 10 100 101 > 2 aa 2 20 200 201 > 3 ab 1 30 100 101 > 4 ab 2 40 200 201 > 5 ba 1 50 300 301 > 6 ba 2 60 400 401 > 7 bb 1 70 300 301 > 8 bb 2 80 400 401 > > desired > a b d e1 e2 > 1 aa 1 10 100 101 > 2 aa 2 20 200 201 > 3 ab 1 30 100 101 > 4 ab 2 40 200 201 > 5 ba 1 50 300 301 > 6 ba 2 60 400 401 > 7 bb 1 70 300 301 > 8 bb 2 80 400 401 > > all(merged==desired) > [1] TRUE > > Cheers, > > Eloi > > > On 12-07-11 03:50 PM, Dimitri Liakhovitski wrote: >> >> Jorge, thank you! >> that seems to be working, but unfortunately in real life I have >> thousands of variables (except for a, a2, a3 and b) so that manually >> selecting columns (as in c(2:4, 8:9)) would be too difficult... >> Dimitri >> >> On Wed, Jul 11, 2012 at 6:36 PM, Jorge I Velez <[hidden email]> >> wrote: >>> >>> Hi Dimitri, >>> >>> Try creating a key for "x" and "y" and then merging the result by that >>> variable: >>> >>> x$key <- with(x, paste(a, b, sep = "/")) >>> y$key <- with(y, paste(a2, b, sep = "/")) >>> merge(x, y, by = 'key')[, c(2:4, 8:9)] >>> >>> HTH, >>> Jorge.- >>> >>> >>> On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote: >>>> >>>> Dear R-ers, >>>> >>>> I feel I am close, but can't get it quite right. >>>> Thanks a lot for your help! >>>> >>>> Dimitri >>>> >>>> # I have 2 data frames: >>>> >>>> >>>> >>>> x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) >>>> >>>> >>>> y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) >>>> (x);(y) >>>> >>>> # I'd like to merge them so that the result looks like this: >>>> >>>> >>>> >>>> desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), >>>> >>>> >>>> e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) >>>> (desired) >>>> >>>> # In other words, I want column e1 and e2 entries from data frame y to >>>> be repeated based on matching of column a from x and columns a2 and >>>> then a3 from y. >>>> >>>> # I am trying step-by-step - first I am using column a2 from data >>>> frame y for merging: >>>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) >>>> (out1) # looking good - half of the job is done >>>> >>>> # Step2 - does not work >>>> >>>> # next line produces columns e1 and e2 twice (in real life I have tons >>>> of columns like e1 and e2): >>>> merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) >>>> >>>> # next line also doesn't do the job: >>>> >>>> >>>> merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) >>>> >>>> # Finally, I tried this approach: >>>> out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) >>>> out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) >>>> (out1); (out2) >>>> >>>> # Now I need to merge these 2 - however, the next line doubles the >>>> number of entries: >>>> merge(out1,out2,by=names(out1),all.x=T,all.y=T) >>>> >>>> -- >>>> Dimitri Liakhovitski >>>> marketfusionanalytics.com >>>> >>>> ______________________________________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> > > > -- > Eloi Mercier > Bioinformatics PhD Student, UBC > Paul Pavlidis Lab > 2185 East Mall > University of British Columbia > Vancouver BC V6T1Z4 > -- Dimitri Liakhovitski marketfusionanalytics.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
