Hi everyone!
I'm matching two samples to create one sample that have pairs of observations equal for the k1 variable. Merge() doesn't work because I dont't want to recycle the values. x <- data.frame(k1=c(1,1,2,3,3,5), k2=c(20,21,22,23,24,25)) x y <- data.frame(k1=c(1,1,2,2,3,4,5,5), k2=c(10,11,12,13,14,15,16,17)) y merge(x,y,by="k1") k1 k2.x k2.y 1 1 20 10 2 1 20 11 3 1 21 10 4 1 21 11 5 2 22 12 6 2 22 13 7 3 23 14 8 3 24 14 9 5 25 16 10 5 25 17 I have a final dataframe with 10 rows, but I want it with 5 rows, like this: k1 k2.x k2.y 1 1 20 10 2 1 21 11 3 2 22 12 4 3 23 14 5 5 25 16 Thanks for any help. Cecília Carmo (Universidade de Aveiro) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi Cecília,
Assuming that you want to pair two samples as more as possible, let's relabel the k1 indices. The x$k1 c(1,1,2,3,3,5) could be relabeled as c(1.01,1.02,2.01,3.01,3.02,5.01), so as y$k1. x$k1 <- x$k1+sequence(rle(x$k1)$lengths)/100 y$k1 <- y$k1+sequence(rle(y$k1)$lengths)/100 merge(x,y,by="k1") Regards, Wu |
In reply to this post by Cecilia Carmo
On Aug 20, 2010, at 6:44 AM, Cecilia Carmo wrote: > Hi everyone! > > I'm matching two samples to create one sample that have > pairs of observations equal for the k1 variable. Merge() doesn't > work because I dont't want to recycle the values. When there is more than one possible match in either y or x to a possible match on k1 in the othr set of values, is there some rule that lets you determine which one should be chosen. Your offered solution suggests that you think the order in the original data.frams is a proper rule, but why should we believe that rule is anything other than convenience? -- David. > > x <- data.frame(k1=c(1,1,2,3,3,5), k2=c(20,21,22,23,24,25)) > x > y <- data.frame(k1=c(1,1,2,2,3,4,5,5), k2=c(10,11,12,13,14,15,16,17)) > y > merge(x,y,by="k1") > k1 k2.x k2.y > 1 1 20 10 > 2 1 20 11 > 3 1 21 10 > 4 1 21 11 > 5 2 22 12 > 6 2 22 13 > 7 3 23 14 > 8 3 24 14 > 9 5 25 16 > 10 5 25 17 > > I have a final dataframe with 10 rows, but I want it with 5 rows, > like this: > k1 k2.x k2.y > 1 1 20 10 > 2 1 21 11 > 3 2 22 12 > 4 3 23 14 > 5 5 25 16 > > Thanks for any help. > > Cecília Carmo > (Universidade de Aveiro) David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Cecilia Carmo
On Fri, Aug 20, 2010 at 6:44 AM, Cecilia Carmo <[hidden email]> wrote:
> Hi everyone! > > I'm matching two samples to create one sample that have > pairs of observations equal for the k1 variable. Merge() doesn't work > because I dont't want to recycle the values. > > x <- data.frame(k1=c(1,1,2,3,3,5), k2=c(20,21,22,23,24,25)) > x > y <- data.frame(k1=c(1,1,2,2,3,4,5,5), k2=c(10,11,12,13,14,15,16,17)) > y > merge(x,y,by="k1") > k1 k2.x k2.y > 1 1 20 10 > 2 1 20 11 > 3 1 21 10 > 4 1 21 11 > 5 2 22 12 > 6 2 22 13 > 7 3 23 14 > 8 3 24 14 > 9 5 25 16 > 10 5 25 17 > > I have a final dataframe with 10 rows, but I want it with 5 rows, like this: > k1 k2.x k2.y > 1 1 20 10 > 2 1 21 11 > 3 2 22 12 > 4 3 23 14 > 5 5 25 16 > Try this: x$k3 <- with(x, ave(k1, k1, FUN = seq_along)) y$k3 <- with(y, ave(k1, k1, FUN = seq_along)) merge(x, y, by = c("k1", "k3")) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Cecilia Carmo
hello cecilia,
i tried: yn<-y[y$k1%in%x$k1,] xn<-x[x$k1%in%y$k1,] x1st<-unique(match(yn$k1,xn$k1)) y1st<-unique(match(xn$k1,yn$k1)) new<-cbind(k1=intersect(y$k1,x$k1),k2.x=xn[x1st,2],k2.y=yn[y1st,2]) giving: k1 k2.x k2.y [1,] 1 20 10 [2,] 2 22 12 [3,] 3 23 14 [4,] 5 25 16 ...but wih the irregulary duplicated values in k1 i dead-ended and i guess it could get tricky to solve this. greetings, kay |
..gabor gave the solution during i was typing -
so please disregard this. yours, kay |
In reply to this post by Gabor Grothendieck
This is what I need, but my dataframe has many rows and it
returns to me the following message: Error: cannot allocate vector of size 120 Kb In addition: There were 18 warnings (use warnings() to see them) Could you help me, Thanks Cecília Em Fri, 20 Aug 2010 08:30:16 -0400 Gabor Grothendieck <[hidden email]> escreveu: > On Fri, Aug 20, 2010 at 6:44 AM, Cecilia Carmo ><[hidden email]> wrote: >> Hi everyone! >> >> I'm matching two samples to create one sample that have >> pairs of observations equal for the k1 variable. Merge() >>doesn't work >> because I dont't want to recycle the values. >> >> x <- data.frame(k1=c(1,1,2,3,3,5), >>k2=c(20,21,22,23,24,25)) >> x >> y <- data.frame(k1=c(1,1,2,2,3,4,5,5), >>k2=c(10,11,12,13,14,15,16,17)) >> y >> merge(x,y,by="k1") >> k1 k2.x k2.y >> 1 1 20 10 >> 2 1 20 11 >> 3 1 21 10 >> 4 1 21 11 >> 5 2 22 12 >> 6 2 22 13 >> 7 3 23 14 >> 8 3 24 14 >> 9 5 25 16 >> 10 5 25 17 >> >> I have a final dataframe with 10 rows, but I want it >>with 5 rows, like this: >> k1 k2.x k2.y >> 1 1 20 10 >> 2 1 21 11 >> 3 2 22 12 >> 4 3 23 14 >> 5 5 25 16 >> > > Try this: > > x$k3 <- with(x, ave(k1, k1, FUN = seq_along)) > y$k3 <- with(y, ave(k1, k1, FUN = seq_along)) > > merge(x, y, by = c("k1", "k3")) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by David Winsemius
The rule is not important to me. I'm selecting a sample
that must have one important feature: the same number of obs from x and from y with the same k1. Thanks Cecília Em Fri, 20 Aug 2010 08:28:24 -0400 David Winsemius <[hidden email]> escreveu: > > On Aug 20, 2010, at 6:44 AM, Cecilia Carmo wrote: > >> Hi everyone! >> >> I'm matching two samples to create one sample that have >> pairs of observations equal for the k1 variable. Merge() >>doesn't >> work because I dont't want to recycle the values. > > When there is more than one possible match in either y >or x to a possible match on k1 in the othr set of >values, is there some rule that lets you determine which >one should be chosen. Your offered solution suggests >that you think the order in the original data.frams is a >proper rule, but why should we believe that rule is >anything other than convenience? > > -- > David. >> >> x <- data.frame(k1=c(1,1,2,3,3,5), >>k2=c(20,21,22,23,24,25)) >> x >> y <- data.frame(k1=c(1,1,2,2,3,4,5,5), >>k2=c(10,11,12,13,14,15,16,17)) >> y >> merge(x,y,by="k1") >> k1 k2.x k2.y >> 1 1 20 10 >> 2 1 20 11 >> 3 1 21 10 >> 4 1 21 11 >> 5 2 22 12 >> 6 2 22 13 >> 7 3 23 14 >> 8 3 24 14 >> 9 5 25 16 >> 10 5 25 17 >> >> I have a final dataframe with 10 rows, but I want it >>with 5 rows, >> like this: >> k1 k2.x k2.y >> 1 1 20 10 >> 2 1 21 11 >> 3 2 22 12 >> 4 3 23 14 >> 5 5 25 16 >> >> Thanks for any help. >> >> Cecília Carmo >> (Universidade de Aveiro) > > David Winsemius, MD > West Hartford, CT > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Cecilia Carmo
On Fri, Aug 20, 2010 at 9:07 AM, Cecilia Carmo <[hidden email]> wrote:
> This is what I need, but my dataframe has many rows and it returns to me the > following message: > > Error: cannot allocate vector of size 120 Kb > In addition: There were 18 warnings (use warnings() to see them) > Assuming that its the merge command that generated the error try this sqldf instead of the merge line: library(sqldf) sqldf("select * from x, y using(k1, k3)", dbname = tempfile()) You can also try it without dbname = tempfile() but that has a higher chance of overflowing memory. See http://sqldf.googlecode.com for more info on sqldf. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
It wasn't the merge command. It doesn't create the
variable x3. Cecília Em Fri, 20 Aug 2010 09:22:29 -0400 Gabor Grothendieck <[hidden email]> escreveu: > On Fri, Aug 20, 2010 at 9:07 AM, Cecilia Carmo ><[hidden email]> wrote: >> This is what I need, but my dataframe has many rows and >>it returns to me the >> following message: >> >> Error: cannot allocate vector of size 120 Kb >> In addition: There were 18 warnings (use warnings() to >>see them) >> > > Assuming that its the merge command that generated the >error try this > sqldf instead of the merge line: > > library(sqldf) > sqldf("select * from x, y using(k1, k3)", dbname = >tempfile()) > > You can also try it without dbname = tempfile() but that >has a higher > chance of overflowing memory. > See http://sqldf.googlecode.com for more info on sqldf. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo <[hidden email]> wrote:
> It wasn't the merge command. It doesn't create the variable x3. > > Cecília > > How about: x$k3 <- with(x, unlist(tapply(k1, k1, seq_along))) y$k3 <- with(y, unlist(tapply(k1, k1, seq_along))) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Fri, Aug 20, 2010 at 9:33 AM, Gabor Grothendieck
<[hidden email]> wrote: > On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo <[hidden email]> wrote: >> It wasn't the merge command. It doesn't create the variable x3. >> >> Cecília >> >> > > How about: > > x$k3 <- with(x, unlist(tapply(k1, k1, seq_along))) > y$k3 <- with(y, unlist(tapply(k1, k1, seq_along))) > And here is a second one in case that one overflows as well: x$k3 <- with(x, seq_along(k1) - match(k1, k1) + 1) y$k3 <- with(y, seq_along(k1) - match(k1, k1) + 1) Note that this one assumes that the data frame is sorted by k1 which in your example it is. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
The last one worked! The other one, not.
Thank you very much! Another question about merge(): sometimes I'm merging two dataframes and the merged dataframe has much more rows than the two merged? I think this should not happen,does it? Cecília Em Fri, 20 Aug 2010 09:37:26 -0400 Gabor Grothendieck <[hidden email]> escreveu: > On Fri, Aug 20, 2010 at 9:33 AM, Gabor Grothendieck > <[hidden email]> wrote: >> On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo >><[hidden email]> wrote: >>> It wasn't the merge command. It doesn't create the >>>variable x3. >>> >>> Cecília >>> >>> >> >> How about: >> >> x$k3 <- with(x, unlist(tapply(k1, k1, seq_along))) >> y$k3 <- with(y, unlist(tapply(k1, k1, seq_along))) >> > > And here is a second one in case that one overflows as >well: > > x$k3 <- with(x, seq_along(k1) - match(k1, k1) + 1) > y$k3 <- with(y, seq_along(k1) - match(k1, k1) + 1) > > Note that this one assumes that the data frame is sorted >by k1 which > in your example it is. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Fri, Aug 20, 2010 at 9:59 AM, Cecilia Carmo <[hidden email]> wrote:
> The last one worked! The other one, not. > Thank you very much! > > Another question about merge(): sometimes I'm merging two dataframes and the > merged dataframe has much more rows than the two merged? I think this should > not happen,does it? > That is normal behavior. If there are m rows of with a key in x and n rows of that key in y then there will be mn rows generated. Every such row in x will be matched to every such row in y. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |