

Hi everyone!
I'm matching two samples to create one sample that have
pairs of observations equal for the k1 variable. Merge()
doesn't work because I dont't want to recycle the values.
x < data.frame(k1=c(1,1,2,3,3,5),
k2=c(20,21,22,23,24,25))
x
y < data.frame(k1=c(1,1,2,2,3,4,5,5),
k2=c(10,11,12,13,14,15,16,17))
y
merge(x,y,by="k1")
k1 k2.x k2.y
1 1 20 10
2 1 20 11
3 1 21 10
4 1 21 11
5 2 22 12
6 2 22 13
7 3 23 14
8 3 24 14
9 5 25 16
10 5 25 17
I have a final dataframe with 10 rows, but I want it with
5 rows, like this:
k1 k2.x k2.y
1 1 20 10
2 1 21 11
3 2 22 12
4 3 23 14
5 5 25 16
Thanks for any help.
Cecília Carmo
(Universidade de Aveiro)
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Cecília,
Assuming that you want to pair two samples as more as possible, let's relabel the k1 indices. The x$k1 c(1,1,2,3,3,5) could be relabeled as c(1.01,1.02,2.01,3.01,3.02,5.01), so as y$k1.
x$k1 < x$k1+sequence(rle(x$k1)$lengths)/100
y$k1 < y$k1+sequence(rle(y$k1)$lengths)/100
merge(x,y,by="k1")
Regards,
Wu


On Aug 20, 2010, at 6:44 AM, Cecilia Carmo wrote:
> Hi everyone!
>
> I'm matching two samples to create one sample that have
> pairs of observations equal for the k1 variable. Merge() doesn't
> work because I dont't want to recycle the values.
When there is more than one possible match in either y or x to a
possible match on k1 in the othr set of values, is there some rule
that lets you determine which one should be chosen. Your offered
solution suggests that you think the order in the original data.frams
is a proper rule, but why should we believe that rule is anything
other than convenience?

David.
>
> x < data.frame(k1=c(1,1,2,3,3,5), k2=c(20,21,22,23,24,25))
> x
> y < data.frame(k1=c(1,1,2,2,3,4,5,5), k2=c(10,11,12,13,14,15,16,17))
> y
> merge(x,y,by="k1")
> k1 k2.x k2.y
> 1 1 20 10
> 2 1 20 11
> 3 1 21 10
> 4 1 21 11
> 5 2 22 12
> 6 2 22 13
> 7 3 23 14
> 8 3 24 14
> 9 5 25 16
> 10 5 25 17
>
> I have a final dataframe with 10 rows, but I want it with 5 rows,
> like this:
> k1 k2.x k2.y
> 1 1 20 10
> 2 1 21 11
> 3 2 22 12
> 4 3 23 14
> 5 5 25 16
>
> Thanks for any help.
>
> Cecília Carmo
> (Universidade de Aveiro)
David Winsemius, MD
West Hartford, CT
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Fri, Aug 20, 2010 at 6:44 AM, Cecilia Carmo < [hidden email]> wrote:
> Hi everyone!
>
> I'm matching two samples to create one sample that have
> pairs of observations equal for the k1 variable. Merge() doesn't work
> because I dont't want to recycle the values.
>
> x < data.frame(k1=c(1,1,2,3,3,5), k2=c(20,21,22,23,24,25))
> x
> y < data.frame(k1=c(1,1,2,2,3,4,5,5), k2=c(10,11,12,13,14,15,16,17))
> y
> merge(x,y,by="k1")
> k1 k2.x k2.y
> 1 1 20 10
> 2 1 20 11
> 3 1 21 10
> 4 1 21 11
> 5 2 22 12
> 6 2 22 13
> 7 3 23 14
> 8 3 24 14
> 9 5 25 16
> 10 5 25 17
>
> I have a final dataframe with 10 rows, but I want it with 5 rows, like this:
> k1 k2.x k2.y
> 1 1 20 10
> 2 1 21 11
> 3 2 22 12
> 4 3 23 14
> 5 5 25 16
>
Try this:
x$k3 < with(x, ave(k1, k1, FUN = seq_along))
y$k3 < with(y, ave(k1, k1, FUN = seq_along))
merge(x, y, by = c("k1", "k3"))
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


hello cecilia,
i tried:
yn<y[y$k1%in%x$k1,]
xn<x[x$k1%in%y$k1,]
x1st<unique(match(yn$k1,xn$k1))
y1st<unique(match(xn$k1,yn$k1))
new<cbind(k1=intersect(y$k1,x$k1),k2.x=xn[x1st,2],k2.y=yn[y1st,2])
giving:
k1 k2.x k2.y
[1,] 1 20 10
[2,] 2 22 12
[3,] 3 23 14
[4,] 5 25 16
...but wih the irregulary duplicated values in k1 i deadended and i guess it could get tricky to solve this.
greetings,
kay


..gabor gave the solution during i was typing 
so please disregard this.
yours,
kay


This is what I need, but my dataframe has many rows and it
returns to me the following message:
Error: cannot allocate vector of size 120 Kb
In addition: There were 18 warnings (use warnings() to see
them)
Could you help me,
Thanks
Cecília
Em Fri, 20 Aug 2010 08:30:16 0400
Gabor Grothendieck < [hidden email]> escreveu:
> On Fri, Aug 20, 2010 at 6:44 AM, Cecilia Carmo
>< [hidden email]> wrote:
>> Hi everyone!
>>
>> I'm matching two samples to create one sample that have
>> pairs of observations equal for the k1 variable. Merge()
>>doesn't work
>> because I dont't want to recycle the values.
>>
>> x < data.frame(k1=c(1,1,2,3,3,5),
>>k2=c(20,21,22,23,24,25))
>> x
>> y < data.frame(k1=c(1,1,2,2,3,4,5,5),
>>k2=c(10,11,12,13,14,15,16,17))
>> y
>> merge(x,y,by="k1")
>> k1 k2.x k2.y
>> 1 1 20 10
>> 2 1 20 11
>> 3 1 21 10
>> 4 1 21 11
>> 5 2 22 12
>> 6 2 22 13
>> 7 3 23 14
>> 8 3 24 14
>> 9 5 25 16
>> 10 5 25 17
>>
>> I have a final dataframe with 10 rows, but I want it
>>with 5 rows, like this:
>> k1 k2.x k2.y
>> 1 1 20 10
>> 2 1 21 11
>> 3 2 22 12
>> 4 3 23 14
>> 5 5 25 16
>>
>
> Try this:
>
> x$k3 < with(x, ave(k1, k1, FUN = seq_along))
> y$k3 < with(y, ave(k1, k1, FUN = seq_along))
>
> merge(x, y, by = c("k1", "k3"))
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The rule is not important to me. I'm selecting a sample
that must have one important feature: the same number of
obs from x and from y with the same k1.
Thanks
Cecília
Em Fri, 20 Aug 2010 08:28:24 0400
David Winsemius < [hidden email]> escreveu:
>
> On Aug 20, 2010, at 6:44 AM, Cecilia Carmo wrote:
>
>> Hi everyone!
>>
>> I'm matching two samples to create one sample that have
>> pairs of observations equal for the k1 variable. Merge()
>>doesn't
>> work because I dont't want to recycle the values.
>
> When there is more than one possible match in either y
>or x to a possible match on k1 in the othr set of
>values, is there some rule that lets you determine which
>one should be chosen. Your offered solution suggests
>that you think the order in the original data.frams is a
>proper rule, but why should we believe that rule is
>anything other than convenience?
>
> 
> David.
>>
>> x < data.frame(k1=c(1,1,2,3,3,5),
>>k2=c(20,21,22,23,24,25))
>> x
>> y < data.frame(k1=c(1,1,2,2,3,4,5,5),
>>k2=c(10,11,12,13,14,15,16,17))
>> y
>> merge(x,y,by="k1")
>> k1 k2.x k2.y
>> 1 1 20 10
>> 2 1 20 11
>> 3 1 21 10
>> 4 1 21 11
>> 5 2 22 12
>> 6 2 22 13
>> 7 3 23 14
>> 8 3 24 14
>> 9 5 25 16
>> 10 5 25 17
>>
>> I have a final dataframe with 10 rows, but I want it
>>with 5 rows,
>> like this:
>> k1 k2.x k2.y
>> 1 1 20 10
>> 2 1 21 11
>> 3 2 22 12
>> 4 3 23 14
>> 5 5 25 16
>>
>> Thanks for any help.
>>
>> Cecília Carmo
>> (Universidade de Aveiro)
>
> David Winsemius, MD
> West Hartford, CT
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Fri, Aug 20, 2010 at 9:07 AM, Cecilia Carmo < [hidden email]> wrote:
> This is what I need, but my dataframe has many rows and it returns to me the
> following message:
>
> Error: cannot allocate vector of size 120 Kb
> In addition: There were 18 warnings (use warnings() to see them)
>
Assuming that its the merge command that generated the error try this
sqldf instead of the merge line:
library(sqldf)
sqldf("select * from x, y using(k1, k3)", dbname = tempfile())
You can also try it without dbname = tempfile() but that has a higher
chance of overflowing memory.
See http://sqldf.googlecode.com for more info on sqldf.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


It wasn't the merge command. It doesn't create the
variable x3.
Cecília
Em Fri, 20 Aug 2010 09:22:29 0400
Gabor Grothendieck < [hidden email]> escreveu:
> On Fri, Aug 20, 2010 at 9:07 AM, Cecilia Carmo
>< [hidden email]> wrote:
>> This is what I need, but my dataframe has many rows and
>>it returns to me the
>> following message:
>>
>> Error: cannot allocate vector of size 120 Kb
>> In addition: There were 18 warnings (use warnings() to
>>see them)
>>
>
> Assuming that its the merge command that generated the
>error try this
> sqldf instead of the merge line:
>
> library(sqldf)
> sqldf("select * from x, y using(k1, k3)", dbname =
>tempfile())
>
> You can also try it without dbname = tempfile() but that
>has a higher
> chance of overflowing memory.
> See http://sqldf.googlecode.com for more info on sqldf.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo < [hidden email]> wrote:
> It wasn't the merge command. It doesn't create the variable x3.
>
> Cecília
>
>
How about:
x$k3 < with(x, unlist(tapply(k1, k1, seq_along)))
y$k3 < with(y, unlist(tapply(k1, k1, seq_along)))
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Fri, Aug 20, 2010 at 9:33 AM, Gabor Grothendieck
< [hidden email]> wrote:
> On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo < [hidden email]> wrote:
>> It wasn't the merge command. It doesn't create the variable x3.
>>
>> Cecília
>>
>>
>
> How about:
>
> x$k3 < with(x, unlist(tapply(k1, k1, seq_along)))
> y$k3 < with(y, unlist(tapply(k1, k1, seq_along)))
>
And here is a second one in case that one overflows as well:
x$k3 < with(x, seq_along(k1)  match(k1, k1) + 1)
y$k3 < with(y, seq_along(k1)  match(k1, k1) + 1)
Note that this one assumes that the data frame is sorted by k1 which
in your example it is.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The last one worked! The other one, not.
Thank you very much!
Another question about merge(): sometimes I'm merging two
dataframes and the merged dataframe has much more rows
than the two merged? I think this should not happen,does
it?
Cecília
Em Fri, 20 Aug 2010 09:37:26 0400
Gabor Grothendieck < [hidden email]> escreveu:
> On Fri, Aug 20, 2010 at 9:33 AM, Gabor Grothendieck
> < [hidden email]> wrote:
>> On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo
>>< [hidden email]> wrote:
>>> It wasn't the merge command. It doesn't create the
>>>variable x3.
>>>
>>> Cecília
>>>
>>>
>>
>> How about:
>>
>> x$k3 < with(x, unlist(tapply(k1, k1, seq_along)))
>> y$k3 < with(y, unlist(tapply(k1, k1, seq_along)))
>>
>
> And here is a second one in case that one overflows as
>well:
>
> x$k3 < with(x, seq_along(k1)  match(k1, k1) + 1)
> y$k3 < with(y, seq_along(k1)  match(k1, k1) + 1)
>
> Note that this one assumes that the data frame is sorted
>by k1 which
> in your example it is.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Fri, Aug 20, 2010 at 9:59 AM, Cecilia Carmo < [hidden email]> wrote:
> The last one worked! The other one, not.
> Thank you very much!
>
> Another question about merge(): sometimes I'm merging two dataframes and the
> merged dataframe has much more rows than the two merged? I think this should
> not happen,does it?
>
That is normal behavior. If there are m rows of with a key in x and n
rows of that key in y then there will be mn rows generated. Every
such row in x will be matched to every such row in y.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

