paired samples, matching rows, merge()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

paired samples, matching rows, merge()

Cecilia Carmo
Hi everyone!

I'm matching two samples to create one sample that have
pairs of observations equal for the k1 variable. Merge()
doesn't work because I dont't want to recycle the values.

x <- data.frame(k1=c(1,1,2,3,3,5),
k2=c(20,21,22,23,24,25))
x
y <- data.frame(k1=c(1,1,2,2,3,4,5,5),
k2=c(10,11,12,13,14,15,16,17))
y
merge(x,y,by="k1")
   k1 k2.x k2.y
1   1   20   10
2   1   20   11
3   1   21   10
4   1   21   11
5   2   22   12
6   2   22   13
7   3   23   14
8   3   24   14
9   5   25   16
10  5   25   17

I have a final dataframe with 10 rows, but I want it with
5 rows, like this:
   k1 k2.x k2.y
1   1   20   10
2   1   21   11
3   2   22   12
4   3   23   14
5   5   25   16

Thanks for any help.

Cecília Carmo
(Universidade de Aveiro)

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Wu Gong
Hi Cecília,

Assuming that you want to pair two samples as more as possible, let's relabel the k1 indices. The x$k1 c(1,1,2,3,3,5) could be relabeled as c(1.01,1.02,2.01,3.01,3.02,5.01), so as y$k1.

x$k1 <- x$k1+sequence(rle(x$k1)$lengths)/100
y$k1 <- y$k1+sequence(rle(y$k1)$lengths)/100
merge(x,y,by="k1")

Regards,

Wu
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

David Winsemius
In reply to this post by Cecilia Carmo

On Aug 20, 2010, at 6:44 AM, Cecilia Carmo wrote:

> Hi everyone!
>
> I'm matching two samples to create one sample that have
> pairs of observations equal for the k1 variable. Merge() doesn't  
> work because I dont't want to recycle the values.

When there is more than one possible match in either y or x to a  
possible match on k1 in the othr set of values, is there some rule  
that lets you determine which one should be chosen. Your offered  
solution suggests that you think the order in the original data.frams  
is a proper rule, but why should we believe that rule is anything  
other than convenience?

--
David.

>
> x <- data.frame(k1=c(1,1,2,3,3,5), k2=c(20,21,22,23,24,25))
> x
> y <- data.frame(k1=c(1,1,2,2,3,4,5,5), k2=c(10,11,12,13,14,15,16,17))
> y
> merge(x,y,by="k1")
>  k1 k2.x k2.y
> 1   1   20   10
> 2   1   20   11
> 3   1   21   10
> 4   1   21   11
> 5   2   22   12
> 6   2   22   13
> 7   3   23   14
> 8   3   24   14
> 9   5   25   16
> 10  5   25   17
>
> I have a final dataframe with 10 rows, but I want it with 5 rows,  
> like this:
>  k1 k2.x k2.y
> 1   1   20   10
> 2   1   21   11
> 3   2   22   12
> 4   3   23   14
> 5   5   25   16
>
> Thanks for any help.
>
> Cecília Carmo
> (Universidade de Aveiro)

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Gabor Grothendieck
In reply to this post by Cecilia Carmo
On Fri, Aug 20, 2010 at 6:44 AM, Cecilia Carmo <[hidden email]> wrote:

> Hi everyone!
>
> I'm matching two samples to create one sample that have
> pairs of observations equal for the k1 variable. Merge() doesn't work
> because I dont't want to recycle the values.
>
> x <- data.frame(k1=c(1,1,2,3,3,5), k2=c(20,21,22,23,24,25))
> x
> y <- data.frame(k1=c(1,1,2,2,3,4,5,5), k2=c(10,11,12,13,14,15,16,17))
> y
> merge(x,y,by="k1")
>  k1 k2.x k2.y
> 1   1   20   10
> 2   1   20   11
> 3   1   21   10
> 4   1   21   11
> 5   2   22   12
> 6   2   22   13
> 7   3   23   14
> 8   3   24   14
> 9   5   25   16
> 10  5   25   17
>
> I have a final dataframe with 10 rows, but I want it with 5 rows, like this:
>  k1 k2.x k2.y
> 1   1   20   10
> 2   1   21   11
> 3   2   22   12
> 4   3   23   14
> 5   5   25   16
>

Try this:

x$k3 <- with(x, ave(k1, k1, FUN = seq_along))
y$k3 <- with(y, ave(k1, k1, FUN = seq_along))

merge(x, y, by = c("k1", "k3"))

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Kay Cichini
In reply to this post by Cecilia Carmo
hello cecilia,
i tried:

yn<-y[y$k1%in%x$k1,]
xn<-x[x$k1%in%y$k1,]
x1st<-unique(match(yn$k1,xn$k1))
y1st<-unique(match(xn$k1,yn$k1))
new<-cbind(k1=intersect(y$k1,x$k1),k2.x=xn[x1st,2],k2.y=yn[y1st,2])

giving:

     k1 k2.x k2.y
[1,]  1   20   10
[2,]  2   22   12
[3,]  3   23   14
[4,]  5   25   16

...but wih the irregulary duplicated values in k1 i dead-ended and i guess it could get tricky to solve this.

greetings,
kay
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Kay Cichini
..gabor gave the solution during i was typing -
so please disregard this.

yours,
kay
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Cecilia Carmo
In reply to this post by Gabor Grothendieck
This is what I need, but my dataframe has many rows and it
returns to me the following message:

Error: cannot allocate vector of size 120 Kb
In addition: There were 18 warnings (use warnings() to see
them)

Could you help me,

Thanks
Cecília

Em Fri, 20 Aug 2010 08:30:16 -0400
  Gabor Grothendieck <[hidden email]> escreveu:

> On Fri, Aug 20, 2010 at 6:44 AM, Cecilia Carmo
><[hidden email]> wrote:
>> Hi everyone!
>>
>> I'm matching two samples to create one sample that have
>> pairs of observations equal for the k1 variable. Merge()
>>doesn't work
>> because I dont't want to recycle the values.
>>
>> x <- data.frame(k1=c(1,1,2,3,3,5),
>>k2=c(20,21,22,23,24,25))
>> x
>> y <- data.frame(k1=c(1,1,2,2,3,4,5,5),
>>k2=c(10,11,12,13,14,15,16,17))
>> y
>> merge(x,y,by="k1")
>>  k1 k2.x k2.y
>> 1   1   20   10
>> 2   1   20   11
>> 3   1   21   10
>> 4   1   21   11
>> 5   2   22   12
>> 6   2   22   13
>> 7   3   23   14
>> 8   3   24   14
>> 9   5   25   16
>> 10  5   25   17
>>
>> I have a final dataframe with 10 rows, but I want it
>>with 5 rows, like this:
>>  k1 k2.x k2.y
>> 1   1   20   10
>> 2   1   21   11
>> 3   2   22   12
>> 4   3   23   14
>> 5   5   25   16
>>
>
> Try this:
>
> x$k3 <- with(x, ave(k1, k1, FUN = seq_along))
> y$k3 <- with(y, ave(k1, k1, FUN = seq_along))
>
> merge(x, y, by = c("k1", "k3"))

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Cecilia Carmo
In reply to this post by David Winsemius
The rule is not important to me. I'm selecting a sample
that must have one important feature: the same number of
obs from x and from y with the same k1.

Thanks
Cecília

Em Fri, 20 Aug 2010 08:28:24 -0400
  David Winsemius <[hidden email]> escreveu:

>
> On Aug 20, 2010, at 6:44 AM, Cecilia Carmo wrote:
>
>> Hi everyone!
>>
>> I'm matching two samples to create one sample that have
>> pairs of observations equal for the k1 variable. Merge()
>>doesn't  
>> work because I dont't want to recycle the values.
>
> When there is more than one possible match in either y
>or x to a  possible match on k1 in the othr set of
>values, is there some rule  that lets you determine which
>one should be chosen. Your offered  solution suggests
>that you think the order in the original data.frams  is a
>proper rule, but why should we believe that rule is
>anything  other than convenience?
>
> --
> David.
>>
>> x <- data.frame(k1=c(1,1,2,3,3,5),
>>k2=c(20,21,22,23,24,25))
>> x
>> y <- data.frame(k1=c(1,1,2,2,3,4,5,5),
>>k2=c(10,11,12,13,14,15,16,17))
>> y
>> merge(x,y,by="k1")
>>  k1 k2.x k2.y
>> 1   1   20   10
>> 2   1   20   11
>> 3   1   21   10
>> 4   1   21   11
>> 5   2   22   12
>> 6   2   22   13
>> 7   3   23   14
>> 8   3   24   14
>> 9   5   25   16
>> 10  5   25   17
>>
>> I have a final dataframe with 10 rows, but I want it
>>with 5 rows,  
>> like this:
>>  k1 k2.x k2.y
>> 1   1   20   10
>> 2   1   21   11
>> 3   2   22   12
>> 4   3   23   14
>> 5   5   25   16
>>
>> Thanks for any help.
>>
>> Cecília Carmo
>> (Universidade de Aveiro)
>
> David Winsemius, MD
> West Hartford, CT
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Gabor Grothendieck
In reply to this post by Cecilia Carmo
On Fri, Aug 20, 2010 at 9:07 AM, Cecilia Carmo <[hidden email]> wrote:
> This is what I need, but my dataframe has many rows and it returns to me the
> following message:
>
> Error: cannot allocate vector of size 120 Kb
> In addition: There were 18 warnings (use warnings() to see them)
>

Assuming that its the merge command that generated the error try this
sqldf instead of the merge line:

library(sqldf)
sqldf("select * from x, y using(k1, k3)", dbname = tempfile())

You can also try it without dbname = tempfile() but that has a higher
chance of overflowing memory.
See http://sqldf.googlecode.com for more info on sqldf.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Cecilia Carmo
It wasn't the merge command. It doesn't create the
variable x3.

Cecília


Em Fri, 20 Aug 2010 09:22:29 -0400
  Gabor Grothendieck <[hidden email]> escreveu:

> On Fri, Aug 20, 2010 at 9:07 AM, Cecilia Carmo
><[hidden email]> wrote:
>> This is what I need, but my dataframe has many rows and
>>it returns to me the
>> following message:
>>
>> Error: cannot allocate vector of size 120 Kb
>> In addition: There were 18 warnings (use warnings() to
>>see them)
>>
>
> Assuming that its the merge command that generated the
>error try this
> sqldf instead of the merge line:
>
> library(sqldf)
> sqldf("select * from x, y using(k1, k3)", dbname =
>tempfile())
>
> You can also try it without dbname = tempfile() but that
>has a higher
> chance of overflowing memory.
> See http://sqldf.googlecode.com for more info on sqldf.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Gabor Grothendieck
On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo <[hidden email]> wrote:
> It wasn't the merge command. It doesn't create the variable x3.
>
> Cecília
>
>

How about:

x$k3 <- with(x, unlist(tapply(k1, k1, seq_along)))
y$k3 <- with(y, unlist(tapply(k1, k1, seq_along)))

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Gabor Grothendieck
On Fri, Aug 20, 2010 at 9:33 AM, Gabor Grothendieck
<[hidden email]> wrote:

> On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo <[hidden email]> wrote:
>> It wasn't the merge command. It doesn't create the variable x3.
>>
>> Cecília
>>
>>
>
> How about:
>
> x$k3 <- with(x, unlist(tapply(k1, k1, seq_along)))
> y$k3 <- with(y, unlist(tapply(k1, k1, seq_along)))
>

And here is a second one in case that one overflows as well:

x$k3 <- with(x, seq_along(k1) - match(k1, k1) + 1)
y$k3 <- with(y, seq_along(k1) - match(k1, k1) + 1)

Note that this one assumes that the data frame is sorted by k1 which
in your example it is.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Cecilia Carmo
The last one worked! The other one, not.
Thank you very much!

Another question about merge(): sometimes I'm merging two
dataframes and the merged dataframe has much more rows
than the two merged? I think this should not happen,does
it?

Cecília


Em Fri, 20 Aug 2010 09:37:26 -0400
  Gabor Grothendieck <[hidden email]> escreveu:

> On Fri, Aug 20, 2010 at 9:33 AM, Gabor Grothendieck
> <[hidden email]> wrote:
>> On Fri, Aug 20, 2010 at 9:27 AM, Cecilia Carmo
>><[hidden email]> wrote:
>>> It wasn't the merge command. It doesn't create the
>>>variable x3.
>>>
>>> Cecília
>>>
>>>
>>
>> How about:
>>
>> x$k3 <- with(x, unlist(tapply(k1, k1, seq_along)))
>> y$k3 <- with(y, unlist(tapply(k1, k1, seq_along)))
>>
>
> And here is a second one in case that one overflows as
>well:
>
> x$k3 <- with(x, seq_along(k1) - match(k1, k1) + 1)
> y$k3 <- with(y, seq_along(k1) - match(k1, k1) + 1)
>
> Note that this one assumes that the data frame is sorted
>by k1 which
> in your example it is.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: paired samples, matching rows, merge()

Gabor Grothendieck
On Fri, Aug 20, 2010 at 9:59 AM, Cecilia Carmo <[hidden email]> wrote:
> The last one worked! The other one, not.
> Thank you very much!
>
> Another question about merge(): sometimes I'm merging two dataframes and the
> merged dataframe has much more rows than the two merged? I think this should
> not happen,does it?
>

That is normal behavior.  If there are m rows of with a key in x and n
rows of that key in y then there will be mn rows generated.  Every
such row in x will be matched to every such row in  y.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.