Merging data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging data

Brian Perron
Hello all,

I am fairly new to R and am trying to bring together data from multiple sources.  Here is one problem that I cannot seem to crack – I hope somebody can help.  Let me simplify the problem:  Let’s say I have two datasets:  DATA1 and DATA2.  I would like to work with all the cases in DATA2.  I have additional variables on these cases in DATA1, which is a larger data set with many additional cases.  I know how to merge data sets if the datasets contain the same cases.  However, I want to eliminate all the cases from DATA1 that are not present in DATA2 and then merge.  The CASEID is my matching variable, and there are no duplicate variable names.
Any guidance would be greatly appreciated.

Thanks in advance,
Brian  




        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Merging data

Chuck Cleland
Brian Perron wrote:
> Hello all,
>
> I am fairly new to R and am trying to bring together data from multiple sources.  Here is one problem that I cannot seem to crack – I hope somebody can help.  Let me simplify the problem:  Let’s say I have two datasets:  DATA1 and DATA2.  I would like to work with all the cases in DATA2.  I have additional variables on these cases in DATA1, which is a larger data set with many additional cases.  I know how to merge data sets if the datasets contain the same cases.  However, I want to eliminate all the cases from DATA1 that are not present in DATA2 and then merge.  The CASEID is my matching variable, and there are no duplicate variable names.
> Any guidance would be greatly appreciated.

Take closer look at the all.x and all.y arguments in ?merge.  Does this
give what you want?

merge(DATA1, DATA2, by="CASEID", all.x=FALSE, all.y=TRUE)

--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Merging data

Liaw, Andy
In reply to this post by Brian Perron
Something like this?

> data1 <- data.frame(id=c(1, 3, 5), x=runif(3))
> data2 <- data.frame(id=1:10, y=runif(10))
> data3 <- merge(data1, data2, by="id", all.x=TRUE, all.y=FALSE)
> data3
  id         x         y
1  1 0.9533341 0.1803271
2  3 0.9143624 0.5033228
3  5 0.2866931 0.4233733

Andy

From: Brian Perron

>
> Hello all,
>
> I am fairly new to R and am trying to bring together data
> from multiple sources.  Here is one problem that I cannot
> seem to crack - I hope somebody can help.  Let me simplify
> the problem:  Let's say I have two datasets:  DATA1 and
> DATA2.  I would like to work with all the cases in DATA2.  I
> have additional variables on these cases in DATA1, which is a
> larger data set with many additional cases.  I know how to
> merge data sets if the datasets contain the same cases.  
> However, I want to eliminate all the cases from DATA1 that
> are not present in DATA2 and then merge.  The CASEID is my
> matching variable, and there are no duplicate variable names.
> Any guidance would be greatly appreciated.
>
> Thanks in advance,
> Brian  
>
>
>
>
> [[alternative HTML version deleted]]
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html