Merging Issue

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging Issue

R help mailing list-2
Hi all, 
I have two data sets similar like below and wanted to merge them with variable "deps". As this is a sample data with small sample size, I don't have any problem using command merge. However, the actual data set has ~60,000 observations with a lot of repeated measures. For example, for a given ID I have 100 different dates and groups. Thee problem is using "merge" command gives me a lot of duplicates that I can't even track. I was wondering if there is any other way to merge such a data.Any help is appreciated. Thanks.
## Data ASubject<- c("2", "2", "2", "3", "3", "3", "4", "4", "5", "5", "5", "5")dates<-seq(as.Date('2011-01-01'),as.Date('2011-01-12'),by = 1) deps<-c("A", "B", "C", "C", "D", "A", "F", "G", "A", "F", "A", "D")df <- data.frame(Subject, dates, deps)
## Data Bloc<-c("CA","NY", "CA", "NY", "WA", "WA")grp<-c("DE", "OC", "DE", "OT", "DE", "OC")deps<-c("A","B","C", "D", "F","G")df2<-data.frame(loc, grp, deps )
dat<-merge(df, df2, by="deps")
 


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Merging Issue

jholtman
Don't use HTML on sending email- messes up the data.

What do you mean that you get lots of duplicates?  If you have duplicated
entries in df2 this will lead to dups because of the way merge works (here
is the help file):

 If there is more than one match, all possible matches contribute
     one row each.  For the precise meaning of ‘match’, see ‘match’.

So you need to define the problem that you want to solve in going the
merge.  Here is what happens in your data if I duplicate some entries in
df2; is this what you are seeing:

>  #Data A
>  Subject<- c("2", "2", "2", "3", "3", "3", "4", "4", "5", "5", "5", "5")
>  dates<-seq(as.Date('2011-01-01'),as.Date('2011-01-12'),by = 1)
>  deps<-c("A", "B", "C", "C", "D", "A", "F", "G", "A", "F", "A", "D")
>  df <- data.frame(Subject, dates, deps)
>  ##
>  #Data B
>  loc<-c("CA","NY", "CA", "NY", "WA", "WA", 'yy')
>  grp<-c("DE", "OC", "DE", "OT", "DE", "OC", "xx")
>  deps<-c("A","B","C", "D", "F","G", "A")
>  df2<-data.frame(loc, grp, deps )
>  dat<-merge(df, df2, by="deps")
>
> dat
   deps Subject      dates loc grp
1     A       2 2011-01-01  CA  DE
2     A       2 2011-01-01  yy  xx
3     A       3 2011-01-06  CA  DE
4     A       3 2011-01-06  yy  xx
5     A       5 2011-01-11  CA  DE
6     A       5 2011-01-11  yy  xx
7     A       5 2011-01-09  CA  DE
8     A       5 2011-01-09  yy  xx
9     B       2 2011-01-02  NY  OC
10    C       3 2011-01-04  CA  DE
11    C       2 2011-01-03  CA  DE
12    D       5 2011-01-12  NY  OT
13    D       3 2011-01-05  NY  OT
14    F       5 2011-01-10  WA  DE
15    F       4 2011-01-07  WA  DE
16    G       4 2011-01-08  WA  OC



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jun 17, 2016 at 8:33 PM, Farnoosh Sheikhi via R-help <
[hidden email]> wrote:

> Hi all,
> I have two data sets similar like below and wanted to merge them with
> variable "deps". As this is a sample data with small sample size, I don't
> have any problem using command merge. However, the actual data set has
> ~60,000 observations with a lot of repeated measures. For example, for a
> given ID I have 100 different dates and groups. Thee problem is using
> "merge" command gives me a lot of duplicates that I can't even track. I was
> wondering if there is any other way to merge such a data.Any help is
> appreciated. Thanks.
> ## Data ASubject<- c("2", "2", "2", "3", "3", "3", "4", "4", "5", "5",
> "5", "5")dates<-seq(as.Date('2011-01-01'),as.Date('2011-01-12'),by =
> 1) deps<-c("A", "B", "C", "C", "D", "A", "F", "G", "A", "F", "A", "D")df <-
> data.frame(Subject, dates, deps)
> ## Data Bloc<-c("CA","NY", "CA", "NY", "WA", "WA")grp<-c("DE", "OC", "DE",
> "OT", "DE", "OC")deps<-c("A","B","C", "D", "F","G")df2<-data.frame(loc,
> grp, deps )
> dat<-merge(df, df2, by="deps")
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.