Matching values between 2 data.frame.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Matching values between 2 data.frame.

Bogaso
Hi again,

Let say I have below 2 data frames.

OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22, 'AA1' =
c('AA4', 'AA3', 'AA4', 'AA1', 'AA2', 'AA1', 'AA6', 'AA6', 'AA3',
'AA3', 'AA4', 'AA3'), 'Value' = NA)

TargetValue = data.frame('AA' = c('AA1', 'AA2', 'AA3', 'AA4', 'AA5',
'AA6'), 'BB' = c('B', 'B', 'B', 'B', 'CC', 'CC'), 'Value' = c(5, 10,
25, 7, 35, 21))

OriginalData
TargetValue

Now I need to replace OriginalData's 'AA1' column with TargetValue's
'BB' column, based on matched values between 'AA1' & 'AA' columns of
OriginalData & TargetValue respectively. With this same law, I need to
update 'Value' column of  OriginalData with that of TargetValue.

As an example, after replacement by above rule, 1st row of
OriginalData should look like :

> OriginalData

   Value1 Value2 AA1 Value

1       1     11 B    7

Values of TargetValue's 'AA' column are unique i.e. no duplication

Previously I have implemented a 'for' loop to implement above, however
since both of my data.frames are quite big, it is taking long time to
execute. Is there any 'R' way to implement this quickly.

Appreciate for any pointer.

Thanks,

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Matching values between 2 data.frame.

David Winsemius

> On May 20, 2017, at 11:23 AM, Christofer Bogaso <[hidden email]> wrote:
>
> Hi again,
>
> Let say I have below 2 data frames.
>
> OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22, 'AA1' =
> c('AA4', 'AA3', 'AA4', 'AA1', 'AA2', 'AA1', 'AA6', 'AA6', 'AA3',
> 'AA3', 'AA4', 'AA3'), 'Value' = NA)
>
> TargetValue = data.frame('AA' = c('AA1', 'AA2', 'AA3', 'AA4', 'AA5',
> 'AA6'), 'BB' = c('B', 'B', 'B', 'B', 'CC', 'CC'), 'Value' = c(5, 10,
> 25, 7, 35, 21))
>
> OriginalData
> TargetValue
>
> Now I need to replace OriginalData's 'AA1' column with TargetValue's
> 'BB' column, based on matched values between 'AA1' & 'AA' columns of
> OriginalData & TargetValue respectively. With this same law, I need to
> update 'Value' column of  OriginalData with that of TargetValue.
>
> As an example, after replacement by above rule, 1st row of
> OriginalData should look like :
>
>> OriginalData
>
>   Value1 Value2 AA1 Value
>
> 1       1     11 B    7
>
> Values of TargetValue's 'AA' column are unique i.e. no duplication
>
> Previously I have implemented a 'for' loop to implement above, however
> since both of my data.frames are quite big, it is taking long time to
> execute. Is there any 'R' way to implement this quickly.
>
> Appreciate for any pointer.

It's going to have a greater chance of delivering the desired result if you convert the factor columns into character.

>
> Thanks,
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Matching values between 2 data.frame.

Bert Gunter-2
Like this?  (use indexing to avoid explicit loops whenever possible):

## first convert factor columns to character, as David W. suggested
i <- sapply(od,is.factor)
od[i]<- lapply(od[i],as.character)
i <- sapply(tv, is.factor)
tv[i]<- lapply(tv[i],as.character)


## Now use ?match
wh  <-   match(od[,"AA1"], tv[,"AA"])
matched <- !is.na(wh)  ## only needed if not all AA1's match in AA
od[matched,c("AA1", "Value")] <- tv[wh[matched],c("BB","Value")]

> od
   Value1 Value2 AA1 Value
1       1     11   B     7
2       2     12   B    25
3       3     13   B     7
4       4     14   B     5
5       5     15   B    10
6       6     16   B     5
7       7     17  CC    21
8       8     18  CC    21
9       9     19   B    25
10     10     20   B    25
11     11     21   B     7
12     12     22   B    25


Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, May 20, 2017 at 11:53 AM, David Winsemius
<[hidden email]> wrote:

>
>> On May 20, 2017, at 11:23 AM, Christofer Bogaso <[hidden email]> wrote:
>>
>> Hi again,
>>
>> Let say I have below 2 data frames.
>>
>> OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22, 'AA1' =
>> c('AA4', 'AA3', 'AA4', 'AA1', 'AA2', 'AA1', 'AA6', 'AA6', 'AA3',
>> 'AA3', 'AA4', 'AA3'), 'Value' = NA)
>>
>> TargetValue = data.frame('AA' = c('AA1', 'AA2', 'AA3', 'AA4', 'AA5',
>> 'AA6'), 'BB' = c('B', 'B', 'B', 'B', 'CC', 'CC'), 'Value' = c(5, 10,
>> 25, 7, 35, 21))
>>
>> OriginalData
>> TargetValue
>>
>> Now I need to replace OriginalData's 'AA1' column with TargetValue's
>> 'BB' column, based on matched values between 'AA1' & 'AA' columns of
>> OriginalData & TargetValue respectively. With this same law, I need to
>> update 'Value' column of  OriginalData with that of TargetValue.
>>
>> As an example, after replacement by above rule, 1st row of
>> OriginalData should look like :
>>
>>> OriginalData
>>
>>   Value1 Value2 AA1 Value
>>
>> 1       1     11 B    7
>>
>> Values of TargetValue's 'AA' column are unique i.e. no duplication
>>
>> Previously I have implemented a 'for' loop to implement above, however
>> since both of my data.frames are quite big, it is taking long time to
>> execute. Is there any 'R' way to implement this quickly.
>>
>> Appreciate for any pointer.
>
> It's going to have a greater chance of delivering the desired result if you convert the factor columns into character.
>
>>
>> Thanks,
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Matching values between 2 data.frame.

R help mailing list-2
merge() may be useful here:

> merge(OriginalData[1:3], TargetValue, by.x="AA1", by.y="AA",
sort=FALSE)[-1]
   Value1 Value2 BB Value
1       1     11  B     7
2       3     13  B     7
3      11     21  B     7
4       2     12  B    25
5      12     22  B    25
6       9     19  B    25
7      10     20  B    25
8       4     14  B     5
9       6     16  B     5
10      5     15  B    10
11      7     17 CC    21
12      8     18 CC    21

Rename the columns as desired.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, May 20, 2017 at 1:13 PM, Bert Gunter <[hidden email]> wrote:

> Like this?  (use indexing to avoid explicit loops whenever possible):
>
> ## first convert factor columns to character, as David W. suggested
> i <- sapply(od,is.factor)
> od[i]<- lapply(od[i],as.character)
> i <- sapply(tv, is.factor)
> tv[i]<- lapply(tv[i],as.character)
>
>
> ## Now use ?match
> wh  <-   match(od[,"AA1"], tv[,"AA"])
> matched <- !is.na(wh)  ## only needed if not all AA1's match in AA
> od[matched,c("AA1", "Value")] <- tv[wh[matched],c("BB","Value")]
>
> > od
>    Value1 Value2 AA1 Value
> 1       1     11   B     7
> 2       2     12   B    25
> 3       3     13   B     7
> 4       4     14   B     5
> 5       5     15   B    10
> 6       6     16   B     5
> 7       7     17  CC    21
> 8       8     18  CC    21
> 9       9     19   B    25
> 10     10     20   B    25
> 11     11     21   B     7
> 12     12     22   B    25
>
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, May 20, 2017 at 11:53 AM, David Winsemius
> <[hidden email]> wrote:
> >
> >> On May 20, 2017, at 11:23 AM, Christofer Bogaso <
> [hidden email]> wrote:
> >>
> >> Hi again,
> >>
> >> Let say I have below 2 data frames.
> >>
> >> OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22, 'AA1' =
> >> c('AA4', 'AA3', 'AA4', 'AA1', 'AA2', 'AA1', 'AA6', 'AA6', 'AA3',
> >> 'AA3', 'AA4', 'AA3'), 'Value' = NA)
> >>
> >> TargetValue = data.frame('AA' = c('AA1', 'AA2', 'AA3', 'AA4', 'AA5',
> >> 'AA6'), 'BB' = c('B', 'B', 'B', 'B', 'CC', 'CC'), 'Value' = c(5, 10,
> >> 25, 7, 35, 21))
> >>
> >> OriginalData
> >> TargetValue
> >>
> >> Now I need to replace OriginalData's 'AA1' column with TargetValue's
> >> 'BB' column, based on matched values between 'AA1' & 'AA' columns of
> >> OriginalData & TargetValue respectively. With this same law, I need to
> >> update 'Value' column of  OriginalData with that of TargetValue.
> >>
> >> As an example, after replacement by above rule, 1st row of
> >> OriginalData should look like :
> >>
> >>> OriginalData
> >>
> >>   Value1 Value2 AA1 Value
> >>
> >> 1       1     11 B    7
> >>
> >> Values of TargetValue's 'AA' column are unique i.e. no duplication
> >>
> >> Previously I have implemented a 'for' loop to implement above, however
> >> since both of my data.frames are quite big, it is taking long time to
> >> execute. Is there any 'R' way to implement this quickly.
> >>
> >> Appreciate for any pointer.
> >
> > It's going to have a greater chance of delivering the desired result if
> you convert the factor columns into character.
> >
> >>
> >> Thanks,
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.