

Dear all,
please may I ask for a piece of advise regarding merging two dataframes :
A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
function of the criteria :
if "the elements in the 1st column of A could be found among the elements
of the 1st column of B" i.e.
for the example above, we shall combine in the results only the row with
"a*b" of A with the row with "a*b::x*y" of B.
thank you,
bogdan
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear Riley,
thank you very much for your help and solution. I got some inspiration from
stackoverflow website,
and I did use sqldf library. It looks that the formula below works too.
Thanks a lot !
sqldf("select B.*, A.* from B left join A on instr(B.z, A.z)")
On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn < [hidden email]> wrote:
> please may I ask for a piece of advise regarding merging two dataframes :
>> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>> function of the criteria :
>> if "the elements in the 1st column of A could be found among the elements
>> of the 1st column of B" i.e.
>> for the example above, we shall combine in the results only the row with
>> "a*b" of A with the row with "a*b::x*y" of B.
>
>
> This may be what you are looking for:
>
> library(fuzzyjoin)
>
> The inner join returns just the one row where the string matches.
> B %>%
> regex_inner_join(A, by = c(z = 'z'))
>
> While the full join returns NA's where the string does not match.
> B %>%
> regex_full_join(A, by = c(z = 'z'))
>
> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa < [hidden email]> wrote:
>
>> Dear all,
>>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>
>> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>
>> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>
>> function of the criteria :
>>
>> if "the elements in the 1st column of A could be found among the elements
>> of the 1st column of B" i.e.
>>
>> for the example above, we shall combine in the results only the row with
>> "a*b" of A with the row with "a*b::x*y" of B.
>>
>> thank you,
>>
>> bogdan
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/>> postingguide.html
>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks a lot ! It looks that I am getting the same results with :
B %>% regex_left_join(A, by = c(z = 'z'))
On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn < [hidden email]> wrote:
> please may I ask for a piece of advise regarding merging two dataframes :
>> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>> function of the criteria :
>> if "the elements in the 1st column of A could be found among the elements
>> of the 1st column of B" i.e.
>> for the example above, we shall combine in the results only the row with
>> "a*b" of A with the row with "a*b::x*y" of B.
>
>
> This may be what you are looking for:
>
> library(fuzzyjoin)
>
> The inner join returns just the one row where the string matches.
> B %>%
> regex_inner_join(A, by = c(z = 'z'))
>
> While the full join returns NA's where the string does not match.
> B %>%
> regex_full_join(A, by = c(z = 'z'))
>
> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa < [hidden email]> wrote:
>
>> Dear all,
>>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>
>> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>
>> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>
>> function of the criteria :
>>
>> if "the elements in the 1st column of A could be found among the elements
>> of the 1st column of B" i.e.
>>
>> for the example above, we shall combine in the results only the row with
>> "a*b" of A with the row with "a*b::x*y" of B.
>>
>> thank you,
>>
>> bogdan
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/>> postingguide.html
>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The traditional (SQL) way to attack this problem is to make the data
structure simpler so that faster comparisons can be utilized:
################
A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
Bx < ( B
%>% mutate( z_B = as.character( z ) )
%>% rename( t_B = t )
%>% separate_rows( z, sep="::" )
)
Bx
#> z t_B z_B
#> 1 a*b 1 a*b::x*y
#> 2 x*y 1 a*b::x*y
#> 3 c 2 c
#> 4 3
#> 5 g*h 4 g*h
result < ( A
%>% mutate( z = as.character( z ) )
%>% rename( t_A = t )
%>% inner_join( Bx, by="z" )
)
result
#> z t_A t_B z_B
#> 1 a*b 1 1 a*b::x*y
#' Created on 20180718 by the [reprex package]( http://reprex.tidyverse.org) (v0.2.0).
################
Note that this is preferable if you can avoid ever creating the
complex data z in B, but Bx is much more flexible and less error prone
than B. (Especially if you don't have to create B$z_B at all, but have
some other unique identifier(s) for the groupings represented by each row
in B.)
On Wed, 18 Jul 2018, Bogdan Tanasa wrote:
> Thanks a lot ! It looks that I am getting the same results with :
>
> B %>% regex_left_join(A, by = c(z = 'z'))
>
> On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn < [hidden email]> wrote:
>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>> function of the criteria :
>>> if "the elements in the 1st column of A could be found among the elements
>>> of the 1st column of B" i.e.
>>> for the example above, we shall combine in the results only the row with
>>> "a*b" of A with the row with "a*b::x*y" of B.
>>
>>
>> This may be what you are looking for:
>>
>> library(fuzzyjoin)
>>
>> The inner join returns just the one row where the string matches.
>> B %>%
>> regex_inner_join(A, by = c(z = 'z'))
>>
>> While the full join returns NA's where the string does not match.
>> B %>%
>> regex_full_join(A, by = c(z = 'z'))
>>
>> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa < [hidden email]> wrote:
>>
>>> Dear all,
>>>
>>> please may I ask for a piece of advise regarding merging two dataframes :
>>>
>>> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>>
>>> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>>
>>> function of the criteria :
>>>
>>> if "the elements in the 1st column of A could be found among the elements
>>> of the 1st column of B" i.e.
>>>
>>> for the example above, we shall combine in the results only the row with
>>> "a*b" of A with the row with "a*b::x*y" of B.
>>>
>>> thank you,
>>>
>>> bogdan
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>> PLEASE do read the posting guide http://www.Rproject.org/>>> postingguide.html
>>> and provide commented, minimal, selfcontained, reproducible code.
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>

Jeff Newmiller The ..... ..... Go Live...
DCN:< [hidden email]> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


it looks great, thank you very much Jeff for your time and kind help !
On Wed, Jul 18, 2018 at 7:51 PM, Jeff Newmiller < [hidden email]>
wrote:
> The traditional (SQL) way to attack this problem is to make the data
> structure simpler so that faster comparisons can be utilized:
>
> ################
> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #> intersect, setdiff, setequal, union
> library(tidyr)
> Bx < ( B
> %>% mutate( z_B = as.character( z ) )
> %>% rename( t_B = t )
> %>% separate_rows( z, sep="::" )
> )
> Bx
> #> z t_B z_B
> #> 1 a*b 1 a*b::x*y
> #> 2 x*y 1 a*b::x*y
> #> 3 c 2 c
> #> 4 3
> #> 5 g*h 4 g*h
> result < ( A
> %>% mutate( z = as.character( z ) )
> %>% rename( t_A = t )
> %>% inner_join( Bx, by="z" )
> )
> result
> #> z t_A t_B z_B
> #> 1 a*b 1 1 a*b::x*y
>
> #' Created on 20180718 by the [reprex package]( http://reprex.tidyver> se.org) (v0.2.0).
> ################
>
> Note that this is preferable if you can avoid ever creating the complex
> data z in B, but Bx is much more flexible and less error prone than B.
> (Especially if you don't have to create B$z_B at all, but have some other
> unique identifier(s) for the groupings represented by each row in B.)
>
>
> On Wed, 18 Jul 2018, Bogdan Tanasa wrote:
>
> Thanks a lot ! It looks that I am getting the same results with :
>>
>> B %>% regex_left_join(A, by = c(z = 'z'))
>>
>> On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn < [hidden email]> wrote:
>>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>>
>>>> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>>> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>>> function of the criteria :
>>>> if "the elements in the 1st column of A could be found among the
>>>> elements
>>>> of the 1st column of B" i.e.
>>>> for the example above, we shall combine in the results only the row with
>>>> "a*b" of A with the row with "a*b::x*y" of B.
>>>>
>>>
>>>
>>> This may be what you are looking for:
>>>
>>> library(fuzzyjoin)
>>>
>>> The inner join returns just the one row where the string matches.
>>> B %>%
>>> regex_inner_join(A, by = c(z = 'z'))
>>>
>>> While the full join returns NA's where the string does not match.
>>> B %>%
>>> regex_full_join(A, by = c(z = 'z'))
>>>
>>> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa < [hidden email]> wrote:
>>>
>>> Dear all,
>>>>
>>>> please may I ask for a piece of advise regarding merging two dataframes
>>>> :
>>>>
>>>> A < data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>>>
>>>> B < data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>>>
>>>> function of the criteria :
>>>>
>>>> if "the elements in the 1st column of A could be found among the
>>>> elements
>>>> of the 1st column of B" i.e.
>>>>
>>>> for the example above, we shall combine in the results only the row with
>>>> "a*b" of A with the row with "a*b::x*y" of B.
>>>>
>>>> thank you,
>>>>
>>>> bogdan
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>> PLEASE do read the posting guide http://www.Rproject.org/>>>> postingguide.html
>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>>
>>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/posti>> ngguide.html
>> and provide commented, minimal, selfcontained, reproducible code.
>>
>>
> 
> 
> Jeff Newmiller The ..... ..... Go Live...
> DCN:< [hidden email]> Basics: ##.#. ##.#. Live
> Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> 
> 
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

