help with merging two dataframes function of "egrep"-like formulas

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

help with merging two dataframes function of "egrep"-like formulas

Bogdan Tanasa
Dear all,

please may I ask for a piece of advise regarding merging two dataframes :

A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))

B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))

function of the criteria :

if "the elements in the 1st column of A could be found among the elements
of the 1st column of B" i.e.

for the example above, we shall combine in the results only the row with
"a*b" of A with the row with "a*b::x*y" of B.

thank you,

bogdan

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help with merging two dataframes function of "egrep"-like formulas

Bogdan Tanasa
Dear Riley,

thank you very much for your help and solution. I got some inspiration from
stackoverflow website,

and I did use sqldf library. It looks that the formula below works too.
Thanks a lot !

sqldf("select B.*, A.* from B left join A on instr(B.z,  A.z)")


On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <[hidden email]> wrote:

> please may I ask for a piece of advise regarding merging two dataframes :
>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>> function of the criteria :
>> if "the elements in the 1st column of A could be found among the elements
>> of the 1st column of B" i.e.
>> for the example above, we shall combine in the results only the row with
>> "a*b" of A with the row with "a*b::x*y" of B.
>
>
> This may be what you are looking for:
>
> library(fuzzyjoin)
>
> The inner join returns just the one row where the string matches.
> B %>%
>   regex_inner_join(A, by = c(z = 'z'))
>
> While the full join returns NA's where the string does not match.
> B %>%
>   regex_full_join(A, by = c(z = 'z'))
>
> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <[hidden email]> wrote:
>
>> Dear all,
>>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>
>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>
>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>
>> function of the criteria :
>>
>> if "the elements in the 1st column of A could be found among the elements
>> of the 1st column of B" i.e.
>>
>> for the example above, we shall combine in the results only the row with
>> "a*b" of A with the row with "a*b::x*y" of B.
>>
>> thank you,
>>
>> bogdan
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help with merging two dataframes function of "egrep"-like formulas

Bogdan Tanasa
In reply to this post by Bogdan Tanasa
Thanks a lot ! It looks that I am getting the same results with :

B %>% regex_left_join(A, by = c(z = 'z'))

On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <[hidden email]> wrote:

> please may I ask for a piece of advise regarding merging two dataframes :
>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>> function of the criteria :
>> if "the elements in the 1st column of A could be found among the elements
>> of the 1st column of B" i.e.
>> for the example above, we shall combine in the results only the row with
>> "a*b" of A with the row with "a*b::x*y" of B.
>
>
> This may be what you are looking for:
>
> library(fuzzyjoin)
>
> The inner join returns just the one row where the string matches.
> B %>%
>   regex_inner_join(A, by = c(z = 'z'))
>
> While the full join returns NA's where the string does not match.
> B %>%
>   regex_full_join(A, by = c(z = 'z'))
>
> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <[hidden email]> wrote:
>
>> Dear all,
>>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>
>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>
>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>
>> function of the criteria :
>>
>> if "the elements in the 1st column of A could be found among the elements
>> of the 1st column of B" i.e.
>>
>> for the example above, we shall combine in the results only the row with
>> "a*b" of A with the row with "a*b::x*y" of B.
>>
>> thank you,
>>
>> bogdan
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help with merging two dataframes function of "egrep"-like formulas

Jeff Newmiller
The traditional (SQL) way to attack this problem is to make the data
structure simpler so that faster comparisons can be utilized:

################
A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))

library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)
Bx <- (   B
       %>% mutate( z_B = as.character( z ) )
       %>% rename( t_B = t )
       %>% separate_rows( z, sep="::" )
       )
Bx
#>     z t_B      z_B
#> 1 a*b   1 a*b::x*y
#> 2 x*y   1 a*b::x*y
#> 3   c   2        c
#> 4       3
#> 5 g*h   4      g*h
result <- (   A
           %>% mutate( z = as.character( z ) )
           %>% rename( t_A = t )
           %>% inner_join( Bx, by="z" )
           )
result
#>     z t_A t_B      z_B
#> 1 a*b   1   1 a*b::x*y

#' Created on 2018-07-18 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
################

Note that this is preferable if you can avoid ever creating the
complex data z in B, but Bx is much more flexible and less error prone
than B. (Especially if you don't have to create B$z_B at all, but have
some other unique identifier(s) for the groupings represented by each row
in B.)

On Wed, 18 Jul 2018, Bogdan Tanasa wrote:

> Thanks a lot ! It looks that I am getting the same results with :
>
> B %>% regex_left_join(A, by = c(z = 'z'))
>
> On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <[hidden email]> wrote:
>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>> function of the criteria :
>>> if "the elements in the 1st column of A could be found among the elements
>>> of the 1st column of B" i.e.
>>> for the example above, we shall combine in the results only the row with
>>> "a*b" of A with the row with "a*b::x*y" of B.
>>
>>
>> This may be what you are looking for:
>>
>> library(fuzzyjoin)
>>
>> The inner join returns just the one row where the string matches.
>> B %>%
>>   regex_inner_join(A, by = c(z = 'z'))
>>
>> While the full join returns NA's where the string does not match.
>> B %>%
>>   regex_full_join(A, by = c(z = 'z'))
>>
>> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <[hidden email]> wrote:
>>
>>> Dear all,
>>>
>>> please may I ask for a piece of advise regarding merging two dataframes :
>>>
>>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>>
>>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>>
>>> function of the criteria :
>>>
>>> if "the elements in the 1st column of A could be found among the elements
>>> of the 1st column of B" i.e.
>>>
>>> for the example above, we shall combine in the results only the row with
>>> "a*b" of A with the row with "a*b::x*y" of B.
>>>
>>> thank you,
>>>
>>> bogdan
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: help with merging two dataframes function of "egrep"-like formulas

Bogdan Tanasa
it looks great, thank you very much Jeff for your time and kind help !

On Wed, Jul 18, 2018 at 7:51 PM, Jeff Newmiller <[hidden email]>
wrote:

> The traditional (SQL) way to attack this problem is to make the data
> structure simpler so that faster comparisons can be utilized:
>
> ################
> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #>     filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #>     intersect, setdiff, setequal, union
> library(tidyr)
> Bx <- (   B
>       %>% mutate( z_B = as.character( z ) )
>       %>% rename( t_B = t )
>       %>% separate_rows( z, sep="::" )
>       )
> Bx
> #>     z t_B      z_B
> #> 1 a*b   1 a*b::x*y
> #> 2 x*y   1 a*b::x*y
> #> 3   c   2        c
> #> 4       3
> #> 5 g*h   4      g*h
> result <- (   A
>           %>% mutate( z = as.character( z ) )
>           %>% rename( t_A = t )
>           %>% inner_join( Bx, by="z" )
>           )
> result
> #>     z t_A t_B      z_B
> #> 1 a*b   1   1 a*b::x*y
>
> #' Created on 2018-07-18 by the [reprex package](http://reprex.tidyver
> se.org) (v0.2.0).
> ################
>
> Note that this is preferable if you can avoid ever creating the complex
> data z in B, but Bx is much more flexible and less error prone than B.
> (Especially if you don't have to create B$z_B at all, but have some other
> unique identifier(s) for the groupings represented by each row in B.)
>
>
> On Wed, 18 Jul 2018, Bogdan Tanasa wrote:
>
> Thanks a lot ! It looks that I am getting the same results with :
>>
>> B %>% regex_left_join(A, by = c(z = 'z'))
>>
>> On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <[hidden email]> wrote:
>>
>> please may I ask for a piece of advise regarding merging two dataframes :
>>>
>>>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>>> function of the criteria :
>>>> if "the elements in the 1st column of A could be found among the
>>>> elements
>>>> of the 1st column of B" i.e.
>>>> for the example above, we shall combine in the results only the row with
>>>> "a*b" of A with the row with "a*b::x*y" of B.
>>>>
>>>
>>>
>>> This may be what you are looking for:
>>>
>>> library(fuzzyjoin)
>>>
>>> The inner join returns just the one row where the string matches.
>>> B %>%
>>>   regex_inner_join(A, by = c(z = 'z'))
>>>
>>> While the full join returns NA's where the string does not match.
>>> B %>%
>>>   regex_full_join(A, by = c(z = 'z'))
>>>
>>> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <[hidden email]> wrote:
>>>
>>> Dear all,
>>>>
>>>> please may I ask for a piece of advise regarding merging two dataframes
>>>> :
>>>>
>>>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
>>>>
>>>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
>>>>
>>>> function of the criteria :
>>>>
>>>> if "the elements in the 1st column of A could be found among the
>>>> elements
>>>> of the 1st column of B" i.e.
>>>>
>>>> for the example above, we shall combine in the results only the row with
>>>> "a*b" of A with the row with "a*b::x*y" of B.
>>>>
>>>> thank you,
>>>>
>>>> bogdan
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ------------------------------------------------------------
> ---------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ------------------------------------------------------------
> ---------------
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.