Dear all,
please may I ask for a piece of advise regarding merging two dataframes : A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) function of the criteria : if "the elements in the 1st column of A could be found among the elements of the 1st column of B" i.e. for the example above, we shall combine in the results only the row with "a*b" of A with the row with "a*b::x*y" of B. thank you, bogdan [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Riley,
thank you very much for your help and solution. I got some inspiration from stackoverflow website, and I did use sqldf library. It looks that the formula below works too. Thanks a lot ! sqldf("select B.*, A.* from B left join A on instr(B.z, A.z)") On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <[hidden email]> wrote: > please may I ask for a piece of advise regarding merging two dataframes : >> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) >> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) >> function of the criteria : >> if "the elements in the 1st column of A could be found among the elements >> of the 1st column of B" i.e. >> for the example above, we shall combine in the results only the row with >> "a*b" of A with the row with "a*b::x*y" of B. > > > This may be what you are looking for: > > library(fuzzyjoin) > > The inner join returns just the one row where the string matches. > B %>% > regex_inner_join(A, by = c(z = 'z')) > > While the full join returns NA's where the string does not match. > B %>% > regex_full_join(A, by = c(z = 'z')) > > On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <[hidden email]> wrote: > >> Dear all, >> >> please may I ask for a piece of advise regarding merging two dataframes : >> >> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) >> >> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) >> >> function of the criteria : >> >> if "the elements in the 1st column of A could be found among the elements >> of the 1st column of B" i.e. >> >> for the example above, we shall combine in the results only the row with >> "a*b" of A with the row with "a*b::x*y" of B. >> >> thank you, >> >> bogdan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Bogdan Tanasa
Thanks a lot ! It looks that I am getting the same results with :
B %>% regex_left_join(A, by = c(z = 'z')) On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <[hidden email]> wrote: > please may I ask for a piece of advise regarding merging two dataframes : >> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) >> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) >> function of the criteria : >> if "the elements in the 1st column of A could be found among the elements >> of the 1st column of B" i.e. >> for the example above, we shall combine in the results only the row with >> "a*b" of A with the row with "a*b::x*y" of B. > > > This may be what you are looking for: > > library(fuzzyjoin) > > The inner join returns just the one row where the string matches. > B %>% > regex_inner_join(A, by = c(z = 'z')) > > While the full join returns NA's where the string does not match. > B %>% > regex_full_join(A, by = c(z = 'z')) > > On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <[hidden email]> wrote: > >> Dear all, >> >> please may I ask for a piece of advise regarding merging two dataframes : >> >> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) >> >> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) >> >> function of the criteria : >> >> if "the elements in the 1st column of A could be found among the elements >> of the 1st column of B" i.e. >> >> for the example above, we shall combine in the results only the row with >> "a*b" of A with the row with "a*b::x*y" of B. >> >> thank you, >> >> bogdan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
The traditional (SQL) way to attack this problem is to make the data
structure simpler so that faster comparisons can be utilized: ################ A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(tidyr) Bx <- ( B %>% mutate( z_B = as.character( z ) ) %>% rename( t_B = t ) %>% separate_rows( z, sep="::" ) ) Bx #> z t_B z_B #> 1 a*b 1 a*b::x*y #> 2 x*y 1 a*b::x*y #> 3 c 2 c #> 4 3 #> 5 g*h 4 g*h result <- ( A %>% mutate( z = as.character( z ) ) %>% rename( t_A = t ) %>% inner_join( Bx, by="z" ) ) result #> z t_A t_B z_B #> 1 a*b 1 1 a*b::x*y #' Created on 2018-07-18 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0). ################ Note that this is preferable if you can avoid ever creating the complex data z in B, but Bx is much more flexible and less error prone than B. (Especially if you don't have to create B$z_B at all, but have some other unique identifier(s) for the groupings represented by each row in B.) On Wed, 18 Jul 2018, Bogdan Tanasa wrote: > Thanks a lot ! It looks that I am getting the same results with : > > B %>% regex_left_join(A, by = c(z = 'z')) > > On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <[hidden email]> wrote: > >> please may I ask for a piece of advise regarding merging two dataframes : >>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) >>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) >>> function of the criteria : >>> if "the elements in the 1st column of A could be found among the elements >>> of the 1st column of B" i.e. >>> for the example above, we shall combine in the results only the row with >>> "a*b" of A with the row with "a*b::x*y" of B. >> >> >> This may be what you are looking for: >> >> library(fuzzyjoin) >> >> The inner join returns just the one row where the string matches. >> B %>% >> regex_inner_join(A, by = c(z = 'z')) >> >> While the full join returns NA's where the string does not match. >> B %>% >> regex_full_join(A, by = c(z = 'z')) >> >> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <[hidden email]> wrote: >> >>> Dear all, >>> >>> please may I ask for a piece of advise regarding merging two dataframes : >>> >>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) >>> >>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) >>> >>> function of the criteria : >>> >>> if "the elements in the 1st column of A could be found among the elements >>> of the 1st column of B" i.e. >>> >>> for the example above, we shall combine in the results only the row with >>> "a*b" of A with the row with "a*b::x*y" of B. >>> >>> thank you, >>> >>> bogdan >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/ >>> posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
it looks great, thank you very much Jeff for your time and kind help !
On Wed, Jul 18, 2018 at 7:51 PM, Jeff Newmiller <[hidden email]> wrote: > The traditional (SQL) way to attack this problem is to make the data > structure simpler so that faster comparisons can be utilized: > > ################ > A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) > B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) > > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > library(tidyr) > Bx <- ( B > %>% mutate( z_B = as.character( z ) ) > %>% rename( t_B = t ) > %>% separate_rows( z, sep="::" ) > ) > Bx > #> z t_B z_B > #> 1 a*b 1 a*b::x*y > #> 2 x*y 1 a*b::x*y > #> 3 c 2 c > #> 4 3 > #> 5 g*h 4 g*h > result <- ( A > %>% mutate( z = as.character( z ) ) > %>% rename( t_A = t ) > %>% inner_join( Bx, by="z" ) > ) > result > #> z t_A t_B z_B > #> 1 a*b 1 1 a*b::x*y > > #' Created on 2018-07-18 by the [reprex package](http://reprex.tidyver > se.org) (v0.2.0). > ################ > > Note that this is preferable if you can avoid ever creating the complex > data z in B, but Bx is much more flexible and less error prone than B. > (Especially if you don't have to create B$z_B at all, but have some other > unique identifier(s) for the groupings represented by each row in B.) > > > On Wed, 18 Jul 2018, Bogdan Tanasa wrote: > > Thanks a lot ! It looks that I am getting the same results with : >> >> B %>% regex_left_join(A, by = c(z = 'z')) >> >> On Wed, Jul 18, 2018 at 3:57 PM, Riley Finn <[hidden email]> wrote: >> >> please may I ask for a piece of advise regarding merging two dataframes : >>> >>>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) >>>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) >>>> function of the criteria : >>>> if "the elements in the 1st column of A could be found among the >>>> elements >>>> of the 1st column of B" i.e. >>>> for the example above, we shall combine in the results only the row with >>>> "a*b" of A with the row with "a*b::x*y" of B. >>>> >>> >>> >>> This may be what you are looking for: >>> >>> library(fuzzyjoin) >>> >>> The inner join returns just the one row where the string matches. >>> B %>% >>> regex_inner_join(A, by = c(z = 'z')) >>> >>> While the full join returns NA's where the string does not match. >>> B %>% >>> regex_full_join(A, by = c(z = 'z')) >>> >>> On Wed, Jul 18, 2018 at 5:20 PM Bogdan Tanasa <[hidden email]> wrote: >>> >>> Dear all, >>>> >>>> please may I ask for a piece of advise regarding merging two dataframes >>>> : >>>> >>>> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4)) >>>> >>>> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4)) >>>> >>>> function of the criteria : >>>> >>>> if "the elements in the 1st column of A could be found among the >>>> elements >>>> of the 1st column of B" i.e. >>>> >>>> for the example above, we shall combine in the results only the row with >>>> "a*b" of A with the row with "a*b::x*y" of B. >>>> >>>> thank you, >>>> >>>> bogdan >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/ >>>> posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > ------------------------------------------------------------ > --------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<[hidden email]> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ------------------------------------------------------------ > --------------- > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |