Partial LookUP

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Partial LookUP

gkchimz
I am working n R, using R studio,
I have a dataframe with 4 columns. Column A contains passenger iD, B contains passenger name, C contains husband name.
I am attempting to create a new column which look to see if the husband name in column C is listed in any of the records in column B. If so it should then return to me the passenger iD of the husband from column A.
To make things more complicated, as in the first example in some cases, the husband's given in column C might not include the his second name, which would be included in column B.

Reproducible Example
library(stringr)
rm(list=ls())
passengerid <- c(0908,9883,7767,3302)

Name<- c("Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",
          "Backstrom, Mr. Karl Alfred John",
          "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
          "Cumings, Mr. John Bradley")

HusbandName <- c("Backstrom, Mr. Karl Alfred","","Cumings, Mr. John
Bradley","")



df1<- data.frame(cbind(passengerid,Name,HusbandName))
df1$Name <- as.character(df1$Name)
df1$HusbandName <- as.character(df1$HusbandName)

I have tried using Stringr, but facing problems because 1)I need the code to look at only 1 element of the vector HusbandName and search for it in the whole vector Name. 2) I found it difficult to use regular expressions given that the pattern I am looking for is vectorised (as HusbandName)
This is what I have tried so far:

Attempt 1 - only finds exact matches & doesn't return the passengerID & doesn't add column to df
df1$Husbandid < - for (i in 1:NROW(df1$HusbandName)) {
print(HusbandName[i] %in% Name)}


Attempt 2 - finds partial matches, but does not ignore blanks & does not tell me passenger id & doesn't add column to df
df1$Husbandid <- for (i in 1:NROW(df1$HusbandName)) {
print(which(str_detect(df1$Name,df1$HusbandName[i])))}


#Attempt 3 - almost works but - the printed results are different from those added into the dataframe as a new column. how can i correct for this? Ultimately I need the ones in the df to be correct. the error is that those without husbands are showing husbandiD when this should be blank or na. can this be corrected or is there a way to convert the output of the for loop into a vector we can add to the df?
for (i in 1:NROW(df1$HusbandName)) {
     if (df1$HusbandName[i] =="") {
      print("Man") & next()
      }
    FoundHusbandNames<- c(which(str_detect(df1$Name,df1$HusbandName[i])))
    print(df1$passengerid[FoundHusbandNames]) -> df1$Husbandid[i] }


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Partial LookUP

PIKAL Petr
Hi

I did not see any answer so I try to generate some answer.
It seems to me that your second attempt was quite close.

If passengerid was numeric, following code could probably give you the required result.

res <- rep(NA, nrow(df1))
for (i in 1:NROW(df1)) {
sel <- which(str_detect(df1$Name,coll(df1$HusbandName[i])))
if (length(sel) > 0) { res[i] <- df1$passengerid[sel]}
}

res should contain passengerid for each relevant line and NA if there is no match. You just could add it to your data frame as a new column.

The problem is that although you provide "a kind of" example, HTML format probably scrambled it somehow. Better is to use dput for sending test data and not  use HTML formating.

This is data frame I got from your mail.

> dput(df1)
structure(list(passengerid = structure(c(3L, 4L, 2L, 1L), .Label = c("3302",
"7767", "908", "9883"), class = "factor"), Name = c("Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",
"Backstrom, Mr. Karl Alfred John", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
"Cumings, Mr. John Bradley"), HusbandName = c("Backstrom, Mr. Karl Alfred",
"", "Cumings, Mr. John\nBradley", "")), row.names = c(NA, -4L
), class = "data.frame")

Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of gary chimuzinga
> Sent: Tuesday, November 20, 2018 5:06 PM
> To: [hidden email]
> Subject: [R] Partial LookUP
>
> I am working n R, using R studio,
> I have a dataframe with 4 columns. Column A contains passenger iD, B contains
> passenger name, C contains husband name.
> I am attempting to create a new column which look to see if the husband name
> in column C is listed in any of the records in column B. If so it should then
> return to me the passenger iD of the husband from column A.
> To make things more complicated, as in the first example in some cases, the
> husband's given in column C might not include the his second name, which
> would be included in column B.
>
> Reproducible Example
> library(stringr)
> rm(list=ls())
> passengerid <- c(0908,9883,7767,3302)
>
> Name<- c("Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",
>           "Backstrom, Mr. Karl Alfred John",
>           "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
>           "Cumings, Mr. John Bradley")
>
> HusbandName <- c("Backstrom, Mr. Karl Alfred","","Cumings, Mr. John
> Bradley","")
>
>
>
> df1<- data.frame(cbind(passengerid,Name,HusbandName))
> df1$Name <- as.character(df1$Name)
> df1$HusbandName <- as.character(df1$HusbandName)
>
> I have tried using Stringr, but facing problems because 1)I need the code to look
> at only 1 element of the vector HusbandName and search for it in the whole
> vector Name. 2) I found it difficult to use regular expressions given that the
> pattern I am looking for is vectorised (as HusbandName)
> This is what I have tried so far:
>
> Attempt 1 - only finds exact matches & doesn't return the passengerID &
> doesn't add column to df
> df1$Husbandid < - for (i in 1:NROW(df1$HusbandName)) {
> print(HusbandName[i] %in% Name)}
>
>
> Attempt 2 - finds partial matches, but does not ignore blanks & does not tell
> me passenger id & doesn't add column to df
> df1$Husbandid <- for (i in 1:NROW(df1$HusbandName)) {
> print(which(str_detect(df1$Name,df1$HusbandName[i])))}
>
>
> #Attempt 3 - almost works but - the printed results are different from those
> added into the dataframe as a new column. how can i correct for this?
> Ultimately I need the ones in the df to be correct. the error is that those
> without husbands are showing husbandiD when this should be blank or na. can
> this be corrected or is there a way to convert the output of the for loop into a
> vector we can add to the df?
> for (i in 1:NROW(df1$HusbandName)) {
>      if (df1$HusbandName[i] =="") {
>       print("Man") & next()
>       }
>     FoundHusbandNames<-
> c(which(str_detect(df1$Name,df1$HusbandName[i])))
>     print(df1$passengerid[FoundHusbandNames]) -> df1$Husbandid[i] }
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.