Populate one data frame with values from another dataframe for rows that match

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Populate one data frame with values from another dataframe for rows that match

kevinkariuki
I'm trying to populate the column “pf_mcl” in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output.

Below is a snapshot of the data.frames.

myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9",
"J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong.

All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on “studyno”, the value should remain the same.

It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes.



______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

Bert Gunter-2
?merge

Bert

On Oct 13, 2017 12:09 PM, "Kevin Wamae" <[hidden email]> wrote:

> I'm trying to populate the column “pf_mcl” in myDF1 with values from
> myDF2, where rows match based on column "studyno" but the solutions I have
> found so far don't seem to be giving me the desired output.
>
> Below is a snapshot of the data.frames.
>
> myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9",
> "J1000/9",
> "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
> 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
> ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
> "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")
>
> myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7",
> "J931/6",
> "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names =
> c("studyno",
> "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")
>
> myDF2 is a well curated subset of myDF1. Some rows in the two datasets
> match based on "studyno", one may find that values are missing in
> myDF1$pf_mcl or the values are wrong.
>
> All I want to do is identify a matching row in myDF2 and populate
> myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based
> on “studyno”, the value should remain the same.
>
> It's probably worth mentioning, the two data frames have other columns...I
> have selected a few for example purposes.
>
>
>
> ______________________________________________________________________
>
> This e-mail contains information which is confidential. It is intended
> only for the use of the named recipient. If you have received this e-mail
> in error, please let us know by replying to the sender, and immediately
> delete it from your system.  Please note, that in these circumstances, the
> use, disclosure, distribution or copying of this information is strictly
> prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility
> for the  accuracy or completeness of this message as it has been
> transmitted over a public network. Although the Programme has taken
> reasonable precautions to ensure no viruses are present in emails, it
> cannot accept responsibility for any loss or damage arising from the use of
> the email or attachments. Any views expressed in this message are those of
> the individual sender, except where the sender specifically states them to
> be the views of KEMRI-Wellcome Trust Programme.
> ______________________________________________________________________
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

Rui Barradas
In reply to this post by kevinkariuki
Hello,

Try the following.


myDF1$studyno <- as.character(myDF1$studyno)
myDF2$studyno <- as.character(myDF2$studyno)
i1 <- which(names(myDF1) == "pf_mcl")

merge(myDF1[-i1], myDF2, by = "studyno")


Hope this helps,

Rui Barradas

Em 13-10-2017 20:09, Kevin Wamae escreveu:

> I'm trying to populate the column “pf_mcl” in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output.
>
> Below is a snapshot of the data.frames.
>
> myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9",
> "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
> 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
> ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
> "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")
>
> myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
> "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
> "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")
>
> myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong.
>
> All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on “studyno”, the value should remain the same.
>
> It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes.
>
>
>
> ______________________________________________________________________
>
> This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
> ______________________________________________________________________
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

kevinkariuki
In reply to this post by Bert Gunter-2
Dear @Bert Gunter<mailto:[hidden email]>, I tried merge and I faced many challenges. @Rui Barradas<mailto:[hidden email]> solution is working.

From: Bert Gunter <[hidden email]>
Date: Friday, 13 October 2017 at 22:44
To: Kevin Wamae <[hidden email]>
Cc: R-help <[hidden email]>
Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match

?merge

Bert

On Oct 13, 2017 12:09 PM, "Kevin Wamae" <[hidden email]<mailto:[hidden email]>> wrote:
I'm trying to populate the column “pf_mcl” in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output.

Below is a snapshot of the data.frames.

myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9",
"J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong.

All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on “studyno”, the value should remain the same.

It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes.



______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

kevinkariuki
In reply to this post by Rui Barradas
Dear @Rui Barradas, thank you for the solution. It works perfectly.


On 13/10/2017, 23:35, "Rui Barradas" <[hidden email]> wrote:

    Hello,
   
    Try the following.
   
   
    myDF1$studyno <- as.character(myDF1$studyno)
    myDF2$studyno <- as.character(myDF2$studyno)
    i1 <- which(names(myDF1) == "pf_mcl")
   
    merge(myDF1[-i1], myDF2, by = "studyno")
   
   
    Hope this helps,
   
    Rui Barradas
   
    Em 13-10-2017 20:09, Kevin Wamae escreveu:
    > I'm trying to populate the column “pf_mcl” in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output.
    >
    > Below is a snapshot of the data.frames.
    >
    > myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9",
    > "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
    > 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
    > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
    > ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
    > "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")
    >
    > myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
    > "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
    > "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")
    >
    > myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong.
    >
    > All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on “studyno”, the value should remain the same.
    >
    > It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes.
    >
    >
    >
    > ______________________________________________________________________
    >
    > This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
    > ______________________________________________________________________
    >
    > [[alternative HTML version deleted]]
    >
    > ______________________________________________
    > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.
    >
   


______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

Eric Berger
Hi Kevin,
I think there are issues with Rui's proposed solution. For example, if
there are rows in myDF1 which have a studyno
which does not match any row in myDF2, then you will lose those rows. In
your original request you said that you wanted to keep those rows.

To demonstrate my point I need to modify your sample data. Specifically, I
changed some studyno settings in myDF1, and also the entries of pf_mcl in
myDF1.

myDF1 <- structure(list(studyno = c("J1000/8", "J1000/9", "J1000/9",
"J1000/9",
"J1000/5", "J1000/6"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(1:6
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names =
c("studyno",
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

#Rui's proposal gives the following result
# studyno       date year pf_mcl
# 1 J1000/9 2016-11-22 2016      0
# 2 J1000/9 2016-11-30 2016      0
# 3 J1000/9 2016-12-09 2016      0

My proposal

library(dplyr)

myDF1$studyno <- as.character(myDF1$studyno)
myDF2$studyno <- as.character(myDF2$studyno)
myDF3 <- merge(myDF1, myDF2, by="studyno", all.x=TRUE ) %>%
                dplyr::mutate( pf_mcl = ifelse( is.na(pf_mcl.y), pf_mcl.x,
pf_mcl.y ) ) %>%
                dplyr::select( studyno, date, pf_mcl )

# The results of this approach
#  studyno       date pf_mcl
# 1 J1000/5 2016-12-13      5
# 2 J1000/6 2016-12-20      6
# 3 J1000/8 2016-11-18      1
# 4 J1000/9 2016-11-22      0
# 5 J1000/9 2016-11-30      0
# 6 J1000/9 2016-12-09      0

Comparing the two results you see that no rows have been dropped in my
approach.

HTH,

Eric





On Sat, Oct 14, 2017 at 8:49 AM, Kevin Wamae <[hidden email]>
wrote:

> Dear @Rui Barradas, thank you for the solution. It works perfectly.
>
>
> On 13/10/2017, 23:35, "Rui Barradas" <[hidden email]> wrote:
>
>     Hello,
>
>     Try the following.
>
>
>     myDF1$studyno <- as.character(myDF1$studyno)
>     myDF2$studyno <- as.character(myDF2$studyno)
>     i1 <- which(names(myDF1) == "pf_mcl")
>
>     merge(myDF1[-i1], myDF2, by = "studyno")
>
>
>     Hope this helps,
>
>     Rui Barradas
>
>     Em 13-10-2017 20:09, Kevin Wamae escreveu:
>     > I'm trying to populate the column “pf_mcl” in myDF1 with values from
> myDF2, where rows match based on column "studyno" but the solutions I have
> found so far don't seem to be giving me the desired output.
>     >
>     > Below is a snapshot of the data.frames.
>     >
>     > myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9",
> "J1000/9",
>     > "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
>     > 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
>     > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
>     > ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names =
> c("studyno",
>     > "date", "pf_mcl", "year"), row.names = c(NA, 6L), class =
> "data.frame")
>     >
>     > myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7",
> "J931/6",
>     > "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names =
> c("studyno",
>     > "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")
>     >
>     > myDF2 is a well curated subset of myDF1. Some rows in the two
> datasets match based on "studyno", one may find that values are missing in
> myDF1$pf_mcl or the values are wrong.
>     >
>     > All I want to do is identify a matching row in myDF2 and populate
> myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based
> on “studyno”, the value should remain the same.
>     >
>     > It's probably worth mentioning, the two data frames have other
> columns...I have selected a few for example purposes.
>     >
>     >
>     >
>     > ____________________________________________________________
> __________
>     >
>     > This e-mail contains information which is confidential. It is
> intended only for the use of the named recipient. If you have received this
> e-mail in error, please let us know by replying to the sender, and
> immediately delete it from your system.  Please note, that in these
> circumstances, the use, disclosure, distribution or copying of this
> information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot
> accept any responsibility for the  accuracy or completeness of this message
> as it has been transmitted over a public network. Although the Programme
> has taken reasonable precautions to ensure no viruses are present in
> emails, it cannot accept responsibility for any loss or damage arising from
> the use of the email or attachments. Any views expressed in this message
> are those of the individual sender, except where the sender specifically
> states them to be the views of KEMRI-Wellcome Trust Programme.
>     > ____________________________________________________________
> __________
>     >
>     >   [[alternative HTML version deleted]]
>     >
>     > ______________________________________________
>     > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>     >
>
>
>
> ______________________________________________________________________
>
> This e-mail contains information which is confidential. It is intended
> only for the use of the named recipient. If you have received this e-mail
> in error, please let us know by replying to the sender, and immediately
> delete it from your system.  Please note, that in these circumstances, the
> use, disclosure, distribution or copying of this information is strictly
> prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility
> for the  accuracy or completeness of this message as it has been
> transmitted over a public network. Although the Programme has taken
> reasonable precautions to ensure no viruses are present in emails, it
> cannot accept responsibility for any loss or damage arising from the use of
> the email or attachments. Any views expressed in this message are those of
> the individual sender, except where the sender specifically states them to
> be the views of KEMRI-Wellcome Trust Programme.
> ______________________________________________________________________
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

kevinkariuki
Dear @Eric<mailto:[hidden email]>, thank you so very much for noticing that. When I tested @Rui<mailto:[hidden email]>’s solution, it was on a smaller dataset that had purely matching rows. I had considered including non-matching rows to evaluate what the alternative would be.

Also, I hadn’t even tested it on the larger dataset. I have now and noticed that it went further to omit rows that did not match, just like you said.

Your proposed solution works well.

Much appreciated…I’ll get in touch in case I encounter any problems


From: Eric Berger <[hidden email]>
Date: Saturday, 14 October 2017 at 12:43
To: Kevin Wamae <[hidden email]>
Cc: Rui Barradas <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match

Hi Kevin,
I think there are issues with Rui's proposed solution. For example, if there are rows in myDF1 which have a studyno
which does not match any row in myDF2, then you will lose those rows. In your original request you said that you wanted to keep those rows.

To demonstrate my point I need to modify your sample data. Specifically, I changed some studyno settings in myDF1, and also the entries of pf_mcl in myDF1.

myDF1 <- structure(list(studyno = c("J1000/8", "J1000/9", "J1000/9", "J1000/9",
"J1000/5", "J1000/6"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(1:6
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

#Rui's proposal gives the following result
# studyno       date year pf_mcl
# 1 J1000/9 2016-11-22 2016      0
# 2 J1000/9 2016-11-30 2016      0
# 3 J1000/9 2016-12-09 2016      0

My proposal

library(dplyr)

myDF1$studyno <- as.character(myDF1$studyno)
myDF2$studyno <- as.character(myDF2$studyno)
myDF3 <- merge(myDF1, myDF2, by="studyno", all.x=TRUE ) %>%
                dplyr::mutate( pf_mcl = ifelse( is.na<http://is.na>(pf_mcl.y), pf_mcl.x, pf_mcl.y ) ) %>%
                dplyr::select( studyno, date, pf_mcl )

# The results of this approach
#  studyno       date pf_mcl
# 1 J1000/5 2016-12-13      5
# 2 J1000/6 2016-12-20      6
# 3 J1000/8 2016-11-18      1
# 4 J1000/9 2016-11-22      0
# 5 J1000/9 2016-11-30      0
# 6 J1000/9 2016-12-09      0

Comparing the two results you see that no rows have been dropped in my approach.

HTH,

Eric





On Sat, Oct 14, 2017 at 8:49 AM, Kevin Wamae <[hidden email]<mailto:[hidden email]>> wrote:
Dear @Rui Barradas, thank you for the solution. It works perfectly.


On 13/10/2017, 23:35, "Rui Barradas" <[hidden email]<mailto:[hidden email]>> wrote:

    Hello,

    Try the following.


    myDF1$studyno <- as.character(myDF1$studyno)
    myDF2$studyno <- as.character(myDF2$studyno)
    i1 <- which(names(myDF1) == "pf_mcl")

    merge(myDF1[-i1], myDF2, by = "studyno")


    Hope this helps,

    Rui Barradas

    Em 13-10-2017 20:09, Kevin Wamae escreveu:
    > I'm trying to populate the column “pf_mcl” in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output.
    >
    > Below is a snapshot of the data.frames.
    >
    > myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9",
    > "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
    > 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
    > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
    > ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
    > "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")
    >
    > myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
    > "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
    > "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")
    >
    > myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong.
    >
    > All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on “studyno”, the value should remain the same.
    >
    > It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes.
    >
    >
    >
    > ______________________________________________________________________
    >
    > This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
    > ______________________________________________________________________
    >
    >   [[alternative HTML version deleted]]
    >
    > ______________________________________________
    > [hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.
    >



______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________
______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

R help mailing list-2
In reply to this post by kevinkariuki
Your example used one distinct studyno in DF1 and one distinct pf_mcl in
DF2.  I think that makes it hard to see what is going on, but maybe I
completely misunderstand the problem.  In any case, let's redefine myDF1
and myDF2.  Note that myDF1 contains a studyno not in myDF2 and vice versa.

myDF1 <- structure(list(studyno = c("J1000/9", "J895/7", "J931/6", "J666/6",
"J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
2L, 3L, 4L, 5L, NA_integer_
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(101L, 102L, 103L, 104L, 105L, 106L)),
.Names = c("studyno", "pf_mcl"), row.names = c(NA, 6L), class =
"data.frame")

m <- merge(myDF1, myDF2, by="studyno", all.x=TRUE, all.y=FALSE,
suffixes=c(".raw", ".curated"))


The results are:

> myDF1
  studyno       date pf_mcl year
1 J1000/9 2016-11-18     NA 2016
2  J895/7 2016-11-22      2 2016
3  J931/6 2016-11-30      3 2016
4  J666/6 2016-12-09      4 2016
5 J1000/9 2016-12-13      5 2016
6 J1000/9 2016-12-20     NA 2016
> myDF2
  studyno pf_mcl
1  J740/4    101
2 J1000/9    102
3  J895/7    103
4  J931/6    104
5  J609/1    105
6  J941/3    106
> m
  studyno       date pf_mcl.raw year pf_mcl.curated
1 J1000/9 2016-11-18         NA 2016            102
2 J1000/9 2016-12-13          5 2016            102
3 J1000/9 2016-12-20         NA 2016            102
4  J666/6 2016-12-09          4 2016             NA
5  J895/7 2016-11-22          2 2016            103
6  J931/6 2016-11-30          3 2016            104


Now your problem is to combine the columns pf_mcl.raw and pf_mcl.curated in
the way you want.  ifelse() may be useful for that.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 13, 2017 at 10:48 PM, Kevin Wamae <[hidden email]>
wrote:

> Dear @Bert Gunter<mailto:[hidden email]>, I tried merge and I
> faced many challenges. @Rui Barradas<mailto:[hidden email]>
> solution is working.
>
> From: Bert Gunter <[hidden email]>
> Date: Friday, 13 October 2017 at 22:44
> To: Kevin Wamae <[hidden email]>
> Cc: R-help <[hidden email]>
> Subject: Re: [R] Populate one data frame with values from another
> dataframe for rows that match
>
> ?merge
>
> Bert
>
> On Oct 13, 2017 12:09 PM, "Kevin Wamae" <[hidden email]<mailto:
> [hidden email]>> wrote:
> I'm trying to populate the column “pf_mcl” in myDF1 with values from
> myDF2, where rows match based on column "studyno" but the solutions I have
> found so far don't seem to be giving me the desired output.
>
> Below is a snapshot of the data.frames.
>
> myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9",
> "J1000/9",
> "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
> 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
> ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
> "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")
>
> myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7",
> "J931/6",
> "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names =
> c("studyno",
> "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")
>
> myDF2 is a well curated subset of myDF1. Some rows in the two datasets
> match based on "studyno", one may find that values are missing in
> myDF1$pf_mcl or the values are wrong.
>
> All I want to do is identify a matching row in myDF2 and populate
> myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based
> on “studyno”, the value should remain the same.
>
> It's probably worth mentioning, the two data frames have other columns...I
> have selected a few for example purposes.
>
>
>
> ______________________________________________________________________
>
> This e-mail contains information which is confidential. It is intended
> only for the use of the named recipient. If you have received this e-mail
> in error, please let us know by replying to the sender, and immediately
> delete it from your system.  Please note, that in these circumstances, the
> use, disclosure, distribution or copying of this information is strictly
> prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility
> for the  accuracy or completeness of this message as it has been
> transmitted over a public network. Although the Programme has taken
> reasonable precautions to ensure no viruses are present in emails, it
> cannot accept responsibility for any loss or damage arising from the use of
> the email or attachments. Any views expressed in this message are those of
> the individual sender, except where the sender specifically states them to
> be the views of KEMRI-Wellcome Trust Programme.
> ______________________________________________________________________
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list -- To
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________________________________
>
> This e-mail contains information which is confidential. It is intended
> only for the use of the named recipient. If you have received this e-mail
> in error, please let us know by replying to the sender, and immediately
> delete it from your system.  Please note, that in these circumstances, the
> use, disclosure, distribution or copying of this information is strictly
> prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility
> for the  accuracy or completeness of this message as it has been
> transmitted over a public network. Although the Programme has taken
> reasonable precautions to ensure no viruses are present in emails, it
> cannot accept responsibility for any loss or damage arising from the use of
> the email or attachments. Any views expressed in this message are those of
> the individual sender, except where the sender specifically states them to
> be the views of KEMRI-Wellcome Trust Programme.
> ______________________________________________________________________
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

kevinkariuki
Dear @William<mailto:[hidden email]>, thanks for the feedback. I have tested it on the larger dataset and noticed that it created two variables, pf_raw and pf_curated.

The output we were looking for, was one that takes the variable pf_mcl in curated dataset and replaces pf_mcl in matching rows within the raw dataset.

@Eric<mailto:[hidden email]>’s solution was able to achieve that. Nonetheless, we do appreciate your solution.

Regards
------------------
Kevin Wamae

From: William Dunlap <[hidden email]>
Date: Saturday, 14 October 2017 at 20:21
To: Kevin Wamae <[hidden email]>
Cc: Bert Gunter <[hidden email]>, Rui Barradas <[hidden email]>, R-help <[hidden email]>
Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match

Your example used one distinct studyno in DF1 and one distinct pf_mcl in DF2.  I think that makes it hard to see what is going on, but maybe I completely misunderstand the problem.  In any case, let's redefine myDF1 and myDF2.  Note that myDF1 contains a studyno not in myDF2 and vice versa.

myDF1 <- structure(list(studyno = c("J1000/9", "J895/7", "J931/6", "J666/6",
"J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
2L, 3L, 4L, 5L, NA_integer_
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(101L, 102L, 103L, 104L, 105L, 106L)),
.Names = c("studyno", "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

m <- merge(myDF1, myDF2, by="studyno", all.x=TRUE, all.y=FALSE, suffixes=c(".raw", ".curated"))


The results are:

> myDF1
  studyno       date pf_mcl year
1 J1000/9 2016-11-18     NA 2016
2  J895/7 2016-11-22      2 2016
3  J931/6 2016-11-30      3 2016
4  J666/6 2016-12-09      4 2016
5 J1000/9 2016-12-13      5 2016
6 J1000/9 2016-12-20     NA 2016
> myDF2
  studyno pf_mcl
1  J740/4    101
2 J1000/9    102
3  J895/7    103
4  J931/6    104
5  J609/1    105
6  J941/3    106
> m
  studyno       date pf_mcl.raw year pf_mcl.curated
1 J1000/9 2016-11-18         NA 2016            102
2 J1000/9 2016-12-13          5 2016            102
3 J1000/9 2016-12-20         NA 2016            102
4  J666/6 2016-12-09          4 2016             NA
5  J895/7 2016-11-22          2 2016            103
6  J931/6 2016-11-30          3 2016            104


Now your problem is to combine the columns pf_mcl.raw and pf_mcl.curated in the way you want.  ifelse() may be useful for that.


Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>

On Fri, Oct 13, 2017 at 10:48 PM, Kevin Wamae <[hidden email]<mailto:[hidden email]>> wrote:
Dear @Bert Gunter<mailto:[hidden email]<mailto:[hidden email]>>, I tried merge and I faced many challenges. @Rui Barradas<mailto:[hidden email]<mailto:[hidden email]>> solution is working.

From: Bert Gunter <[hidden email]<mailto:[hidden email]>>
Date: Friday, 13 October 2017 at 22:44
To: Kevin Wamae <[hidden email]<mailto:[hidden email]>>
Cc: R-help <[hidden email]<mailto:[hidden email]>>
Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match

?merge

Bert

On Oct 13, 2017 12:09 PM, "Kevin Wamae" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
I'm trying to populate the column “pf_mcl” in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output.

Below is a snapshot of the data.frames.

myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9",
"J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong.

All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on “studyno”, the value should remain the same.

It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes.



______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Populate one data frame with values from another dataframe for rows that match

kevinkariuki
Pardon me, here’s @Eric<mailto:[hidden email]>’s solution…

myDF1$studyno <- as.character(myDF1$studyno)
myDF2$studyno <- as.character(myDF2$studyno)
myDF3 <- merge(myDF1, myDF2, by="studyno", all.x=TRUE ) %>%
                dplyr::mutate( pf_mcl = ifelse( is.na<http://is.na/>(pf_mcl.y), pf_mcl.x, pf_mcl.y ) ) %>%
                dplyr::select( studyno, date, pf_mcl )

Regards
------------------
Kevin Wamae

From: Kevin Wamae <[hidden email]>
Date: Sunday, 15 October 2017 at 14:03
To: William Dunlap <[hidden email]>
Cc: Bert Gunter <[hidden email]>, Rui Barradas <[hidden email]>, Eric Berger <[hidden email]>, R-help <[hidden email]>
Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match

Dear @William<mailto:[hidden email]>, thanks for the feedback. I have tested it on the larger dataset and noticed that it created two variables, pf_raw and pf_curated.

The output we were looking for, was one that takes the variable pf_mcl in curated dataset and replaces pf_mcl in matching rows within the raw dataset.

@Eric<mailto:[hidden email]>’s solution was able to achieve that. Nonetheless, we do appreciate your solution.

Regards
------------------
Kevin Wamae

From: William Dunlap <[hidden email]>
Date: Saturday, 14 October 2017 at 20:21
To: Kevin Wamae <[hidden email]>
Cc: Bert Gunter <[hidden email]>, Rui Barradas <[hidden email]>, R-help <[hidden email]>
Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match

Your example used one distinct studyno in DF1 and one distinct pf_mcl in DF2.  I think that makes it hard to see what is going on, but maybe I completely misunderstand the problem.  In any case, let's redefine myDF1 and myDF2.  Note that myDF1 contains a studyno not in myDF2 and vice versa.

myDF1 <- structure(list(studyno = c("J1000/9", "J895/7", "J931/6", "J666/6",
"J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
2L, 3L, 4L, 5L, NA_integer_
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(101L, 102L, 103L, 104L, 105L, 106L)),
.Names = c("studyno", "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

m <- merge(myDF1, myDF2, by="studyno", all.x=TRUE, all.y=FALSE, suffixes=c(".raw", ".curated"))


The results are:

> myDF1
  studyno       date pf_mcl year
1 J1000/9 2016-11-18     NA 2016
2  J895/7 2016-11-22      2 2016
3  J931/6 2016-11-30      3 2016
4  J666/6 2016-12-09      4 2016
5 J1000/9 2016-12-13      5 2016
6 J1000/9 2016-12-20     NA 2016
> myDF2
  studyno pf_mcl
1  J740/4    101
2 J1000/9    102
3  J895/7    103
4  J931/6    104
5  J609/1    105
6  J941/3    106
> m
  studyno       date pf_mcl.raw year pf_mcl.curated
1 J1000/9 2016-11-18         NA 2016            102
2 J1000/9 2016-12-13          5 2016            102
3 J1000/9 2016-12-20         NA 2016            102
4  J666/6 2016-12-09          4 2016             NA
5  J895/7 2016-11-22          2 2016            103
6  J931/6 2016-11-30          3 2016            104


Now your problem is to combine the columns pf_mcl.raw and pf_mcl.curated in the way you want.  ifelse() may be useful for that.


Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>

On Fri, Oct 13, 2017 at 10:48 PM, Kevin Wamae <[hidden email]<mailto:[hidden email]>> wrote:
Dear @Bert Gunter<mailto:[hidden email]<mailto:[hidden email]>>, I tried merge and I faced many challenges. @Rui Barradas<mailto:[hidden email]<mailto:[hidden email]>> solution is working.

From: Bert Gunter <[hidden email]<mailto:[hidden email]>>
Date: Friday, 13 October 2017 at 22:44
To: Kevin Wamae <[hidden email]<mailto:[hidden email]>>
Cc: R-help <[hidden email]<mailto:[hidden email]>>
Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match

?merge

Bert

On Oct 13, 2017 12:09 PM, "Kevin Wamae" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
I'm trying to populate the column “pf_mcl” in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output.

Below is a snapshot of the data.frames.

myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9",
"J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno",
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong.

All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on “studyno”, the value should remain the same.

It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes.



______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.