Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ)
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

R. Mark Sharp
Pradip,

mutate() works on the entire column as a vector so that you find the maximum of the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 <- data1
data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

data3
  id    mrjdate    cocdate    inhdate    haldate    oidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: [hidden email]





NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Daniel Nordlund-4
In reply to this post by Muhuri, Pradip (SAMHSA/CBHSQ)
On 11/8/2014 8:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
>
>
> The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).
>
>
>
> I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.
>
>
>
> Thanks in advance.
>
>
>
> Pradip
>
>
>
>
>
> ######  from the console ########
>
> print (data2)
>
>    id    mrjdate    cocdate    inhdate    haldate    oidflag
>
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04
>
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04
>
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04
>
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04
>
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04
>
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
>
>
>
>
> ##################  Reproducible code and data #####################################
>
>
>
> library(dplyr)
>
> library(lubridate)
>
> library(zoo)
>
> # data object - description of the
>
>
>
> temp <- "id  mrjdate cocdate inhdate haldate
>
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>
> 2             NA         NA         NA         NA
>
> 3     2009-10-24         NA 2011-10-13         NA
>
> 4     2007-10-10         NA         NA         NA
>
> 5     2006-09-01 2005-08-10         NA         NA
>
> 6     2007-09-04 2011-10-05         NA         NA
>
> 7     2005-10-25         NA         NA 2011-11-04"
>
>
>
> # read the data object
>
>
>
> data1 <- read.table(textConnection(temp),
>
>                      colClasses=c("character", "Date", "Date", "Date", "Date"),
>
>                      header=TRUE, as.is=TRUE
>
>                      )
>
> # create a new column
>
>
>
> data2 <- mutate(data1,
>
>                  oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,
>
>                                    max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )
>
>                                  )
>
>                  )
>
>
>
> # convert to date
>
> data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")
>
>
>
> # print records
>
>
>
> print (data2)
>
>
>
>
>
> Pradip K. Muhuri, PhD
>
> SAMHSA/CBHSQ
>
> 1 Choke Cherry Road, Room 2-1071
>
> Rockville, MD 20857
>
> Tel: 240-276-1070
>
> Fax: 240-276-1260
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

I am not familiar with the mutate() function from dplyr, but you can get
your wanted results as follows:

data2 <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ)
Hi Dan,

Thank you so much for sending me your code that provides me desired results. But, I don't understand  why I am getting the follow warning message, In FUN(newX[, i], ...) : no non-missing arguments, returning NA. Any thoughts?

Regards,

Pradip



data2x <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))

Warning message:
In FUN(newX[, i], ...) : no non-missing arguments, returning NA
> data2x
  id    mrjdate    cocdate    inhdate    haldate    oidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Daniel Nordlund
Sent: Sunday, November 09, 2014 5:33 AM
To: [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

On 11/8/2014 8:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
>
>
> The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).
>
>
>
> I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.
>
>
>
> Thanks in advance.
>
>
>
> Pradip
>
>
>
>
>
> ######  from the console ########
>
> print (data2)
>
>    id    mrjdate    cocdate    inhdate    haldate    oidflag
>
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04
>
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04
>
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04
>
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04
>
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04
>
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
>
>
>
>
> ##################  Reproducible code and data
> #####################################
>
>
>
> library(dplyr)
>
> library(lubridate)
>
> library(zoo)
>
> # data object - description of the
>
>
>
> temp <- "id  mrjdate cocdate inhdate haldate
>
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>
> 2             NA         NA         NA         NA
>
> 3     2009-10-24         NA 2011-10-13         NA
>
> 4     2007-10-10         NA         NA         NA
>
> 5     2006-09-01 2005-08-10         NA         NA
>
> 6     2007-09-04 2011-10-05         NA         NA
>
> 7     2005-10-25         NA         NA 2011-11-04"
>
>
>
> # read the data object
>
>
>
> data1 <- read.table(textConnection(temp),
>
>                      colClasses=c("character", "Date", "Date", "Date",
> "Date"),
>
>                      header=TRUE, as.is=TRUE
>
>                      )
>
> # create a new column
>
>
>
> data2 <- mutate(data1,
>
>                  oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) &
> is.na(inhdate)  & is.na(haldate), NA,
>
>                                    max(mrjdate, cocdate, inhdate,
> haldate,na.rm=TRUE )
>
>                                  )
>
>                  )
>
>
>
> # convert to date
>
> data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")
>
>
>
> # print records
>
>
>
> print (data2)
>
>
>
>
>
> Pradip K. Muhuri, PhD
>
> SAMHSA/CBHSQ
>
> 1 Choke Cherry Road, Room 2-1071
>
> Rockville, MD 20857
>
> Tel: 240-276-1070
>
> Fax: 240-276-1260
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

I am not familiar with the mutate() function from dplyr, but you can get your wanted results as follows:

data2 <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ)
In reply to this post by R. Mark Sharp
Hi Mark,

Your code has also given me the results I expected.  Thank you so much for your help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: Mark Sharp [mailto:[hidden email]]
Sent: Sunday, November 09, 2014 3:01 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Pradip,

mutate() works on the entire column as a vector so that you find the maximum of the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 <- data1
data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

data3
  id    mrjdate    cocdate    inhdate    haldate    oidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: [hidden email]





NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

arun kirshna
In reply to this post by Muhuri, Pradip (SAMHSA/CBHSQ)
You could try

library(dplyr)
data1 %>%

      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                       na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ)
Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.    But,  I am getting the following warning message.  Any suggestions on how to avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#################################################################
data1 %>%
+  
+   rowwise() %>%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+                              na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

  id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: arun [mailto:[hidden email]]
Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %>%

      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                       na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

arun kirshna


Dear Pradip,

>From the documentation of ?max:


   The minimum and maximum of a numeric empty set are ‘+Inf’ and
        ‘-Inf’

One of the rows in your dataset is all `NAs.`  I am not sure you want to keep that row with all NAs.  You could remove it and run the code or keep it and run with that warning.

data1 <- data1[rowSums(is.na(data1[,-1]))!=4,]

data1 %>%

      rowwise()%>%
      mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')


A.K.
On Sunday, November 9, 2014 9:16 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:



Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.    But,  I am getting the following warning message.  Any suggestions on how to avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#################################################################
data1 %>%
+  
+   rowwise() %>%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+                              na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

  id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----

Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %>%

      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                       na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Daniel Nordlund-4
In reply to this post by Muhuri, Pradip (SAMHSA/CBHSQ)
On 11/9/2014 3:05 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hi Dan,
>
> Thank you so much for sending me your code that provides me desired results. But, I don't understand  why I am getting the follow warning message, In FUN(newX[, i], ...) : no non-missing arguments, returning NA. Any thoughts?
>
> Regards,
>
> Pradip
>
>
>
> data2x <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))
>
> Warning message:
> In FUN(newX[, i], ...) : no non-missing arguments, returning NA
>> data2x
>    id    mrjdate    cocdate    inhdate    haldate    oidflag
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Daniel Nordlund
> Sent: Sunday, November 09, 2014 5:33 AM
> To: [hidden email]
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
>
> On 11/8/2014 8:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>> Hello,
>>
>>
>>
>> The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).
>>
>>
>>
>> I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Pradip
>>
>>
>>
>>
>>
>> ######  from the console ########
>>
>> print (data2)
>>
>>     id    mrjdate    cocdate    inhdate    haldate    oidflag
>>
>> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04
>>
>> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
>>
>> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04
>>
>> 4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04
>>
>> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04
>>
>> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04
>>
>> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>>
>>
>>
>>
>> ##################  Reproducible code and data
>> #####################################
>>
>>
>>
>> library(dplyr)
>>
>> library(lubridate)
>>
>> library(zoo)
>>
>> # data object - description of the
>>
>>
>>
>> temp <- "id  mrjdate cocdate inhdate haldate
>>
>> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>>
>> 2             NA         NA         NA         NA
>>
>> 3     2009-10-24         NA 2011-10-13         NA
>>
>> 4     2007-10-10         NA         NA         NA
>>
>> 5     2006-09-01 2005-08-10         NA         NA
>>
>> 6     2007-09-04 2011-10-05         NA         NA
>>
>> 7     2005-10-25         NA         NA 2011-11-04"
>>
>>
>>
>> # read the data object
>>
>>
>>
>> data1 <- read.table(textConnection(temp),
>>
>>                       colClasses=c("character", "Date", "Date", "Date",
>> "Date"),
>>
>>                       header=TRUE, as.is=TRUE
>>
>>                       )
>>
>> # create a new column
>>
>>
>>
>> data2 <- mutate(data1,
>>
>>                   oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) &
>> is.na(inhdate)  & is.na(haldate), NA,
>>
>>                                     max(mrjdate, cocdate, inhdate,
>> haldate,na.rm=TRUE )
>>
>>                                   )
>>
>>                   )
>>
>>
>>
>> # convert to date
>>
>> data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")
>>
>>
>>
>> # print records
>>
>>
>>
>> print (data2)
>>
>>
>>
>>
>>
>> Pradip K. Muhuri, PhD
>>
>> SAMHSA/CBHSQ
>>
>> 1 Choke Cherry Road, Room 2-1071
>>
>> Rockville, MD 20857
>>
>> Tel: 240-276-1070
>>
>> Fax: 240-276-1260
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> I am not familiar with the mutate() function from dplyr, but you can get your wanted results as follows:
>
> data2 <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))
>
>
> Hope this is helpful,
>
> Dan
>
> Daniel Nordlund
> Bothell, WA USA
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

It means what it says.  In this case, for id=2 there are no non-missing
values.  Since, na.rm was set to true, it is just warning you that since
there was nothing left to get the max of, it is returning NA.

Dan

--
Daniel Nordlund
Bothell, WA USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ)
In reply to this post by arun kirshna
Hi Arun and Dennis,

This is just an FYI.

You're right - In one row, there are all NA's in  the four  "date" columns.  I have tested below the "TRUEness" of the condition Arun has set.

is.logical(data1[rowSums(is.na(data1[,-1]))!=4,])
[1] FALSE

All these 3 approaches below provide the exact same results.

# Approach 1 (suggested by Arun): The code gives the expected results, but with a warning message.
data1 %>%

   rowwise() %>%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                             na.rm=TRUE), origin='1970-01-01'))

# Approach 2: This code (suggested by Dan) does not provide now a warning message although it provided such message earlier.
data2x <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))


# Approach 2: This code (suggested by Mark) does not provide a warning message
data2 <- data1
data2$oidflag <- as.Date(sapply(seq_along(data2$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")


##########################  ends here ################

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: arun [mailto:[hidden email]]
Sent: Sunday, November 09, 2014 10:18 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)



Dear Pradip,

From the documentation of ?max:


   The minimum and maximum of a numeric empty set are ‘+Inf’ and
        ‘-Inf’

One of the rows in your dataset is all `NAs.`  I am not sure you want to keep that row with all NAs.  You could remove it and run the code or keep it and run with that warning.

data1 <- data1[rowSums(is.na(data1[,-1]))!=4,]

data1 %>%

      rowwise()%>%
      mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')


A.K.
On Sunday, November 9, 2014 9:16 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:



Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.    But,  I am getting the following warning message.  Any suggestions on how to avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#################################################################
data1 %>%
+  
+   rowwise() %>%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+                              na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

  id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: arun [mailto:[hidden email]]
Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %>%

      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                       na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ)
In reply to this post by Muhuri, Pradip (SAMHSA/CBHSQ)
Hello,

Two alternative approaches - mutate() vs. sapply() - were used to get the desired results (i.e., creating a new column of the most recent date  from 4 dates ) with help from Arun and Mark on this forum.  I now find that the two data objects (created using two different approaches) are not identical although results are exactly the same.  
 
identical(new1, new2)
[1] FALSE
 
Please see the reproducible example below.

I don't understand why the code returns FALSE here.  Any hints/comments  will be  appreciated.

Thanks,

Pradip

#############################################  reproducible example ########################################
library(dplyr)
# data object - description

temp <- "id  mrjdate cocdate inhdate haldate
1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
2             NA         NA         NA         NA    
3     2009-10-24         NA 2011-10-13         NA
4     2007-10-10         NA         NA         NA
5     2006-09-01 2005-08-10         NA         NA
6     2007-09-04 2011-10-05         NA         NA
7     2005-10-25         NA         NA 2011-11-04"

# read the data object

example.data <- read.table(textConnection(temp),
                    colClasses=c("character", "Date", "Date", "Date", "Date"),  
                    header=TRUE, as.is=TRUE
                    )


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 <- example.data %>%
     rowwise() %>%
      mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                                               na.rm=TRUE), origin='1970-01-01'))

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 <- example.data
new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

identical(new1, new2)

# print records

print (new1); print(new2)

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Sunday, November 09, 2014 6:11 AM
To: 'Mark Sharp'
Cc: [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Hi Mark,

Your code has also given me the results I expected.  Thank you so much for your help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: Mark Sharp [mailto:[hidden email]]
Sent: Sunday, November 09, 2014 3:01 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Pradip,

mutate() works on the entire column as a vector so that you find the maximum of the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 <- data1
data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

data3
  id    mrjdate    cocdate    inhdate    haldate    oidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: [hidden email]





NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

David Winsemius

On Dec 3, 2014, at 2:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> Two alternative approaches - mutate() vs. sapply() - were used to get the desired results (i.e., creating a new column of the most recent date  from 4 dates ) with help from Arun and Mark on this forum.  I now find that the two data objects (created using two different approaches) are not identical although results are exactly the same.  
>
> identical(new1, new2)
> [1] FALSE
>

You should have examined the output from dput() on both objects. I think you will find that dplyr is adding new attributes.

Notice the the "mutate()-ed" object now has this class:

class = c("rowwise_df", "tbl_df", "tbl", "data.frame")

Moral: Never rely on the the print representation.

--
David.


> Please see the reproducible example below.
>
> I don't understand why the code returns FALSE here.  Any hints/comments  will be  appreciated.
>
> Thanks,
>
> Pradip
>
> #############################################  reproducible example ########################################
> library(dplyr)
> # data object - description
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA    
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
>
> # read the data object
>
> example.data <- read.table(textConnection(temp),
>                    colClasses=c("character", "Date", "Date", "Date", "Date"),  
>                    header=TRUE, as.is=TRUE
>                    )
>
>
> # create a new column -dplyr solution (Acknowledgement: Arun)
>
> new1 <- example.data %>%
>     rowwise() %>%
>      mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>                                                               na.rm=TRUE), origin='1970-01-01'))
>
> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
>
> new2 <- example.data
> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')])))) {
>    max_d <- NA
>  } else {
>    max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
>
> identical(new1, new2)
>
> # print records
>
> print (new1); print(new2)
>
> Pradip K. Muhuri
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
> Sent: Sunday, November 09, 2014 6:11 AM
> To: 'Mark Sharp'
> Cc: [hidden email]
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
>
> Hi Mark,
>
> Your code has also given me the results I expected.  Thank you so much for your help.
>
> Regards,
>
> Pradip
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
> -----Original Message-----
> From: Mark Sharp [mailto:[hidden email]]
> Sent: Sunday, November 09, 2014 3:01 AM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: [hidden email]
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
>
> Pradip,
>
> mutate() works on the entire column as a vector so that you find the maximum of the entire data set.
>
> I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.
>
> max() does not want a dataframe thus the use of unlist().
>
> Using your definition of data1:
>
> data3 <- data1
> data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
>  if (all(is.na(unlist(data1[row, -1])))) {
>    max_d <- NA
>  } else {
>    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
>
> data3
>  id    mrjdate    cocdate    inhdate    haldate    oidflag
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
>
>
> R. Mark Sharp, Ph.D.
> Director of Primate Records Database
> Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549
> Telephone: (210)258-9476
> e-mail: [hidden email]
>
>
>
>
>
> NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Chel Hee Lee
In reply to this post by Muhuri, Pradip (SAMHSA/CBHSQ)
The output in the object 'new1' are apparently same the output in the
object 'new2'.  Are you trying to compare the entries of two outputs
'new1' and 'new2'?  If so, the function 'all()' would be useful:

 > all(new1 == new2, na.rm=TRUE)
[1] TRUE

If you are interested in the comparison of two objects in terms of
class, then the function 'identical()' is useful:

 > attributes(new1)
$names
[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oldflag"

$class
[1] "rowwise_df" "tbl_df"     "tbl"        "data.frame"

$row.names
[1] 1 2 3 4 5 6 7

 > attributes(new2)
$names
[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oiddate"

$row.names
[1] 1 2 3 4 5 6 7

$class
[1] "data.frame"

I hope this helps.

Chel Hee Lee

On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> Two alternative approaches - mutate() vs. sapply() - were used to get the desired results (i.e., creating a new column of the most recent date  from 4 dates ) with help from Arun and Mark on this forum.  I now find that the two data objects (created using two different approaches) are not identical although results are exactly the same.
>
> identical(new1, new2)
> [1] FALSE
>
> Please see the reproducible example below.
>
> I don't understand why the code returns FALSE here.  Any hints/comments  will be  appreciated.
>
> Thanks,
>
> Pradip
>
> #############################################  reproducible example ########################################
> library(dplyr)
> # data object - description
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
>
> # read the data object
>
> example.data <- read.table(textConnection(temp),
>                      colClasses=c("character", "Date", "Date", "Date", "Date"),
>                      header=TRUE, as.is=TRUE
>                      )
>
>
> # create a new column -dplyr solution (Acknowledgement: Arun)
>
> new1 <- example.data %>%
>       rowwise() %>%
>        mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>                                                                 na.rm=TRUE), origin='1970-01-01'))
>
> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
>
> new2 <- example.data
> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>    if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')])))) {
>      max_d <- NA
>    } else {
>      max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE)
>    }
>    max_d}),
>    origin = "1970-01-01")
>
> identical(new1, new2)
>
> # print records
>
> print (new1); print(new2)
>
> Pradip K. Muhuri
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
> Sent: Sunday, November 09, 2014 6:11 AM
> To: 'Mark Sharp'
> Cc: [hidden email]
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
>
> Hi Mark,
>
> Your code has also given me the results I expected.  Thank you so much for your help.
>
> Regards,
>
> Pradip
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
> -----Original Message-----
> From: Mark Sharp [mailto:[hidden email]]
> Sent: Sunday, November 09, 2014 3:01 AM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: [hidden email]
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
>
> Pradip,
>
> mutate() works on the entire column as a vector so that you find the maximum of the entire data set.
>
> I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.
>
> max() does not want a dataframe thus the use of unlist().
>
> Using your definition of data1:
>
> data3 <- data1
> data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
>    if (all(is.na(unlist(data1[row, -1])))) {
>      max_d <- NA
>    } else {
>      max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>    }
>    max_d}),
>    origin = "1970-01-01")
>
> data3
>    id    mrjdate    cocdate    inhdate    haldate    oidflag
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
>
>
> R. Mark Sharp, Ph.D.
> Director of Primate Records Database
> Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549
> Telephone: (210)258-9476
> e-mail: [hidden email]
>
>
>
>
>
> NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Chel Hee Lee
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ)
Hello Chel and David,

Thank you very much for providing new insights into this issue.  Here is one more question.  Why  does the mutate () give incorrect results here?

# The following gives INCORRECT results - mutated()ed object
na.date.cases = ifelse(!is.na(oiddate),1,0)

# The following gives CORRECT results
new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)

###############################  reproducible example - slightly revised/modified  ###############
library(dplyr)
# data object - description

temp <- "id  mrjdate cocdate inhdate haldate
1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
2             NA         NA         NA         NA    
3     2009-10-24         NA 2011-10-13         NA
4     2007-10-10         NA         NA         NA
5     2006-09-01 2005-08-10         NA         NA
6     2007-09-04 2011-10-05         NA         NA
7     2005-10-25         NA         NA 2011-11-04"

# read the data object

example.data <- read.table(textConnection(temp),
                    colClasses=c("character", "Date", "Date", "Date", "Date"),  
                    header=TRUE, as.is=TRUE
                    )


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 <- example.data %>%
     rowwise() %>%
      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01'),
             na.date.cases = ifelse(!is.na(oiddate),1,0)
             )

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 <- example.data
new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)


identical(new1, new2)

table(new1$oiddate)
table(new2$oiddate)

# print records

print (new1); print(new2)

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: Chel Hee Lee [mailto:[hidden email]]
Sent: Wednesday, December 03, 2014 8:48 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

The output in the object 'new1' are apparently same the output in the object 'new2'.  Are you trying to compare the entries of two outputs 'new1' and 'new2'?  If so, the function 'all()' would be useful:

 > all(new1 == new2, na.rm=TRUE)
[1] TRUE

If you are interested in the comparison of two objects in terms of class, then the function 'identical()' is useful:

 > attributes(new1)
$names
[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oldflag"

$class
[1] "rowwise_df" "tbl_df"     "tbl"        "data.frame"

$row.names
[1] 1 2 3 4 5 6 7

 > attributes(new2)
$names
[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oiddate"

$row.names
[1] 1 2 3 4 5 6 7

$class
[1] "data.frame"

I hope this helps.

Chel Hee Lee

On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> Two alternative approaches - mutate() vs. sapply() - were used to get the desired results (i.e., creating a new column of the most recent date  from 4 dates ) with help from Arun and Mark on this forum.  I now find that the two data objects (created using two different approaches) are not identical although results are exactly the same.
>
> identical(new1, new2)
> [1] FALSE
>
> Please see the reproducible example below.
>
> I don't understand why the code returns FALSE here.  Any hints/comments  will be  appreciated.
>
> Thanks,
>
> Pradip
>
> #############################################  reproducible example
> ########################################
> library(dplyr)
> # data object - description
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
>
> # read the data object
>
> example.data <- read.table(textConnection(temp),
>                      colClasses=c("character", "Date", "Date", "Date", "Date"),
>                      header=TRUE, as.is=TRUE
>                      )
>
>
> # create a new column -dplyr solution (Acknowledgement: Arun)
>
> new1 <- example.data %>%
>       rowwise() %>%
>        mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>                                                                
> na.rm=TRUE), origin='1970-01-01'))
>
> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
>
> new2 <- example.data
> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>    if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')])))) {
>      max_d <- NA
>    } else {
>      max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE)
>    }
>    max_d}),
>    origin = "1970-01-01")
>
> identical(new1, new2)
>
> # print records
>
> print (new1); print(new2)
>
> Pradip K. Muhuri
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Muhuri, Pradip
> (SAMHSA/CBHSQ)
> Sent: Sunday, November 09, 2014 6:11 AM
> To: 'Mark Sharp'
> Cc: [hidden email]
> Subject: Re: [R] Getting the most recent dates in a new column from
> dates in four columns using the dplyr package (mutate verb)
>
> Hi Mark,
>
> Your code has also given me the results I expected.  Thank you so much for your help.
>
> Regards,
>
> Pradip
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
> -----Original Message-----
> From: Mark Sharp [mailto:[hidden email]]
> Sent: Sunday, November 09, 2014 3:01 AM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: [hidden email]
> Subject: Re: [R] Getting the most recent dates in a new column from
> dates in four columns using the dplyr package (mutate verb)
>
> Pradip,
>
> mutate() works on the entire column as a vector so that you find the maximum of the entire data set.
>
> I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.
>
> max() does not want a dataframe thus the use of unlist().
>
> Using your definition of data1:
>
> data3 <- data1
> data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
>    if (all(is.na(unlist(data1[row, -1])))) {
>      max_d <- NA
>    } else {
>      max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>    }
>    max_d}),
>    origin = "1970-01-01")
>
> data3
>    id    mrjdate    cocdate    inhdate    haldate    oidflag
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
>
>
> R. Mark Sharp, Ph.D.
> Director of Primate Records Database
> Southwest National Primate Research Center Texas Biomedical Research
> Institute P.O. Box 760549 San Antonio, TX 78245-0549
> Telephone: (210)258-9476
> e-mail: [hidden email]
>
>
>
>
>
> NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

David Winsemius

On Dec 3, 2014, at 7:43 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello Chel and David,
>
> Thank you very much for providing new insights into this issue.  Here is one more question.  Why  does the mutate () give incorrect results here?
>
> # The following gives INCORRECT results - mutated()ed object
> na.date.cases = ifelse(!is.na(oiddate),1,0)
>
> # The following gives CORRECT results
> new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)
>
> ###############################  reproducible example - slightly revised/modified  ###############
> library(dplyr)
> # data object - description
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA    
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
>
> # read the data object
>
> example.data <- read.table(textConnection(temp),
>                    colClasses=c("character", "Date", "Date", "Date", "Date"),  
>                    header=TRUE, as.is=TRUE
>                    )
>
>
> # create a new column -dplyr solution (Acknowledgement: Arun)
>
> new1 <- example.data %>%
>     rowwise() %>%
>      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01'),
>             na.date.cases = ifelse(!is.na(oiddate),1,0)
>             )
>

It would have been polite to include the warning printed to the console after this line of code. It seems to me that this highlights the fact that you used different logic in the two methods and got, therefore, different answers.

--
David.

> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
>
> new2 <- example.data
> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')])))) {
>    max_d <- NA
>  } else {
>    max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
>
> new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)
>
>
> identical(new1, new2)
>
> table(new1$oiddate)
> table(new2$oiddate)
>
> # print records
>
> print (new1); print(new2)
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
> -----Original Message-----
> From: Chel Hee Lee [mailto:[hidden email]]
> Sent: Wednesday, December 03, 2014 8:48 PM
> To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
>
> The output in the object 'new1' are apparently same the output in the object 'new2'.  Are you trying to compare the entries of two outputs 'new1' and 'new2'?  If so, the function 'all()' would be useful:
>
>> all(new1 == new2, na.rm=TRUE)
> [1] TRUE
>
> If you are interested in the comparison of two objects in terms of class, then the function 'identical()' is useful:
>
>> attributes(new1)
> $names
> [1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oldflag"
>
> $class
> [1] "rowwise_df" "tbl_df"     "tbl"        "data.frame"
>
> $row.names
> [1] 1 2 3 4 5 6 7
>
>> attributes(new2)
> $names
> [1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oiddate"
>
> $row.names
> [1] 1 2 3 4 5 6 7
>
> $class
> [1] "data.frame"
>
> I hope this helps.
>
> Chel Hee Lee
>
> On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>> Hello,
>>
>> Two alternative approaches - mutate() vs. sapply() - were used to get the desired results (i.e., creating a new column of the most recent date  from 4 dates ) with help from Arun and Mark on this forum.  I now find that the two data objects (created using two different approaches) are not identical although results are exactly the same.
>>
>> identical(new1, new2)
>> [1] FALSE
>>
>> Please see the reproducible example below.
>>
>> I don't understand why the code returns FALSE here.  Any hints/comments  will be  appreciated.
>>
>> Thanks,
>>
>> Pradip
>>
>> #############################################  reproducible example
>> ########################################
>> library(dplyr)
>> # data object - description
>>
>> temp <- "id  mrjdate cocdate inhdate haldate
>> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>> 2             NA         NA         NA         NA
>> 3     2009-10-24         NA 2011-10-13         NA
>> 4     2007-10-10         NA         NA         NA
>> 5     2006-09-01 2005-08-10         NA         NA
>> 6     2007-09-04 2011-10-05         NA         NA
>> 7     2005-10-25         NA         NA 2011-11-04"
>>
>> # read the data object
>>
>> example.data <- read.table(textConnection(temp),
>>                     colClasses=c("character", "Date", "Date", "Date", "Date"),
>>                     header=TRUE, as.is=TRUE
>>                     )
>>
>>
>> # create a new column -dplyr solution (Acknowledgement: Arun)
>>
>> new1 <- example.data %>%
>>      rowwise() %>%
>>       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>>
>> na.rm=TRUE), origin='1970-01-01'))
>>
>> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
>>
>> new2 <- example.data
>> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>>   if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')])))) {
>>     max_d <- NA
>>   } else {
>>     max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 'haldate')]), na.rm = TRUE)
>>   }
>>   max_d}),
>>   origin = "1970-01-01")
>>
>> identical(new1, new2)
>>
>> # print records
>>
>> print (new1); print(new2)
>>
>> Pradip K. Muhuri
>> SAMHSA/CBHSQ
>> 1 Choke Cherry Road, Room 2-1071
>> Rockville, MD 20857
>> Tel: 240-276-1070
>> Fax: 240-276-1260
>>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Muhuri, Pradip
>> (SAMHSA/CBHSQ)
>> Sent: Sunday, November 09, 2014 6:11 AM
>> To: 'Mark Sharp'
>> Cc: [hidden email]
>> Subject: Re: [R] Getting the most recent dates in a new column from
>> dates in four columns using the dplyr package (mutate verb)
>>
>> Hi Mark,
>>
>> Your code has also given me the results I expected.  Thank you so much for your help.
>>
>> Regards,
>>
>> Pradip
>>
>> Pradip K. Muhuri, PhD
>> SAMHSA/CBHSQ
>> 1 Choke Cherry Road, Room 2-1071
>> Rockville, MD 20857
>> Tel: 240-276-1070
>> Fax: 240-276-1260
>>
>>
>> -----Original Message-----
>> From: Mark Sharp [mailto:[hidden email]]
>> Sent: Sunday, November 09, 2014 3:01 AM
>> To: Muhuri, Pradip (SAMHSA/CBHSQ)
>> Cc: [hidden email]
>> Subject: Re: [R] Getting the most recent dates in a new column from
>> dates in four columns using the dplyr package (mutate verb)
>>
>> Pradip,
>>
>> mutate() works on the entire column as a vector so that you find the maximum of the entire data set.
>>
>> I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.
>>
>> max() does not want a dataframe thus the use of unlist().
>>
>> Using your definition of data1:
>>
>> data3 <- data1
>> data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
>>   if (all(is.na(unlist(data1[row, -1])))) {
>>     max_d <- NA
>>   } else {
>>     max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>>   }
>>   max_d}),
>>   origin = "1970-01-01")
>>
>> data3
>>   id    mrjdate    cocdate    inhdate    haldate    oidflag
>> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
>> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
>> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
>> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
>> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
>> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
>> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>>
>>
>> R. Mark Sharp, Ph.D.
>> Director of Primate Records Database
>> Southwest National Primate Research Center Texas Biomedical Research
>> Institute P.O. Box 760549 San Antonio, TX 78245-0549
>> Telephone: (210)258-9476
>> e-mail: [hidden email]
>>
>>
>>
>>
>>
>> NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Jeff Newmiller
In reply to this post by Muhuri, Pradip (SAMHSA/CBHSQ)
There is something weird going on with mutate's interaction with the scalar Date objects. It seems to be passing them to max as constants of mode double.

Regardless, use of rowwise should be very rare, and you are definitely abusing it. Learn to work with vectors of values rather than one value at a time.

new3 <- example.data %>%
      mutate( oiddate = pmax( mrjdate, cocdate, inhdate, haldate, na.rm=TRUE)
                   , na.date.cases= as.numeric( !is.na( oiddate ) ) )

You might find it more useful to not convert the result of is.na to numeric... logical indexing can use that more efficiently than testing which rows have na.date.cases==1.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

On December 3, 2014 7:43:37 PM PST, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:

>Hello Chel and David,
>
>Thank you very much for providing new insights into this issue.  Here
>is one more question.  Why  does the mutate () give incorrect results
>here?
>
># The following gives INCORRECT results - mutated()ed object
>na.date.cases = ifelse(!is.na(oiddate),1,0)
>
># The following gives CORRECT results
>new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)
>
>###############################  reproducible example - slightly
>revised/modified  ###############
>library(dplyr)
># data object - description
>
>temp <- "id  mrjdate cocdate inhdate haldate
>1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>2             NA         NA         NA         NA    
>3     2009-10-24         NA 2011-10-13         NA
>4     2007-10-10         NA         NA         NA
>5     2006-09-01 2005-08-10         NA         NA
>6     2007-09-04 2011-10-05         NA         NA
>7     2005-10-25         NA         NA 2011-11-04"
>
># read the data object
>
>example.data <- read.table(textConnection(temp),
>           colClasses=c("character", "Date", "Date", "Date", "Date"),  
>                    header=TRUE, as.is=TRUE
>                    )
>
>
># create a new column -dplyr solution (Acknowledgement: Arun)
>
>new1 <- example.data %>%
>     rowwise() %>%
>mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>na.rm=TRUE), origin='1970-01-01'),
>             na.date.cases = ifelse(!is.na(oiddate),1,0)
>             )
>
># create a new column - Base R solution (Acknowlegement: Mark Sharp)
>
>new2 <- example.data
>new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate',
>'inhdate', 'haldate')])))) {
>    max_d <- NA
>  } else {
>max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate',
>'haldate')]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
>
>new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)
>
>
>identical(new1, new2)
>
>table(new1$oiddate)
>table(new2$oiddate)
>
># print records
>
>print (new1); print(new2)
>
>Pradip K. Muhuri, PhD
>SAMHSA/CBHSQ
>1 Choke Cherry Road, Room 2-1071
>Rockville, MD 20857
>Tel: 240-276-1070
>Fax: 240-276-1260
>
>-----Original Message-----
>From: Chel Hee Lee [mailto:[hidden email]]
>Sent: Wednesday, December 03, 2014 8:48 PM
>To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
>Subject: Re: [R] Getting the most recent dates in a new column from
>dates in four columns using the dplyr package (mutate verb)
>
>The output in the object 'new1' are apparently same the output in the
>object 'new2'.  Are you trying to compare the entries of two outputs
>'new1' and 'new2'?  If so, the function 'all()' would be useful:
>
> > all(new1 == new2, na.rm=TRUE)
>[1] TRUE
>
>If you are interested in the comparison of two objects in terms of
>class, then the function 'identical()' is useful:
>
> > attributes(new1)
>$names
>[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oldflag"
>
>$class
>[1] "rowwise_df" "tbl_df"     "tbl"        "data.frame"
>
>$row.names
>[1] 1 2 3 4 5 6 7
>
> > attributes(new2)
>$names
>[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oiddate"
>
>$row.names
>[1] 1 2 3 4 5 6 7
>
>$class
>[1] "data.frame"
>
>I hope this helps.
>
>Chel Hee Lee
>
>On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>> Hello,
>>
>> Two alternative approaches - mutate() vs. sapply() - were used to get
>the desired results (i.e., creating a new column of the most recent
>date  from 4 dates ) with help from Arun and Mark on this forum.  I now
>find that the two data objects (created using two different approaches)
>are not identical although results are exactly the same.
>>
>> identical(new1, new2)
>> [1] FALSE
>>
>> Please see the reproducible example below.
>>
>> I don't understand why the code returns FALSE here.  Any
>hints/comments  will be  appreciated.
>>
>> Thanks,
>>
>> Pradip
>>
>> #############################################  reproducible example
>> ########################################
>> library(dplyr)
>> # data object - description
>>
>> temp <- "id  mrjdate cocdate inhdate haldate
>> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>> 2             NA         NA         NA         NA
>> 3     2009-10-24         NA 2011-10-13         NA
>> 4     2007-10-10         NA         NA         NA
>> 5     2006-09-01 2005-08-10         NA         NA
>> 6     2007-09-04 2011-10-05         NA         NA
>> 7     2005-10-25         NA         NA 2011-11-04"
>>
>> # read the data object
>>
>> example.data <- read.table(textConnection(temp),
>>                      colClasses=c("character", "Date", "Date",
>"Date", "Date"),
>>                      header=TRUE, as.is=TRUE
>>                      )
>>
>>
>> # create a new column -dplyr solution (Acknowledgement: Arun)
>>
>> new1 <- example.data %>%
>>       rowwise() %>%
>>        mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>>                                                                
>> na.rm=TRUE), origin='1970-01-01'))
>>
>> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
>>
>> new2 <- example.data
>> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>>    if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate',
>'inhdate', 'haldate')])))) {
>>      max_d <- NA
>>    } else {
>>      max_d <- max(unlist(example.data[row, c('mrjdate','cocdate',
>'inhdate', 'haldate')]), na.rm = TRUE)
>>    }
>>    max_d}),
>>    origin = "1970-01-01")
>>
>> identical(new1, new2)
>>
>> # print records
>>
>> print (new1); print(new2)
>>
>> Pradip K. Muhuri
>> SAMHSA/CBHSQ
>> 1 Choke Cherry Road, Room 2-1071
>> Rockville, MD 20857
>> Tel: 240-276-1070
>> Fax: 240-276-1260
>>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Muhuri, Pradip
>> (SAMHSA/CBHSQ)
>> Sent: Sunday, November 09, 2014 6:11 AM
>> To: 'Mark Sharp'
>> Cc: [hidden email]
>> Subject: Re: [R] Getting the most recent dates in a new column from
>> dates in four columns using the dplyr package (mutate verb)
>>
>> Hi Mark,
>>
>> Your code has also given me the results I expected.  Thank you so
>much for your help.
>>
>> Regards,
>>
>> Pradip
>>
>> Pradip K. Muhuri, PhD
>> SAMHSA/CBHSQ
>> 1 Choke Cherry Road, Room 2-1071
>> Rockville, MD 20857
>> Tel: 240-276-1070
>> Fax: 240-276-1260
>>
>>
>> -----Original Message-----
>> From: Mark Sharp [mailto:[hidden email]]
>> Sent: Sunday, November 09, 2014 3:01 AM
>> To: Muhuri, Pradip (SAMHSA/CBHSQ)
>> Cc: [hidden email]
>> Subject: Re: [R] Getting the most recent dates in a new column from
>> dates in four columns using the dplyr package (mutate verb)
>>
>> Pradip,
>>
>> mutate() works on the entire column as a vector so that you find the
>maximum of the entire data set.
>>
>> I am almost certain there is some nice way to handle this, but the
>sapply() function is a standard approach.
>>
>> max() does not want a dataframe thus the use of unlist().
>>
>> Using your definition of data1:
>>
>> data3 <- data1
>> data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
>>    if (all(is.na(unlist(data1[row, -1])))) {
>>      max_d <- NA
>>    } else {
>>      max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>>    }
>>    max_d}),
>>    origin = "1970-01-01")
>>
>> data3
>>    id    mrjdate    cocdate    inhdate    haldate    oidflag
>> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
>> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
>> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
>> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
>> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
>> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
>> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>>
>>
>> R. Mark Sharp, Ph.D.
>> Director of Primate Records Database
>> Southwest National Primate Research Center Texas Biomedical Research
>> Institute P.O. Box 760549 San Antonio, TX 78245-0549
>> Telephone: (210)258-9476
>> e-mail: [hidden email]
>>
>>
>>
>>
>>
>> NOTICE:  This E-Mail (including attachments) is confidential and may
>be legally privileged.  It is covered by the Electronic Communications
>Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended
>recipient, you are hereby notified that any retention, dissemination,
>distribution or copying of this communication is strictly prohibited.
>Please reply to the sender that you have received this message in
>error, then delete it.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ)
Hello Jeff,

Your code has given me desired results, and your advice is well taken.  I agree with you regarding the use of logical indexing for testing conditions.   Thank you so much for your time and advice.

Pradip

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: Jeff Newmiller [mailto:[hidden email]]
Sent: Thursday, December 04, 2014 1:20 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

There is something weird going on with mutate's interaction with the scalar Date objects. It seems to be passing them to max as constants of mode double.

Regardless, use of rowwise should be very rare, and you are definitely abusing it. Learn to work with vectors of values rather than one value at a time.

new3 <- example.data %>%
      mutate( oiddate = pmax( mrjdate, cocdate, inhdate, haldate, na.rm=TRUE)
                   , na.date.cases= as.numeric( !is.na( oiddate ) ) )

You might find it more useful to not convert the result of is.na to numeric... logical indexing can use that more efficiently than testing which rows have na.date.cases==1.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

On December 3, 2014 7:43:37 PM PST, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:

>Hello Chel and David,
>
>Thank you very much for providing new insights into this issue.  Here
>is one more question.  Why  does the mutate () give incorrect results
>here?
>
># The following gives INCORRECT results - mutated()ed object
>na.date.cases = ifelse(!is.na(oiddate),1,0)
>
># The following gives CORRECT results
>new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)
>
>###############################  reproducible example - slightly
>revised/modified  ###############
>library(dplyr)
># data object - description
>
>temp <- "id  mrjdate cocdate inhdate haldate
>1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>2             NA         NA         NA         NA    
>3     2009-10-24         NA 2011-10-13         NA
>4     2007-10-10         NA         NA         NA
>5     2006-09-01 2005-08-10         NA         NA
>6     2007-09-04 2011-10-05         NA         NA
>7     2005-10-25         NA         NA 2011-11-04"
>
># read the data object
>
>example.data <- read.table(textConnection(temp),
>           colClasses=c("character", "Date", "Date", "Date", "Date"),  
>                    header=TRUE, as.is=TRUE
>                    )
>
>
># create a new column -dplyr solution (Acknowledgement: Arun)
>
>new1 <- example.data %>%
>     rowwise() %>%
>mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>na.rm=TRUE), origin='1970-01-01'),
>             na.date.cases = ifelse(!is.na(oiddate),1,0)
>             )
>
># create a new column - Base R solution (Acknowlegement: Mark Sharp)
>
>new2 <- example.data
>new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) { if
>(all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate',
>'haldate')])))) {
>    max_d <- NA
>  } else {
>max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate',
>'haldate')]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
>
>new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)
>
>
>identical(new1, new2)
>
>table(new1$oiddate)
>table(new2$oiddate)
>
># print records
>
>print (new1); print(new2)
>
>Pradip K. Muhuri, PhD
>SAMHSA/CBHSQ
>1 Choke Cherry Road, Room 2-1071
>Rockville, MD 20857
>Tel: 240-276-1070
>Fax: 240-276-1260
>
>-----Original Message-----
>From: Chel Hee Lee [mailto:[hidden email]]
>Sent: Wednesday, December 03, 2014 8:48 PM
>To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
>Subject: Re: [R] Getting the most recent dates in a new column from
>dates in four columns using the dplyr package (mutate verb)
>
>The output in the object 'new1' are apparently same the output in the
>object 'new2'.  Are you trying to compare the entries of two outputs
>'new1' and 'new2'?  If so, the function 'all()' would be useful:
>
> > all(new1 == new2, na.rm=TRUE)
>[1] TRUE
>
>If you are interested in the comparison of two objects in terms of
>class, then the function 'identical()' is useful:
>
> > attributes(new1)
>$names
>[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oldflag"
>
>$class
>[1] "rowwise_df" "tbl_df"     "tbl"        "data.frame"
>
>$row.names
>[1] 1 2 3 4 5 6 7
>
> > attributes(new2)
>$names
>[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oiddate"
>
>$row.names
>[1] 1 2 3 4 5 6 7
>
>$class
>[1] "data.frame"
>
>I hope this helps.
>
>Chel Hee Lee
>
>On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>> Hello,
>>
>> Two alternative approaches - mutate() vs. sapply() - were used to get
>the desired results (i.e., creating a new column of the most recent
>date  from 4 dates ) with help from Arun and Mark on this forum.  I now
>find that the two data objects (created using two different approaches)
>are not identical although results are exactly the same.
>>
>> identical(new1, new2)
>> [1] FALSE
>>
>> Please see the reproducible example below.
>>
>> I don't understand why the code returns FALSE here.  Any
>hints/comments  will be  appreciated.
>>
>> Thanks,
>>
>> Pradip
>>
>> #############################################  reproducible example
>> ########################################
>> library(dplyr)
>> # data object - description
>>
>> temp <- "id  mrjdate cocdate inhdate haldate
>> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>> 2             NA         NA         NA         NA
>> 3     2009-10-24         NA 2011-10-13         NA
>> 4     2007-10-10         NA         NA         NA
>> 5     2006-09-01 2005-08-10         NA         NA
>> 6     2007-09-04 2011-10-05         NA         NA
>> 7     2005-10-25         NA         NA 2011-11-04"
>>
>> # read the data object
>>
>> example.data <- read.table(textConnection(temp),
>>                      colClasses=c("character", "Date", "Date",
>"Date", "Date"),
>>                      header=TRUE, as.is=TRUE
>>                      )
>>
>>
>> # create a new column -dplyr solution (Acknowledgement: Arun)
>>
>> new1 <- example.data %>%
>>       rowwise() %>%
>>        mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>>                                                                
>> na.rm=TRUE), origin='1970-01-01'))
>>
>> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
>>
>> new2 <- example.data
>> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>>    if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate',
>'inhdate', 'haldate')])))) {
>>      max_d <- NA
>>    } else {
>>      max_d <- max(unlist(example.data[row, c('mrjdate','cocdate',
>'inhdate', 'haldate')]), na.rm = TRUE)
>>    }
>>    max_d}),
>>    origin = "1970-01-01")
>>
>> identical(new1, new2)
>>
>> # print records
>>
>> print (new1); print(new2)
>>
>> Pradip K. Muhuri
>> SAMHSA/CBHSQ
>> 1 Choke Cherry Road, Room 2-1071
>> Rockville, MD 20857
>> Tel: 240-276-1070
>> Fax: 240-276-1260
>>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Muhuri, Pradip
>> (SAMHSA/CBHSQ)
>> Sent: Sunday, November 09, 2014 6:11 AM
>> To: 'Mark Sharp'
>> Cc: [hidden email]
>> Subject: Re: [R] Getting the most recent dates in a new column from
>> dates in four columns using the dplyr package (mutate verb)
>>
>> Hi Mark,
>>
>> Your code has also given me the results I expected.  Thank you so
>much for your help.
>>
>> Regards,
>>
>> Pradip
>>
>> Pradip K. Muhuri, PhD
>> SAMHSA/CBHSQ
>> 1 Choke Cherry Road, Room 2-1071
>> Rockville, MD 20857
>> Tel: 240-276-1070
>> Fax: 240-276-1260
>>
>>
>> -----Original Message-----
>> From: Mark Sharp [mailto:[hidden email]]
>> Sent: Sunday, November 09, 2014 3:01 AM
>> To: Muhuri, Pradip (SAMHSA/CBHSQ)
>> Cc: [hidden email]
>> Subject: Re: [R] Getting the most recent dates in a new column from
>> dates in four columns using the dplyr package (mutate verb)
>>
>> Pradip,
>>
>> mutate() works on the entire column as a vector so that you find the
>maximum of the entire data set.
>>
>> I am almost certain there is some nice way to handle this, but the
>sapply() function is a standard approach.
>>
>> max() does not want a dataframe thus the use of unlist().
>>
>> Using your definition of data1:
>>
>> data3 <- data1
>> data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
>>    if (all(is.na(unlist(data1[row, -1])))) {
>>      max_d <- NA
>>    } else {
>>      max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>>    }
>>    max_d}),
>>    origin = "1970-01-01")
>>
>> data3
>>    id    mrjdate    cocdate    inhdate    haldate    oidflag
>> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
>> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
>> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
>> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
>> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
>> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
>> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>>
>>
>> R. Mark Sharp, Ph.D.
>> Director of Primate Records Database
>> Southwest National Primate Research Center Texas Biomedical Research
>> Institute P.O. Box 760549 San Antonio, TX 78245-0549
>> Telephone: (210)258-9476
>> e-mail: [hidden email]
>>
>>
>>
>>
>>
>> NOTICE:  This E-Mail (including attachments) is confidential and may
>be legally privileged.  It is covered by the Electronic Communications
>Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended
>recipient, you are hereby notified that any retention, dissemination,
>distribution or copying of this communication is strictly prohibited.
>Please reply to the sender that you have received this message in
>error, then delete it.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.