range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

Muhuri, Pradip (SAMHSA/CBHSQ)
Hello,

The range() with complete.cases() removes NA's for the date variables that are read from a data frame.  However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate().  The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated.

Thanks,

Pradip Muhuri


#################  cut and pasted from the R console ####################

id    mrjdate    cocdate    inhdate    haldate    oiddate
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data2$cocdate[complete.cases(data2$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data2$inhdate[complete.cases(data2$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data2$haldate[complete.cases(data2$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data2$oiddate[complete.cases(data2$oiddate)])
[1] NA           "2011-11-04"


################  reproducible code #############################

library(dplyr)
library(lubridate)
library(zoo)
# data object - description of the

temp <- "id  mrjdate cocdate inhdate haldate
1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
2             NA         NA         NA         NA
3     2009-10-24         NA 2011-10-13         NA
4     2007-10-10         NA         NA         NA
5     2006-09-01 2005-08-10         NA         NA
6     2007-09-04 2011-10-05         NA         NA
7     2005-10-25         NA         NA 2011-11-04"

# read the data object

data1 <- read.table(textConnection(temp),
                    colClasses=c("character", "Date", "Date", "Date", "Date"),
                    header=TRUE, as.is=TRUE
                    )


# create a new column

data2 <- data1 %>%
     rowwise() %>%
      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                                               na.rm=TRUE), origin='1970-01-01'))

# print records

print (data2)

# range of dates

range(data2$mrjdate[complete.cases(data2$mrjdate)])
range(data2$cocdate[complete.cases(data2$cocdate)])
range(data2$inhdate[complete.cases(data2$inhdate)])
range(data2$haldate[complete.cases(data2$haldate)])
range(data2$oiddate[complete.cases(data2$oiddate)])





Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

arun kirshna
Try

range(data2$oiddate[complete.cases(data2$oiddate) & is.finite(data2$oiddate)])
#[1] "2006-09-01" "2011-11-04"



If you look at the `dput` output, it is `Inf` for oiddate
dput(data2$oiddate)
structure(c(14078, -Inf, 15260, 13796, 13392, 15252, 15282), class = "Date")

   

A.K.

On Monday, November 10, 2014 11:15 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:
Hello,

The range() with complete.cases() removes NA's for the date variables that are read from a data frame.  However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate().  The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated.

Thanks,

Pradip Muhuri


#################  cut and pasted from the R console ####################

id    mrjdate    cocdate    inhdate    haldate    oiddate
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data2$cocdate[complete.cases(data2$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data2$inhdate[complete.cases(data2$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data2$haldate[complete.cases(data2$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data2$oiddate[complete.cases(data2$oiddate)])
[1] NA           "2011-11-04"


################  reproducible code #############################

library(dplyr)
library(lubridate)
library(zoo)
# data object - description of the

temp <- "id  mrjdate cocdate inhdate haldate
1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
2             NA         NA         NA         NA
3     2009-10-24         NA 2011-10-13         NA
4     2007-10-10         NA         NA         NA
5     2006-09-01 2005-08-10         NA         NA
6     2007-09-04 2011-10-05         NA         NA
7     2005-10-25         NA         NA 2011-11-04"

# read the data object

data1 <- read.table(textConnection(temp),
                    colClasses=c("character", "Date", "Date", "Date", "Date"),
                    header=TRUE, as.is=TRUE
                    )


# create a new column

data2 <- data1 %>%
     rowwise() %>%
      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                                               na.rm=TRUE), origin='1970-01-01'))

# print records

print (data2)

# range of dates

range(data2$mrjdate[complete.cases(data2$mrjdate)])
range(data2$cocdate[complete.cases(data2$cocdate)])
range(data2$inhdate[complete.cases(data2$inhdate)])
range(data2$haldate[complete.cases(data2$haldate)])
range(data2$oiddate[complete.cases(data2$oiddate)])





Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260



    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

Muhuri, Pradip (SAMHSA/CBHSQ)
Hello Arun,

Thank you so much for your help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: arun [mailto:[hidden email]]
Sent: Monday, November 10, 2014 11:30 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); [hidden email]
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

Try

range(data2$oiddate[complete.cases(data2$oiddate) & is.finite(data2$oiddate)]) #[1] "2006-09-01" "2011-11-04"



If you look at the `dput` output, it is `Inf` for oiddate
dput(data2$oiddate)
structure(c(14078, -Inf, 15260, 13796, 13392, 15252, 15282), class = "Date")

   

A.K.

On Monday, November 10, 2014 11:15 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <[hidden email]> wrote:
Hello,

The range() with complete.cases() removes NA's for the date variables that are read from a data frame.  However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate().  The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated.

Thanks,

Pradip Muhuri


#################  cut and pasted from the R console ####################

id    mrjdate    cocdate    inhdate    haldate    oiddate
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data2$cocdate[complete.cases(data2$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data2$inhdate[complete.cases(data2$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data2$haldate[complete.cases(data2$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data2$oiddate[complete.cases(data2$oiddate)])
[1] NA           "2011-11-04"


################  reproducible code #############################

library(dplyr)
library(lubridate)
library(zoo)
# data object - description of the

temp <- "id  mrjdate cocdate inhdate haldate
1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
2             NA         NA         NA         NA
3     2009-10-24         NA 2011-10-13         NA
4     2007-10-10         NA         NA         NA
5     2006-09-01 2005-08-10         NA         NA
6     2007-09-04 2011-10-05         NA         NA
7     2005-10-25         NA         NA 2011-11-04"

# read the data object

data1 <- read.table(textConnection(temp),
                    colClasses=c("character", "Date", "Date", "Date", "Date"),
                    header=TRUE, as.is=TRUE
                    )


# create a new column

data2 <- data1 %>%
     rowwise() %>%
      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                                               na.rm=TRUE), origin='1970-01-01'))

# print records

print (data2)

# range of dates

range(data2$mrjdate[complete.cases(data2$mrjdate)])
range(data2$cocdate[complete.cases(data2$cocdate)])
range(data2$inhdate[complete.cases(data2$inhdate)])
range(data2$haldate[complete.cases(data2$haldate)])
range(data2$oiddate[complete.cases(data2$oiddate)])





Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260



    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

R. Mark Sharp
In reply to this post by Muhuri, Pradip (SAMHSA/CBHSQ)
Pradip,

For some reason mutate is not setting the is.NA value for the new column. Note the output below using your data structures.

> ## It looks at first as if the second element of both columns are NA.
> data2$mrjdate[2]
[1] NA
> data2$oiddate[2]
[1] NA
> ## for convenience
> mrj <- data2$mrjdate[2]
> oid <- data2$oiddate[2]
> mode(mrj)
[1] "numeric"
> mode(oid)
[1] "numeric"
> str(mrj)
 Date[1:1], format: NA
> str(oid)
 Date[1:1], format: NA
> class(mrj)
[1] "Date"
> class(oid)
[1] "Date"
> ## But note:
> identical(mrj, oid)
[1] FALSE
> all.equal(mrj, oid)
[1] "'is.NA' value mismatch: 0 in current 1 in target"
## functioning code
data2$mrjdate[2]
data2$oiddate[2]
mrj <- data2$mrjdate[2]
oid <- data2$oiddate[2]
mode(mrj)
mode(oid)
str(mrj)
str(oid)
class(mrj)
class(oid)
# But note:
identical(mrj, oid)
all.equal(mrj, oid)

## This ugly solution does not have the problem.
> data3 <- data1
> data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
+   if (all(is.na(unlist(data1[row, -1])))) {
+     max_d <- NA
+   } else {
+     max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
+   }
+   max_d}),
+   origin = "1970-01-01")
>
> range(data3$mrjdate[complete.cases(data3$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data3$cocdate[complete.cases(data3$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data3$inhdate[complete.cases(data3$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data3$haldate[complete.cases(data3$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data3$oiddate[complete.cases(data3$oiddate)])
[1] "2006-09-01" "2011-11-04"
>
Working code below.

data3 <- data1
data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

range(data3$mrjdate[complete.cases(data3$mrjdate)])
range(data3$cocdate[complete.cases(data3$cocdate)])
range(data3$inhdate[complete.cases(data3$inhdate)])
range(data3$haldate[complete.cases(data3$haldate)])
range(data3$oiddate[complete.cases(data3$oiddate)])


On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> The range() with complete.cases() removes NA's for the date variables that are read from a data frame.  However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate().  The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
>
> #################  cut and pasted from the R console ####################
>
> id    mrjdate    cocdate    inhdate    haldate    oiddate
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>> # range of dates
>>
>> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> [1] "2004-11-04" "2009-10-24"
>> range(data2$cocdate[complete.cases(data2$cocdate)])
> [1] "2005-08-10" "2011-10-05"
>> range(data2$inhdate[complete.cases(data2$inhdate)])
> [1] "2005-07-07" "2011-10-13"
>> range(data2$haldate[complete.cases(data2$haldate)])
> [1] "2007-11-07" "2011-11-04"
>> range(data2$oiddate[complete.cases(data2$oiddate)])
> [1] NA           "2011-11-04"
>
>
> ################  reproducible code #############################
>
> library(dplyr)
> library(lubridate)
> library(zoo)
> # data object - description of the
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
>
> # read the data object
>
> data1 <- read.table(textConnection(temp),
>                    colClasses=c("character", "Date", "Date", "Date", "Date"),
>                    header=TRUE, as.is=TRUE
>                    )
>
>
> # create a new column
>
> data2 <- data1 %>%
>     rowwise() %>%
>      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>                                                               na.rm=TRUE), origin='1970-01-01'))
>
> # print records
>
> print (data2)
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> range(data2$cocdate[complete.cases(data2$cocdate)])
> range(data2$inhdate[complete.cases(data2$inhdate)])
> range(data2$haldate[complete.cases(data2$haldate)])
> range(data2$oiddate[complete.cases(data2$oiddate)])
>
>
>
>
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

Muhuri, Pradip (SAMHSA/CBHSQ)
Mark,

Thank you very much for further looking into this issue.  So, the "ugly" solution is better!  Would you like to bring to Hadley's attention that mutate does set the NA value for the new column?

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: Mark Sharp [mailto:[hidden email]]
Sent: Monday, November 10, 2014 12:23 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: [hidden email]
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

Pradip,

For some reason mutate is not setting the is.NA value for the new column. Note the output below using your data structures.

> ## It looks at first as if the second element of both columns are NA.
> data2$mrjdate[2]
[1] NA
> data2$oiddate[2]
[1] NA
> ## for convenience
> mrj <- data2$mrjdate[2]
> oid <- data2$oiddate[2]
> mode(mrj)
[1] "numeric"
> mode(oid)
[1] "numeric"
> str(mrj)
 Date[1:1], format: NA
> str(oid)
 Date[1:1], format: NA
> class(mrj)
[1] "Date"
> class(oid)
[1] "Date"
> ## But note:
> identical(mrj, oid)
[1] FALSE
> all.equal(mrj, oid)
[1] "'is.NA' value mismatch: 0 in current 1 in target"
## functioning code
data2$mrjdate[2]
data2$oiddate[2]
mrj <- data2$mrjdate[2]
oid <- data2$oiddate[2]
mode(mrj)
mode(oid)
str(mrj)
str(oid)
class(mrj)
class(oid)
# But note:
identical(mrj, oid)
all.equal(mrj, oid)

## This ugly solution does not have the problem.
> data3 <- data1
> data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
+   if (all(is.na(unlist(data1[row, -1])))) {
+     max_d <- NA
+   } else {
+     max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
+   }
+   max_d}),
+   origin = "1970-01-01")
>
> range(data3$mrjdate[complete.cases(data3$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data3$cocdate[complete.cases(data3$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data3$inhdate[complete.cases(data3$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data3$haldate[complete.cases(data3$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data3$oiddate[complete.cases(data3$oiddate)])
[1] "2006-09-01" "2011-11-04"
>
Working code below.

data3 <- data1
data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

range(data3$mrjdate[complete.cases(data3$mrjdate)])
range(data3$cocdate[complete.cases(data3$cocdate)])
range(data3$inhdate[complete.cases(data3$inhdate)])
range(data3$haldate[complete.cases(data3$haldate)])
range(data3$oiddate[complete.cases(data3$oiddate)])


On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> The range() with complete.cases() removes NA's for the date variables that are read from a data frame.  However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate().  The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
>
> #################  cut and pasted from the R console ####################
>
> id    mrjdate    cocdate    inhdate    haldate    oiddate
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>> # range of dates
>>
>> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> [1] "2004-11-04" "2009-10-24"
>> range(data2$cocdate[complete.cases(data2$cocdate)])
> [1] "2005-08-10" "2011-10-05"
>> range(data2$inhdate[complete.cases(data2$inhdate)])
> [1] "2005-07-07" "2011-10-13"
>> range(data2$haldate[complete.cases(data2$haldate)])
> [1] "2007-11-07" "2011-11-04"
>> range(data2$oiddate[complete.cases(data2$oiddate)])
> [1] NA           "2011-11-04"
>
>
> ################  reproducible code #############################
>
> library(dplyr)
> library(lubridate)
> library(zoo)
> # data object - description of the
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
>
> # read the data object
>
> data1 <- read.table(textConnection(temp),
>                    colClasses=c("character", "Date", "Date", "Date", "Date"),
>                    header=TRUE, as.is=TRUE
>                    )
>
>
> # create a new column
>
> data2 <- data1 %>%
>     rowwise() %>%
>      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>                                                               na.rm=TRUE), origin='1970-01-01'))
>
> # print records
>
> print (data2)
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> range(data2$cocdate[complete.cases(data2$cocdate)])
> range(data2$inhdate[complete.cases(data2$inhdate)])
> range(data2$haldate[complete.cases(data2$haldate)])
> range(data2$oiddate[complete.cases(data2$oiddate)])
>
>
>
>
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

William Dunlap
> Would you like to bring to Hadley's attention that mutate does
> set the NA value for the new column?

This may not be mutate()'s problem.

The Date class is messed up with regard to NA's and Inf's.  E.g., what gets
printed as NA does not correspond to what is.na() returns and its range()
method does not appear to pass the finite=TRUE argument to range.default:
  > d <- as.Date(c("2014-10-31", c("2014-11-10")))
  > d1 <- range(d[0], finite=TRUE)
  Warning messages:
  1: In min.default(numeric(0), na.rm = FALSE) :
    no non-missing arguments to min; returning Inf
  2: In max.default(numeric(0), na.rm = FALSE) :
    no non-missing arguments to max; returning -Inf
  > d1
  [1] NA NA
  > is.na(d1)
  [1] FALSE FALSE
  > dput(d1)
  structure(c(Inf, -Inf), class = "Date")
  > range(c(d1, d), finite=TRUE)
  [1] NA NA
  > range(c(d1, d), finite=TRUE, na.rm=TRUE)
  [1] NA NA



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Nov 10, 2014 at 10:09 AM, Muhuri, Pradip (SAMHSA/CBHSQ) <
[hidden email]> wrote:

> Mark,
>
> Thank you very much for further looking into this issue.  So, the "ugly"
> solution is better!  Would you like to bring to Hadley's attention that
> mutate does set the NA value for the new column?
>
> Regards,
>
> Pradip
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
> -----Original Message-----
> From: Mark Sharp [mailto:[hidden email]]
> Sent: Monday, November 10, 2014 12:23 PM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: [hidden email]
> Subject: Re: [R] range () does not remove NA's with complete.cases() for
> dates (dplyr/mutate)
>
> Pradip,
>
> For some reason mutate is not setting the is.NA value for the new column.
> Note the output below using your data structures.
>
> > ## It looks at first as if the second element of both columns are NA.
> > data2$mrjdate[2]
> [1] NA
> > data2$oiddate[2]
> [1] NA
> > ## for convenience
> > mrj <- data2$mrjdate[2]
> > oid <- data2$oiddate[2]
> > mode(mrj)
> [1] "numeric"
> > mode(oid)
> [1] "numeric"
> > str(mrj)
>  Date[1:1], format: NA
> > str(oid)
>  Date[1:1], format: NA
> > class(mrj)
> [1] "Date"
> > class(oid)
> [1] "Date"
> > ## But note:
> > identical(mrj, oid)
> [1] FALSE
> > all.equal(mrj, oid)
> [1] "'is.NA' value mismatch: 0 in current 1 in target"
> ## functioning code
> data2$mrjdate[2]
> data2$oiddate[2]
> mrj <- data2$mrjdate[2]
> oid <- data2$oiddate[2]
> mode(mrj)
> mode(oid)
> str(mrj)
> str(oid)
> class(mrj)
> class(oid)
> # But note:
> identical(mrj, oid)
> all.equal(mrj, oid)
>
> ## This ugly solution does not have the problem.
> > data3 <- data1
> > data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
> +   if (all(is.na(unlist(data1[row, -1])))) {
> +     max_d <- NA
> +   } else {
> +     max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
> +   }
> +   max_d}),
> +   origin = "1970-01-01")
> >
> > range(data3$mrjdate[complete.cases(data3$mrjdate)])
> [1] "2004-11-04" "2009-10-24"
> > range(data3$cocdate[complete.cases(data3$cocdate)])
> [1] "2005-08-10" "2011-10-05"
> > range(data3$inhdate[complete.cases(data3$inhdate)])
> [1] "2005-07-07" "2011-10-13"
> > range(data3$haldate[complete.cases(data3$haldate)])
> [1] "2007-11-07" "2011-11-04"
> > range(data3$oiddate[complete.cases(data3$oiddate)])
> [1] "2006-09-01" "2011-11-04"
> >
> Working code below.
>
> data3 <- data1
> data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
>   if (all(is.na(unlist(data1[row, -1])))) {
>     max_d <- NA
>   } else {
>     max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>   }
>   max_d}),
>   origin = "1970-01-01")
>
> range(data3$mrjdate[complete.cases(data3$mrjdate)])
> range(data3$cocdate[complete.cases(data3$cocdate)])
> range(data3$inhdate[complete.cases(data3$inhdate)])
> range(data3$haldate[complete.cases(data3$haldate)])
> range(data3$oiddate[complete.cases(data3$oiddate)])
>
>
> On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>
> > Hello,
> >
> > The range() with complete.cases() removes NA's for the date variables
> that are read from a data frame.  However, the issue is that the same
> function does not remove NA's for the other date variable that is created
> using the dplyr/mutate().  The console and the reproducible example are
> given below. Any advice how to resolve this issue would be appreciated.
> >
> > Thanks,
> >
> > Pradip Muhuri
> >
> >
> > #################  cut and pasted from the R console ####################
> >
> > id    mrjdate    cocdate    inhdate    haldate    oiddate
> > 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> > 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> > 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> > 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> > 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> > 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> > 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
> >>
> >> # range of dates
> >>
> >> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> > [1] "2004-11-04" "2009-10-24"
> >> range(data2$cocdate[complete.cases(data2$cocdate)])
> > [1] "2005-08-10" "2011-10-05"
> >> range(data2$inhdate[complete.cases(data2$inhdate)])
> > [1] "2005-07-07" "2011-10-13"
> >> range(data2$haldate[complete.cases(data2$haldate)])
> > [1] "2007-11-07" "2011-11-04"
> >> range(data2$oiddate[complete.cases(data2$oiddate)])
> > [1] NA           "2011-11-04"
> >
> >
> > ################  reproducible code #############################
> >
> > library(dplyr)
> > library(lubridate)
> > library(zoo)
> > # data object - description of the
> >
> > temp <- "id  mrjdate cocdate inhdate haldate
> > 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> > 2             NA         NA         NA         NA
> > 3     2009-10-24         NA 2011-10-13         NA
> > 4     2007-10-10         NA         NA         NA
> > 5     2006-09-01 2005-08-10         NA         NA
> > 6     2007-09-04 2011-10-05         NA         NA
> > 7     2005-10-25         NA         NA 2011-11-04"
> >
> > # read the data object
> >
> > data1 <- read.table(textConnection(temp),
> >                    colClasses=c("character", "Date", "Date", "Date",
> "Date"),
> >                    header=TRUE, as.is=TRUE
> >                    )
> >
> >
> > # create a new column
> >
> > data2 <- data1 %>%
> >     rowwise() %>%
> >      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
> >
>  na.rm=TRUE), origin='1970-01-01'))
> >
> > # print records
> >
> > print (data2)
> >
> > # range of dates
> >
> > range(data2$mrjdate[complete.cases(data2$mrjdate)])
> > range(data2$cocdate[complete.cases(data2$cocdate)])
> > range(data2$inhdate[complete.cases(data2$inhdate)])
> > range(data2$haldate[complete.cases(data2$haldate)])
> > range(data2$oiddate[complete.cases(data2$oiddate)])
> >
> >
> >
> >
> >
> > Pradip K. Muhuri, PhD
> > SAMHSA/CBHSQ
> > 1 Choke Cherry Road, Room 2-1071
> > Rockville, MD 20857
> > Tel: 240-276-1070
> > Fax: 240-276-1260
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
> NOTICE:  This E-Mail (including attachments) is confidential and may be
> legally privileged.  It is covered by the Electronic Communications Privacy
> Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are
> hereby notified that any retention, dissemination, distribution or copying
> of this communication is strictly prohibited.  Please reply to the sender
> that you have received this message in error, then delete it.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

Muhuri, Pradip (SAMHSA/CBHSQ)
Hi Bill and mark,

I meant the mutate does NOT set the NA value – sorry for the confusion.  Thank you for your clarifications that this may not be mutate()’s problem.  This thread is now closed from my end.

Thanks,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

From: William Dunlap [mailto:[hidden email]]
Sent: Monday, November 10, 2014 1:30 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: Mark Sharp; [hidden email]
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

> Would you like to bring to Hadley's attention that mutate does
> set the NA value for the new column?

This may not be mutate()'s problem.

The Date class is messed up with regard to NA's and Inf's.  E.g., what gets printed as NA does not correspond to what is.na<http://is.na>() returns and its range() method does not appear to pass the finite=TRUE argument to range.default:
  > d <- as.Date(c("2014-10-31", c("2014-11-10")))
  > d1 <- range(d[0], finite=TRUE)
  Warning messages:
  1: In min.default(numeric(0), na.rm = FALSE) :
    no non-missing arguments to min; returning Inf
  2: In max.default(numeric(0), na.rm = FALSE) :
    no non-missing arguments to max; returning -Inf
  > d1
  [1] NA NA
  > is.na<http://is.na>(d1)
  [1] FALSE FALSE
  > dput(d1)
  structure(c(Inf, -Inf), class = "Date")
  > range(c(d1, d), finite=TRUE)
  [1] NA NA
  > range(c(d1, d), finite=TRUE, na.rm=TRUE)
  [1] NA NA



Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>

On Mon, Nov 10, 2014 at 10:09 AM, Muhuri, Pradip (SAMHSA/CBHSQ) <[hidden email]<mailto:[hidden email]>> wrote:
Mark,

Thank you very much for further looking into this issue.  So, the "ugly" solution is better!  Would you like to bring to Hadley's attention that mutate does set the NA value for the new column?

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070<tel:240-276-1070>
Fax: 240-276-1260<tel:240-276-1260>


-----Original Message-----
From: Mark Sharp [mailto:[hidden email]<mailto:[hidden email]>]
Sent: Monday, November 10, 2014 12:23 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: [hidden email]<mailto:[hidden email]>
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Pradip,

For some reason mutate is not setting the is.NA value for the new column. Note the output below using your data structures.

> ## It looks at first as if the second element of both columns are NA.
> data2$mrjdate[2]
[1] NA
> data2$oiddate[2]
[1] NA
> ## for convenience
> mrj <- data2$mrjdate[2]
> oid <- data2$oiddate[2]
> mode(mrj)
[1] "numeric"
> mode(oid)
[1] "numeric"
> str(mrj)
 Date[1:1], format: NA
> str(oid)
 Date[1:1], format: NA
> class(mrj)
[1] "Date"
> class(oid)
[1] "Date"
> ## But note:
> identical(mrj, oid)
[1] FALSE
> all.equal(mrj, oid)
[1] "'is.NA' value mismatch: 0 in current 1 in target"
## functioning code
data2$mrjdate[2]
data2$oiddate[2]
mrj <- data2$mrjdate[2]
oid <- data2$oiddate[2]
mode(mrj)
mode(oid)
str(mrj)
str(oid)
class(mrj)
class(oid)
# But note:
identical(mrj, oid)
all.equal(mrj, oid)

## This ugly solution does not have the problem.
> data3 <- data1
> data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
+   if (all(is.na<http://is.na>(unlist(data1[row, -1])))) {
+     max_d <- NA
+   } else {
+     max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
+   }
+   max_d}),
+   origin = "1970-01-01")
>
> range(data3$mrjdate[complete.cases(data3$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data3$cocdate[complete.cases(data3$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data3$inhdate[complete.cases(data3$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data3$haldate[complete.cases(data3$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data3$oiddate[complete.cases(data3$oiddate)])
[1] "2006-09-01" "2011-11-04"
>
Working code below.

data3 <- data1
data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na<http://is.na>(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

range(data3$mrjdate[complete.cases(data3$mrjdate)])
range(data3$cocdate[complete.cases(data3$cocdate)])
range(data3$inhdate[complete.cases(data3$inhdate)])
range(data3$haldate[complete.cases(data3$haldate)])
range(data3$oiddate[complete.cases(data3$oiddate)])


On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> The range() with complete.cases() removes NA's for the date variables that are read from a data frame.  However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate().  The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
>
> #################  cut and pasted from the R console ####################
>
> id    mrjdate    cocdate    inhdate    haldate    oiddate
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>> # range of dates
>>
>> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> [1] "2004-11-04" "2009-10-24"
>> range(data2$cocdate[complete.cases(data2$cocdate)])
> [1] "2005-08-10" "2011-10-05"
>> range(data2$inhdate[complete.cases(data2$inhdate)])
> [1] "2005-07-07" "2011-10-13"
>> range(data2$haldate[complete.cases(data2$haldate)])
> [1] "2007-11-07" "2011-11-04"
>> range(data2$oiddate[complete.cases(data2$oiddate)])
> [1] NA           "2011-11-04"
>
>
> ################  reproducible code #############################
>
> library(dplyr)
> library(lubridate)
> library(zoo)
> # data object - description of the
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
>
> # read the data object
>
> data1 <- read.table(textConnection(temp),
>                    colClasses=c("character", "Date", "Date", "Date", "Date"),
>                    header=TRUE, as.is<http://as.is>=TRUE
>                    )
>
>
> # create a new column
>
> data2 <- data1 %>%
>     rowwise() %>%
>      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>                                                               na.rm=TRUE), origin='1970-01-01'))
>
> # print records
>
> print (data2)
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> range(data2$cocdate[complete.cases(data2$cocdate)])
> range(data2$inhdate[complete.cases(data2$inhdate)])
> range(data2$haldate[complete.cases(data2$haldate)])
> range(data2$oiddate[complete.cases(data2$oiddate)])
>
>
>
>
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.