as.Date(Inf) displays as 'NA' but is actually 'Inf'

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

as.Date(Inf) displays as 'NA' but is actually 'Inf'

Richard White-2
Hi,

I think I've discovered a bug in base R.

Basically, when using 'Inf' as as 'Date', is is visually displayed as
'NA', but R still treats it as 'Inf'. So it is very confusing to work
with, and can easily lead to errors:

# Visually displays as NA
 > as.Date(Inf, origin="2018-01-01")
[1] NA

# Visually displays as NA
 > str(as.Date(Inf, origin="2018-01-01"))
Date[1:1], format: NA

# Is NOT NA
 > is.na(as.Date(Inf, origin="2018-01-01"))
[1] FALSE

# Is still Inf
 > is.infinite(as.Date(Inf, origin="2018-01-01"))
[1] TRUE

This gets really problematic when you are collapsing dates over groups
and you want to find the first date of a group. Because min() returns
Inf if there is no data:

# Visually displays as NA
 > as.Date(min(), origin="2018-01-01")
[1] NA
Warning message: In min() : no non-missing arguments to min; returning Inf

# Visually displays as NA
 > str(as.Date(min(), origin="2018-01-01"))
Date[1:1], format: NA
Warning message: In min() : no non-missing arguments to min; returning Inf

# Is not NA
 > is.na(as.Date(min(), origin="2018-01-01"))
[1] FALSE
Warning message: In min() : no non-missing arguments to min; returning Inf

# This is bad!
 > as.Date(min(), origin="2018-01-01") > "2018-01-01"
[1] TRUE
Warning message: In min() : no non-missing arguments to min; returning Inf

Here is my sessionInfo():

 > sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
LC_MONETARY=C.UTF-8
[6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base loaded via a
namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19

 > Sys.getlocale()
[1]
"LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C"

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.Date(Inf) displays as 'NA' but is actually 'Inf'

Gabriel Becker-2
Richard,

Well others may chime in here, but from a mathematical point of view, the
concept of "infinite days from right now" is well-defined, so it maybe a
"valid" date in that sense, but what day and month it will be (year will be
Inf) are indeterminate/not well defined. Those are rightfully, NA, it
seems?

I mean you could disallow dates to take Inf at all, ever. I don't feel
strongly one way or the other about that, personally. That said, if inf
dates are allowed, its not clear to me that displaying the "Formatted" date
string as NA, even if the value isn't,  is wrong given it can't be
determined for that "date" is. It could be displayed differently, I
suppose, but all the ones I can think of off the top of my head would be
problematic and probably break lots of formatted-dates parsing code out
there in the wild (and in R, I would guess). Things like displaying
"Inf-NA-NA", or just "Inf". Neither of those are going to handle a
read-write round-trip well, I think.

So my personal don't-really-have-a-hat-in-the-ring opinion would be to
either leave it as is, or force as.Date(Inf, bla) to actually be NA.

Best,
~G

On Tue, Mar 5, 2019 at 12:06 PM Richard White <[hidden email]> wrote:

> Hi,
>
> I think I've discovered a bug in base R.
>
> Basically, when using 'Inf' as as 'Date', is is visually displayed as
> 'NA', but R still treats it as 'Inf'. So it is very confusing to work
> with, and can easily lead to errors:
>
> # Visually displays as NA
>  > as.Date(Inf, origin="2018-01-01")
> [1] NA
>
> # Visually displays as NA
>  > str(as.Date(Inf, origin="2018-01-01"))
> Date[1:1], format: NA
>
> # Is NOT NA
>  > is.na(as.Date(Inf, origin="2018-01-01"))
> [1] FALSE
>
> # Is still Inf
>  > is.infinite(as.Date(Inf, origin="2018-01-01"))
> [1] TRUE
>
> This gets really problematic when you are collapsing dates over groups
> and you want to find the first date of a group. Because min() returns
> Inf if there is no data:
>
> # Visually displays as NA
>  > as.Date(min(), origin="2018-01-01")
> [1] NA
> Warning message: In min() : no non-missing arguments to min; returning Inf
>
> # Visually displays as NA
>  > str(as.Date(min(), origin="2018-01-01"))
> Date[1:1], format: NA
> Warning message: In min() : no non-missing arguments to min; returning Inf
>
> # Is not NA
>  > is.na(as.Date(min(), origin="2018-01-01"))
> [1] FALSE
> Warning message: In min() : no non-missing arguments to min; returning Inf
>
> # This is bad!
>  > as.Date(min(), origin="2018-01-01") > "2018-01-01"
> [1] TRUE
> Warning message: In min() : no non-missing arguments to min; returning Inf
>
> Here is my sessionInfo():
>
>  > sessionInfo()
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 9 (stretch)
> Matrix products: default
> BLAS: /usr/lib/openblas-base/libblas.so.3
> LAPACK: /usr/lib/libopenblasp-r0.2.19.so
>
> locale:
> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
> LC_MONETARY=C.UTF-8
> [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base loaded via a
> namespace (and not attached):
> [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19
>
>  > Sys.getlocale()
> [1]
>
> "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C"
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.Date(Inf) displays as 'NA' but is actually 'Inf'

R devel mailing list
format.Date runs into trouble long before Inf:
  > as.Date("2018-03-05") + c(2147466052, 2147466053)
  [1] "5881580-07-11"  "-5877641-06-23"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Mar 5, 2019 at 2:33 PM Gabriel Becker <[hidden email]> wrote:

> Richard,
>
> Well others may chime in here, but from a mathematical point of view, the
> concept of "infinite days from right now" is well-defined, so it maybe a
> "valid" date in that sense, but what day and month it will be (year will be
> Inf) are indeterminate/not well defined. Those are rightfully, NA, it
> seems?
>
> I mean you could disallow dates to take Inf at all, ever. I don't feel
> strongly one way or the other about that, personally. That said, if inf
> dates are allowed, its not clear to me that displaying the "Formatted" date
> string as NA, even if the value isn't,  is wrong given it can't be
> determined for that "date" is. It could be displayed differently, I
> suppose, but all the ones I can think of off the top of my head would be
> problematic and probably break lots of formatted-dates parsing code out
> there in the wild (and in R, I would guess). Things like displaying
> "Inf-NA-NA", or just "Inf". Neither of those are going to handle a
> read-write round-trip well, I think.
>
> So my personal don't-really-have-a-hat-in-the-ring opinion would be to
> either leave it as is, or force as.Date(Inf, bla) to actually be NA.
>
> Best,
> ~G
>
> On Tue, Mar 5, 2019 at 12:06 PM Richard White <[hidden email]> wrote:
>
> > Hi,
> >
> > I think I've discovered a bug in base R.
> >
> > Basically, when using 'Inf' as as 'Date', is is visually displayed as
> > 'NA', but R still treats it as 'Inf'. So it is very confusing to work
> > with, and can easily lead to errors:
> >
> > # Visually displays as NA
> >  > as.Date(Inf, origin="2018-01-01")
> > [1] NA
> >
> > # Visually displays as NA
> >  > str(as.Date(Inf, origin="2018-01-01"))
> > Date[1:1], format: NA
> >
> > # Is NOT NA
> >  > is.na(as.Date(Inf, origin="2018-01-01"))
> > [1] FALSE
> >
> > # Is still Inf
> >  > is.infinite(as.Date(Inf, origin="2018-01-01"))
> > [1] TRUE
> >
> > This gets really problematic when you are collapsing dates over groups
> > and you want to find the first date of a group. Because min() returns
> > Inf if there is no data:
> >
> > # Visually displays as NA
> >  > as.Date(min(), origin="2018-01-01")
> > [1] NA
> > Warning message: In min() : no non-missing arguments to min; returning
> Inf
> >
> > # Visually displays as NA
> >  > str(as.Date(min(), origin="2018-01-01"))
> > Date[1:1], format: NA
> > Warning message: In min() : no non-missing arguments to min; returning
> Inf
> >
> > # Is not NA
> >  > is.na(as.Date(min(), origin="2018-01-01"))
> > [1] FALSE
> > Warning message: In min() : no non-missing arguments to min; returning
> Inf
> >
> > # This is bad!
> >  > as.Date(min(), origin="2018-01-01") > "2018-01-01"
> > [1] TRUE
> > Warning message: In min() : no non-missing arguments to min; returning
> Inf
> >
> > Here is my sessionInfo():
> >
> >  > sessionInfo()
> > R version 3.5.0 (2018-04-23)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Debian GNU/Linux 9 (stretch)
> > Matrix products: default
> > BLAS: /usr/lib/openblas-base/libblas.so.3
> > LAPACK: /usr/lib/libopenblasp-r0.2.19.so
> >
> > locale:
> > [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
> > LC_MONETARY=C.UTF-8
> > [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base loaded via a
> > namespace (and not attached):
> > [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19
> >
> >  > Sys.getlocale()
> > [1]
> >
> >
> "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C"
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.Date(Inf) displays as 'NA' but is actually 'Inf'

Richard White-2
In reply to this post by Gabriel Becker-2
Hi Gabriel,

The point is that it *visually* displays as NA, but is.na() still
responds as FALSE.

When I (and I am sure many people) see an NA, we then use is.na(). If we
see Inf displayed, we then use is.infinite(). With as.Date() this breaks
down.

I'm not arguing that as.Date(Inf) should be coerced to NA. I'm arguing
that as.Date(Inf) should be *visually* displayed as Inf (i.e. the
truth!). I doubt this would break any existing code, because
as.Date(Inf) acts as Inf in every way possible, except for when you
visually look at the output printed on the screen.

William - For all the other Date bugs, they don't visually display false
information about the variable's contents. They might give wrong output,
but the output displayed is what exists inside the variable.

If we can't trust the R console to display the truth, then we are in a
lot of trouble.

 > a <- as.Date(Inf, origin="2018-01-01")
 > a
[1] NA
 > is.na(a)
[1] FALSE

Richard

Gabriel Becker wrote on 06/03/2019 00:33:

> Richard,
>
> Well others may chime in here, but from a mathematical point of view,
> the concept of "infinite days from right now" is well-defined, so it
> maybe a "valid" date in that sense, but what day and month it will be
> (year will be Inf) are indeterminate/not well defined. Those are
> rightfully, NA, it seems?
>
> I mean you could disallow dates to take Inf at all, ever. I don't feel
> strongly one way or the other about that, personally. That said, if
> inf dates are allowed, its not clear to me that displaying the
> "Formatted" date string as NA, even if the value isn't,  is wrong
> given it can't be determined for that "date" is. It could be displayed
> differently, I suppose, but all the ones I can think of off the top of
> my head would be problematic and probably break lots of
> formatted-dates parsing code out there in the wild (and in R, I would
> guess). Things like displaying "Inf-NA-NA", or just "Inf". Neither of
> those are going to handle a read-write round-trip well, I think.
>
> So my personal don't-really-have-a-hat-in-the-ring opinion would be to
> either leave it as is, or force as.Date(Inf, bla) to actually be NA.
>
> Best,
> ~G
>
> On Tue, Mar 5, 2019 at 12:06 PM Richard White <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     I think I've discovered a bug in base R.
>
>     Basically, when using 'Inf' as as 'Date', is is visually displayed as
>     'NA', but R still treats it as 'Inf'. So it is very confusing to work
>     with, and can easily lead to errors:
>
>     # Visually displays as NA
>      > as.Date(Inf, origin="2018-01-01")
>     [1] NA
>
>     # Visually displays as NA
>      > str(as.Date(Inf, origin="2018-01-01"))
>     Date[1:1], format: NA
>
>     # Is NOT NA
>      > is.na <http://is.na>(as.Date(Inf, origin="2018-01-01"))
>     [1] FALSE
>
>     # Is still Inf
>      > is.infinite(as.Date(Inf, origin="2018-01-01"))
>     [1] TRUE
>
>     This gets really problematic when you are collapsing dates over
>     groups
>     and you want to find the first date of a group. Because min() returns
>     Inf if there is no data:
>
>     # Visually displays as NA
>      > as.Date(min(), origin="2018-01-01")
>     [1] NA
>     Warning message: In min() : no non-missing arguments to min;
>     returning Inf
>
>     # Visually displays as NA
>      > str(as.Date(min(), origin="2018-01-01"))
>     Date[1:1], format: NA
>     Warning message: In min() : no non-missing arguments to min;
>     returning Inf
>
>     # Is not NA
>      > is.na <http://is.na>(as.Date(min(), origin="2018-01-01"))
>     [1] FALSE
>     Warning message: In min() : no non-missing arguments to min;
>     returning Inf
>
>     # This is bad!
>      > as.Date(min(), origin="2018-01-01") > "2018-01-01"
>     [1] TRUE
>     Warning message: In min() : no non-missing arguments to min;
>     returning Inf
>
>     Here is my sessionInfo():
>
>      > sessionInfo()
>     R version 3.5.0 (2018-04-23)
>     Platform: x86_64-pc-linux-gnu (64-bit)
>     Running under: Debian GNU/Linux 9 (stretch)
>     Matrix products: default
>     BLAS: /usr/lib/openblas-base/libblas.so.3
>     LAPACK: /usr/lib/libopenblasp-r0.2.19.so
>     <http://libopenblasp-r0.2.19.so>
>
>     locale:
>     [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
>     LC_MONETARY=C.UTF-8
>     [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
>     LC_TELEPHONE=C
>     [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>
>     attached base packages:
>     [1] stats graphics grDevices utils datasets methods base loaded via a
>     namespace (and not attached):
>     [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19
>
>      > Sys.getlocale()
>     [1]
>     "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C"
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.Date(Inf) displays as 'NA' but is actually 'Inf'

Gabriel Becker-2
On Tue, Mar 5, 2019 at 9:54 PM Richard White <[hidden email]> wrote:

> Hi Gabriel,
>
> The point is that it *visually* displays as NA, but is.na() still
> responds as FALSE.
>
> When I (and I am sure many people) see an NA, we then use is.na(). If we
> see Inf displayed, we then use is.infinite(). With as.Date() this breaks
> down.
>
> I'm not arguing that as.Date(Inf) should be coerced to NA. I'm arguing
> that as.Date(Inf) should be *visually* displayed as Inf (i.e. the truth!).
> I doubt this would break any existing code, because as.Date(Inf) acts as
> Inf in every way possible, except for when you visually look at the output
> printed on the screen.
>
> William - For all the other Date bugs, they don't visually display false
> information about the variable's contents. They might give wrong output,
> but the output displayed is what exists inside the variable.
>
> If we can't trust the R console to display the truth, then we are in a lot
> of trouble.
>

Well, I think it (subtly) actually is the truth though. What is displayed
when you print a date is the *formatted date string, not the numeric value
stored within the date*. The formatted date string of the infinite date, is
actually, correctly,  NA, because, for the reasons I pointed out in my last
post, it is indeterminate.

> x = as.Date(Inf, origin = "2018-01-01")

> format(x)

[1] NA


So that is what is happening, both technically, but also conceptually. For
the record, I'd be surprised by that too, but I think its a situation of
pieces working correctly individually, but together having a correct but
unintuitive behavior.

Others may feel differently though, thats just my read on it.

Best,
~G



> > a <- as.Date(Inf, origin="2018-01-01")
> > a
> [1] NA
> > is.na(a)
> [1] FALSE
>
> Richard
>
> Gabriel Becker wrote on 06/03/2019 00:33:
>
> Richard,
>
> Well others may chime in here, but from a mathematical point of view, the
> concept of "infinite days from right now" is well-defined, so it maybe a
> "valid" date in that sense, but what day and month it will be (year will be
> Inf) are indeterminate/not well defined. Those are rightfully, NA, it
> seems?
>
> I mean you could disallow dates to take Inf at all, ever. I don't feel
> strongly one way or the other about that, personally. That said, if inf
> dates are allowed, its not clear to me that displaying the "Formatted" date
> string as NA, even if the value isn't,  is wrong given it can't be
> determined for that "date" is. It could be displayed differently, I
> suppose, but all the ones I can think of off the top of my head would be
> problematic and probably break lots of formatted-dates parsing code out
> there in the wild (and in R, I would guess). Things like displaying
> "Inf-NA-NA", or just "Inf". Neither of those are going to handle a
> read-write round-trip well, I think.
>
> So my personal don't-really-have-a-hat-in-the-ring opinion would be to
> either leave it as is, or force as.Date(Inf, bla) to actually be NA.
>
> Best,
> ~G
>
> On Tue, Mar 5, 2019 at 12:06 PM Richard White <[hidden email]> wrote:
>
>> Hi,
>>
>> I think I've discovered a bug in base R.
>>
>> Basically, when using 'Inf' as as 'Date', is is visually displayed as
>> 'NA', but R still treats it as 'Inf'. So it is very confusing to work
>> with, and can easily lead to errors:
>>
>> # Visually displays as NA
>>  > as.Date(Inf, origin="2018-01-01")
>> [1] NA
>>
>> # Visually displays as NA
>>  > str(as.Date(Inf, origin="2018-01-01"))
>> Date[1:1], format: NA
>>
>> # Is NOT NA
>>  > is.na(as.Date(Inf, origin="2018-01-01"))
>> [1] FALSE
>>
>> # Is still Inf
>>  > is.infinite(as.Date(Inf, origin="2018-01-01"))
>> [1] TRUE
>>
>> This gets really problematic when you are collapsing dates over groups
>> and you want to find the first date of a group. Because min() returns
>> Inf if there is no data:
>>
>> # Visually displays as NA
>>  > as.Date(min(), origin="2018-01-01")
>> [1] NA
>> Warning message: In min() : no non-missing arguments to min; returning Inf
>>
>> # Visually displays as NA
>>  > str(as.Date(min(), origin="2018-01-01"))
>> Date[1:1], format: NA
>> Warning message: In min() : no non-missing arguments to min; returning Inf
>>
>> # Is not NA
>>  > is.na(as.Date(min(), origin="2018-01-01"))
>> [1] FALSE
>> Warning message: In min() : no non-missing arguments to min; returning Inf
>>
>> # This is bad!
>>  > as.Date(min(), origin="2018-01-01") > "2018-01-01"
>> [1] TRUE
>> Warning message: In min() : no non-missing arguments to min; returning Inf
>>
>> Here is my sessionInfo():
>>
>>  > sessionInfo()
>> R version 3.5.0 (2018-04-23)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Debian GNU/Linux 9 (stretch)
>> Matrix products: default
>> BLAS: /usr/lib/openblas-base/libblas.so.3
>> LAPACK: /usr/lib/libopenblasp-r0.2.19.so
>>
>> locale:
>> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
>> LC_MONETARY=C.UTF-8
>> [6] LC_MESSAGES=C LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base loaded via a
>> namespace (and not attached):
>> [1] compiler_3.5.0 tools_3.5.0 yaml_2.1.19
>>
>>  > Sys.getlocale()
>> [1]
>>
>> "LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C"
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.Date(Inf) displays as 'NA' but is actually 'Inf'

Martin Maechler
>>>>> Gabriel Becker
>>>>>     on Tue, 5 Mar 2019 22:01:37 -0800 writes:

    > On Tue, Mar 5, 2019 at 9:54 PM Richard White <[hidden email]> wrote:
    >> Hi Gabriel,
    >>
    >> The point is that it *visually* displays as NA, but is.na() still
    >> responds as FALSE.
    >>
    >> When I (and I am sure many people) see an NA, we then use is.na(). If we
    >> see Inf displayed, we then use is.infinite(). With as.Date() this breaks
    >> down.
    >>
    >> I'm not arguing that as.Date(Inf) should be coerced to NA. I'm arguing
    >> that as.Date(Inf) should be *visually* displayed as Inf (i.e. the truth!).
    >> I doubt this would break any existing code, because as.Date(Inf) acts as
    >> Inf in every way possible, except for when you visually look at the output
    >> printed on the screen.
    >>
    >> William - For all the other Date bugs, they don't visually display false
    >> information about the variable's contents. They might give wrong output,
    >> but the output displayed is what exists inside the variable.
    >>
    >> If we can't trust the R console to display the truth, then we are in a lot
    >> of trouble.
    >>

    > Well, I think it (subtly) actually is the truth though. What is displayed
    > when you print a date is the *formatted date string, not the numeric value
    > stored within the date*. The formatted date string of the infinite date, is
    > actually, correctly,  NA, because, for the reasons I pointed out in my last
    > post, it is indeterminate.

    >> x = as.Date(Inf, origin = "2018-01-01")

    >> format(x)

    > [1] NA


    > So that is what is happening, both technically, but also conceptually. For
    > the record, I'd be surprised by that too, but I think its a situation of
    > pieces working correctly individually, but together having a correct but
    > unintuitive behavior.

    > Others may feel differently though, thats just my read on it.

    > Best,
    > ~G

Thank you Richard and Gabe and Bill (Dunlap),
I agree with both of you that the behavior is suprising (to > 99.9% of useRs).

Gabe very nicely explains how it happens and also why it does
make some sense *and* that a change may be problematic.

However, the "principle of least surprise" I've learned very long ago
from Doug Bates is good "guiding" principle for software design
(if you allow to weight it with other principles, etc).

Here is a bit of slightly more principled code to show the
phenomenon, including the fact noticed by Bill that both
as.Date() and format.Date() should probably be tweaked such as
to signal warnings (e.g. on integer overflow for too large numbers).

## -------------------------------------------------------------------------
xDates <- lapply(c(-Inf, Inf, NA, NaN,
                   1e9, 4e9, 1e100, .Machine$double.xmax),
                 as.Date, origin = "2000-01-01")
str(xDates) # --> first 4 *all* show as  NA
sapply(xDates, is.na) # the two +-Inf are not NA
(f.D <- sapply(xDates, format))# 1..4: NA, then "negative" but all the same (?!)
stopifnot(is.na(f.D)[1:4]) # the formats (of 1..4) *are* all NA !!
## show their true internals -- still contain what was put there :
for(d in xDates) dput(d)
## -------------------------------------------------------------------------

produces

> xDates <- lapply(c(-Inf, Inf, NA, NaN,
+                    1e9, 4e9, 1e100, .Machine$double.xmax),
+                  as.Date, origin = "2000-01-01")
> str(xDates) # --> first 4 *all* show as  NA
List of 8
 $ : Date[1:1], format: NA
 $ : Date[1:1], format: NA
 $ : Date[1:1], format: NA
 $ : Date[1:1], format: NA
 $ : Date[1:1], format: "2739907-01-04"
 $ : Date[1:1], format: "-5877641-06-23"
 $ : Date[1:1], format: "-5877641-06-23"
 $ : Date[1:1], format: "-5877641-06-23"
> sapply(xDates, is.na) # the two +-Inf are not NA
[1] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
> (f.D <- sapply(xDates, format))# 1..4: NA, then "negative" but all the same (?!)
[1] NA               NA               NA               NA               "2739907-01-04"  "-5877641-06-23"
[7] "-5877641-06-23" "-5877641-06-23"
> stopifnot(is.na(f.D)[1:4]) # the formats (of 1..4) *are* all NA !!
> ## show their true internals -- still contain what was put there :
> for(d in xDates) dput(d)
structure(-Inf, class = "Date")
structure(Inf, class = "Date")
structure(NA_real_, class = "Date")
structure(NaN, class = "Date")
structure(1000010957, class = "Date")
structure(4000010957, class = "Date")
structure(1e+100, class = "Date")
structure(1.79769313486232e+308, class = "Date")
>

---------

What if we left NA ( NA_character_ specifically ) as result for format(),
but changed the print() method so it gives better information
here ?

I would argue that -Inf and Inf should show differently than
true NA's or NaN's .. not the least because infinitely past and
infinitely into the future are different concepts.

Martin Maechler
ETH Zurich (and R Core team)

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.Date(Inf) displays as 'NA' but is actually 'Inf'

Martin Maechler
>>>>> Martin Maechler
>>>>>     on Wed, 6 Mar 2019 11:51:33 +0100 writes:

>>>>> Gabriel Becker
>>>>>     on Tue, 5 Mar 2019 22:01:37 -0800 writes:

    >> On Tue, Mar 5, 2019 at 9:54 PM Richard White <[hidden email]> wrote:
    >>> Hi Gabriel,
    >>>
    >>> The point is that it *visually* displays as NA, but is.na() still
    >>> responds as FALSE.
    >>>
    >>> When I (and I am sure many people) see an NA, we then use is.na(). If we
    >>> see Inf displayed, we then use is.infinite(). With as.Date() this breaks
    >>> down.
    >>>
    >>> I'm not arguing that as.Date(Inf) should be coerced to NA. I'm arguing
    >>> that as.Date(Inf) should be *visually* displayed as Inf (i.e. the truth!).
    >>> I doubt this would break any existing code, because as.Date(Inf) acts as
    >>> Inf in every way possible, except for when you visually look at the output
    >>> printed on the screen.
    >>>
    >>> William - For all the other Date bugs, they don't visually display false
    >>> information about the variable's contents. They might give wrong output,
    >>> but the output displayed is what exists inside the variable.
    >>>
    >>> If we can't trust the R console to display the truth, then we are in a lot
    >>> of trouble.
    >>>

    >> Well, I think it (subtly) actually is the truth though. What is displayed
    >> when you print a date is the *formatted date string, not the numeric value
    >> stored within the date*. The formatted date string of the infinite date, is
    >> actually, correctly,  NA, because, for the reasons I pointed out in my last
    >> post, it is indeterminate.

    >>> x = as.Date(Inf, origin = "2018-01-01")

    >>> format(x)

    >> [1] NA


    >> So that is what is happening, both technically, but also conceptually. For
    >> the record, I'd be surprised by that too, but I think its a situation of
    >> pieces working correctly individually, but together having a correct but
    >> unintuitive behavior.

    >> Others may feel differently though, thats just my read on it.

    >> Best,
    >> ~G

    > Thank you Richard and Gabe and Bill (Dunlap),
    > I agree with both of you that the behavior is suprising (to > 99.9% of useRs).

    > Gabe very nicely explains how it happens and also why it does
    > make some sense *and* that a change may be problematic.

    > However, the "principle of least surprise" I've learned very long ago
    > from Doug Bates is good "guiding" principle for software design
    > (if you allow to weight it with other principles, etc).

    > Here is a bit of slightly more principled code to show the
    > phenomenon, including the fact noticed by Bill that both
    > as.Date() and format.Date() should probably be tweaked such as
    > to signal warnings (e.g. on integer overflow for too large numbers).

    > ## -------------------------------------------------------------------------
    > xDates <- lapply(c(-Inf, Inf, NA, NaN,
    > 1e9, 4e9, 1e100, .Machine$double.xmax),
    > as.Date, origin = "2000-01-01")
    > str(xDates) # --> first 4 *all* show as  NA
    > sapply(xDates, is.na) # the two +-Inf are not NA
    > (f.D <- sapply(xDates, format))# 1..4: NA, then "negative" but all the same (?!)
    > stopifnot(is.na(f.D)[1:4]) # the formats (of 1..4) *are* all NA !!
    > ## show their true internals -- still contain what was put there :
    > for(d in xDates) dput(d)
    > ## -------------------------------------------------------------------------

    > produces

    >> xDates <- lapply(c(-Inf, Inf, NA, NaN,
    > +                    1e9, 4e9, 1e100, .Machine$double.xmax),
    > +                  as.Date, origin = "2000-01-01")
    >> str(xDates) # --> first 4 *all* show as  NA
    > List of 8
    > $ : Date[1:1], format: NA
    > $ : Date[1:1], format: NA
    > $ : Date[1:1], format: NA
    > $ : Date[1:1], format: NA
    > $ : Date[1:1], format: "2739907-01-04"
    > $ : Date[1:1], format: "-5877641-06-23"
    > $ : Date[1:1], format: "-5877641-06-23"
    > $ : Date[1:1], format: "-5877641-06-23"
    >> sapply(xDates, is.na) # the two +-Inf are not NA
    > [1] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE
    >> (f.D <- sapply(xDates, format))# 1..4: NA, then "negative" but all the same (?!)
    > [1] NA               NA               NA               NA               "2739907-01-04"  "-5877641-06-23"
    > [7] "-5877641-06-23" "-5877641-06-23"
    >> stopifnot(is.na(f.D)[1:4]) # the formats (of 1..4) *are* all NA !!
    >> ## show their true internals -- still contain what was put there :
    >> for(d in xDates) dput(d)
    > structure(-Inf, class = "Date")
    > structure(Inf, class = "Date")
    > structure(NA_real_, class = "Date")
    > structure(NaN, class = "Date")
    > structure(1000010957, class = "Date")
    > structure(4000010957, class = "Date")
    > structure(1e+100, class = "Date")
    > structure(1.79769313486232e+308, class = "Date")
    >>

    > ---------

    > What if we left NA ( NA_character_ specifically ) as result for format(),
    > but changed the print() method so it gives better information
    > here ?

    > I would argue that -Inf and Inf should show differently than
    > true NA's or NaN's .. not the least because infinitely past and
    > infinitely into the future are different concepts.

    > Martin Maechler
    > ETH Zurich (and R Core team)


One change that would solve these problems would be to allow
<POSIXlt> [["year"]]  to become "double" instead of "integer".
Then  as.POSIXlt()  would return different things, no integer
overflow and still contain the correct numbers which it
currently cannot (but should at least warn for integer overflow !).

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel