Date class shows Inf as NA; this confuses the use of is.na()

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Date class shows Inf as NA; this confuses the use of is.na()

Werner Grundlingh
In the following example, the date class shows Inf as NA

> as_date(Inf, origin = '1970-01-01')
[1] NA

This is misleading as is.na() reports incorrectly

> is.na(as_date(Inf, origin = '1970-01-01'))
[1] FALSE

The correct approach here would probably to have an Inf (and -Inf)
*displayed* rather than NA.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

R devel mailing list
> as_date
Error: object 'as_date' not found

Must be from some not-named package...

But don't confuse the format of an object when printed with its underlying value:

> as.Date(Inf,origin = '1970-01-01')
[1] NA

> str(as.Date(Inf,origin = '1970-01-01'))
 Date[1:1], format: NA

> as.numeric(as.Date(Inf,origin = '1970-01-01'))
[1] Inf

> is.na(Inf)
[1] FALSE

> is.na(as.Date(Inf,origin = '1970-01-01'))
[1] FALSE

> str(as.Date(27,origin = '1970-01-01'))
 Date[1:1], format: "1970-01-28"

> as.numeric(as.Date(27,origin = '1970-01-01'))
[1] 27

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 6/8/18, 1:02 PM, "R-devel on behalf of Werner Grundlingh" <[hidden email] on behalf of [hidden email]> wrote:

    In the following example, the date class shows Inf as NA
   
    > as_date(Inf, origin = '1970-01-01')
    [1] NA
   
    This is misleading as is.na() reports incorrectly
   
    > is.na(as_date(Inf, origin = '1970-01-01'))
    [1] FALSE
   
    The correct approach here would probably to have an Inf (and -Inf)
    *displayed* rather than NA.
   
    [[alternative HTML version deleted]]
   
    ______________________________________________
    [hidden email] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
   

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Werner Grundlingh
Indeed. as_date is from lubridate, but the same holds for as.Date.

The output and it's interpretation should be consistent, otherwise it leads
to confusion when programming. I understand that the difference exists
after asking a question on Stack Overflow:
  https://stackoverflow.com/q/50766089/914686
This understanding is never mentioned in the documentation - that an Inf
date is actually represented as NA:
  https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/as.Date
So I'm of the impression that the display should be fixed as a first option
(thereby providing clarity/transparency in terms of back-end and output),
or the documentation amended (to highlight this) as a second option.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Joris FA Meys
Hi Werner,

on ?is.na it says:

> The default method for anyNA handles atomic vectors without a class and
NULL.

I hear you, and it is confusing to say the least. Looking deeper, the
culprit seems to be in the conversion of a Date to POSIXlt prior to the
formatting:

> x <- as.Date(Inf,origin = '1970-01-01')
> is.na(as.POSIXlt(x))
[1] TRUE

Given this implicit conversion, I'd argue that as.Date should really return
NA as well when passed an infinite value. The other option is to provide an
is.na method for the Date class, which is -given is.na is an internal
generic- rather trivial:

> is.na.Date <- function(x) is.na(as.POSIXlt(x))
> is.na(x)
[1] TRUE

This might be a workaround for your current problem without needing changes
to R itself. But this will give a "wrong" answer in the sense that this
still works:

> Sys.Date() - x
Time difference of -Inf days

I personally would go for NA as the "correct" date for an infinite value,
but given that this will have implications in other areas, there is a
possibility of breaking code and it should be investigated a bit further
imho.
Cheers
Joris




On Fri, Jun 8, 2018 at 11:21 PM, Werner Grundlingh <[hidden email]>
wrote:

> Indeed. as_date is from lubridate, but the same holds for as.Date.
>
> The output and it's interpretation should be consistent, otherwise it leads
> to confusion when programming. I understand that the difference exists
> after asking a question on Stack Overflow:
>   https://stackoverflow.com/q/50766089/914686
> This understanding is never mentioned in the documentation - that an Inf
> date is actually represented as NA:
>   https://www.rdocumentation.org/packages/base/versions/3.
> 5.0/topics/as.Date
> So I'm of the impression that the display should be fixed as a first option
> (thereby providing clarity/transparency in terms of back-end and output),
> or the documentation amended (to highlight this) as a second option.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Joris FA Meys
And now I've seen I copied the wrong part of ?is.na

> The default method for is.na applied to an atomic vector returns a
logical vector of the same length as its argument x, containing TRUE for
those elements marked NA or, for numeric or complex vectors, NaN, and FALSE
otherwise.

Key point being "atomic vector" here.


On Sat, Jun 9, 2018 at 1:41 PM, Joris Meys <[hidden email]> wrote:

> Hi Werner,
>
> on ?is.na it says:
>
> > The default method for anyNA handles atomic vectors without a class and
> NULL.
>
> I hear you, and it is confusing to say the least. Looking deeper, the
> culprit seems to be in the conversion of a Date to POSIXlt prior to the
> formatting:
>
> > x <- as.Date(Inf,origin = '1970-01-01')
> > is.na(as.POSIXlt(x))
> [1] TRUE
>
> Given this implicit conversion, I'd argue that as.Date should really
> return NA as well when passed an infinite value. The other option is to
> provide an is.na method for the Date class, which is -given is.na is an
> internal generic- rather trivial:
>
> > is.na.Date <- function(x) is.na(as.POSIXlt(x))
> > is.na(x)
> [1] TRUE
>
> This might be a workaround for your current problem without needing
> changes to R itself. But this will give a "wrong" answer in the sense that
> this still works:
>
> > Sys.Date() - x
> Time difference of -Inf days
>
> I personally would go for NA as the "correct" date for an infinite value,
> but given that this will have implications in other areas, there is a
> possibility of breaking code and it should be investigated a bit further
> imho.
> Cheers
> Joris
>
>
>
>
> On Fri, Jun 8, 2018 at 11:21 PM, Werner Grundlingh <[hidden email]>
> wrote:
>
>> Indeed. as_date is from lubridate, but the same holds for as.Date.
>>
>> The output and it's interpretation should be consistent, otherwise it
>> leads
>> to confusion when programming. I understand that the difference exists
>> after asking a question on Stack Overflow:
>>   https://stackoverflow.com/q/50766089/914686
>> This understanding is never mentioned in the documentation - that an Inf
>> date is actually represented as NA:
>>   https://www.rdocumentation.org/packages/base/versions/3.5.0/
>> topics/as.Date
>> So I'm of the impression that the display should be fixed as a first
>> option
>> (thereby providing clarity/transparency in terms of back-end and output),
>> or the documentation amended (to highlight this) as a second option.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
>
> <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
>
> -----------
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Emil
I don't think there's much wrong with is.na(as_date(Inf, origin='1970-01-01'))==FALSE, as there still is some "non-NA-ness" about the value (as difftime shows), but that the output when printing is confusing. The way cat is treating it is clearer: it does print Inf.

So would this be a solution?

format.Date <- function (x, ...)
{
  xx <- format(as.POSIXlt(x), ...)
  names(xx) <- names(x)
  xx[is.na(xx) & !is.na(x)] <- paste('Invalid date:',as.numeric(x[is.na(xx) & !is.na(x)]))
  xx
}

Which causes this behaviour, which I think is clearer:

environment(print.Date) <- .GlobalEnv
x <- as_date(Inf, origin='1970-01-01')
print(x)
# [1] "Invalid date: Inf"

Best regards,
Emil Bode
 
Data-analyst
 
+31 6 43 83 89 33
[hidden email]
 
DANS: Netherlands Institute for Permanent Access to Digital Research Resources
Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] <mailto:[hidden email]> | dans.knaw.nl <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding organisation NWO <http://www.nwo.nl/>.
 
Who will be the winner of the Dutch Data Prize 2018? Go to researchdata.nl to nominate.

On 09/06/2018, 13:52, "R-devel on behalf of Joris Meys" <[hidden email] on behalf of [hidden email]> wrote:

    And now I've seen I copied the wrong part of ?is.na
   
    > The default method for is.na applied to an atomic vector returns a
    logical vector of the same length as its argument x, containing TRUE for
    those elements marked NA or, for numeric or complex vectors, NaN, and FALSE
    otherwise.
   
    Key point being "atomic vector" here.
   
   
    On Sat, Jun 9, 2018 at 1:41 PM, Joris Meys <[hidden email]> wrote:
   
    > Hi Werner,
    >
    > on ?is.na it says:
    >
    > > The default method for anyNA handles atomic vectors without a class and
    > NULL.
    >
    > I hear you, and it is confusing to say the least. Looking deeper, the
    > culprit seems to be in the conversion of a Date to POSIXlt prior to the
    > formatting:
    >
    > > x <- as.Date(Inf,origin = '1970-01-01')
    > > is.na(as.POSIXlt(x))
    > [1] TRUE
    >
    > Given this implicit conversion, I'd argue that as.Date should really
    > return NA as well when passed an infinite value. The other option is to
    > provide an is.na method for the Date class, which is -given is.na is an
    > internal generic- rather trivial:
    >
    > > is.na.Date <- function(x) is.na(as.POSIXlt(x))
    > > is.na(x)
    > [1] TRUE
    >
    > This might be a workaround for your current problem without needing
    > changes to R itself. But this will give a "wrong" answer in the sense that
    > this still works:
    >
    > > Sys.Date() - x
    > Time difference of -Inf days
    >
    > I personally would go for NA as the "correct" date for an infinite value,
    > but given that this will have implications in other areas, there is a
    > possibility of breaking code and it should be investigated a bit further
    > imho.
    > Cheers
    > Joris
    >
    >
    >
    >
    > On Fri, Jun 8, 2018 at 11:21 PM, Werner Grundlingh <[hidden email]>
    > wrote:
    >
    >> Indeed. as_date is from lubridate, but the same holds for as.Date.
    >>
    >> The output and it's interpretation should be consistent, otherwise it
    >> leads
    >> to confusion when programming. I understand that the difference exists
    >> after asking a question on Stack Overflow:
    >>   https://stackoverflow.com/q/50766089/914686
    >> This understanding is never mentioned in the documentation - that an Inf
    >> date is actually represented as NA:
    >>   https://www.rdocumentation.org/packages/base/versions/3.5.0/
    >> topics/as.Date
    >> So I'm of the impression that the display should be fixed as a first
    >> option
    >> (thereby providing clarity/transparency in terms of back-end and output),
    >> or the documentation amended (to highlight this) as a second option.
    >>
    >>         [[alternative HTML version deleted]]
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>
    >
    >
    >
    > --
    > Joris Meys
    > Statistical consultant
    >
    > Department of Data Analysis and Mathematical Modelling
    > Ghent University
    > Coupure Links 653, B-9000 Gent (Belgium)
    >
    > <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
    >
    > -----------
    > Biowiskundedagen 2017-2018
    > http://www.biowiskundedagen.ugent.be/
    >
    > -------------------------------
    > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
    >
   
   
   
    --
    Joris Meys
    Statistical consultant
   
    Department of Data Analysis and Mathematical Modelling
    Ghent University
    Coupure Links 653, B-9000 Gent (Belgium)
    <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
   
    -----------
    Biowiskundedagen 2017-2018
    http://www.biowiskundedagen.ugent.be/
   
    -------------------------------
    Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
   
    [[alternative HTML version deleted]]
   
    ______________________________________________
    [hidden email] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
   

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Martin Maechler
In reply to this post by Joris FA Meys
>>>>> Joris Meys
>>>>>     on Sat, 9 Jun 2018 13:45:21 +0200 writes:

    > And now I've seen I copied the wrong part of ?is.na
    >> The default method for is.na applied to an atomic vector
    >> returns a
    > logical vector of the same length as its argument x,
    > containing TRUE for those elements marked NA or, for
    > numeric or complex vectors, NaN, and FALSE otherwise.

    > Key point being "atomic vector" here.

and a Date vector *is* atomic .. (so I'm confused about what
that issue is .. but read one.


    > On Sat, Jun 9, 2018 at 1:41 PM, Joris Meys
    > <[hidden email]> wrote:

    >> Hi Werner,
    >>
    >> on ?is.na it says:
    >>
    >> > The default method for anyNA handles atomic vectors
    >> without a class and NULL.
    >>
    >> I hear you, and it is confusing to say the least. Looking
    >> deeper, the culprit seems to be in the conversion of a
    >> Date to POSIXlt prior to the formatting:
    >>
    >> > x <- as.Date(Inf,origin = '1970-01-01')
    >> > is.na(as.POSIXlt(x)) [1] TRUE
    >>
    >> Given this implicit conversion, I'd argue that as.Date
    >> should really return NA as well when passed an infinite
    >> value. The other option is to provide an is.na method for
    >> the Date class, which is -given is.na is an internal
    >> generic- rather trivial:
    >>
    >> > is.na.Date <- function(x) is.na(as.POSIXlt(x))
    >> > is.na(x) [1] TRUE
    >>
    >> This might be a workaround for your current problem
    >> without needing changes to R itself. But this will give a
    >> "wrong" answer in the sense that this still works:
    >>
    >> > Sys.Date() - x Time difference of -Inf days
    >>

    >> I personally would go for NA as the "correct" date for an
    >> infinite value, but given that this will have
    >> implications in other areas, there is a possibility of
    >> breaking code and it should be investigated a bit further
    >> imho.  Cheers Joris

Indeed.  I could argue it is wrong to treat '+/- Inf' as NA for
dates (as well as for date times), because the Inf *does*
contain information in some sense:

     Infinitely far in the future
vs   Infinitely far in the past

which may make sense in some case ... in the same sense +Inf and
-Inf do make sense for numbers in some cases.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Joris FA Meys
On Mon, Jun 11, 2018 at 11:12 AM, Martin Maechler <
[hidden email]> wrote:

>
> and a Date vector *is* atomic .. (so I'm confused about what
> that issue is .. but read one.
>

Indeed. I tend to exclude everything with a formal class from "atomic" (eg
factors et al) because they do behave differently sometimes, but
technically that's not correct. Thank you for pointing that out.


> Indeed.  I could argue it is wrong to treat '+/- Inf' as NA for
> dates (as well as for date times), because the Inf *does*
> contain information in some sense:
>
>      Infinitely far in the future
> vs   Infinitely far in the past
>
> which may make sense in some case ... in the same sense +Inf and
> -Inf do make sense for numbers in some cases.
>
> Martin
>

I considered that too. But as shown in the code above: anything that relies
on POSIXlt to process the date, will actually convert the Inf value to NA.

The problem becomes a bit more confusing, as as.POSIXct() does not convert
to NA.

> x <-  as.Date(Inf, origin = '1970-01-01')
> is.na(x)
[1] FALSE
> is.na(as.POSIXct(x))
[1] FALSE
> is.na(as.POSIXlt(x))
[1] TRUE

I can guess why this happens. For a date that's infinitely far in the
future, it is impossible to determine an exact hour, minute, second, day,
month, ... So these values in the POSIXlt "list" format can't be anything
but NA.

So I totally understand the value of having Inf dates. The trade-off to
consider here is whether we strive for consistency among the different
datetime classes, or strive for correct representation of the actual value
of the date.

Cheers
Joris
--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Gabe Becker
In reply to this post by Emil
Emil et al.,


On Mon, Jun 11, 2018 at 1:08 AM, Emil Bode <[hidden email]> wrote:

> I don't think there's much wrong with is.na(as_date(Inf,
> origin='1970-01-01'))==FALSE, as there still is some "non-NA-ness" about
> the value (as difftime shows), but that the output when printing is
> confusing. The way cat is treating it is clearer: it does print Inf.
>
> So would this be a solution?
>
> format.Date <- function (x, ...)
> {
>   xx <- format(as.POSIXlt(x), ...)
>   names(xx) <- names(x)
>   xx[is.na(xx) & !is.na(x)] <- paste('Invalid date:',as.numeric(x[is.na(xx)
> & !is.na(x)]))
>   xx
> }
>
> Which causes this behaviour, which I think is clearer:
>
> environment(print.Date) <- .GlobalEnv
> x <- as_date(Inf, origin='1970-01-01')
> print(x)
> # [1] "Invalid date: Inf"
>

In my opinion, it's either invalid or it isn't. If it's actually invalid,
as_date (and the equivalent core function which is actually relevant on
this list) should fail; because it's an invalid date.

If it *isn't* invalid, having the print method tell users it is seems
problematic.

And I think people seem to be leaning towards it not being invalid. A bit
surprising to me, as my personal first thought was that infinite dates
don't make any sense, but I don't really have a horse in this race and so
defer to the cooler heads that are saying having an infinite date perhaps
should not be disallowed explicitly. If it's not, though, it's not invalid
and we shouldn't confuse users by saying it is, imho.

Best,
~G


>
> Best regards,
> Emil Bode
>
> Data-analyst
>
> +31 6 43 83 89 33
> [hidden email]
>
> DANS: Netherlands Institute for Permanent Access to Digital Research
> Resources
> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 |
> [hidden email] <mailto:[hidden email]> | dans.knaw.nl
> <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
> DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and
> funding organisation NWO <http://www.nwo.nl/>.
>
> Who will be the winner of the Dutch Data Prize 2018? Go to researchdata.nl
> to nominate.
>
> On 09/06/2018, 13:52, "R-devel on behalf of Joris Meys" <
> [hidden email] on behalf of [hidden email]> wrote:
>
>     And now I've seen I copied the wrong part of ?is.na
>
>     > The default method for is.na applied to an atomic vector returns a
>     logical vector of the same length as its argument x, containing TRUE
> for
>     those elements marked NA or, for numeric or complex vectors, NaN, and
> FALSE
>     otherwise.
>
>     Key point being "atomic vector" here.
>
>
>     On Sat, Jun 9, 2018 at 1:41 PM, Joris Meys <[hidden email]>
> wrote:
>
>     > Hi Werner,
>     >
>     > on ?is.na it says:
>     >
>     > > The default method for anyNA handles atomic vectors without a
> class and
>     > NULL.
>     >
>     > I hear you, and it is confusing to say the least. Looking deeper, the
>     > culprit seems to be in the conversion of a Date to POSIXlt prior to
> the
>     > formatting:
>     >
>     > > x <- as.Date(Inf,origin = '1970-01-01')
>     > > is.na(as.POSIXlt(x))
>     > [1] TRUE
>     >
>     > Given this implicit conversion, I'd argue that as.Date should really
>     > return NA as well when passed an infinite value. The other option is
> to
>     > provide an is.na method for the Date class, which is -given is.na
> is an
>     > internal generic- rather trivial:
>     >
>     > > is.na.Date <- function(x) is.na(as.POSIXlt(x))
>     > > is.na(x)
>     > [1] TRUE
>     >
>     > This might be a workaround for your current problem without needing
>     > changes to R itself. But this will give a "wrong" answer in the
> sense that
>     > this still works:
>     >
>     > > Sys.Date() - x
>     > Time difference of -Inf days
>     >
>     > I personally would go for NA as the "correct" date for an infinite
> value,
>     > but given that this will have implications in other areas, there is a
>     > possibility of breaking code and it should be investigated a bit
> further
>     > imho.
>     > Cheers
>     > Joris
>     >
>     >
>     >
>     >
>     > On Fri, Jun 8, 2018 at 11:21 PM, Werner Grundlingh <
> [hidden email]>
>     > wrote:
>     >
>     >> Indeed. as_date is from lubridate, but the same holds for as.Date.
>     >>
>     >> The output and it's interpretation should be consistent, otherwise
> it
>     >> leads
>     >> to confusion when programming. I understand that the difference
> exists
>     >> after asking a question on Stack Overflow:
>     >>   https://stackoverflow.com/q/50766089/914686
>     >> This understanding is never mentioned in the documentation - that
> an Inf
>     >> date is actually represented as NA:
>     >>   https://www.rdocumentation.org/packages/base/versions/3.5.0/
>     >> topics/as.Date
>     >> So I'm of the impression that the display should be fixed as a first
>     >> option
>     >> (thereby providing clarity/transparency in terms of back-end and
> output),
>     >> or the documentation amended (to highlight this) as a second option.
>     >>
>     >>         [[alternative HTML version deleted]]
>     >>
>     >> ______________________________________________
>     >> [hidden email] mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>
>     >
>     >
>     >
>     > --
>     > Joris Meys
>     > Statistical consultant
>     >
>     > Department of Data Analysis and Mathematical Modelling
>     > Ghent University
>     > Coupure Links 653, B-9000 Gent (Belgium)
>     >
>     > <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-
> 9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
>     >
>     > -----------
>     > Biowiskundedagen 2017-2018
>     > http://www.biowiskundedagen.ugent.be/
>     >
>     > -------------------------------
>     > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>     >
>
>
>
>     --
>     Joris Meys
>     Statistical consultant
>
>     Department of Data Analysis and Mathematical Modelling
>     Ghent University
>     Coupure Links 653, B-9000 Gent (Belgium)
>     <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-
> 9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
>
>     -----------
>     Biowiskundedagen 2017-2018
>     http://www.biowiskundedagen.ugent.be/
>
>     -------------------------------
>     Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>         [[alternative HTML version deleted]]
>
>     ______________________________________________
>     [hidden email] mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Emil
I agree that calling it invalid is a bit confusing, but I’m not sure what the wording should be, as the problem is that the conversion to POSIXlt is failing.
The best solution would be to extend the whole POSIXlt-class, but that’s too much work.
I’ve done some experiments, and it also seems that the Date class can store larger values than POSIXlt:
> as.Date(8e9, origin='1970-01-01')==as.Date(9e9, origin='1970-01-01')
[1] FALSE
> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))==as.POSIXlt(as.Date(9e9, origin='1970-01-01'))
[1] TRUE
> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))
[1] "-5877641-06-23 UTC"
# Same for 9e9
> as.Date(8e9, origin='1970-01-01')>Sys.Date()
[1] TRUE
> as.POSIXlt(as.Date(8e9, origin='1970-01-01'))>as.POSIXlt(Sys.Date())
[1] FALSE

So the situation as I see it now:

  *   Having an infinite date may convey some information, so we shouldn’t prohibit it anyway
  *   Idem for very large values (positive or negative)
  *   But we should warn users that their dates may not be neatly representable, that there is no way to use the default-print
  *   So for values where the POSIXlt-print fails, I think it’s best to print the numerical value, along with some text warning the user
So I’ve adapted the format-function a bit more, with behaviour below.
The details can be adapted of course, but I feel it’s best to print some variant of as.numeric(x) if as.POSIXlt(x) turns out to be unreliable, and further leave is.na()


format.Date <- function (x, ...)
{
  xx <- format(as.POSIXlt(x), ...)
  names(xx) <- names(x)
  if(any(!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896))) {
    xx[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)] <-
      paste('Date with numerical value',as.numeric(x[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)]))
    warning('Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.')
  }
  xx
}

With the following results:

> environment(print.Date) <- .GlobalEnv
> as.Date(Inf, origin='1970-01-01')
[1] "Date with numerical value Inf"
Warning message:
In format.Date(x) :
  Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.



From: Gabe Becker <[hidden email]>
Date: Monday, 11 June 2018 at 23:59
To: Emil Bode <[hidden email]>
Cc: Joris Meys <[hidden email]>, Werner Grundlingh <[hidden email]>, "[hidden email]" <[hidden email]>, r-devel <[hidden email]>
Subject: Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

format.Date <- function (x, ...)
{
  xx <- format(as.POSIXlt(x), ...)
  names(xx) <- names(x)
  xx[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)] <- paste('Invalid date:',as.numeric(x[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)]))
  xx
}

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Martin Maechler
>>>>> Emil Bode
>>>>>     on Tue, 12 Jun 2018 12:00:42 +0000 writes:

> I agree that calling it invalid is a bit confusing, but I’m not sure what the wording should be, as the problem is that the conversion to POSIXlt is failing.
> The best solution would be to extend the whole POSIXlt-class, but that’s too much work.
> I’ve done some experiments, and it also seems that the Date class can store larger values than POSIXlt:
> > as.Date(8e9, origin='1970-01-01')==as.Date(9e9, origin='1970-01-01')
> [1] FALSE
> > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))==as.POSIXlt(as.Date(9e9, origin='1970-01-01'))
> [1] TRUE
> > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))
> [1] "-5877641-06-23 UTC"
> # Same for 9e9
> > as.Date(8e9, origin='1970-01-01')>Sys.Date()
> [1] TRUE
> > as.POSIXlt(as.Date(8e9, origin='1970-01-01'))>as.POSIXlt(Sys.Date())
> [1] FALSE
>
> So the situation as I see it now:
>
>   *   Having an infinite date may convey some information, so
>       we shouldn’t prohibit it anyway

>   *   Idem for very large values (positive or negative)

Indeed -- good you found that you don't have to go all the way to Inf
... and that is typical (and the reason why one has to solve the
problem anyway and way Inf is not really a special case in that
sense (but nicely in another sense) !

>   *   But we should warn users that their dates may not be neatly representable, that there is no way to use the default-print
>   *   So for values where the POSIXlt-print fails, I think it’s best to print the numerical value, along with some text warning the user

> So I’ve adapted the format-function a bit more, with behaviour below.
> The details can be adapted of course, but I feel it’s best to print some variant of as.numeric(x) if as.POSIXlt(x) turns out to be unreliable, and further leave is.na()

>
> format.Date <- function (x, ...)
> {
>   xx <- format(as.POSIXlt(x), ...)
>   names(xx) <- names(x)
>   if(any(!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896))) {
>     xx[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)] <-
>       paste('Date with numerical value',as.numeric(x[!is.na(x) & (-719162>as.numeric(x) | as.numeric(x)>2932896)]))
>     warning('Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.')
>   }
>   xx
> }
>
> With the following results:
>
> > environment(print.Date) <- .GlobalEnv
> > as.Date(Inf, origin='1970-01-01')
> [1] "Date with numerical value Inf"
> Warning message:
> In format.Date(x) :
>   Some dates are not in the interval 01-01-01 and 9999-12-31, showing numerical value.
>
This looks somewhat reasonable as a workaround for you and for now.

However, I'd propose another route to go for "the next version of R":
When I consider

  > str(unclass(as.POSIXlt.Date(Sys.time() + 1e50)))
  List of 9
   $ sec  : num 0
   $ min  : int 0
   $ hour : int 0
   $ mday : int 23
   $ mon  : int 5
   $ year : int -5879541
   $ wday : int 2
   $ yday : int 173
   $ isdst: int 0
   - attr(*, "tzone")= chr "UTC"
  >

we see the integer overflow (to negative here) and that all
components but 'sec' (because allow fractions!) are integer.

I think we should allow 'year' to be "double" instead, and so it
could also be +Inf or -Inf and we'd nicely cover
the conversions from and to 'numeric' -- which is really used
internally for dates and date-times in  POSIXct.

Martin

>
> From: Gabe Becker <[hidden email]>
> Date: Monday, 11 June 2018 at 23:59
> To: Emil Bode <[hidden email]>
> Cc: Joris Meys <[hidden email]>, Werner Grundlingh <[hidden email]>, "[hidden email]" <[hidden email]>, r-devel <[hidden email]>
> Subject: Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()
>
> format.Date <- function (x, ...)
> {
>   xx <- format(as.POSIXlt(x), ...)
>   names(xx) <- names(x)
>   xx[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)] <- paste('Invalid date:',as.numeric(x[is.na<http://is.na>(xx) & !is.na<http://is.na>(x)]))
>   xx
> }

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Joris FA Meys
On Tue, Jun 12, 2018 at 6:28 PM, Martin Maechler <[hidden email]
> wrote:

>
> I think we should allow 'year' to be "double" instead, and so it
> could also be +Inf or -Inf and we'd nicely cover
> the conversions from and to 'numeric' -- which is really used
> internally for dates and date-times in  POSIXct.
>
> Martin
>
>
That would be perfect and tackles both consistency with other formats and
the confusing print() output. I'm all for it.
Cheers
Joris



--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Greg Minshall-2
In reply to this post by Martin Maechler
Martin, et al.,

> I think we should allow 'year' to be "double" instead, and so it
> could also be +Inf or -Inf and we'd nicely cover
> the conversions from and to 'numeric' -- which is really used
> internally for dates and date-times in  POSIXct.

storing years as a double makes me worry slightly about
----
> year <- 1e50
> (year+1)-year
[1] 0
----
which is not how one thinks of years (or integers) as behaving.

cheers, Greg

ps -- sorry for the ">" overloading!

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Gabe Becker
Greg,

I see what you mean, but on the other hand, that's not how we think about
real numbers working either, and doubles have that behavior generally. It
might be possible to put checks in (with a potentially non-trivial overhead
cost) to disallow that kind of thing, but again R (and everyone else, I
think?) doesn't do so for regular doubles.

Also, I would expect the year 1e50 and the "year" Inf to be functionally
equivalent in meaning (and largely meaningless) in context.

Best,
~G

On Tue, Jun 12, 2018 at 4:23 PM, Greg Minshall <[hidden email]> wrote:

> Martin, et al.,
>
> > I think we should allow 'year' to be "double" instead, and so it
> > could also be +Inf or -Inf and we'd nicely cover
> > the conversions from and to 'numeric' -- which is really used
> > internally for dates and date-times in  POSIXct.
>
> storing years as a double makes me worry slightly about
> ----
> > year <- 1e50
> > (year+1)-year
> [1] 0
> ----
> which is not how one thinks of years (or integers) as behaving.
>
> cheers, Greg
>
> ps -- sorry for the ">" overloading!
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


--
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Date class shows Inf as NA; this confuses the use of is.na()

Greg Minshall-2
Gabe,

> Also, I would expect the year 1e50 and the "year" Inf to be functionally
> equivalent in meaning (and largely meaningless) in context.

indeed.

thanks, Greg

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel