base::mean not consistent about NA/NaN

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

base::mean not consistent about NA/NaN

Jan Gorecki
Hi,
base::mean is not consistent in terms of handling NA/NaN.
Mean should not depend on order of its arguments while currently it is.

    mean(c(NA, NaN))
    #[1] NA
    mean(c(NaN, NA))
    #[1] NaN

I created issue so in case of no replies here status of it can be looked up
at:
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441

Best,
Jan

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: base::mean not consistent about NA/NaN

Ista Zahn
The current behavior is as documented. See ?NA, which says

"Numerical computations using ‘NA’ will normally result in ‘NA’: a
     possible exception is where ‘NaN’ is also involved, in which case
     either might result"

--Ista

On Mon, Jul 2, 2018 at 11:25 AM, Jan Gorecki <[hidden email]> wrote:

> Hi,
> base::mean is not consistent in terms of handling NA/NaN.
> Mean should not depend on order of its arguments while currently it is.
>
>     mean(c(NA, NaN))
>     #[1] NA
>     mean(c(NaN, NA))
>     #[1] NaN
>
> I created issue so in case of no replies here status of it can be looked up
> at:
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441
>
> Best,
> Jan
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: base::mean not consistent about NA/NaN

Duncan Murdoch-2
In reply to this post by Jan Gorecki
On 02/07/2018 11:25 AM, Jan Gorecki wrote:
> Hi,
> base::mean is not consistent in terms of handling NA/NaN.
> Mean should not depend on order of its arguments while currently it is.

The result of mean() can depend on the order even with regular numbers.
For example,

 > x <- rep(c(1, 10^(-15)), 1000000)
 > mean(sort(x)) - 0.5
[1] 5.551115e-16
 > mean(rev(sort(x))) - 0.5
[1] 0


>
>      mean(c(NA, NaN))
>      #[1] NA
>      mean(c(NaN, NA))
>      #[1] NaN
>
> I created issue so in case of no replies here status of it can be looked up
> at:
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441

The help page for ?NaN says,

"Computations involving NaN will return NaN or perhaps NA: which of
those two is not guaranteed and may depend on the R platform (since
compilers may re-order computations)."

And ?NA says,

"Numerical computations using NA will normally result in NA: a possible
exception is where NaN is also involved, in which case either might
result (which may depend on the R platform). "

So I doubt if this inconsistency will be fixed.

Duncan Murdoch

>
> Best,
> Jan
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: base::mean not consistent about NA/NaN

barry rowlingson
In reply to this post by Jan Gorecki
And for a starker example of this (documented) inconsistency,
arithmetic addition is not commutative:

 > NA + NaN
 [1] NA
 > NaN + NA
 [1] NaN



On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch <[hidden email]> wrote:

> On 02/07/2018 11:25 AM, Jan Gorecki wrote:
>> Hi,
>> base::mean is not consistent in terms of handling NA/NaN.
>> Mean should not depend on order of its arguments while currently it is.
>
> The result of mean() can depend on the order even with regular numbers.
> For example,
>
>  > x <- rep(c(1, 10^(-15)), 1000000)
>  > mean(sort(x)) - 0.5
> [1] 5.551115e-16
>  > mean(rev(sort(x))) - 0.5
> [1] 0
>
>
>>
>>      mean(c(NA, NaN))
>>      #[1] NA
>>      mean(c(NaN, NA))
>>      #[1] NaN
>>
>> I created issue so in case of no replies here status of it can be looked up
>> at:
>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441
>
> The help page for ?NaN says,
>
> "Computations involving NaN will return NaN or perhaps NA: which of
> those two is not guaranteed and may depend on the R platform (since
> compilers may re-order computations)."
>
> And ?NA says,
>
> "Numerical computations using NA will normally result in NA: a possible
> exception is where NaN is also involved, in which case either might
> result (which may depend on the R platform). "
>
> So I doubt if this inconsistency will be fixed.
>
> Duncan Murdoch
>
>>
>> Best,
>> Jan
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: base::mean not consistent about NA/NaN

Jan Gorecki
Thank you for interesting examples.
I would find useful to document this behavior also in `?mean`, while `+`
operator is also affected, the `sum` function is not.
For mean, NA / NaN could be handled in loop in summary.c. I assume that
performance penalty of fix is the reason why this inconsistency still
exists.
Jan

On Mon, Jul 2, 2018 at 8:28 PM, Barry Rowlingson <
[hidden email]> wrote:

> And for a starker example of this (documented) inconsistency,
> arithmetic addition is not commutative:
>
>  > NA + NaN
>  [1] NA
>  > NaN + NA
>  [1] NaN
>
>
>
> On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch <[hidden email]>
> wrote:
> > On 02/07/2018 11:25 AM, Jan Gorecki wrote:
> >> Hi,
> >> base::mean is not consistent in terms of handling NA/NaN.
> >> Mean should not depend on order of its arguments while currently it is.
> >
> > The result of mean() can depend on the order even with regular numbers.
> > For example,
> >
> >  > x <- rep(c(1, 10^(-15)), 1000000)
> >  > mean(sort(x)) - 0.5
> > [1] 5.551115e-16
> >  > mean(rev(sort(x))) - 0.5
> > [1] 0
> >
> >
> >>
> >>      mean(c(NA, NaN))
> >>      #[1] NA
> >>      mean(c(NaN, NA))
> >>      #[1] NaN
> >>
> >> I created issue so in case of no replies here status of it can be
> looked up
> >> at:
> >> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441
> >
> > The help page for ?NaN says,
> >
> > "Computations involving NaN will return NaN or perhaps NA: which of
> > those two is not guaranteed and may depend on the R platform (since
> > compilers may re-order computations)."
> >
> > And ?NA says,
> >
> > "Numerical computations using NA will normally result in NA: a possible
> > exception is where NaN is also involved, in which case either might
> > result (which may depend on the R platform). "
> >
> > So I doubt if this inconsistency will be fixed.
> >
> > Duncan Murdoch
> >
> >>
> >> Best,
> >> Jan
> >>
> >>       [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: base::mean not consistent about NA/NaN

barry rowlingson
In reply to this post by barry rowlingson
On Tue, Jul 3, 2018 at 10:12 AM, Jan Gorecki <[hidden email]> wrote:
> Thank you for interesting examples.
> I would find useful to document this behavior also in `?mean`, while `+`
> operator is also affected, the `sum` function is not.

`sum` is "affected" on my system, if you mean:

> sum(c(NA,NaN))
[1] NA
> sum(c(NaN,NA))
[1] NaN

oh, maybe you mean:

> sum(NaN, NA)
[1] NA
> sum(NA, NaN)
[1] NA

But whatever, no money back guarantee:

     Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
     which of those two is not guaranteed and may depend on the R
     platform (since compilers may re-order computations).

> For mean, NA / NaN could be handled in loop in summary.c. I assume that
> performance penalty of fix is the reason why this inconsistency still
> exists.
> Jan
>
> On Mon, Jul 2, 2018 at 8:28 PM, Barry Rowlingson
> <[hidden email]> wrote:
>>
>> And for a starker example of this (documented) inconsistency,
>> arithmetic addition is not commutative:
>>
>>  > NA + NaN
>>  [1] NA
>>  > NaN + NA
>>  [1] NaN
>>
>>
>>
>> On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch <[hidden email]>
>> wrote:
>> > On 02/07/2018 11:25 AM, Jan Gorecki wrote:
>> >> Hi,
>> >> base::mean is not consistent in terms of handling NA/NaN.
>> >> Mean should not depend on order of its arguments while currently it is.
>> >
>> > The result of mean() can depend on the order even with regular numbers.
>> > For example,
>> >
>> >  > x <- rep(c(1, 10^(-15)), 1000000)
>> >  > mean(sort(x)) - 0.5
>> > [1] 5.551115e-16
>> >  > mean(rev(sort(x))) - 0.5
>> > [1] 0
>> >
>> >
>> >>
>> >>      mean(c(NA, NaN))
>> >>      #[1] NA
>> >>      mean(c(NaN, NA))
>> >>      #[1] NaN
>> >>
>> >> I created issue so in case of no replies here status of it can be
>> >> looked up
>> >> at:
>> >> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441
>> >
>> > The help page for ?NaN says,
>> >
>> > "Computations involving NaN will return NaN or perhaps NA: which of
>> > those two is not guaranteed and may depend on the R platform (since
>> > compilers may re-order computations)."
>> >
>> > And ?NA says,
>> >
>> > "Numerical computations using NA will normally result in NA: a possible
>> > exception is where NaN is also involved, in which case either might
>> > result (which may depend on the R platform). "
>> >
>> > So I doubt if this inconsistency will be fixed.
>> >
>> > Duncan Murdoch
>> >
>> >>
>> >> Best,
>> >> Jan
>> >>
>> >>       [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: base::mean not consistent about NA/NaN

Tomas Kalibera
In reply to this post by Jan Gorecki
Yes, the performance overhead of fixing this at R level would be too
large and it would complicate the code significantly. The result of
binary operations involving NA and NaN is hardware dependent (the
propagation of NaN payload) - on some hardware, it actually works the
way we would like - NA is returned - but on some hardware you get NaN or
sometimes NA and sometimes NaN. Also there are C compiler optimizations
re-ordering code, as mentioned in ?NaN. Then there are also external
numerical libraries that do not distinguish NA from NaN (NA is an R
concept). So I am afraid this is unfixable. The disclaimer mentioned by
Duncan is in ?NaN/?NA, which I think is ok - there are so many numerical
functions through which one might run into these problems that it would
be infeasible to document them all. Some functions in fact will preserve
NA, and we would not let NA turn into NaN unnecessarily, but the
disclaimer says it is something not to depend on.

Tomas

On 07/03/2018 11:12 AM, Jan Gorecki wrote:

> Thank you for interesting examples.
> I would find useful to document this behavior also in `?mean`, while `+`
> operator is also affected, the `sum` function is not.
> For mean, NA / NaN could be handled in loop in summary.c. I assume that
> performance penalty of fix is the reason why this inconsistency still
> exists.
> Jan
>
> On Mon, Jul 2, 2018 at 8:28 PM, Barry Rowlingson <
> [hidden email]> wrote:
>
>> And for a starker example of this (documented) inconsistency,
>> arithmetic addition is not commutative:
>>
>>   > NA + NaN
>>   [1] NA
>>   > NaN + NA
>>   [1] NaN
>>
>>
>>
>> On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch <[hidden email]>
>> wrote:
>>> On 02/07/2018 11:25 AM, Jan Gorecki wrote:
>>>> Hi,
>>>> base::mean is not consistent in terms of handling NA/NaN.
>>>> Mean should not depend on order of its arguments while currently it is.
>>> The result of mean() can depend on the order even with regular numbers.
>>> For example,
>>>
>>>   > x <- rep(c(1, 10^(-15)), 1000000)
>>>   > mean(sort(x)) - 0.5
>>> [1] 5.551115e-16
>>>   > mean(rev(sort(x))) - 0.5
>>> [1] 0
>>>
>>>
>>>>       mean(c(NA, NaN))
>>>>       #[1] NA
>>>>       mean(c(NaN, NA))
>>>>       #[1] NaN
>>>>
>>>> I created issue so in case of no replies here status of it can be
>> looked up
>>>> at:
>>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441
>>> The help page for ?NaN says,
>>>
>>> "Computations involving NaN will return NaN or perhaps NA: which of
>>> those two is not guaranteed and may depend on the R platform (since
>>> compilers may re-order computations)."
>>>
>>> And ?NA says,
>>>
>>> "Numerical computations using NA will normally result in NA: a possible
>>> exception is where NaN is also involved, in which case either might
>>> result (which may depend on the R platform). "
>>>
>>> So I doubt if this inconsistency will be fixed.
>>>
>>> Duncan Murdoch
>>>
>>>> Best,
>>>> Jan
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: base::mean not consistent about NA/NaN

Jan Gorecki
Thank you Tomas for detailed explanation. Such a nice description deserves
to be included somewhere in documentation, R-internals maybe.
Regards
Jan

On 18 Jul 2018 18:24, "Tomas Kalibera" <[hidden email]> wrote:

Yes, the performance overhead of fixing this at R level would be too
large and it would complicate the code significantly. The result of
binary operations involving NA and NaN is hardware dependent (the
propagation of NaN payload) - on some hardware, it actually works the
way we would like - NA is returned - but on some hardware you get NaN or
sometimes NA and sometimes NaN. Also there are C compiler optimizations
re-ordering code, as mentioned in ?NaN. Then there are also external
numerical libraries that do not distinguish NA from NaN (NA is an R
concept). So I am afraid this is unfixable. The disclaimer mentioned by
Duncan is in ?NaN/?NA, which I think is ok - there are so many numerical
functions through which one might run into these problems that it would
be infeasible to document them all. Some functions in fact will preserve
NA, and we would not let NA turn into NaN unnecessarily, but the
disclaimer says it is something not to depend on.


Tomas


On 07/03/2018 11:12 AM, Jan Gorecki wrote:

> Thank you for interesting examples.
> I would find useful to document this behavior also in `?mean`, while `+`
> operator is also affected, the `sum` function is not.
> For mean, NA / NaN could be handled in loop in summary.c. I assume that
> performance penalty of fix is the reason why this inconsistency still
> exists.
> Jan
>
> On Mon, Jul 2, 2018 at 8:28 PM, Barry Rowlingson <
> [hidden email]> wrote:
>
>> And for a starker example of this (documented) inconsistency,
>> arithmetic addition is not commutative:
>>
>>   > NA + NaN
>>   [1] NA
>>   > NaN + NA
>>   [1] NaN
>>
>>
>>
>> On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch <[hidden email]>
>> wrote:
>>> On 02/07/2018 11:25 AM, Jan Gorecki wrote:
>>>> Hi,
>>>> base::mean is not consistent in terms of handling NA/NaN.
>>>> Mean should not depend on order of its arguments while currently it is.
>>> The result of mean() can depend on the order even with regular numbers.
>>> For example,
>>>
>>>   > x <- rep(c(1, 10^(-15)), 1000000)
>>>   > mean(sort(x)) - 0.5
>>> [1] 5.551115e-16
>>>   > mean(rev(sort(x))) - 0.5
>>> [1] 0
>>>
>>>
>>>>       mean(c(NA, NaN))
>>>>       #[1] NA
>>>>       mean(c(NaN, NA))
>>>>       #[1] NaN
>>>>
>>>> I created issue so in case of no replies here status of it can be
>> looked up
>>>> at:
>>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441
>>> The help page for ?NaN says,
>>>
>>> "Computations involving NaN will return NaN or perhaps NA: which of
>>> those two is not guaranteed and may depend on the R platform (since
>>> compilers may re-order computations)."
>>>
>>> And ?NA says,
>>>
>>> "Numerical computations using NA will normally result in NA: a possible
>>> exception is where NaN is also involved, in which case either might
>>> result (which may depend on the R platform). "
>>>
>>> So I doubt if this inconsistency will be fixed.
>>>
>>> Duncan Murdoch
>>>
>>>> Best,
>>>> Jan
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel