Precision of function mean,bug?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Precision of function mean,bug?

MorganMorgan
Hello R-dev,

Yesterday, while I was testing the newly implemented function pmean in
package kit, I noticed a mismatch in the output of the below R expressions.

set.seed(123)
n=1e3L
idx=5
x=rnorm(n)
y=rnorm(n)
z=rnorm(n)
a=(x[idx]+y[idx]+z[idx])/3
b=mean(c(x[idx],y[idx],z[idx]))
a==b
# [1] FALSE

For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many
others the difference is small but still.
Is that expected or is it a bug?

Thank you
Best Regards
Morgan Jacob

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Precision of function mean,bug?

Peter Dalgaard-2
Expected, see FAQ 7.31.

You just can't trust == on FP operations. Notice also

> a2=(z[idx]+x[idx]+y[idx])/3
> a2==a
[1] FALSE
> a2==b
[1] TRUE

-pd

> On 20 May 2020, at 12:40 , Morgan Morgan <[hidden email]> wrote:
>
> Hello R-dev,
>
> Yesterday, while I was testing the newly implemented function pmean in
> package kit, I noticed a mismatch in the output of the below R expressions.
>
> set.seed(123)
> n=1e3L
> idx=5
> x=rnorm(n)
> y=rnorm(n)
> z=rnorm(n)
> a=(x[idx]+y[idx]+z[idx])/3
> b=mean(c(x[idx],y[idx],z[idx]))
> a==b
> # [1] FALSE
>
> For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many
> others the difference is small but still.
> Is that expected or is it a bug?
>
> Thank you
> Best Regards
> Morgan Jacob
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Precision of function mean,bug?

R devel mailing list
 > On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard <[hidden email]> wrote:
>
> Expected, see FAQ 7.31.
>
> You just can't trust == on FP operations. Notice also

Additionally, since you're implementing a "mean" function you are testing
against R's mean, you might want to consider that R uses a two-pass
calculation[1] to reduce floating point precision error.

Best,

Brodie.

[1] https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482

> > a2=(z[idx]+x[idx]+y[idx])/3
> > a2==a
> [1] FALSE
> > a2==b
> [1] TRUE
>
> -pd
>
> > On 20 May 2020, at 12:40 , Morgan Morgan <[hidden email]> wrote:
> >
> > Hello R-dev,
> >
> > Yesterday, while I was testing the newly implemented function pmean in
> > package kit, I noticed a mismatch in the output of the below R expressions.
> >
> > set.seed(123)
> > n=1e3L
> > idx=5
> > x=rnorm(n)
> > y=rnorm(n)
> > z=rnorm(n)
> > a=(x[idx]+y[idx]+z[idx])/3
> > b=mean(c(x[idx],y[idx],z[idx]))
> > a==b
> > # [1] FALSE
> >
> > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many
> > others the difference is small but still.
> > Is that expected or is it a bug?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Precision of function mean,bug?

Henrik Bengtsson-5
On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel
<[hidden email]> wrote:

>
>  > On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard <[hidden email]> wrote:
> >
> > Expected, see FAQ 7.31.
> >
> > You just can't trust == on FP operations. Notice also
>
> Additionally, since you're implementing a "mean" function you are testing
> against R's mean, you might want to consider that R uses a two-pass
> calculation[1] to reduce floating point precision error.

This one is important.

FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to
calculate mean with and without this two-pass calculation;

> a <- c(x[idx],y[idx],z[idx]) / 3
> b <- mean(c(x[idx],y[idx],z[idx]))
> b == a
[1] FALSE
> b - a
[1] 2.220446e-16

> c <- matrixStats::mean2(c(x[idx],y[idx],z[idx]))  ## default to refine=TRUE
> b == c
[1] TRUE
> b - c
[1] 0

> d <- matrixStats::mean2(c(x[idx],y[idx],z[idx]), refine=FALSE)
> a == d
[1] TRUE
> a - d
[1] 0
> c == d
[1] FALSE
> c - d
[1] 2.220446e-16

Not surprisingly, the two-pass higher-precision version (refine=TRUE)
takes roughly twice as long as the one-pass quick version
(refine=FALSE).

/Henrik

>
> Best,
>
> Brodie.
>
> [1] https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482
>
> > > a2=(z[idx]+x[idx]+y[idx])/3
> > > a2==a
> > [1] FALSE
> > > a2==b
> > [1] TRUE
> >
> > -pd
> >
> > > On 20 May 2020, at 12:40 , Morgan Morgan <[hidden email]> wrote:
> > >
> > > Hello R-dev,
> > >
> > > Yesterday, while I was testing the newly implemented function pmean in
> > > package kit, I noticed a mismatch in the output of the below R expressions.
> > >
> > > set.seed(123)
> > > n=1e3L
> > > idx=5
> > > x=rnorm(n)
> > > y=rnorm(n)
> > > z=rnorm(n)
> > > a=(x[idx]+y[idx]+z[idx])/3
> > > b=mean(c(x[idx],y[idx],z[idx]))
> > > a==b
> > > # [1] FALSE
> > >
> > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many
> > > others the difference is small but still.
> > > Is that expected or is it a bug?
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Precision of function mean,bug?

MorganMorgan
Sorry, posting back to the list.
Thank you all.
Morgan

On Thu, 21 May 2020, 16:33 Henrik Bengtsson, <[hidden email]>
wrote:

> Hi.
>
> Good point and a good example. Feel free to post to the list. The purpose
> of my reply wasn't to take away Peter's point but to emphasize that
> base::mean() does a two-pass scan over the elements too lower the impact of
> addition of values with widely different values (classical problem in
> numerical analysis). But I can see how it may look like that.
>
> Cheers,
>
> Henrik
>
>
> On Thu, May 21, 2020, 03:21 Morgan Morgan <[hidden email]>
> wrote:
>
>> Thank you Henrik for the feedback.
>> Note that for idx=4 and refine = TRUE,  your equality b==c is FALSE. I
>> think that as Peter said == can't be trusted with FP.
>> His example is good. Here is an even more shocking one.
>> a=0.786546798
>> b=a+ 1e6 -1e6
>> a==b
>> # [1] FALSE
>>
>> Best regards
>> Morgan Jacob
>>
>> On Wed, 20 May 2020, 20:18 Henrik Bengtsson, <[hidden email]>
>> wrote:
>>
>>> On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel
>>> <[hidden email]> wrote:
>>> >
>>> >  > On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard <
>>> [hidden email]> wrote:
>>> > >
>>> > > Expected, see FAQ 7.31.
>>> > >
>>> > > You just can't trust == on FP operations. Notice also
>>> >
>>> > Additionally, since you're implementing a "mean" function you are
>>> testing
>>> > against R's mean, you might want to consider that R uses a two-pass
>>> > calculation[1] to reduce floating point precision error.
>>>
>>> This one is important.
>>>
>>> FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to
>>> calculate mean with and without this two-pass calculation;
>>>
>>> > a <- c(x[idx],y[idx],z[idx]) / 3
>>> > b <- mean(c(x[idx],y[idx],z[idx]))
>>> > b == a
>>> [1] FALSE
>>> > b - a
>>> [1] 2.220446e-16
>>>
>>> > c <- matrixStats::mean2(c(x[idx],y[idx],z[idx]))  ## default to
>>> refine=TRUE
>>> > b == c
>>> [1] TRUE
>>> > b - c
>>> [1] 0
>>>
>>> > d <- matrixStats::mean2(c(x[idx],y[idx],z[idx]), refine=FALSE)
>>> > a == d
>>> [1] TRUE
>>> > a - d
>>> [1] 0
>>> > c == d
>>> [1] FALSE
>>> > c - d
>>> [1] 2.220446e-16
>>>
>>> Not surprisingly, the two-pass higher-precision version (refine=TRUE)
>>> takes roughly twice as long as the one-pass quick version
>>> (refine=FALSE).
>>>
>>> /Henrik
>>>
>>> >
>>> > Best,
>>> >
>>> > Brodie.
>>> >
>>> > [1]
>>> https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482
>>> >
>>> > > > a2=(z[idx]+x[idx]+y[idx])/3
>>> > > > a2==a
>>> > > [1] FALSE
>>> > > > a2==b
>>> > > [1] TRUE
>>> > >
>>> > > -pd
>>> > >
>>> > > > On 20 May 2020, at 12:40 , Morgan Morgan <
>>> [hidden email]> wrote:
>>> > > >
>>> > > > Hello R-dev,
>>> > > >
>>> > > > Yesterday, while I was testing the newly implemented function
>>> pmean in
>>> > > > package kit, I noticed a mismatch in the output of the below R
>>> expressions.
>>> > > >
>>> > > > set.seed(123)
>>> > > > n=1e3L
>>> > > > idx=5
>>> > > > x=rnorm(n)
>>> > > > y=rnorm(n)
>>> > > > z=rnorm(n)
>>> > > > a=(x[idx]+y[idx]+z[idx])/3
>>> > > > b=mean(c(x[idx],y[idx],z[idx]))
>>> > > > a==b
>>> > > > # [1] FALSE
>>> > > >
>>> > > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and
>>> many
>>> > > > others the difference is small but still.
>>> > > > Is that expected or is it a bug?
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel