Sorry, posting back to the list.

Thank you all.

> Hi.

>

> Good point and a good example. Feel free to post to the list. The purpose

> of my reply wasn't to take away Peter's point but to emphasize that

> base::mean() does a two-pass scan over the elements too lower the impact of

> addition of values with widely different values (classical problem in

> numerical analysis). But I can see how it may look like that.

>

> Cheers,

>

> Henrik

>

>

> On Thu, May 21, 2020, 03:21 Morgan Morgan <

[hidden email]>

> wrote:

>

>> Thank you Henrik for the feedback.

>> Note that for idx=4 and refine = TRUE, your equality b==c is FALSE. I

>> think that as Peter said == can't be trusted with FP.

>> His example is good. Here is an even more shocking one.

>> a=0.786546798

>> b=a+ 1e6 -1e6

>> a==b

>> # [1] FALSE

>>

>> Best regards

>> Morgan Jacob

>>

>> On Wed, 20 May 2020, 20:18 Henrik Bengtsson, <

[hidden email]>

>> wrote:

>>

>>> On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel

>>> <

[hidden email]> wrote:

>>> >

>>> > > On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard <

>>>

[hidden email]> wrote:

>>> > >

>>> > > Expected, see FAQ 7.31.

>>> > >

>>> > > You just can't trust == on FP operations. Notice also

>>> >

>>> > Additionally, since you're implementing a "mean" function you are

>>> testing

>>> > against R's mean, you might want to consider that R uses a two-pass

>>> > calculation[1] to reduce floating point precision error.

>>>

>>> This one is important.

>>>

>>> FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to

>>> calculate mean with and without this two-pass calculation;

>>>

>>> > a <- c(x[idx],y[idx],z[idx]) / 3

>>> > b <- mean(c(x[idx],y[idx],z[idx]))

>>> > b == a

>>> [1] FALSE

>>> > b - a

>>> [1] 2.220446e-16

>>>

>>> > c <- matrixStats::mean2(c(x[idx],y[idx],z[idx])) ## default to

>>> refine=TRUE

>>> > b == c

>>> [1] TRUE

>>> > b - c

>>> [1] 0

>>>

>>> > d <- matrixStats::mean2(c(x[idx],y[idx],z[idx]), refine=FALSE)

>>> > a == d

>>> [1] TRUE

>>> > a - d

>>> [1] 0

>>> > c == d

>>> [1] FALSE

>>> > c - d

>>> [1] 2.220446e-16

>>>

>>> Not surprisingly, the two-pass higher-precision version (refine=TRUE)

>>> takes roughly twice as long as the one-pass quick version

>>> (refine=FALSE).

>>>

>>> /Henrik

>>>

>>> >

>>> > Best,

>>> >

>>> > Brodie.

>>> >

>>> > [1]

>>>

https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482>>> >

>>> > > > a2=(z[idx]+x[idx]+y[idx])/3

>>> > > > a2==a

>>> > > [1] FALSE

>>> > > > a2==b

>>> > > [1] TRUE

>>> > >

>>> > > -pd

>>> > >

>>> > > > On 20 May 2020, at 12:40 , Morgan Morgan <

>>>

[hidden email]> wrote:

>>> > > >

>>> > > > Hello R-dev,

>>> > > >

>>> > > > Yesterday, while I was testing the newly implemented function

>>> pmean in

>>> > > > package kit, I noticed a mismatch in the output of the below R

>>> expressions.

>>> > > >

>>> > > > set.seed(123)

>>> > > > n=1e3L

>>> > > > idx=5

>>> > > > x=rnorm(n)

>>> > > > y=rnorm(n)

>>> > > > z=rnorm(n)

>>> > > > a=(x[idx]+y[idx]+z[idx])/3

>>> > > > b=mean(c(x[idx],y[idx],z[idx]))

>>> > > > a==b

>>> > > > # [1] FALSE

>>> > > >

>>> > > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and

>>> many

>>> > > > others the difference is small but still.

>>> > > > Is that expected or is it a bug?

>>> >

>>> > ______________________________________________

>>> >

[hidden email] mailing list

>>> >

https://stat.ethz.ch/mailman/listinfo/r-devel>>>

>>