On Sun, 25 Feb 2018, Iuri Gavronski wrote:

> Hi,

>

> Why sum() on a 10-item vector produces a different value than its

> counterpart on a 2-item vector? I understand the problems related to

> the arithmetic precision in storing decimal numbers in binary format,

> but shouldn't the errors be equal regardless of the method used?

Since you understand the problems, why are you asking? More to the point,

why are you asking as if this was a property of R rather than a property

of all typical IEEE754 floating point implementations?

For others Googling this: short answer is don't make your program behavior

depend on the least significant digits... they are not reliable. Read

about the difference between double precision (53 bits or about 15 digits

[1]) and extended precision (63 bits or about 18 digits for Intel math

coprocessors [2]) and ask questions about this topic in an appropriate

forum (e.g. [3]) since this issue can crop up in practically any

programming language.

> See my example:

Examples are good. Diving into numerical analysis theory not so much.

>> options(digits=22)

Don't be misleading... you are not going to get 22digits with standard

numeric types. Note that you get 17 digits below, and two of those are

artifacts pulled from the artificially-expanded extended-precision values

in the coprocessor... the numbers in memory are not that precise. The best

that can be said for using 17digit printed values is that you have the

best shot at being able to reproduce the actual 15 digits of double

precision in a different program or instance of R if you keep them.

>> x=rep(.1,10)

>> x

> [1] 0.10000000000000001 0.10000000000000001 0.10000000000000001

> [4] 0.10000000000000001 0.10000000000000001 0.10000000000000001

> [7] 0.10000000000000001 0.10000000000000001 0.10000000000000001

> [10] 0.10000000000000001

Sitting in memory (not extended precision!)

>> sum(x)

> [1] 1

sum() is vectorized (compact compiled C code)... the intermediate values

can stay in the coprocessor at extended precision. Only the result is

trimmed back to double precision.

>> y=0

>> for (i in 1:10) {y=sum(y+x[i])}

Why are you calling the sum function on a scalar value? The addition has

already taken place...

The actual intermediate values in this for loop get stored in RAM at

double precision and get re-expanded to extended precision each time

an addition occurs and then rounded again when the addition is done.

That is a lot of rounding compared to the sum function.

>> y

> [1] 0.99999999999999989

>> y*10^6

> [1] 999999.99999999988

Note that 10 = 2*5 = b'0010' * b'0101' is not just a power of 2 so both

the binary mantissa and binary exponent are getting modified when you

multiply by powers of 10, so the rounding may be different for various

powers of 10.

>> sum(x)*10^6

> [1] 1e+06

>> z=.1+.1+.1+.1+.1+.1+.1+.1+.1+.1

More double precision.

>> z

> [1] 0.99999999999999989

Similar to for loop.

Don't rely on the last few digits to be stable. Take a course in numerical

analysis if you just cannot take my word for it that those last few digits

are going wander off a bit no matter what you do.

[1]

https://en.wikipedia.org/wiki/IEEE_754[2]

https://en.wikipedia.org/wiki/Extended_precision[3]

https://cs.stackexchange.com/search?q=ieee+754---------------------------------------------------------------------------

Jeff Newmiller The ..... ..... Go Live...

DCN:<

[hidden email]> Basics: ##.#. ##.#. Live Go...

Live: OO#.. Dead: OO#.. Playing

Research Engineer (Solar/Batteries O.O#. #.O#. with

/Software/Embedded Controllers) .OO#. .OO#. rocks...1k

______________________________________________

[hidden email] mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.