Precision in R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Precision in R

Iuri Gavronski-2
Hi,

Why sum() on a 10-item vector produces a different value than its
counterpart on a 2-item vector? I understand the problems related to
the arithmetic precision in storing decimal numbers in binary format,
but shouldn't the errors be equal regardless of the method used?

See my example:

> options(digits=22)
> x=rep(.1,10)
> x
 [1] 0.10000000000000001 0.10000000000000001 0.10000000000000001
 [4] 0.10000000000000001 0.10000000000000001 0.10000000000000001
 [7] 0.10000000000000001 0.10000000000000001 0.10000000000000001
[10] 0.10000000000000001
> sum(x)
[1] 1
> y=0
> for (i in 1:10) {y=sum(y+x[i])}
> y
[1] 0.99999999999999989
> y*10^6
[1] 999999.99999999988
> sum(x)*10^6
[1] 1e+06
> z=.1+.1+.1+.1+.1+.1+.1+.1+.1+.1
> z
[1] 0.99999999999999989
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Precision in R

Thierry Onkelinx
This is described in R FAQ 7.31


ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
[hidden email]
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

2018-02-26 2:34 GMT+01:00 Iuri Gavronski <[hidden email]>:

> Hi,
>
> Why sum() on a 10-item vector produces a different value than its
> counterpart on a 2-item vector? I understand the problems related to
> the arithmetic precision in storing decimal numbers in binary format,
> but shouldn't the errors be equal regardless of the method used?
>
> See my example:
>
> > options(digits=22)
> > x=rep(.1,10)
> > x
>  [1] 0.10000000000000001 0.10000000000000001 0.10000000000000001
>  [4] 0.10000000000000001 0.10000000000000001 0.10000000000000001
>  [7] 0.10000000000000001 0.10000000000000001 0.10000000000000001
> [10] 0.10000000000000001
> > sum(x)
> [1] 1
> > y=0
> > for (i in 1:10) {y=sum(y+x[i])}
> > y
> [1] 0.99999999999999989
> > y*10^6
> [1] 999999.99999999988
> > sum(x)*10^6
> [1] 1e+06
> > z=.1+.1+.1+.1+.1+.1+.1+.1+.1+.1
> > z
> [1] 0.99999999999999989
> >
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Precision in R

Jeff Newmiller
In reply to this post by Iuri Gavronski-2
On Sun, 25 Feb 2018, Iuri Gavronski wrote:

> Hi,
>
> Why sum() on a 10-item vector produces a different value than its
> counterpart on a 2-item vector? I understand the problems related to
> the arithmetic precision in storing decimal numbers in binary format,
> but shouldn't the errors be equal regardless of the method used?

Since you understand the problems, why are you asking? More to the point,
why are you asking as if this was a property of R rather than a property
of all typical IEEE754 floating point implementations?

For others Googling this: short answer is don't make your program behavior
depend on the least significant digits... they are not reliable. Read
about the difference between double precision (53 bits or about 15 digits
[1]) and extended precision (63 bits or about 18 digits for Intel math
coprocessors [2]) and ask questions about this topic in an appropriate
forum (e.g. [3]) since this issue can crop up in practically any
programming language.

> See my example:

Examples are good. Diving into numerical analysis theory not so much.

>> options(digits=22)

Don't be misleading... you are not going to get 22digits with standard
numeric types. Note that you get 17 digits below, and two of those are
artifacts pulled from the artificially-expanded extended-precision values
in the coprocessor... the numbers in memory are not that precise. The best
that can be said for using 17digit printed values is that you have the
best shot at being able to reproduce the actual 15 digits of double
precision in a different program or instance of R if you keep them.

>> x=rep(.1,10)
>> x
> [1] 0.10000000000000001 0.10000000000000001 0.10000000000000001
> [4] 0.10000000000000001 0.10000000000000001 0.10000000000000001
> [7] 0.10000000000000001 0.10000000000000001 0.10000000000000001
> [10] 0.10000000000000001

Sitting in memory (not extended precision!)

>> sum(x)
> [1] 1

sum() is vectorized (compact compiled C code)... the intermediate values
can stay in the coprocessor at extended precision. Only the result is
trimmed back to double precision.

>> y=0
>> for (i in 1:10) {y=sum(y+x[i])}

Why are you calling the sum function on a scalar value? The addition has
already taken place...

The actual intermediate values in this for loop get stored in RAM at
double precision and get re-expanded to extended precision each time
an addition occurs and then rounded again when the addition is done.
That is a lot of rounding compared to the sum function.

>> y
> [1] 0.99999999999999989
>> y*10^6
> [1] 999999.99999999988

Note that 10 = 2*5 = b'0010' * b'0101' is not just a power of 2 so both
the binary mantissa and binary exponent are getting modified when you
multiply by powers of 10, so the rounding may be different for various
powers of 10.

>> sum(x)*10^6
> [1] 1e+06
>> z=.1+.1+.1+.1+.1+.1+.1+.1+.1+.1

More double precision.

>> z
> [1] 0.99999999999999989

Similar to for loop.

Don't rely on the last few digits to be stable. Take a course in numerical
analysis if you just cannot take my word for it that those last few digits
are going wander off a bit no matter what you do.

[1] https://en.wikipedia.org/wiki/IEEE_754
[2] https://en.wikipedia.org/wiki/Extended_precision
[3] https://cs.stackexchange.com/search?q=ieee+754

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Precision in R

R help mailing list-2
In reply to this post by Iuri Gavronski-2
In the R expression
   x[1] + x[2]
the result must be stored as a double precision number,
because that is what R "numerics" are.  sum() does not
have to keep its intermediate results as doubles, but
can use quad precision or Kahan's summation algorithm
(both methods involve more than a simple double to
keep track of the running sum) to give more accurate results
for summing vectors.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Feb 25, 2018 at 5:34 PM, Iuri Gavronski <[hidden email]> wrote:

> Hi,
>
> Why sum() on a 10-item vector produces a different value than its
> counterpart on a 2-item vector? I understand the problems related to
> the arithmetic precision in storing decimal numbers in binary format,
> but shouldn't the errors be equal regardless of the method used?
>
> See my example:
>
> > options(digits=22)
> > x=rep(.1,10)
> > x
>  [1] 0.10000000000000001 0.10000000000000001 0.10000000000000001
>  [4] 0.10000000000000001 0.10000000000000001 0.10000000000000001
>  [7] 0.10000000000000001 0.10000000000000001 0.10000000000000001
> [10] 0.10000000000000001
> > sum(x)
> [1] 1
> > y=0
> > for (i in 1:10) {y=sum(y+x[i])}
> > y
> [1] 0.99999999999999989
> > y*10^6
> [1] 999999.99999999988
> > sum(x)*10^6
> [1] 1e+06
> > z=.1+.1+.1+.1+.1+.1+.1+.1+.1+.1
> > z
> [1] 0.99999999999999989
> >
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.