summary.default rounding on numeric seems inconsistent with other R behaviors

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

summary.default rounding on numeric seems inconsistent with other R behaviors

John Mount
I was wondering if it would make sense to change the default behavior of the following:

summary(15555L)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   15560   15560   15560   15560   15560   15560

summary.default on numeric values rounds values (not just presentation) to getOption("digits")-3L (or four) digits by default, making those values surprising and less suitable for further calculation.  Summary on matrix and data.frame do not do so.

It seems it would be nice to have x=15555L; summary(x)[['Min.']] == min(x) evaluate to TRUE.  I know one can alter behavior by changing the global “digits” option, but I don’t know what other impacts that might have.  Ideally I would think summary.default would not round its values at all, but use digits to control presentation (by overriding print and such).  Even in presentation the rounding without switching to scientific notation (such as 1.556e+4) is a bit surprising (I understand rounding and scientific notation are two different presentation issues, but new users are very confused that something that appears to be an integer has been rounded).

Example:

summary(data.frame(x=15555))
##        x        
##  Min.   :15555  
##  1st Qu.:15555  
##  Median :15555  
##  Mean   :15555  
##  3rd Qu.:15555  
##  Max.   :15555  
summary(15555)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   15560   15560   15560   15560   15560   15560

I have a (bit whiny) polemic trying to explain the pain point here http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/ <http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/> (I am not trying to be rude, more I am trying to emphasize why this can be confusing to new users).



---------------
John Mount
http://www.win-vector.com/ <http://www.win-vector.com/>
Our book: Practical Data Science with R http://www.manning.com/zumel/ <http://www.manning.com/zumel/>




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: summary.default rounding on numeric seems inconsistent with other R behaviors

Jim Porzak
Concur.
I would argue the issue is more critical when sharing results (say
summary() in a RMarkdown) with our business partners.



On Fri, Aug 19, 2016 at 8:04 AM, John Mount <[hidden email]> wrote:

> I was wondering if it would make sense to change the default behavior of
> the following:
>
> summary(15555L)
> ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> ##   15560   15560   15560   15560   15560   15560
>
> summary.default on numeric values rounds values (not just presentation) to
> getOption("digits")-3L (or four) digits by default, making those values
> surprising and less suitable for further calculation.  Summary on matrix
> and data.frame do not do so.
>
> It seems it would be nice to have x=15555L; summary(x)[['Min.']] == min(x)
> evaluate to TRUE.  I know one can alter behavior by changing the global
> “digits” option, but I don’t know what other impacts that might have.
> Ideally I would think summary.default would not round its values at all,
> but use digits to control presentation (by overriding print and such).
> Even in presentation the rounding without switching to scientific notation
> (such as 1.556e+4) is a bit surprising (I understand rounding and
> scientific notation are two different presentation issues, but new users
> are very confused that something that appears to be an integer has been
> rounded).
>
> Example:
>
> summary(data.frame(x=15555))
> ##        x
> ##  Min.   :15555
> ##  1st Qu.:15555
> ##  Median :15555
> ##  Mean   :15555
> ##  3rd Qu.:15555
> ##  Max.   :15555
> summary(15555)
> ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> ##   15560   15560   15560   15560   15560   15560
>
> I have a (bit whiny) polemic trying to explain the pain point here
> http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/
> <http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/>
> (I am not trying to be rude, more I am trying to emphasize why this can be
> confusing to new users).
>
>
>
> ---------------
> John Mount
> http://www.win-vector.com/ <http://www.win-vector.com/>
> Our book: Practical Data Science with R http://www.manning.com/zumel/ <
> http://www.manning.com/zumel/>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel




--

Best,
Jim Porzak
DS4CI.org  <http://www.ds4ci.org/>
LinkedIn.com/in/JimPorzak <http://www.linkedin.com/in/jimporzak>
use R! Group SF: meetup.com/R-Users/ <http://www.meetup.com/R-Users/>
R Beginners, Berkeley: meetup.com/r-enthusiasts/
<http://www.meetup.com/r-enthusiasts/>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: summary.default rounding on numeric seems inconsistent with other R behaviors

Simone Giannerini
In reply to this post by John Mount
John,

I had raised the matter ten years ago, and I was told that the topic was
already very^3 old

https://stat.ethz.ch/pipermail/r-devel/2006-September/042684.html

there is some discussion on its origin and also a declaration of intents to
change the default behaviour, which, unfortunately, remained a declaration.
I agree that R could do better here, let's hope in less than ten years
though. ;-)

Kind regards,

Simone

On Fri, Aug 19, 2016 at 5:04 PM, John Mount <[hidden email]> wrote:

> I was wondering if it would make sense to change the default behavior of
> the following:
>
> summary(15555L)
> ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> ##   15560   15560   15560   15560   15560   15560
>
> summary.default on numeric values rounds values (not just presentation) to
> getOption("digits")-3L (or four) digits by default, making those values
> surprising and less suitable for further calculation.  Summary on matrix
> and data.frame do not do so.
>
> It seems it would be nice to have x=15555L; summary(x)[['Min.']] == min(x)
> evaluate to TRUE.  I know one can alter behavior by changing the global
> “digits” option, but I don’t know what other impacts that might have.
> Ideally I would think summary.default would not round its values at all,
> but use digits to control presentation (by overriding print and such).
> Even in presentation the rounding without switching to scientific notation
> (such as 1.556e+4) is a bit surprising (I understand rounding and
> scientific notation are two different presentation issues, but new users
> are very confused that something that appears to be an integer has been
> rounded).
>
> Example:
>
> summary(data.frame(x=15555))
> ##        x
> ##  Min.   :15555
> ##  1st Qu.:15555
> ##  Median :15555
> ##  Mean   :15555
> ##  3rd Qu.:15555
> ##  Max.   :15555
> summary(15555)
> ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> ##   15560   15560   15560   15560   15560   15560
>
> I have a (bit whiny) polemic trying to explain the pain point here
> http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/
> <http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/>
> (I am not trying to be rude, more I am trying to emphasize why this can be
> confusing to new users).
>
>
>
> ---------------
> John Mount
> http://www.win-vector.com/ <http://www.win-vector.com/>
> Our book: Practical Data Science with R http://www.manning.com/zumel/ <
> http://www.manning.com/zumel/>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel




--
___________________________________________________

Simone Giannerini
Dipartimento di Scienze Statistiche "Paolo Fortunati"
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098262  Fax: +39 051 232153
http://www2.stat.unibo.it/giannerini/
___________________________________________________

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: summary.default rounding on numeric seems inconsistent with other R behaviors

Dirk Eddelbuettel

It is the old story of defined behaviour and expected outcomes. Hard to
change now.

So I would suggest you do something like this in your ~/.Rprofile:

R> smry <- function(...) summary(..., digits=6)
R> smry(155555L)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 155555  155555  155555  155555  155555  155555
R>

Maybe call it Summary() instead.

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: summary.default rounding on numeric seems inconsistent with other R behaviors

Martin Maechler
>>>>> Dirk Eddelbuettel <[hidden email]>
>>>>>     on Fri, 19 Aug 2016 11:40:05 -0500 writes:

    > It is the old story of defined behaviour and expected outcomes. Hard to
    > change now.

yes...  not impossible though... see below

    > So I would suggest you do something like this in your ~/.Rprofile:

    R> smry <- function(...) summary(..., digits=6)
    R> smry(155555L)
    > Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    > 155555  155555  155555  155555  155555  155555
    R>

    > Maybe call it Summary() instead.

yes, do use a different name.   There other such functions, 'summarize()'.

Simone wrote

> I had raised the matter ten years ago, and I was told that the topic was
> already very^3 old
>
> https://stat.ethz.ch/pipermail/r-devel/2006-September/042684.html
>
> there is some discussion on its origin and also a declaration of intents to
> change the default behaviour, which, unfortunately, remained a declaration.
> I agree that R could do better here, let's hope in less than ten years
> though. ;-)

and the 2006 thread he mentions is basically a similar question
and a reply by me that I agreed to some extent that a change was
desirable ... originally we had adhered to the S "standard"
which became the S+ one and at that time I did still have access
to a running instance of S-PLUS 6.2 where I had seen that
Insightful (the company selling curating and selling S-PLUS)
also had decided to change the ~15 year old S "standard"... and
indeed I was implicitly *asking* for proposals of such a change,
but I think I never saw a (careful) proposal.

In the spirit of probably 99% of other "base R" code, a change
should really *not* round __at all__ in the summary() methods,
but *only* in the print() methods of such summary() results.

OTOH, for back compatibility, if a user does use  summary(.., digits=.)
explicitly, these digits should be 'obeyed' of course.

I think summary(<1-variable>)  could easily, and relatively "back-compatibly"
be changed in the above vain.

One "real problem" is the wrong decision (also from S and S-PLUS
times IIRC) to return a "character" matrix for
   summary(<data.frame>, ..)
or summary(<matrix>, ..)
(For a data frame, I think it should return a list() of
 single-variable summary()es, or then a numeric matrix .. in
 both cases have a good print() method)

because when you return a character matrix, all the numbers are
already rounded, ... and if we follow the above approach they
would have to be rounded further... ``the horror''

I wonder how much code out there is relying on the internal
structure of  summary(<data.frame>).. because that is the one
part I'd definitely want to change, too.


Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: summary.default rounding on numeric seems inconsistent with other R behaviors

Martin Maechler
>>>>> Martin Maechler <[hidden email]>
>>>>>     on Tue, 23 Aug 2016 14:33:58 +0200 writes:

>>>>> Dirk Eddelbuettel <[hidden email]>
>>>>>     on Fri, 19 Aug 2016 11:40:05 -0500 writes:

    >> It is the old story of defined behaviour and expected outcomes. Hard to
    >> change now.

    > yes...  not impossible though... see below

    >> So I would suggest you do something like this in your ~/.Rprofile:

    R> smry <- function(...) summary(..., digits=6)
    R> smry(155555L)
    >> Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    >> 155555  155555  155555  155555  155555  155555
    R>

    >> Maybe call it Summary() instead.

    > yes, do use a different name.   There other such functions, 'summarize()'.

    > Simone wrote

    >> I had raised the matter ten years ago, and I was told that the topic was
    >> already very^3 old
    >>
    >> https://stat.ethz.ch/pipermail/r-devel/2006-September/042684.html
    >>
    >> there is some discussion on its origin and also a declaration of intents to
    >> change the default behaviour, which, unfortunately, remained a declaration.
    >> I agree that R could do better here, let's hope in less than ten years
    >> though. ;-)

    > and the 2006 thread he mentions is basically a similar question
    > and a reply by me that I agreed to some extent that a change was
    > desirable ... originally we had adhered to the S "standard"
    > which became the S+ one and at that time I did still have access
    > to a running instance of S-PLUS 6.2 where I had seen that
    > Insightful (the company selling curating and selling S-PLUS)
    > also had decided to change the ~15 year old S "standard"... and
    > indeed I was implicitly *asking* for proposals of such a change,
    > but I think I never saw a (careful) proposal.

    > In the spirit of probably 99% of other "base R" code, a change
    > should really *not* round __at all__ in the summary() methods,
    > but *only* in the print() methods of such summary() results.

    > OTOH, for back compatibility, if a user does use  summary(.., digits=.)
    > explicitly, these digits should be 'obeyed' of course.

    > I think summary(<1-variable>)  could easily, and relatively "back-compatibly"
    > be changed in the above vain.

    > One "real problem" is the wrong decision (also from S and S-PLUS
    > times IIRC) to return a "character" matrix for
    > summary(<data.frame>, ..)
    > or summary(<matrix>, ..)
    > (For a data frame, I think it should return a list() of
    > single-variable summary()es, or then a numeric matrix .. in
    > both cases have a good print() method)

    > because when you return a character matrix, all the numbers are
    > already rounded, ... and if we follow the above approach they
    > would have to be rounded further... ``the horror''

    > I wonder how much code out there is relying on the internal
    > structure of  summary(<data.frame>).. because that is the one
    > part I'd definitely want to change, too.

[Talking to myself .. ;-)]
Yes, but that's the tough part to change.

This thread's topic is really only about changing summary.default(),
and I have started testing such a change now, and that does seem
very sensible:

- No rounding in summary.default(),  but
- (almost) back-compatible rounding in its print() method.

My current plan is to commit this to R-devel in a day or so,
unless unforeseen issues emerge.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: summary.default rounding on numeric seems inconsistent with other R behaviors

John Mount

> On Aug 24, 2016, at 2:36 AM, Martin Maechler <[hidden email]> wrote:
>
>>>>>>
>
> [Talking to myself .. ;-)]
> Yes, but that's the tough part to change.
>
> This thread's topic is really only about changing summary.default(),
> and I have started testing such a change now, and that does seem
> very sensible:
>
> - No rounding in summary.default(),  but
> - (almost) back-compatible rounding in its print() method.
>
> My current plan is to commit this to R-devel in a day or so,
> unless unforeseen issues emerge.
>
> Martin
>


That is potentially a very good outcome.  Thank you so much for producing and testing a patch.

---------------
John Mount
http://www.win-vector.com/ <http://www.win-vector.com/>
Our book: Practical Data Science with R http://www.manning.com/zumel/ <http://www.manning.com/zumel/>




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: summary.default rounding on numeric seems inconsistent with other R behaviors

Martin Maechler
>>>>> John Mount <[hidden email]>
>>>>>     on Wed, 24 Aug 2016 07:25:50 -0700 writes:

    >> On Aug 24, 2016, at 2:36 AM, Martin Maechler
    >> <[hidden email]> wrote:
    >>
    >>>>>>>
    >>
    >> [Talking to myself .. ;-)] Yes, but that's the tough part
    >> to change.
    >>
    >> This thread's topic is really only about changing
    >> summary.default(), and I have started testing such a
    >> change now, and that does seem very sensible:
    >>
    >> - No rounding in summary.default(), but - (almost)
    >> back-compatible rounding in its print() method.
    >>
    >> My current plan is to commit this to R-devel in a day or
    >> so, unless unforeseen issues emerge.
    >>
    >> Martin
    >>

    > That is potentially a very good outcome.  Thank you so
    > much for producing and testing a patch.

I have now committed such a change to R-devel:

------------------------------------------------------------------------
r71150 | maechler | 2016-08-25 21:57:19 +0200 (Thu, 25 Aug 2016) | 1 line
Changed paths:
   M /trunk/doc/NEWS.Rd
   M /trunk/src/library/base/R/summary.R
   M /trunk/src/library/base/man/summary.Rd
   M /trunk/src/library/stats/R/ecdf.R
   M /trunk/tests/Examples/stats-Ex.Rout.save
   M /trunk/tests/reg-tests-2.Rout.save

summary.default() no longer rounds by default; just *prints* rounded
------------------------------------------------------------------------


I do expect quite a few packages giving slightly changed output,
typically uniformly not-worse one,  but just "typically".

Note that I did also have to patch   stats:::print.summary.ecdf()
because that had relied on the fact that summary(<numeric>) did
round itself already.
Other useR's code may need similar changes... and so this *is* a
user visible change, listed accordingly in NEWS (the above doc/NEWS.Rd in
the sources).

I hope very much that the overall and longer term benefit will
vastly outweigh the nuisance (to people publishing, e.g.) that
quite a few "basic" outputs will slightly change.

The benefit for maintainers and old timers like me will be that
we will not need to answer this (non-official) FAQ nor excuse a
peculiar behavior in the future .....
But yes, I expect a flurry of questions starting in April 2017,
and hope that the smart readers of this list will share the load
answering them .. ;-)


Martin Maechler
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel