Max vs summary inconsistency

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Max vs summary inconsistency

Adam D. I. Kramer-2
Hello,

I'm having the following questionable behavior:

> summary(m)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
       1   13000   26280   25890   38550   50910
> max(m)
[1] 50912

> typeof(m)
[1] "integer"
> class(m)
[1] "integer"

...it seems to me like max() and summary(m)[6] ought to return the same
number. Am I doing something wrong?

I'm running R 2.5.1 (2007-06-27), installed on MacOSX from the dmg file
found on CRAN.

--
Adam D. I. Kramer
Ph.D. Student, University of Oregon
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Max vs summary inconsistency

Thomas Lumley
On Mon, 27 Aug 2007, Adam D. I. Kramer wrote:

> Hello,
>
> I'm having the following questionable behavior:
>
>> summary(m)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>       1   13000   26280   25890   38550   50910
>> max(m)
> [1] 50912
>
>> typeof(m)
> [1] "integer"
>> class(m)
> [1] "integer"
>
> ...it seems to me like max() and summary(m)[6] ought to return the same
> number. Am I doing something wrong?
>

They do return the same number, they just print it differently. summary() prints four significant digits by default.

      -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Max vs summary inconsistency

François Pinard
In reply to this post by Adam D. I. Kramer-2
[Adam D. I. Kramer]

>I'm having the following questionable behavior:

>> summary(m)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>       1   13000   26280   25890   38550   50910
>> max(m)
>[1] 50912

>...it seems to me like max() and summary(m)[6] ought to return the same
>number.  Am I doing something wrong?

Some may say that you did not scrutinize the documentation enough, as
"summary" artificially limits the number of significant digits.

However, this question reoccurs often and regularly in these mailing
lists, so at last, maybe something should be done about it, beyond
documenting how it works.  Overall, too many users got mislead, that one
may not so bluntly assert they are all wrong.

For example, resorting to scientific notation whenever non significant
zero digits would have otherwise been printed.  This should clarify
a bit that the printing precision got artificially limited.

--
François Pinard   http://pinard.progiciels-bpi.ca

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Max vs summary inconsistency

Adam D. I. Kramer-2

On Mon, 27 Aug 2007, François Pinard wrote:

>>> summary(m)
>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>       1   13000   26280   25890   38550   50910
>>> max(m)
>> [1] 50912
>
>> ...it seems to me like max() and summary(m)[6] ought to return the same
>> number.  Am I doing something wrong?
>
> Some may say that you did not scrutinize the documentation enough, as
> "summary" artificially limits the number of significant digits.
Indeed, several have said so in private email as well as email to the list.
Thanks to all, apologies for my lack of scrutiny.

> However, this question reoccurs often and regularly in these mailing
> lists, so at last, maybe something should be done about it, beyond
> documenting how it works.  Overall, too many users got mislead, that one
> may not so bluntly assert they are all wrong.

I would agree, and not only because I was misled: Several people are
scrutinizing the RESPONSE of summary()'s output, and noticing it is
incorrect.

However, it is very VERY likely that many more are NOT scrutinizing it, and
as such are forming false beliefs about their data sets, which may be
subsequently published or used in further analyses.

Taking a small step in the implementation of summary() to potentially
prevent the publication of incorrect data seems worthwhile. Certainly, any
researcher should check their output in many ways, but it makes no sense to
me that summary() would round its output to 4 significant digits by default.

> For example, resorting to scientific notation whenever non significant
> zero digits would have otherwise been printed.  This should clarify a bit
> that the printing precision got artificially limited.

I think this is a great solution, though I'm not sure whether scripts that
use summary() would break if passed a number in scientific notation.

That said, scripts that use summary() are probably assuming that the number
reported is maximally precise, and thus are making the same mistake I
did...and thus should indeed break!

--
Adam Kramer
Ph.D. Student, Social Psychology
University of Oregon
[hidden email]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.