Quantcast

Precision of summary() when summarizing variables in a data frame

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Precision of summary() when summarizing variables in a data frame

Daniel Malter
Hi,

I summary() a variable with 409908 numeric observations. The variable is part of a data.frame. The problem is that the min and max returned by summary() do not equal the ones returned by min() and max(). Does anybody know why that is?

> min(data$vc)
[1] 15452
> max(data$vc)
[1] 316148
> summary(data$vc)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  15450   21670   40980   55500   63880  316100


sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] sqldf_0.3-5           chron_2.3-39          gsubfn_0.5-5        
[4] proto_0.3-8           RSQLite.extfuns_0.0.1 RSQLite_0.9-4        
[7] DBI_0.2-5    

Thanks much,
Daniel  
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Precision of summary() when summarizing variables in a data frame

jholtman
They are probably the same.  It isjust that summary is printing out 4
significant digits.  Try:

options(digits = 20)



On Tue, Apr 5, 2011 at 12:38 PM, Daniel Malter <[hidden email]> wrote:

> Hi,
>
> I summary() a variable with 409908 numeric observations. The variable is
> part of a data.frame. The problem is that the min and max returned by
> summary() do not equal the ones returned by min() and max(). Does anybody
> know why that is?
>
>> min(data$vc)
> [1] 15452
>> max(data$vc)
> [1] 316148
>> summary(data$vc)
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  15450   21670   40980   55500   63880  316100
>
>
> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] sqldf_0.3-5           chron_2.3-39          gsubfn_0.5-5
> [4] proto_0.3-8           RSQLite.extfuns_0.0.1 RSQLite_0.9-4
> [7] DBI_0.2-5
>
> Thanks much,
> Daniel
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Precision-of-summary-when-summarizing-variables-in-a-data-frame-tp3428570p3428570.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Precision of summary() when summarizing variables in a data frame

Erik Iverson-3


jim holtman wrote:
> They are probably the same.  It isjust that summary is printing out 4
> significant digits.  Try:
>
> options(digits = 20)

FYI, the default summary method also has its own digits argument.

>
>
>
> On Tue, Apr 5, 2011 at 12:38 PM, Daniel Malter <[hidden email]> wrote:
>> Hi,
>>
>> I summary() a variable with 409908 numeric observations. The variable is
>> part of a data.frame. The problem is that the min and max returned by
>> summary() do not equal the ones returned by min() and max(). Does anybody
>> know why that is?
>>
>>> min(data$vc)
>> [1] 15452
>>> max(data$vc)
>> [1] 316148
>>> summary(data$vc)
>>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>  15450   21670   40980   55500   63880  316100
>>
>>
>> sessionInfo()
>> R version 2.11.1 (2010-05-31)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] sqldf_0.3-5           chron_2.3-39          gsubfn_0.5-5
>> [4] proto_0.3-8           RSQLite.extfuns_0.0.1 RSQLite_0.9-4
>> [7] DBI_0.2-5
>>
>> Thanks much,
>> Daniel
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Precision-of-summary-when-summarizing-variables-in-a-data-frame-tp3428570p3428570.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Precision of summary() when summarizing variables in a data frame

Daniel Malter
Thanks all. No I wasn't aware of the fact that summary is rounding in this case.

Da.
Loading...