'split-lapply' vs. 'aggregate'

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

'split-lapply' vs. 'aggregate'

Massimo Bressan
this might be a trivial question (eventually sorry for that!) but I definitely can not catch the problem here...

please consider the following reproducible example: why of different results through 'split-lapply' vs. 'aggregate'?
I've been also through a check against different methods (e.g. data.table, dplyr) and the results were always consistent with 'split-lapply' but apparently not with 'aggregate'

I must be certainly wrong!
could someone point me in the right direction?

thanks

##

s <- split(airquality, airquality$Month)
ls <- lapply(s, function(x) {colMeans(x[c("Ozone", "Solar.R", "Wind")], na.rm = TRUE)})
do.call(rbind, ls)

# slightly different results with
aggregate(.~ Month, airquality[-c(4,6)], mean, na.rm=TRUE)

##

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: 'split-lapply' vs. 'aggregate'

Fox, John
Dear Massimo,

The difference is in the handling of NAs. Try, e.g., airquality <- na.omit(airquality) and compare again.

Best,
 John

-----------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario
Canada L8S 4M4
web: socserv.mcmaster.ca/jfox


________________________________________
From: R-help [[hidden email]] on behalf of Massimo Bressan [[hidden email]]
Sent: March 27, 2016 5:45 PM
To: [hidden email]
Subject: [R] 'split-lapply' vs. 'aggregate'

this might be a trivial question (eventually sorry for that!) but I definitely can not catch the problem here...

please consider the following reproducible example: why of different results through 'split-lapply' vs. 'aggregate'?
I've been also through a check against different methods (e.g. data.table, dplyr) and the results were always consistent with 'split-lapply' but apparently not with 'aggregate'

I must be certainly wrong!
could someone point me in the right direction?

thanks

##

s <- split(airquality, airquality$Month)
ls <- lapply(s, function(x) {colMeans(x[c("Ozone", "Solar.R", "Wind")], na.rm = TRUE)})
do.call(rbind, ls)

# slightly different results with
aggregate(.~ Month, airquality[-c(4,6)], mean, na.rm=TRUE)

##

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.