Quantiles of a subset of data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Quantiles of a subset of data

bradleyd
This post has NOT been accepted by the mailing list yet.
Excuse the request from an R novice! I have a data frame (DATA) that has two numeric columns (YEAR and DAY) and 4000 rows. For each YEAR I need to determine the 10% and 90% quantiles of DAY. I'm sure this is easy enough, but I am a new to this.

> quantile(DATA$DAY,c(0.1,0.9))
10% 90%
 12  29

But this is for the entire 4000 rows, when I need it to be for each YEAR. Is there no way to use a "by" argument in the quantile function?

Thanks for any help you can provide.
David
Reply | Threaded
Open this post in threaded view
|

Re: Quantiles of a subset of data

Pete Brecknock
bradleyd wrote
Excuse the request from an R novice! I have a data frame (DATA) that has two numeric columns (YEAR and DAY) and 4000 rows. For each YEAR I need to determine the 10% and 90% quantiles of DAY. I'm sure this is easy enough, but I am a new to this.

> quantile(DATA$DAY,c(0.1,0.9))
10% 90%
 12  29

But this is for the entire 4000 rows, when I need it to be for each YEAR. Is there no way to use a "by" argument in the quantile function?

Thanks for any help you can provide.
David
check out

?aggregate or ?by should be of help

HTH

Pete


Reply | Threaded
Open this post in threaded view
|

Re: Quantiles of a subset of data

bradleyd
This post has NOT been accepted by the mailing list yet.
Thanks for your help Pete. I can almost get it to work with;

> by(day,year,quantile)

but this only gives me  0%  25%  50%  75% 100%, not the ones I'm looking for, 10% and 90%.

I have tried;

> by(day,year,quantile(c(0.1, 0.9))) but this is rejected by
Error in FUN(X[[1L]], ...) : could not find function "FUN"




Reply | Threaded
Open this post in threaded view
|

Re: Quantiles of a subset of data

Pete Brecknock
bradleyd wrote
Thanks for your help Pete. I can almost get it to work with;

> by(day,year,quantile)

but this only gives me  0%  25%  50%  75% 100%, not the ones I'm looking for, 10% and 90%.

I have tried;

> by(day,year,quantile(c(0.1, 0.9))) but this is rejected by
Error in FUN(X[[1L]], ...) : could not find function "FUN"
Need to add the quantiles of interest .....

# Dummy Data
d <- data.frame(year=c(rep(2010,10),rep(2011,10),rep(2012,10)), quantity = c(1:30))

# Quantiles by Year
by(d$quantity,d$year,quantile,c(0.1,0.9))

HTH

Pete
Reply | Threaded
Open this post in threaded view
|

Re: Quantiles of a subset of data

bradleyd
This post has NOT been accepted by the mailing list yet.
That does it, thanks. Do you think you help me a little bit further? I actually have 4 columns, YEAR, DAY, TEMP , and IBI. They are all numeric. I need to calculate the average TEMP and IBI values between the 10% and 90% quantiles for each YEAR. The code by(data$day,data$year,day,c(0.1,0.9)) was correct in that it calculated the quantile values as intended, but I don't know how to then calculate the mean TEMP and IBI values encompasses within those quantiles. Thanks again, David
Reply | Threaded
Open this post in threaded view
|

Re: Quantiles of a subset of data

Pete Brecknock
bradleyd wrote
That does it, thanks. Do you think you help me a little bit further?

I actually have 4 columns, YEAR, DAY, TEMP , and IBI. They are all numeric. I need to calculate the average TEMP and IBI values between the 10% and 90% quantiles  for each YEAR.

The code by(data$day,data$year,day,c(0.1,0.9)) was correct in that it calculated the quantile values as intended, but I don't know how to then calculate the mean TEMP and IBI values encompasses within those quantiles.

Thanks again,
David
have a look at trim argument of the mean function

?mean

Pete
Reply | Threaded
Open this post in threaded view
|

Re: Quantiles of a subset of data

bradleyd
This post has NOT been accepted by the mailing list yet.
Thanks Pete. The TRIM argument in the MEAN function tells me how to trim off decimal points, but I am lost as to how to append the mean values of TEMP and IBI between the 10% and 90% quantiles of DAY in each YEAR.

DAY is the julian date that an event occurred in certain years. The events occurred numerous times in each year, and I want to be able to say what the mean day was in each year excluding those days greater than the 90% and less than the 10% quantile in that year.
Reply | Threaded
Open this post in threaded view
|

Re: Quantiles of a subset of data

Pete Brecknock
bradleyd wrote
Thanks Pete. The TRIM argument in the MEAN function tells me how to trim off decimal points, but I am lost as to how to append the mean values of TEMP and IBI between the 10% and 90% quantiles of DAY in each YEAR.

DAY is the julian date that an event occurred in certain years. The events occurred numerous times in each year, and I want to be able to say what the mean day was in each year excluding those days greater than the 90% and less than the 10% quantile in that year.
Would suggest that you forward a small, reproducible example with what you expect the results to look like.

Pete
Reply | Threaded
Open this post in threaded view
|

Re: Quantiles of a subset of data

bradleyd
This post has NOT been accepted by the mailing list yet.
This is part of the file.

data.xlsx

I was hoping for a data output that looks something like:

YEAR      MEAN_TEMP
1987        14.2
1988        16.2
1989        12.0
1990        5.6
1991        21.1