Quantcast

tapply help

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

tapply help

Mark Ebbert
Dear R gurus,

I am trying perform what I believe will be a pretty simple task, but I'm struggling to figure out how to do it. I have two vectors of the same length, the first is numeric and the second is factor. I understand that tapply is perfect for applying a function to the numeric vector by subsets of the factors in the second vector. My issue is trying to make use of two other vectors within the custom function I've written for tapply. The two other vectors are a high and low value for each subset I am breaking my data into, and I want to calculate the percentage of data points that fall into each respective range. I will attempt to provide a coherent example:

# create range for each possible class
lows<-c(1,2,3,4,5)
highs<-c(5,6,7,8,9)

# data values
vals<-sample(1:10,100,replace=T)

#classes
classes<-sample(letters[1:5],100,replace=T)

# Try to calculate percentage of values that fall
# into the respective range for the given class.
percentages<-tapply(vals,classes,
        function(i){
                length(i[i>=lows[index] & i<=highs[index]])/length(i)  # I don't know how to actually keep an index count in tapply, but I'm guessing there's a better way.
        })

I really appreciate any help.

ME
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tapply help

jholtman
It this what you are looking for:

> set.seed(1)
> # create range for each possible class
> # 'name' the values so you can use them in the 'sapply' function
> lows<-c(a=1, b=2, c=3, d=4, e=5)
> highs<-c(a=5, b=6, c=7, d=8, e=9)
>
> # data values
> vals<-sample(1:10,100,replace=T)
>
> #classes
> classes<-sample(letters[1:5],100,replace=T)
>
> # split the data so that you retain the 'classes' name
> x.split <- split(vals, classes)
> percentage <- sapply(names(x.split), function(.class){
+     # compute the percentage based on 'class'
+     sum((x.split[[.class]] >= lows[.class]) &
+         (x.split[[.class]] <= highs[.class])) /
length(x.split[[.class]]) * 100
+ })
> percentage
       a        b        c        d        e
50.00000 45.00000 62.50000 54.54545 55.55556
>


On Fri, Jun 4, 2010 at 4:02 PM, Mark Ebbert <[hidden email]> wrote:

> Dear R gurus,
>
> I am trying perform what I believe will be a pretty simple task, but I'm struggling to figure out how to do it. I have two vectors of the same length, the first is numeric and the second is factor. I understand that tapply is perfect for applying a function to the numeric vector by subsets of the factors in the second vector. My issue is trying to make use of two other vectors within the custom function I've written for tapply. The two other vectors are a high and low value for each subset I am breaking my data into, and I want to calculate the percentage of data points that fall into each respective range. I will attempt to provide a coherent example:
>
> # create range for each possible class
> lows<-c(1,2,3,4,5)
> highs<-c(5,6,7,8,9)
>
> # data values
> vals<-sample(1:10,100,replace=T)
>
> #classes
> classes<-sample(letters[1:5],100,replace=T)
>
> # Try to calculate percentage of values that fall
> # into the respective range for the given class.
> percentages<-tapply(vals,classes,
>        function(i){
>                length(i[i>=lows[index] & i<=highs[index]])/length(i)  # I don't know how to actually keep an index count in tapply, but I'm guessing there's a better way.
>        })
>
> I really appreciate any help.
>
> ME
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tapply help

Mark Ebbert
That was very clever. Worked perfectly, thanks!

And thanks to everyone else who provided feedback.

On Jun 5, 2010, at 5:46 AM, jim holtman wrote:

> It this what you are looking for:
>
>> set.seed(1)
>> # create range for each possible class
>> # 'name' the values so you can use them in the 'sapply' function
>> lows<-c(a=1, b=2, c=3, d=4, e=5)
>> highs<-c(a=5, b=6, c=7, d=8, e=9)
>>
>> # data values
>> vals<-sample(1:10,100,replace=T)
>>
>> #classes
>> classes<-sample(letters[1:5],100,replace=T)
>>
>> # split the data so that you retain the 'classes' name
>> x.split <- split(vals, classes)
>> percentage <- sapply(names(x.split), function(.class){
> +     # compute the percentage based on 'class'
> +     sum((x.split[[.class]] >= lows[.class]) &
> +         (x.split[[.class]] <= highs[.class])) /
> length(x.split[[.class]]) * 100
> + })
>> percentage
>       a        b        c        d        e
> 50.00000 45.00000 62.50000 54.54545 55.55556
>>
>
>
> On Fri, Jun 4, 2010 at 4:02 PM, Mark Ebbert <[hidden email]> wrote:
>> Dear R gurus,
>>
>> I am trying perform what I believe will be a pretty simple task, but I'm struggling to figure out how to do it. I have two vectors of the same length, the first is numeric and the second is factor. I understand that tapply is perfect for applying a function to the numeric vector by subsets of the factors in the second vector. My issue is trying to make use of two other vectors within the custom function I've written for tapply. The two other vectors are a high and low value for each subset I am breaking my data into, and I want to calculate the percentage of data points that fall into each respective range. I will attempt to provide a coherent example:
>>
>> # create range for each possible class
>> lows<-c(1,2,3,4,5)
>> highs<-c(5,6,7,8,9)
>>
>> # data values
>> vals<-sample(1:10,100,replace=T)
>>
>> #classes
>> classes<-sample(letters[1:5],100,replace=T)
>>
>> # Try to calculate percentage of values that fall
>> # into the respective range for the given class.
>> percentages<-tapply(vals,classes,
>>        function(i){
>>                length(i[i>=lows[index] & i<=highs[index]])/length(i)  # I don't know how to actually keep an index count in tapply, but I'm guessing there's a better way.
>>        })
>>
>> I really appreciate any help.
>>
>> ME
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...