# Calculating Summaries for each level of a Categorical variable

## Calculating Summaries for each level of a Categorical variable

 Hi, I have a dataset which has a categorical variable "R",a count variable C (integer) and 4 or more numeric variables (A,T,W,H - integers) containing measures for "R". I would like to summarize each level of the variable R by the average for A,T,W and H. I have written a function to calculate weighted averages using C as the weight and this is given below. The function works perfectly but how do I add the additional dimension I require to this function? Dataset: RT= R     A  T   W   H R1   10 20 20  10 R2   60 20 50  10 R3   45 10 20  50 R4   68 50 20  10 R1   73 20 40  46 R3   25 30 10  54 R3   36 90 20  10 R2   29 10 30  30 # FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C WA<-function(A,C) {          sp_A<-c(A %*% C)          sum_C<-sum(C)          WA<-sp_A/sum_C            return(WA)                } I am trying to incorporate the additional step of calculating the weighted average of A,T,W and H for each level of R. Need help with this. Thanks in advance! Raoul
## Re: Calculating Summaries for each level of a Categorical variable

 Did you try tapply?

?tapply

tapply(RT, RT\$R, fun=WA)

or something like that

Corey Sparks, PhD
Associate Professor
Department of Demography
University of Texas at San Antonio
## Re: Calculating Summaries for each level of a Categorical variable

 Look at the summary.formula function inside package Hmisc

Christos

> Date: Sat, 26 Jun 2010 05:17:34 -0700
> From: [hidden email]
> To: [hidden email]
> Subject: [R] Calculating Summaries for each level of a Categorical variable
>
>
> Hi,
>
> I have a dataset which has a categorical variable "R",a count variable C
> (integer) and 4 or more numeric variables (A,T,W,H - integers) containing
> measures for "R". I would like to summarize each level of the variable R by
> the average for A,T,W and H.
>
> I have written a function to calculate weighted averages using C as the
> weight and this is given below. The function works perfectly but how do I
> add the additional dimension I require to this function?
>
> Dataset: RT=
> R     A  T   W   H
> R1   10 20 20  10
> R2   60 20 50  10
> R3   45 10 20  50
> R4   68 50 20  10
> R1   73 20 40  46
> R3   25 30 10  54
> R3   36 90 20  10
> R2   29 10 30  30
>
> # FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C
> WA<-function(A,C) {
>          sp_A<-c(A %*% C)
>          sum_C<-sum(C)
>          WA<-sp_A/sum_C   
>          return(WA)       
>          }
>
> I am trying to incorporate the additional step of calculating the weighted
> average of A,T,W and H for each level of R. Need help with this.
>
> Thanks in advance!
> Raoul
## Re: Calculating Summaries for each level of a Categorical variable

 In reply to this post by Corey Sparks Hi Corey, Thanks so much for this. However, I get this error for tapply - "Error in tapply(RT, RT\$R, fun=WA):   arguments must have same length". Any idea how to get around this? Thanks again, Raoul
## Re: Calculating Summaries for each level of a Categorical variable

 In reply to this post by Christos Argyropoulos Hi Christos, Thanks for this. I had a look at Summary.Forumla in the Hmisc package and it is extremely complicated for me. Still trying to decipher how I could use it. Regards, Raoul
## Re: Calculating Summaries for each level of a Categorical variable

 You could try the remix function in remix package.

David

Le 27 juin 2010 à 06:48, RaoulD <[hidden email]> a écrit :

>
> Hi Christos,
>
> Thanks for this. I had a look at Summary.Forumla in the Hmisc
> package and it
> is extremely complicated for me. Still trying to decipher how I
> could use
> it.
>
> Regards,
> Raoul
## Re: Calculating Summaries for each level of a Categorical variable

 Hi Raoul,

I presume you need these summaries for a table of descriptive statistics for a thesis/report/paper ("Table 1" as known informally by medical researchers). If this is the case, then specify method="reverse" to summary.formula. In the following small example, I create 4 groups of patients and specify 2 characteristics per patient (age and gender) and use summary.formula to summarize characteristics by group. Running the stats on patient characteristics by group is optional but is included for completeness.

If you are looking for something like this I strongly advise you spent some time fiddling around with summary.formula and read:
Harrell FE (2004): Statistical tables and plots using S and LaTeX (available from http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatReport/summary.pdf)

The 2-3 hours you are going to need to familiarize yourself with this package are really worth spending for (especially if you are going to use call LaTEX on the output). If you are a Windows user, copy and paste the output of the print function into Excel or OpenOffice and use the Text to Columns facilities of the two programs to format the output into a table that can be used inside a manuscript.

Christos

## R-code follows
library(Hmisc)
## One baseline factor (e.g. patient group)
grp<-round(runif(20,1,4))
grp<-factor(grp,labels=paste("Group",1:4))
## Another factor (e.g. sex)
sex<-round(runif(20,1,2))
sex<-factor(sex,labels=c("Male","Female"))
## A continuous variable (e.g. age)
age<-rlnorm(20,4,.1)
## A data frame
data<-data.frame(age=age,grp=grp,sex=sex)
## Table 1
sm<-summary(grp~sex+age,method="reverse",overall=T,test=T)
print(sm,dig=2,exclude1=F)

Descriptive Statistics by grp

+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|          |Group 1           |Group 2           |Group 3           |Group 4           |Combined          |  Test                      |
|          |(N=3)             |(N=6)             |(N=8)             |(N=3)             |(N=20)            |Statistic                   |
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|sex : Male|          67% ( 2)|          67% ( 4)|          25% ( 2)|          67% ( 2)|          50% (10)|Chi-square=3.3 d.f.=3 P=0.34|
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|    Female|          33% ( 1)|          33% ( 2)|          75% ( 6)|          33% ( 1)|          50% (10)|                            |
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|age       |          60/62/65|          51/55/60|          46/51/57|          46/48/52|          49/54/60|   F=2.9 d.f.=3,16 P=0.068  |
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
## Re: Calculating Summaries for each level of a Categorical variable

 the variable you want to analyze (first argument to tapply) and the variable you want to analyze by (the factor, second arg to tapply) both must have the same number of rows, that's how I read this.

CS

Corey Sparks
Assistant Professor
Department of Demography and Organization Studies
College of Public Policy
501 West Durango Blvd
Monterrey Building 2.270C
San Antonio, TX 78207

On Jun 26, 2010, at 11:46 PM, RaoulD [via R] wrote:

> Hi Corey,
>
> Thanks so much for this. However, I get this error for tapply - "Error in tapply(RT, RT\$R, fun=WA):
>   arguments must have same length". Any idea how to get around this?
>
> Thanks again,
> Raoul