

Hi. In a data set I have a variable that takes values from 1 to 14. For each subgroup of values of this variable, I would like to obtain some descriptive statistics of other variables present in the data set. I've been trying with a "for" loop but I couldn't get nothing. Could you please suggest me some lines?


On 12/13/2010 09:04 PM, effeesse wrote:
>
> Hi. In a data set I have a variable that takes values from 1 to 14. For each
> subgroup of values of this variable, I would like to obtain some descriptive
> statistics of other variables present in the data set. I've been trying with
> a "for" loop but I couldn't get nothing. Could you please suggest me some
> lines?
Hi effeesse,
Sure:
testmat<data.frame(sample(1:14,50,TRUE),rnorm(50),runif(50))
by(testmat[,1],testmat[,1],mean)
Jim
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


A nice way to obtain summary for data is to use summary.formula in Hmisc
package.
Justin BEM
BP 1917 Yaoundé
Tél (237) 76043774
________________________________
De : Jim Lemon < [hidden email]>
À : effeesse < [hidden email]>
Cc : [hidden email]
Envoyé le : Lun 13 décembre 2010, 11h 23min 15s
Objet : Re: [R] descriptive statistics
On 12/13/2010 09:04 PM, effeesse wrote:
>
> Hi. In a data set I have a variable that takes values from 1 to 14. For each
> subgroup of values of this variable, I would like to obtain some descriptive
> statistics of other variables present in the data set. I've been trying with
> a "for" loop but I couldn't get nothing. Could you please suggest me some
> lines?
Hi effeesse,
Sure:
testmat<data.frame(sample(1:14,50,TRUE),rnorm(50),runif(50))
by(testmat[,1],testmat[,1],mean)
Jim
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Another way is the remix function of the remix package.
On Monday, December 13, 2010, justin bem < [hidden email]> wrote:
> A nice way to obtain summary for data is to use summary.formula in Hmisc
> package.
>
> Justin BEM
> BP 1917 Yaoundé
> Tél (237) 76043774
>
>
>
>
> ________________________________
> De : Jim Lemon < [hidden email]>
> À : effeesse < [hidden email]>
> Cc : [hidden email]
> Envoyé le : Lun 13 décembre 2010, 11h 23min 15s
> Objet : Re: [R] descriptive statistics
>
> On 12/13/2010 09:04 PM, effeesse wrote:
>>
>> Hi. In a data set I have a variable that takes values from 1 to 14. For each
>> subgroup of values of this variable, I would like to obtain some descriptive
>> statistics of other variables present in the data set. I've been trying with
>> a "for" loop but I couldn't get nothing. Could you please suggest me some
>> lines?
>
> Hi effeesse,
> Sure:
>
> testmat<data.frame(sample(1:14,50,TRUE),rnorm(50),runif(50))
> by(testmat[,1],testmat[,1],mean)
>
> Jim
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
>
>
>
> [[alternative HTML version deleted]]
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


I am sorry, but I cannot understand how to use the "summary" function. Maybe, if I describe my needs, you could sketch a line that could work.
In the data set variable "V" can take values 1 to 14. For the subgroup of individuals where "V" takes value =1 I want the mean and variance of a certain set of other variables (V1, V2, V3, V4, V5). And this for all the other subgroups for values 2 to 14.
What do you suggest?


I would suggest what we already suggested to you:
?aggregate
?by
?doBy::summaryBy
We could help you more precisely if you could provide a reproducible
example, as explained in the posting guide (see link at the end of every
email from this list)
Ivan
Le 12/13/2010 15:14, effeesse a écrit :
> I am sorry, but I cannot understand how to use the "summary" function. Maybe,
> if I describe my needs, you could sketch a line that could work.
> In the data set variable "V" can take values 1 to 14. For the subgroup of
> individuals where "V" takes value =1 I want the mean and variance of a
> certain set of other variables (V1, V2, V3, V4, V5). And this for all the
> other subgroups for values 2 to 14.
> What do you suggest?

Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
MartinLutherKingPlatz 3
D20146 Hamburg, GERMANY
+49(0)40 42838 6231
[hidden email]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


With summary do this
my.summary<function(x) c(mean(x),var(x))
summary(v1~V, fun=my.summary,data=df)
summary(v2~V, fun=my.summary,data=df)
summary(v3~V, fun=my.summary,data=df)
summary(v4~V, fun=my.summary,data=df)
summary(v5~V, fun=my.summary,data=df)
If you want you get the mean of all variable together in all table
my.summary<function(x)
c(mean(x[,1]),mean(x[,2]),mean(x[,3]),mean(x[,4]),mean(x[,5]))
summary(cbind(v1,v2,v3,v4,v5)~v,data=df)
Justin BEM
BP 1917 Yaoundé
Tél (237) 76043774
________________________________
De : effeesse < [hidden email]>
À : [hidden email]
Envoyé le : Lun 13 décembre 2010, 15h 14min 22s
Objet : Re: [R] Re : descriptive statistics
I am sorry, but I cannot understand how to use the "summary" function. Maybe,
if I describe my needs, you could sketch a line that could work.
In the data set variable "V" can take values 1 to 14. For the subgroup of
individuals where "V" takes value =1 I want the mean and variance of a
certain set of other variables (V1, V2, V3, V4, V5). And this for all the
other subgroups for values 2 to 14.
What do you suggest?

View this message in context:
http://r.789695.n4.nabble.com/descriptivestatisticstp3085197p3085462.htmlSent from the R help mailing list archive at Nabble.com.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


what am I supposed to put into function(x)? The indicator for extracting the subgroups?
data is the df. cluster={1,...,14}.
This is how I was compiling:
"for (i in 1:14) {
my.summary<data$cluster==i c(mean(?),var(?))
summary(var_A~cluster, fun=my.summary,data=data)
summary(var_B~cluster, fun=my.summary,data=data)
summary(var_C~cluster, fun=my.summary,data=data)
summary(var_D~cluster, fun=my.summary,data=data)
summary(var_E~cluster, fun=my.summary,data=data)
summary(var_F~cluster, fun=my.summary,data=data)
summary(var_G~cluster, fun=my.summary,data=data)
}"
thanks for your patience.


An alternative way of getting summary statistics by a grouping
variable is to use describe.by in the psych package:
using Jim Lemon's example:
library(psych)
testmat<data.frame(sample(1:14,50,TRUE),rnorm(50),runif(50)) #make
up the data
describe.by(test.mat,testmat[1] #get descriptive statistics
At 8:17 AM 0800 12/13/10, effeesse wrote:
>what am I supposed to put into function(x)? The indicator for extracting the
>subgroups?
>data is the df. cluster={1,...,14}.
>
>This is how I was compiling:
>
>"for (i in 1:14) {
>my.summary<data$cluster==i c(mean(?),var(?))
>
>summary(var_A~cluster, fun=my.summary,data=data)
>summary(var_B~cluster, fun=my.summary,data=data)
>summary(var_C~cluster, fun=my.summary,data=data)
>summary(var_D~cluster, fun=my.summary,data=data)
>summary(var_E~cluster, fun=my.summary,data=data)
>summary(var_F~cluster, fun=my.summary,data=data)
>summary(var_G~cluster, fun=my.summary,data=data)
>}"
>
>thanks for your patience.
>
>View this message in context:
> http://r.789695.n4.nabble.com/descriptivestatisticstp3085197p3085651.html>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp>PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Do it with aggregate(), something like this should do:
aggregate(.~cluster, FUN=summary, data=data)
Now if you don't want to run summary(), replace it with the function
you'd like.
HTH,
Ivan
Le 12/13/2010 17:17, effeesse a écrit :
> what am I supposed to put into function(x)? The indicator for extracting the
> subgroups?
> data is the df. cluster={1,...,14}.
>
> This is how I was compiling:
>
> "for (i in 1:14) {
> my.summary<data$cluster==i c(mean(?),var(?))
>
> summary(var_A~cluster, fun=my.summary,data=data)
> summary(var_B~cluster, fun=my.summary,data=data)
> summary(var_C~cluster, fun=my.summary,data=data)
> summary(var_D~cluster, fun=my.summary,data=data)
> summary(var_E~cluster, fun=my.summary,data=data)
> summary(var_F~cluster, fun=my.summary,data=data)
> summary(var_G~cluster, fun=my.summary,data=data)
> }"
>
> thanks for your patience.

Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
MartinLutherKingPlatz 3
D20146 Hamburg, GERMANY
+49(0)40 42838 6231
[hidden email]
**********
http://www.for771.unibonn.dehttp://webapp5.rrz.unihamburg.de/mammals/eng/1525_8_1.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 12/14/2010 01:14 AM, effeesse wrote:
>
> I am sorry, but I cannot understand how to use the "summary" function. Maybe,
> if I describe my needs, you could sketch a line that could work.
> In the data set variable "V" can take values 1 to 14. For the subgroup of
> individuals where "V" takes value =1 I want the mean and variance of a
> certain set of other variables (V1, V2, V3, V4, V5). And this for all the
> other subgroups for values 2 to 14.
> What do you suggest?
Step 1  In a "reproducible example" one makes up some data and does
something to it to show how it is or isn't working. Clearly, you don't
know how to do that yet, so here's how.
mydataframe<data.frame(V=sample(1:14,100,TRUE),
V1=rnorm(100),V2=runif(100),V3=sample(3:3,100,TRUE),
V4=sample(0:1,100,TRUE),V5=rpois(100,3))
If you run this code, you will then have a data frame that may not look
like what you want, but it will serve as an example. In my initial post,
I assumed that you wanted some summary statistic for each of the
variables V1 to V5, broken down by V. That's easy:
by(mydataframe[c("V1","V2","V3","V4","V5")],
mydataframe$V,mean)
If you run that code, you will get a big array of all of the means of
all of the V1V5 columns broken down by the V column as you asked. Now
maybe you want both the mean and variance in one shot:
by_many<function(x,by,stats) {
nfun=length(stats)
myoutputlist<vector("list",nfun)
for(fun in 1:nfun)
myoutputlist[[fun]]<by(x,by,get(stats[fun]))
names(myoutputlist)<stats
return(myoutputlist)
}
by_many(mydataframe[c("V1","V2","V3","V4","V5")],
mydataframe$V,stats=c("mean","var"))
The first part defines a function that will call "by" for each statistic
that you pass in "stats", which now has to be the name of the function
rather than the function. You will have to pick your variances out of
the diagonal of the matrices due to the way "var" works.
So have a look at these and work out if they come close to doing what
you want.
Jim
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks! I got the results! Now I would like to put them in a nice and readable plot to see if there are outlier means or variances. Using "plot(by_many$mean)" I got "Error in by_many$mean : object of type 'closure' is not subsettable".
Another question: I have done the principal components analysis on the same group of variables (V1V5) I used before. I'd like to get a similar descriptive analysis of these principal components by variable V. Correct me if I am wrong; should I run it on the scores of the 2 or 3 principal components I obtained in the PCA?

