descriptive statistics

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

descriptive statistics

effeesse
Hi. In a data set I have a variable that takes values from 1 to 14. For each subgroup of values of this variable, I would like to obtain some descriptive statistics of other variables present in the data set. I've been trying with a "for" loop but I couldn't get nothing. Could you please suggest me some lines?
Reply | Threaded
Open this post in threaded view
|

Re: descriptive statistics

Ivan Calandra
?aggregate
?doBy::summaryBy

Le 12/13/2010 11:04, effeesse a écrit :
> Hi. In a data set I have a variable that takes values from 1 to 14. For each
> subgroup of values of this variable, I would like to obtain some descriptive
> statistics of other variables present in the data set. I've been trying with
> a "for" loop but I couldn't get nothing. Could you please suggest me some
> lines?

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
[hidden email]

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: descriptive statistics

Jim Lemon
In reply to this post by effeesse
On 12/13/2010 09:04 PM, effeesse wrote:
>
> Hi. In a data set I have a variable that takes values from 1 to 14. For each
> subgroup of values of this variable, I would like to obtain some descriptive
> statistics of other variables present in the data set. I've been trying with
> a "for" loop but I couldn't get nothing. Could you please suggest me some
> lines?

Hi effeesse,
Sure:

testmat<-data.frame(sample(1:14,50,TRUE),rnorm(50),runif(50))
by(testmat[,-1],testmat[,1],mean)

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re : descriptive statistics

justin bem
A nice way to obtain summary for data is to use summary.formula in Hmisc
package.

 Justin BEM
BP 1917 Yaoundé
Tél (237) 76043774




________________________________
De : Jim Lemon <[hidden email]>
À : effeesse <[hidden email]>
Cc : [hidden email]
Envoyé le : Lun 13 décembre 2010, 11h 23min 15s
Objet : Re: [R] descriptive statistics

On 12/13/2010 09:04 PM, effeesse wrote:
>
> Hi. In a data set I have a variable that takes values from 1 to 14. For each
> subgroup of values of this variable, I would like to obtain some descriptive
> statistics of other variables present in the data set. I've been trying with
> a "for" loop but I couldn't get nothing. Could you please suggest me some
> lines?

Hi effeesse,
Sure:

testmat<-data.frame(sample(1:14,50,TRUE),rnorm(50),runif(50))
by(testmat[,-1],testmat[,1],mean)

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



     
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: descriptive statistics

David Hajage-2
Another way is the remix function of the remix package.

On Monday, December 13, 2010, justin bem <[hidden email]> wrote:

> A nice way to obtain summary for data is to use summary.formula in Hmisc
> package.
>
>  Justin BEM
> BP 1917 Yaoundé
> Tél (237) 76043774
>
>
>
>
> ________________________________
> De : Jim Lemon <[hidden email]>
> À : effeesse <[hidden email]>
> Cc : [hidden email]
> Envoyé le : Lun 13 décembre 2010, 11h 23min 15s
> Objet : Re: [R] descriptive statistics
>
> On 12/13/2010 09:04 PM, effeesse wrote:
>>
>> Hi. In a data set I have a variable that takes values from 1 to 14. For each
>> subgroup of values of this variable, I would like to obtain some descriptive
>> statistics of other variables present in the data set. I've been trying with
>> a "for" loop but I couldn't get nothing. Could you please suggest me some
>> lines?
>
> Hi effeesse,
> Sure:
>
> testmat<-data.frame(sample(1:14,50,TRUE),rnorm(50),runif(50))
> by(testmat[,-1],testmat[,1],mean)
>
> Jim
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>         [[alternative HTML version deleted]]
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Re : descriptive statistics

effeesse
In reply to this post by justin bem
I am sorry, but I cannot understand how to use the "summary" function. Maybe, if I describe my needs, you could sketch a line that could work.
In the data set variable "V" can take values 1 to 14. For the subgroup of individuals where "V" takes value =1 I want the mean and variance of a certain set of other variables (V1, V2, V3, V4, V5). And this for all the other subgroups for values 2 to 14.
What do you suggest?
Reply | Threaded
Open this post in threaded view
|

Re: Re : descriptive statistics

Ivan Calandra
I would suggest what we already suggested to you:
?aggregate
?by
?doBy::summaryBy

We could help you more precisely if you could provide a reproducible
example, as explained in the posting guide (see link at the end of every
email from this list)

Ivan

Le 12/13/2010 15:14, effeesse a écrit :
> I am sorry, but I cannot understand how to use the "summary" function. Maybe,
> if I describe my needs, you could sketch a line that could work.
> In the data set variable "V" can take values 1 to 14. For the subgroup of
> individuals where "V" takes value =1 I want the mean and variance of a
> certain set of other variables (V1, V2, V3, V4, V5). And this for all the
> other subgroups for values 2 to 14.
> What do you suggest?

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re : Re : descriptive statistics

justin bem
In reply to this post by effeesse
With summary do this

my.summary<-function(x) c(mean(x),var(x))

summary(v1~V, fun=my.summary,data=df)
summary(v2~V, fun=my.summary,data=df)
summary(v3~V, fun=my.summary,data=df)
summary(v4~V, fun=my.summary,data=df)
summary(v5~V, fun=my.summary,data=df)

If you want you get the mean of all variable together in all table

my.summary<-function(x)
c(mean(x[,1]),mean(x[,2]),mean(x[,3]),mean(x[,4]),mean(x[,5]))
summary(cbind(v1,v2,v3,v4,v5)~v,data=df)

 Justin BEM
BP 1917 Yaoundé
Tél (237) 76043774




________________________________
De : effeesse <[hidden email]>
À : [hidden email]
Envoyé le : Lun 13 décembre 2010, 15h 14min 22s
Objet : Re: [R] Re :  descriptive statistics


I am sorry, but I cannot understand how to use the "summary" function. Maybe,
if I describe my needs, you could sketch a line that could work.
In the data set variable "V" can take values 1 to 14. For the subgroup of
individuals where "V" takes value =1 I want the mean and variance of a
certain set of other variables (V1, V2, V3, V4, V5). And this for all the
other subgroups for values 2 to 14.
What do you suggest?
--
View this message in context:
http://r.789695.n4.nabble.com/descriptive-statistics-tp3085197p3085462.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



     
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Re : Re : descriptive statistics

effeesse
what am I supposed to put into function(x)? The indicator for extracting the subgroups?
data is the df. cluster={1,...,14}.

This is how I was compiling:

"for (i in 1:14) {
my.summary<-data$cluster==i c(mean(?),var(?))

summary(var_A~cluster, fun=my.summary,data=data)
summary(var_B~cluster, fun=my.summary,data=data)
summary(var_C~cluster, fun=my.summary,data=data)
summary(var_D~cluster, fun=my.summary,data=data)
summary(var_E~cluster, fun=my.summary,data=data)
summary(var_F~cluster, fun=my.summary,data=data)
summary(var_G~cluster, fun=my.summary,data=data)
}"

thanks for your patience.
Reply | Threaded
Open this post in threaded view
|

Re: Re : Re : descriptive statistics

William Revelle
An alternative way of getting summary statistics by a grouping
variable is to use describe.by in the psych package:

using Jim Lemon's example:

library(psych)
testmat<-data.frame(sample(1:14,50,TRUE),rnorm(50),runif(50))  #make
up the data
describe.by(test.mat,testmat[1]    #get descriptive statistics






At 8:17 AM -0800 12/13/10, effeesse wrote:

>what am I supposed to put into function(x)? The indicator for extracting the
>subgroups?
>data is the df. cluster={1,...,14}.
>
>This is how I was compiling:
>
>"for (i in 1:14) {
>my.summary<-data$cluster==i c(mean(?),var(?))
>
>summary(var_A~cluster, fun=my.summary,data=data)
>summary(var_B~cluster, fun=my.summary,data=data)
>summary(var_C~cluster, fun=my.summary,data=data)
>summary(var_D~cluster, fun=my.summary,data=data)
>summary(var_E~cluster, fun=my.summary,data=data)
>summary(var_F~cluster, fun=my.summary,data=data)
>summary(var_G~cluster, fun=my.summary,data=data)
>}"
>
>thanks for your patience.
>--
>View this message in context:
>http://r.789695.n4.nabble.com/descriptive-statistics-tp3085197p3085651.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Re : Re : descriptive statistics

Ivan Calandra
In reply to this post by effeesse
Do it with aggregate(), something like this should do:
aggregate(.~cluster, FUN=summary, data=data)

Now if you don't want to run summary(), replace it with the function
you'd like.

HTH,
Ivan

Le 12/13/2010 17:17, effeesse a écrit :

> what am I supposed to put into function(x)? The indicator for extracting the
> subgroups?
> data is the df. cluster={1,...,14}.
>
> This is how I was compiling:
>
> "for (i in 1:14) {
> my.summary<-data$cluster==i c(mean(?),var(?))
>
> summary(var_A~cluster, fun=my.summary,data=data)
> summary(var_B~cluster, fun=my.summary,data=data)
> summary(var_C~cluster, fun=my.summary,data=data)
> summary(var_D~cluster, fun=my.summary,data=data)
> summary(var_E~cluster, fun=my.summary,data=data)
> summary(var_F~cluster, fun=my.summary,data=data)
> summary(var_G~cluster, fun=my.summary,data=data)
> }"
>
> thanks for your patience.

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
[hidden email]

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Re : Re: descriptive statistics

Jim Lemon
In reply to this post by effeesse
On 12/14/2010 01:14 AM, effeesse wrote:
>
> I am sorry, but I cannot understand how to use the "summary" function. Maybe,
> if I describe my needs, you could sketch a line that could work.
> In the data set variable "V" can take values 1 to 14. For the subgroup of
> individuals where "V" takes value =1 I want the mean and variance of a
> certain set of other variables (V1, V2, V3, V4, V5). And this for all the
> other subgroups for values 2 to 14.
> What do you suggest?

Step 1 - In a "reproducible example" one makes up some data and does
something to it to show how it is or isn't working. Clearly, you don't
know how to do that yet, so here's how.

mydataframe<-data.frame(V=sample(1:14,100,TRUE),
  V1=rnorm(100),V2=runif(100),V3=sample(-3:3,100,TRUE),
  V4=sample(0:1,100,TRUE),V5=rpois(100,3))

If you run this code, you will then have a data frame that may not look
like what you want, but it will serve as an example. In my initial post,
I assumed that you wanted some summary statistic for each of the
variables V1 to V5, broken down by V. That's easy:

by(mydataframe[c("V1","V2","V3","V4","V5")],
  mydataframe$V,mean)

If you run that code, you will get a big array of all of the means of
all of the V1-V5 columns broken down by the V column as you asked. Now
maybe you want both the mean and variance in one shot:

by_many<-function(x,by,stats) {
  nfun=length(stats)
  myoutputlist<-vector("list",nfun)
  for(fun in 1:nfun)
   myoutputlist[[fun]]<-by(x,by,get(stats[fun]))
  names(myoutputlist)<-stats
  return(myoutputlist)
}
by_many(mydataframe[c("V1","V2","V3","V4","V5")],
  mydataframe$V,stats=c("mean","var"))

The first part defines a function that will call "by" for each statistic
that you pass in "stats", which now has to be the name of the function
rather than the function. You will have to pick your variances out of
the diagonal of the matrices due to the way "var" works.

So have a look at these and work out if they come close to doing what
you want.

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Re : Re: descriptive statistics

effeesse
Thanks! I got the results! Now I would like to put them in a nice and readable plot to see if there are outlier means or variances. Using "plot(by_many$mean)" I got "Error in by_many$mean : object of type 'closure' is not subsettable".

Another question: I have done the principal components analysis on the same group of variables (V1--V5) I used before. I'd like to get a similar descriptive analysis of these principal components by variable V. Correct me if I am wrong; should I run it on the scores of the 2 or 3 principal components I obtained in the PCA?