How to perform a grouped shapiro wilk test on dataframe

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to perform a grouped shapiro wilk test on dataframe

ramoss
Hello,

I was wandering if it is possible to perform on a dataframe called 'all' a shapiro wilk normality test for COUNTS by variable Group

ACTIVITY?  Could it be done using plyer?  I saw an eg that applies to an array but not to a dataframe:

lapply(split(dataset1$Height,dataset1$Group),shapiro.test)

Any thoughts would be much appreciated.

My dataframe is in shape:

dat       ACTIVIT  COUNTS
1/1/13   XXXX      43
..
..
1/31/13 XXXX    60
1/1/13   YYYY     40
..
..
1/31/13 YYYY  10
etc  going for 3 months.
Reply | Threaded
Open this post in threaded view
|

Re: How to perform a grouped shapiro wilk test on dataframe

arun kirshna
Hi,
Try this:
dat1<- read.csv("sample.csv",sep="\t",stringsAsFactors=FALSE)
 with(dat1,tapply(COUNTS,list(ACTIVITY),function(x) if (length(unique(x))==1) NA else shapiro.test(x)))
#$activity1
#[1] NA

#$activity2

#    Shapiro-Wilk normality test

#data:  x
#W = 0.4588, p-value = 8.025e-11
#

#$activity3
#
    Shapiro-Wilk normality test
#
#data:  x
#W = 0.7283, p-value = 3.76e-07
library(plyr)
ddply(dat1,.(ACTIVITY), summarise, Pval=if(length(unique(COUNTS))==1) NA else shapiro.test(COUNTS)$p.value)
#   ACTIVITY         Pval
#1 activity1           NA
#2 activity2 8.025059e-11
#3 activity3 3.760396e-07

A.K.



----- Original Message -----
From: ramoss <[hidden email]>
To: [hidden email]
Cc:
Sent: Friday, April 5, 2013 10:50 AM
Subject: [R] How to perform a grouped shapiro wilk test on dataframe

Hello,

I was wandering if it is possible to perform on a dataframe called 'all' a
shapiro wilk normality test for COUNTS by variable Group

ACTIVITY?  Could it be done using plyer?  I saw an eg that applies to an
array but not to a dataframe:

lapply(split(dataset1$Height,dataset1$Group),shapiro.test)

Any thoughts would be much appreciated.

My dataframe is in shape:

dat       ACTIVIT  COUNTS
1/1/13   XXXX      43
..
..
1/31/13 XXXX    60
1/1/13   YYYY     40
..
..
1/31/13 YYYY  10
etc  going for 3 months.



--
View this message in context: http://r.789695.n4.nabble.com/How-to-perform-a-grouped-shapiro-wilk-test-on-dataframe-tp4663438.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to perform a grouped shapiro wilk test on dataframe

arun kirshna
Hi,
library(plyr)
res<-ddply(dat1,.(ACTIVITY), summarise, cbind(if(length(unique(COUNTS))==1) NA else shapiro.test(COUNTS)$p.value, if(length(unique(COUNTS))==1) NA else shapiro.test(COUNTS)$statistic))
res1<- data.frame(ACTIVITY=res[,1],as.data.frame(res[,2]),stringsAsFactors=FALSE)
names(res1)[2:3]<-c("Pvalue","stats")
 res1
#   ACTIVITY       Pvalue     stats
#1 activity1           NA        NA
#2 activity2 8.025059e-11 0.4588439
#3 activity3 3.760396e-07 0.7282838
str(res1)
#'data.frame':    3 obs. of  3 variables:
# $ ACTIVITY: chr  "activity1" "activity2" "activity3"
# $ Pvalue  : num  NA 8.03e-11 3.76e-07
# $ stats   : num  NA 0.459 0.728

#or
res2<-ddply(dat1,.(ACTIVITY), summarise, value=c(if(length(unique(COUNTS))==1) NA else shapiro.test(COUNTS)$p.value, if(length(unique(COUNTS))==1) NA else shapiro.test(COUNTS)$statistic))
res2$newCol<-rep(c("Pvalue","stats"),times=nrow(res2)/2)
library(reshape2)
res3<-dcast(res2,ACTIVITY~newCol,value.var="value")
 res3
#   ACTIVITY       Pvalue     stats
#1 activity1           NA        NA
#2 activity2 8.025059e-11 0.4588439
#3 activity3 3.760396e-07 0.7282838

A.K.





----- Original Message -----
From: "Mossadegh, Ramine N." <[hidden email]>
To: arun <[hidden email]>
Cc:
Sent: Friday, April 5, 2013 4:17 PM
Subject: RE: [R] How to perform a grouped shapiro wilk test on dataframe

The statistic & the p.value .  When I do all2 <- as.data.frame(stats), I get lots of garbage in 2nd column like:

list(statistic = 0.0889037906739691, p.value = 6.41197341678277e-20, method = "Shapiro-Wilk normality test", data.name = "x")

Thanks

-----Original Message-----
From: arun [mailto:[hidden email]]
Sent: Friday, April 05, 2013 4:15 PM
To: Mossadegh, Ramine N.
Subject: Re: [R] How to perform a grouped shapiro wilk test on dataframe

It depends upon what results you want to put into dataframe.





----- Original Message -----
From: "Mossadegh, Ramine N." <[hidden email]>
To: arun <[hidden email]>
Cc:
Sent: Friday, April 5, 2013 3:12 PM
Subject: RE: [R] How to perform a grouped shapiro wilk test on dataframe

Thanks it now works but how can I put the results back in a data frame?

-----Original Message-----
From: arun [mailto:[hidden email]]
Sent: Friday, April 05, 2013 2:35 PM
To: Mossadegh, Ramine N.
Cc: R help
Subject: Re: [R] How to perform a grouped shapiro wilk test on dataframe

Hi,
Try this:
dat1<- read.csv("sample.csv",sep="\t",stringsAsFactors=FALSE)
 with(dat1,tapply(COUNTS,list(ACTIVITY),function(x) if (length(unique(x))==1) NA else shapiro.test(x)))
#$activity1
#[1] NA

#$activity2

#    Shapiro-Wilk normality test

#data:  x
#W = 0.4588, p-value = 8.025e-11
#

#$activity3
#
    Shapiro-Wilk normality test
#
#data:  x
#W = 0.7283, p-value = 3.76e-07
library(plyr)
ddply(dat1,.(ACTIVITY), summarise, Pval=if(length(unique(COUNTS))==1) NA else shapiro.test(COUNTS)$p.value) #   ACTIVITY         Pval
#1 activity1           NA
#2 activity2 8.025059e-11
#3 activity3 3.760396e-07

A.K.



----- Original Message -----
From: ramoss <[hidden email]>
To: [hidden email]
Cc:
Sent: Friday, April 5, 2013 10:50 AM
Subject: [R] How to perform a grouped shapiro wilk test on dataframe

Hello,

I was wandering if it is possible to perform on a dataframe called 'all' a shapiro wilk normality test for COUNTS by variable Group

ACTIVITY?  Could it be done using plyer?  I saw an eg that applies to an array but not to a dataframe:

lapply(split(dataset1$Height,dataset1$Group),shapiro.test)

Any thoughts would be much appreciated.

My dataframe is in shape:

dat       ACTIVIT  COUNTS
1/1/13   XXXX      43
..
..
1/31/13 XXXX    60
1/1/13   YYYY     40
..
..
1/31/13 YYYY  10
etc  going for 3 months.



--
View this message in context: http://r.789695.n4.nabble.com/How-to-perform-a-grouped-shapiro-wilk-test-on-dataframe-tp4663438.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Confidentiality Notice:  This email, including attachments, may include non-public, proprietary, confidential or legally privileged information.  If you are not an intended recipient or an authorized agent of an intended recipient, you are hereby notified that any dissemination, distribution or copying of the information contained in or transmitted with this e-mail is unauthorized and strictly prohibited.  If you have received this email in error, please notify the sender by replying to this message and permanently delete this e-mail, its attachments, and any copies of it immediately.  You should not retain, copy or use this e-mail or any attachment for any purpose, nor disclose all or any part of the contents to any other person. Thank you

Confidentiality Notice:  This email, including attachments, may include non-public, proprietary, confidential or legally privileged information.  If you are not an intended recipient or an authorized agent of an intended recipient, you are hereby notified that any dissemination, distribution or copying of the information contained in or transmitted with this e-mail is unauthorized and strictly prohibited.  If you have received this email in error, please notify the sender by replying to this message and permanently delete this e-mail, its attachments, and any copies of it immediately.  You should not retain, copy or use this e-mail or any attachment for any purpose, nor disclose all or any part of the contents to any other person. Thank you

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.