# Calculating SD according to groups of rows

9 messages
Open this post in threaded view
|

## Calculating SD according to groups of rows

 *Hi all, I know this is probably basic, but I have proven to be a slow learner in any programming language.   Anyhow, how can I calculate the SD for each person in my table?  I have two patients in this R data.frame, 7200 and 23955. I extracted this from a relational database, but am I better off attempting to compute SD in SQL, or is this easily accomplished in R? *      SUBJECT_ID  HR 1        7200 158 2        7200 165 3        7200 138 4        7200 152 5        7200 139 6        7200 157 7        7200 186 8       23955 167 9       23955 162 10      23955 171 11      23955 139 12      23955 170 13      23955 177 14      23955 180 15      23955 176 16      23955 172 17      23955 179 18      23955 181 19      23955 169 20      23955 168 21      23955 185 22      23955 181 23      23955 191 24      23955 179 25      23955 178 26      23955 184 27      23955 179 28      23955 172 29      23955 173 30      23955 182 31      23955 174 * So, what I would want is a table of 800 patients with a SD for their heart rates: subject id       Heart Rate SD 7200              20 (for example) 23955           18 (for example)* Thank you!         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Calculating SD according to groups of rows

 Dear pufftissue, If your data set is a data.frame called 'x', one approach could be: # Data set x=read.table('clipboard',header=TRUE) # Calculations tapply(x\$HR,x\$SUBJECT_ID,sd,na.rm=TRUE)     7200    23955 16.39977 10.03896 See ?tapply and/or ?ave for more information. HTH, Jorge On Wed, Nov 19, 2008 at 11:59 PM, pufftissue pufftissue < [hidden email]> wrote: > *Hi all, > > I know this is probably basic, but I have proven to be a slow learner in > any > programming language.   Anyhow, > how can I calculate the SD for each person in my table?  I have two > patients > in this R data.frame, 7200 and 23955. > I extracted this from a relational database, but am I better off attempting > to compute SD in SQL, or is this easily accomplished in R? > > > *      SUBJECT_ID  HR > 1        7200 158 > 2        7200 165 > 3        7200 138 > 4        7200 152 > 5        7200 139 > 6        7200 157 > 7        7200 186 > 8       23955 167 > 9       23955 162 > 10      23955 171 > 11      23955 139 > 12      23955 170 > 13      23955 177 > 14      23955 180 > 15      23955 176 > 16      23955 172 > 17      23955 179 > 18      23955 181 > 19      23955 169 > 20      23955 168 > 21      23955 185 > 22      23955 181 > 23      23955 191 > 24      23955 179 > 25      23955 178 > 26      23955 184 > 27      23955 179 > 28      23955 172 > 29      23955 173 > 30      23955 182 > 31      23955 174 > > * > So, what I would want is a table of 800 patients with a SD for their heart > rates: > > subject id       Heart Rate SD > > 7200              20 (for example) > 23955           18 (for example)* > > Thank you! > >        [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Calculating SD according to groups of rows

Open this post in threaded view
|

## Re: Calculating SD according to groups of rows

Open this post in threaded view
|

## Re: Calculating SD according to groups of rows

 pufftissue pufftissue gmail.com> writes: > > What I am getting is indeed: > > 7200          23955        34563        8934 > 16.39977 10.03896    11.234      14.02 > > I'd like the final output to be: > > subject_id         hr_Stand_Deviation > 7200                  16.39977 > 23955                10.03896 > 34563                11.234 > 8934                  14.02 > The hard way could go like that; I personally got used to it, but I admit it is one of the thinks that are unusually difficult in R. dat = data.frame(SUBJECT_ID=sample(letters[1:5],100,TRUE),HR=rnorm(100)) sd.list = with(dat, tapply(HR, SUBJECT_ID, sd)) data.frame(SUBJECT_ID=rownames(sd.list),sd=sd.list) I think Hadley Wickham tried to make life easier with the plyr package, so I thought something like the below would work out of the box. However, there must be something wrong with the syntax, the result is only "approximately" correct. Dieter library(plyr) daply(dat,.(SUBJECT_ID),sd) ddply(dat,.(SUBJECT_ID),sd) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Calculating SD according to groups of rows

 What about aggregate. with(dat, aggregate(HR, list(sub_id=SUBJECT_ID), sd)) shall result in required final output form. Regards Petr [hidden email] napsal dne 20.11.2008 09:20:36: > pufftissue pufftissue gmail.com> writes: > > > > > What I am getting is indeed: > > > > 7200          23955        34563        8934 > > 16.39977 10.03896    11.234      14.02 > > > > I'd like the final output to be: > > > > subject_id         hr_Stand_Deviation > > 7200                  16.39977 > > 23955                10.03896 > > 34563                11.234 > > 8934                  14.02 > > > > The hard way could go like that; I personally got used to it, but I admit > it is one of the thinks that are unusually difficult in R. > > dat = data.frame(SUBJECT_ID=sample(letters[1:5],100,TRUE),HR=rnorm(100)) > sd.list = with(dat, tapply(HR, SUBJECT_ID, sd)) > data.frame(SUBJECT_ID=rownames(sd.list),sd=sd.list) > > I think Hadley Wickham tried to make life easier with the plyr package, > so I thought something like the below would work out of the box. > However, there must be something wrong with the syntax, the > result is only "approximately" correct. > > Dieter > > library(plyr) > daply(dat,.(SUBJECT_ID),sd) > ddply(dat,.(SUBJECT_ID),sd) > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Calculating SD according to groups of rows

 In reply to this post by Dieter Menne On Thu, Nov 20, 2008 at 2:20 AM, Dieter Menne <[hidden email]> wrote: > pufftissue pufftissue gmail.com> writes: > >> >> What I am getting is indeed: >> >> 7200          23955        34563        8934 >> 16.39977 10.03896    11.234      14.02 >> >> I'd like the final output to be: >> >> subject_id         hr_Stand_Deviation >> 7200                  16.39977 >> 23955                10.03896 >> 34563                11.234 >> 8934                  14.02 >> > > The hard way could go like that; I personally got used to it, but I admit > it is one of the thinks that are unusually difficult in R. > > dat = data.frame(SUBJECT_ID=sample(letters[1:5],100,TRUE),HR=rnorm(100)) > sd.list = with(dat, tapply(HR, SUBJECT_ID, sd)) > data.frame(SUBJECT_ID=rownames(sd.list),sd=sd.list) > > I think Hadley Wickham tried to make life easier with the plyr package, > so I thought something like the below would work out of the box. > However, there must be something wrong with the syntax, the > result is only "approximately" correct. > > Dieter > > library(plyr) > daply(dat,.(SUBJECT_ID),sd) > ddply(dat,.(SUBJECT_ID),sd) Well that calculates sd on the whole data frame.  (Like sd(dat)). You probably want: ddply(dat,.(SUBJECT_ID), numcolwise(sd)) which calculates sd for numeric columns only, or ddply(dat,.(SUBJECT_ID), function(df) sd(df\$HR)) which calculates it for HR explicitly. Hadley -- http://had.co.nz/______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.