|
Hi
I think/hope there will be a simple solution to this but google-ing has provided no answers (probably not using the right words) I have a long data frame of >2 000 000 rows, and 6 columns. Across this there are 24 000 combinations of gene in a column (n=12000) and gender in a column (n=2... obviously). I want to create 2 new columns in the data frame that on each row gives, in one column the mean value (of gene expression, in the column called "value") for that row's gene&gender combination, and in the other column the standard deviation for the gene&gender combination. Any suggestions? Rob Example of the top of the data frame: gene variable value gender line rep 1 CG10000 X208.F1.30456 4.758010 Female 208 1 2 CG10000 X365.F2.30478 4.915395 Female 365 2 3 CG10000 X799.F2.30509 4.641636 Female 799 2 4 CG10000 X306.M2.32650 4.550676 Male 306 2 5 CG10000 X712.M2.30830 4.633811 Male 712 2 6 CG10000 X732.M2.30504 4.857564 Male 732 2 7 CG10000 X707.F1.31120 5.104165 Female 707 1 8 CG10000 X514.F2.30493 4.730814 Female 514 2 |
|
Helo,
All problems should be easy. d <- read.table(text=" gene variable value gender line rep 1 CG10000 X208.F1.30456 4.758010 Female 208 1 2 CG10000 X365.F2.30478 4.915395 Female 365 2 3 CG10000 X799.F2.30509 4.641636 Female 799 2 4 CG10000 X306.M2.32650 4.550676 Male 306 2 5 CG10000 X712.M2.30830 4.633811 Male 712 2 6 CG10000 X732.M2.30504 4.857564 Male 732 2 7 CG10000 X707.F1.31120 5.104165 Female 707 1 8 CG10000 X514.F2.30493 4.730814 Female 514 2 ", header=TRUE) # See what we have str(d) # or put function(x) ...etc... in the aggregate f <- function(x) c(mean=mean(x), sd=sd(x)) aggregate(value ~ gene + gender, data = d, f) Hope this helps, Rui Barradas Em 18-07-2012 10:54, robgriffin247 escreveu: > Hi > I think/hope there will be a simple solution to this but google-ing has > provided no answers (probably not using the right words) > > I have a long data frame of >2 000 000 rows, and 6 columns. Across this > there are 24 000 combinations of gene in a column (n=12000) and gender in a > column (n=2... obviously). I want to create 2 new columns in the data frame > that on each row gives, in one column the mean value (of gene expression, in > the column called "value") for that row's gene&gender combination, and in > the other column the standard deviation for the gene&gender combination. > > Any suggestions? > > Rob > > Example of the top of the data frame: > > gene variable value gender line rep > 1 CG10000 X208.F1.30456 4.758010 Female 208 1 > 2 CG10000 X365.F2.30478 4.915395 Female 365 2 > 3 CG10000 X799.F2.30509 4.641636 Female 799 2 > 4 CG10000 X306.M2.32650 4.550676 Male 306 2 > 5 CG10000 X712.M2.30830 4.633811 Male 712 2 > 6 CG10000 X732.M2.30504 4.857564 Male 732 2 > 7 CG10000 X707.F1.31120 5.104165 Female 707 1 > 8 CG10000 X514.F2.30493 4.730814 Female 514 2 > > -- > View this message in context: http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks,
in a way this has worked... with a slight modification to this: narrow3<-aggregate(narrow2$value~narrow2$gene+narrow2$gender,data=narrow2,mean) narrow4<-aggregate(narrow2$value~narrow2$gene+narrow2$gender,data=narrow2,sd) which gives a table of the 24000 gene&gender means (narrow3) and the standard deviations (narrow4) which I then merge in to one df using narrow5<-merge(narrow3,narrow4,by=c("narrow2$gene","narrow2$gender")) colnames(narrow5)<-c("gene","gender","mean","sd") Is there a way I can lift the mean and std.dev. values from data frame narrow5 and paste them to the original narrow2 df? In effect... R would read what gene and gender each row of narrow2 has & then paste in the according mean value in to a new column. then do the same for a new sd column. each mean/sd value would occur in the new column 80 times (there are 80 occurrences of each gene&gender combination). rob |
|
got it... another merge did the trick
narrow6<-merge(narrow2,narrow5,by=c("gene","gender")) Thanks for the help Rui |
|
In reply to this post by Rui Barradas
On 2012-07-18 04:27, Rui Barradas wrote:
> Helo, > > All problems should be easy. > > > d <- read.table(text=" > gene variable value gender line rep > 1 CG10000 X208.F1.30456 4.758010 Female 208 1 > 2 CG10000 X365.F2.30478 4.915395 Female 365 2 > 3 CG10000 X799.F2.30509 4.641636 Female 799 2 > 4 CG10000 X306.M2.32650 4.550676 Male 306 2 > 5 CG10000 X712.M2.30830 4.633811 Male 712 2 > 6 CG10000 X732.M2.30504 4.857564 Male 732 2 > 7 CG10000 X707.F1.31120 5.104165 Female 707 1 > 8 CG10000 X514.F2.30493 4.730814 Female 514 2 > ", header=TRUE) > > # See what we have > str(d) > > # or put function(x) ...etc... in the aggregate > f <- function(x) c(mean=mean(x), sd=sd(x)) > aggregate(value ~ gene + gender, data = d, f) > > > Hope this helps, > > Rui Barradas I read the request a bit differently; we can use ave() to generate the requested new variables: d1 <- transform(d, MN = ave(value, gene, gender), SD = ave(value, gene, gender, FUN = sd)) Or use within() instead of transform(). Peter Ehlers > Em 18-07-2012 10:54, robgriffin247 escreveu: >> Hi >> I think/hope there will be a simple solution to this but google-ing has >> provided no answers (probably not using the right words) >> >> I have a long data frame of >2 000 000 rows, and 6 columns. Across this >> there are 24 000 combinations of gene in a column (n=12000) and gender in a >> column (n=2... obviously). I want to create 2 new columns in the data frame >> that on each row gives, in one column the mean value (of gene expression, in >> the column called "value") for that row's gene&gender combination, and in >> the other column the standard deviation for the gene&gender combination. >> >> Any suggestions? >> >> Rob >> >> Example of the top of the data frame: >> >> gene variable value gender line rep >> 1 CG10000 X208.F1.30456 4.758010 Female 208 1 >> 2 CG10000 X365.F2.30478 4.915395 Female 365 2 >> 3 CG10000 X799.F2.30509 4.641636 Female 799 2 >> 4 CG10000 X306.M2.32650 4.550676 Male 306 2 >> 5 CG10000 X712.M2.30830 4.633811 Male 712 2 >> 6 CG10000 X732.M2.30504 4.857564 Male 732 2 >> 7 CG10000 X707.F1.31120 5.104165 Female 707 1 >> 8 CG10000 X514.F2.30493 4.730814 Female 514 2 >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
