How to do covariate adjustment in R

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to do covariate adjustment in R

karena
Hi, I have a question about how to do covariate adjustment.

I have two sets of 'gene expression' data. They are from two different tissue types, 'liver' and 'brain', respectively.
The purpose of my analysis is to compare the pattern of the whole genome 'gene expression' between the two tissue types.
I have 'age' and 'sex' as covariates. Since 'age' and 'sex' definitely have influence on gene expression, I need to first filter out the proporation of 'gene expression' attributable to 'age' and 'sex', then compare the 'remaining gene expression' value between 'liver' and 'brain'.
How to do the covariate adjustment to remove the effects of these two covariates?
Should I do a 'step-wise' regression or something?
Which function in R should I use?

I am new to this field, and really appreciate your help!

thank you very much,

karena
Reply | Threaded
Open this post in threaded view
|

Re: How to do covariate adjustment in R

plangfelder
On Thu, May 19, 2011 at 7:21 PM, karena <[hidden email]> wrote:

> Hi, I have a question about how to do covariate adjustment.
>
> I have two sets of 'gene expression' data. They are from two different
> tissue types, 'liver' and 'brain', respectively.
> The purpose of my analysis is to compare the pattern of the whole genome
> 'gene expression' between the two tissue types.
> I have 'age' and 'sex' as covariates. Since 'age' and 'sex' definitely have
> influence on gene expression, I need to first filter out the proporation of
> 'gene expression' attributable to 'age' and 'sex', then compare the
> 'remaining gene expression' value between 'liver' and 'brain'.
> How to do the covariate adjustment to remove the effects of these two
> covariates?
> Should I do a 'step-wise' regression or something?
> Which function in R should I use?
>
> I am new to this field, and really appreciate your help!

Go to your local library and get an introductory book on linear
regression (or linear models) that also covers basics of multi-variate
regression.

When you learn a little bit about regression, read at least the first
few chapters of Julian Faraway's book on regression in R,
cran.r-project.org/doc/contrib/Faraway-PRA.pdf

When you're done, use roughly the following command, assuming age is
is a numeric variable and expression contains the expression data with
genes in columns.

sex.num = as.numeric(as.factor(sex))
tissue.num = as.numeric(as.factor(tissue))
fit = lm(expression~sex.num + age + tissue.num)

The variable fit will be a list with one element per gene. In each
element, look at the component $coefficients[4,4], which is the
p-value for the hypothesis that the coefficient of tissue is zero.

Since you likely have thousands of genes, you will have to do a
multiple testing correction, which you could do for example using the
qvalue package in which the function qvalue calculates the false
discovery rate.

Peter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to do covariate adjustment in R

karena
Thank you so much for this reply, Peter. It helps.

I know this is one way to adjust for covariates. However, if what I want is to get the 'remaining values' after adjustment. For example, say, 'gene expression' value is denoted as 'ge', and for each gene,
ge=a*age+b*sex+c*per_se

My question is: how to get the value of 'per_se' for each gene?

thanks again.

Karena
Reply | Threaded
Open this post in threaded view
|

Re: How to do covariate adjustment in R

Timothy Bates
Dear Karena,

x = 1:100
y = rnorm(100)

fit = lm(x~y)
# what properties does a fit have?
names(fit)
 # [1] "coefficients"  "residuals"     "effects"       "rank"          "fitted.values" "assign"        "qr"          
 # [8] "df.residual"   "xlevels"       "call"          "terms"         "model"        

x.resid = fit$residuals
plot(x~x.resid)
cor.test(y,x.resid) # -> r = 3.019388e-17


A gotcha is NAs in your data frame: Look at ?lm to see how to handle these properly
best, t

On 20 May 2011, at 6:58 AM, karena wrote:

> Thank you so much for this reply, Peter. It helps.
>
> I know this is one way to adjust for covariates. However, if what I want is
> to get the 'remaining values' after adjustment. For example, say, 'gene
> expression' value is denoted as 'ge', and for each gene,
> ge=a*age+b*sex+c*per_se
>
> My question is: how to get the value of 'per_se' for each gene?
>
> thanks again.
>
> Karena
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-do-covariate-adjustment-in-R-tp3537463p3537711.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to do covariate adjustment in R

karena
Thanks a lot for the great help, Timothy!

K
Reply | Threaded
Open this post in threaded view
|

Re: How to do covariate adjustment in R

plangfelder
In reply to this post by karena
On Thu, May 19, 2011 at 10:58 PM, karena <[hidden email]> wrote:
> Thank you so much for this reply, Peter. It helps.
>
> I know this is one way to adjust for covariates. However, if what I want is
> to get the 'remaining values' after adjustment. For example, say, 'gene
> expression' value is denoted as 'ge', and for each gene,
> ge=a*age+b*sex+c*per_se
>
> My question is: how to get the value of 'per_se' for each gene?

You could use the residuals after regressing the expression on sex and
age.  I think the call would be

residuals(lm(expression~age+sex))

but try it and see if it makes sense. Just note that the residual
depends on how you define sex and and age; for example, if you shift
age by 10, the residuals shift by a*age. Similarly, the residuals will
depend on whether you use sex values (1,2) or (0,1) or (2,1) etc.  for
(female, male). Thus you need to be careful when interpreting the
residuals. I would suggest standardizing age to mean zero so the
residuals have the interpretation of expression at average age.
Standardization for sex makes statistical but not biological sense
since there are no samples with average sex.

Peter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.