Logistic regression to select genes and estimate cutoff point?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Logistic regression to select genes and estimate cutoff point?

Ldong
Hi, all,
I am new to R or even to statistics. Not sure if the question has a answer. But I couldn't find a straight forward answer in the help mailing list.
I need use MicroArray data to select several diagnostic genes between Normal samples and Tumor samples and use these genes to predict unknow samples.
Since the sample size is so small and data doesn't follow normal distribution, I am thinking to use logistic regression instead of Student T test to select genes. To make the problem simpler, I assume each gene is independent to each other without interactions.
My questions is how I should build up the model: one model for each gene or a multiple variable model to include all genes? Which is the test to compare the discrimination power of each gene? I am thinking it is Wald statistic for the multiple variable model and Maximum likelihood for the single gene models? Am I  correct?
To estimate the cutoff point, I guess the answer is the gene expression when p=0.5 in the model. Am I on the right direction?
Any suggestion is appreciated!
Thanks a lot.
Lingsheng

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression to select genes and estimate cutoff point?

Ido Tamir
You could take a look at www.bioconductor.org
limma would be a good starting point.

hth
ido

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression to select genes and estimate cutoff point?

Frank Harrell
In reply to this post by Ldong
Lingsheng Dong wrote:
> Hi, all,
> I am new to R or even to statistics. Not sure if the question has a answer. But I couldn't find a straight forward answer in the help mailing list.
> I need use MicroArray data to select several diagnostic genes between Normal samples and Tumor samples and use these genes to predict unknow samples.
> Since the sample size is so small and data doesn't follow normal distribution, I am thinking to use logistic regression instead of Student T test to select genes. To make the problem simpler, I assume each gene is independent to each other without interactions.
> My questions is how I should build up the model: one model for each gene or a multiple variable model to include all genes? Which is the test to compare the discrimination power of each gene? I am thinking it is Wald statistic for the multiple variable model and Maximum likelihood for the single gene models? Am I  correct?
> To estimate the cutoff point, I guess the answer is the gene expression when p=0.5 in the model. Am I on the right direction?
> Any suggestion is appreciated!
> Thanks a lot.
> Lingsheng

Just a comment: Do you not have a statistician to work with at your
institution?  You are new to statistics and are asking a question that
would be very difficult to deal with for someone with a PhD in
statistics and 20 years of experience.  Some of the issues involved are
multiple comparisons, false discovery rate, shrinkage, array geometry
effects, nonparametric vs. parametric statistics, stability of selected
genes, discovery validation, ...

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank Harrell
Department of Biostatistics, Vanderbilt University