regression analysis

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

regression analysis

Silvano-7
Hi,

I have to do 10,000 linear regression analysis, and the response variable
(RESP) is the same for all independent variables (10,000).

y ~ x[i]

i = 1, ..., 10000

For each analysis must extract the p-value and put them in an orderly
increasing.

I thought an analysis of the type:

ana  = numeric(10000)
for(i in 1:10000){
 mod = lm(RESP~x[i]
 p-value[i] = summary(mod)$coe[2,4]
 }

Could someone suggest a reading material or any suggestions, I thank you.

---------------------------------------------
Silvano Cesar da Costa

Universidade Estadual de Londrina
Centro de Ciências Exatas
Departamento de Estatística

Fone: (43) 3371-4346

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: regression analysis

Michael Weylandt
Something like that should work, though you might need to construct
the formula as a string:

paste("y ~", names(x)[i])

instead.

More worrisome is the methodology: doing 10k regressions on a single
response is almost guaranteed to give spurious results. This
methodological mistake has different names in different fields, but
it's not too hard to illustrate:

If I have 10 patients with a rare disease and a list of what each of
them had for dinner for each night over the last 20 years, it's
practically guaranteed, that on one night, perhaps 1562 days ago, they
all had fish tacos for dinner. But to conclude that fish tacos cause
my rare disease on a 4 and a quarter year lag strains credibility....

I'll let you work out the details.

Michael

On Wed, Jul 25, 2012 at 4:03 PM, Silvano Cesar da Costa <[hidden email]> wrote:

> Hi,
>
> I have to do 10,000 linear regression analysis, and the response variable
> (RESP) is the same for all independent variables (10,000).
>
> y ~ x[i]
>
> i = 1, ..., 10000
>
> For each analysis must extract the p-value and put them in an orderly
> increasing.
>
> I thought an analysis of the type:
>
> ana  = numeric(10000)
> for(i in 1:10000){
>  mod = lm(RESP~x[i]
>  p-value[i] = summary(mod)$coe[2,4]
>  }
>
> Could someone suggest a reading material or any suggestions, I thank you.
>
> ---------------------------------------------
> Silvano Cesar da Costa
>
> Universidade Estadual de Londrina
> Centro de Ciências Exatas
> Departamento de Estatística
>
> Fone: (43) 3371-4346
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.