How to represent the effect of one covariate on regression results?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to represent the effect of one covariate on regression results?

anikaM
Hello,

I was running association analysis using --glm genotypic from:
https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
result looks like this:

    #CHROM    POS    ID    REF    ALT    A1    TEST    OBS_CT    BETA
  SE    Z_OR_F_STAT    P    ERRCODE
    10    135434303    rs11101905    G    A    A    ADD    11863
-0.110733    0.0986981    -1.12193    0.261891    .
    10    135434303    rs11101905    G    A    A    DOMDEV    11863
0.079797    0.111004    0.718868    0.472222    .
    10    135434303    rs11101905    G    A    A    sex=Female
11863    -0.120404    0.0536069    -2.24605    0.0247006    .
    10    135434303    rs11101905    G    A    A    age    11863
0.00524501    0.00391528    1.33963    0.180367    .
    10    135434303    rs11101905    G    A    A    PC1    11863
-0.0191779    0.0166868    -1.14928    0.25044    .
    10    135434303    rs11101905    G    A    A    PC2    11863
-0.0269939    0.0173086    -1.55957    0.118863    .
    10    135434303    rs11101905    G    A    A    PC3    11863
0.0115207    0.0168076    0.685448    0.493061    .
    10    135434303    rs11101905    G    A    A    PC4    11863
9.57832e-05    0.0124607    0.0076868    0.993867    .
    10    135434303    rs11101905    G    A    A    PC5    11863
-0.00191047    0.00543937    -0.35123    0.725416    .
    10    135434303    rs11101905    G    A    A    PC6    11863
-0.0103309    0.0159879    -0.646172    0.518168    .
    10    135434303    rs11101905    G    A    A    PC7    11863
0.00790997    0.0144025    0.549207    0.582863    .
    10    135434303    rs11101905    G    A    A    PC8    11863
-0.00205639    0.0142709    -0.144096    0.885424    .
    10    135434303    rs11101905    G    A    A    PC9    11863
-0.00873771    0.0057239    -1.52653    0.126878    .
    10    135434303    rs11101905    G    A    A    PC10    11863
0.0116197    0.0123826    0.938388    0.348045    .
    10    135434303    rs11101905    G    A    A    TD    11863
-0.670026    0.0962216    -6.96337    3.32228e-12    .
    10    135434303    rs11101905    G    A    A    array=Biobank
11863    0.160666    0.073631    2.18205    0.0291062    .
    10    135434303    rs11101905    G    A    A    HBA1C    11863
0.0265933    0.00168758    15.7583    6.0236e-56    .
    10    135434303    rs11101905    G    A    A    GENO_2DF    11863
  NA    NA    0.726514    0.483613    .

This results is shown just for one ID (rs11101905) there is about 2
million of those in the resulting file.

My question is how do I present/plot the effect of covariate "TD" in
the example it has "P" equal to 3.32228e-12 for all IDs in the
resulting file so that I show how much effect covariate "TD" has on
the analysis. Should I run another regression without covariate "TD"
and than do scatter plot of P values with and without "TD" covariate
or there is a better way to do this from the data I already have?

Thanks
Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to represent the effect of one covariate on regression results?

Abby Spurdle
I'm wondering if you want one of these:
(1) Plots of "Main Effects".
(2) "Partial Residual Plots".

Search for them, and you should be able to tell if they're what you want.

But a word of warning:

Many people (including many senior statisticians) misinterpret this
kind of information.
Because, it's always the effect of xj on Y, while holding the other
variables *constant*.
That's not as simple as it sounds, and people have a tendency of
disregarding the importance of the second half of that sentence, in
their final interpretations.


P.S.
John Fox, announced a package with support for Regression Diagnostics,
about 11 days ago:
https://stat.ethz.ch/pipermail/r-help/2020-September/468609.html

I'm not sure how relevant it is to your question, but I just glanced
at the vignette, and it's pretty slick...




On Tue, Sep 15, 2020 at 1:30 AM Ana Marija <[hidden email]> wrote:

>
> Hello,
>
> I was running association analysis using --glm genotypic from:
> https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
> sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
> result looks like this:
>
>     #CHROM    POS    ID    REF    ALT    A1    TEST    OBS_CT    BETA
>   SE    Z_OR_F_STAT    P    ERRCODE
>     10    135434303    rs11101905    G    A    A    ADD    11863
> -0.110733    0.0986981    -1.12193    0.261891    .
>     10    135434303    rs11101905    G    A    A    DOMDEV    11863
> 0.079797    0.111004    0.718868    0.472222    .
>     10    135434303    rs11101905    G    A    A    sex=Female
> 11863    -0.120404    0.0536069    -2.24605    0.0247006    .
>     10    135434303    rs11101905    G    A    A    age    11863
> 0.00524501    0.00391528    1.33963    0.180367    .
>     10    135434303    rs11101905    G    A    A    PC1    11863
> -0.0191779    0.0166868    -1.14928    0.25044    .
>     10    135434303    rs11101905    G    A    A    PC2    11863
> -0.0269939    0.0173086    -1.55957    0.118863    .
>     10    135434303    rs11101905    G    A    A    PC3    11863
> 0.0115207    0.0168076    0.685448    0.493061    .
>     10    135434303    rs11101905    G    A    A    PC4    11863
> 9.57832e-05    0.0124607    0.0076868    0.993867    .
>     10    135434303    rs11101905    G    A    A    PC5    11863
> -0.00191047    0.00543937    -0.35123    0.725416    .
>     10    135434303    rs11101905    G    A    A    PC6    11863
> -0.0103309    0.0159879    -0.646172    0.518168    .
>     10    135434303    rs11101905    G    A    A    PC7    11863
> 0.00790997    0.0144025    0.549207    0.582863    .
>     10    135434303    rs11101905    G    A    A    PC8    11863
> -0.00205639    0.0142709    -0.144096    0.885424    .
>     10    135434303    rs11101905    G    A    A    PC9    11863
> -0.00873771    0.0057239    -1.52653    0.126878    .
>     10    135434303    rs11101905    G    A    A    PC10    11863
> 0.0116197    0.0123826    0.938388    0.348045    .
>     10    135434303    rs11101905    G    A    A    TD    11863
> -0.670026    0.0962216    -6.96337    3.32228e-12    .
>     10    135434303    rs11101905    G    A    A    array=Biobank
> 11863    0.160666    0.073631    2.18205    0.0291062    .
>     10    135434303    rs11101905    G    A    A    HBA1C    11863
> 0.0265933    0.00168758    15.7583    6.0236e-56    .
>     10    135434303    rs11101905    G    A    A    GENO_2DF    11863
>   NA    NA    0.726514    0.483613    .
>
> This results is shown just for one ID (rs11101905) there is about 2
> million of those in the resulting file.
>
> My question is how do I present/plot the effect of covariate "TD" in
> the example it has "P" equal to 3.32228e-12 for all IDs in the
> resulting file so that I show how much effect covariate "TD" has on
> the analysis. Should I run another regression without covariate "TD"
> and than do scatter plot of P values with and without "TD" covariate
> or there is a better way to do this from the data I already have?
>
> Thanks
> Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to represent the effect of one covariate on regression results?

David Winsemius
In reply to this post by anikaM
There is a user-group for PLINK, easily found by looking at the page you
cited. This is not the correct place to submit such questions.


https://groups.google.com/g/plink2-users?pli=1


--

David.

On 9/14/20 6:29 AM, Ana Marija wrote:

> Hello,
>
> I was running association analysis using --glm genotypic from:
> https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
> sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
> result looks like this:
>
>      #CHROM    POS    ID    REF    ALT    A1    TEST    OBS_CT    BETA
>    SE    Z_OR_F_STAT    P    ERRCODE
>      10    135434303    rs11101905    G    A    A    ADD    11863
> -0.110733    0.0986981    -1.12193    0.261891    .
>      10    135434303    rs11101905    G    A    A    DOMDEV    11863
> 0.079797    0.111004    0.718868    0.472222    .
>      10    135434303    rs11101905    G    A    A    sex=Female
> 11863    -0.120404    0.0536069    -2.24605    0.0247006    .
>      10    135434303    rs11101905    G    A    A    age    11863
> 0.00524501    0.00391528    1.33963    0.180367    .
>      10    135434303    rs11101905    G    A    A    PC1    11863
> -0.0191779    0.0166868    -1.14928    0.25044    .
>      10    135434303    rs11101905    G    A    A    PC2    11863
> -0.0269939    0.0173086    -1.55957    0.118863    .
>      10    135434303    rs11101905    G    A    A    PC3    11863
> 0.0115207    0.0168076    0.685448    0.493061    .
>      10    135434303    rs11101905    G    A    A    PC4    11863
> 9.57832e-05    0.0124607    0.0076868    0.993867    .
>      10    135434303    rs11101905    G    A    A    PC5    11863
> -0.00191047    0.00543937    -0.35123    0.725416    .
>      10    135434303    rs11101905    G    A    A    PC6    11863
> -0.0103309    0.0159879    -0.646172    0.518168    .
>      10    135434303    rs11101905    G    A    A    PC7    11863
> 0.00790997    0.0144025    0.549207    0.582863    .
>      10    135434303    rs11101905    G    A    A    PC8    11863
> -0.00205639    0.0142709    -0.144096    0.885424    .
>      10    135434303    rs11101905    G    A    A    PC9    11863
> -0.00873771    0.0057239    -1.52653    0.126878    .
>      10    135434303    rs11101905    G    A    A    PC10    11863
> 0.0116197    0.0123826    0.938388    0.348045    .
>      10    135434303    rs11101905    G    A    A    TD    11863
> -0.670026    0.0962216    -6.96337    3.32228e-12    .
>      10    135434303    rs11101905    G    A    A    array=Biobank
> 11863    0.160666    0.073631    2.18205    0.0291062    .
>      10    135434303    rs11101905    G    A    A    HBA1C    11863
> 0.0265933    0.00168758    15.7583    6.0236e-56    .
>      10    135434303    rs11101905    G    A    A    GENO_2DF    11863
>    NA    NA    0.726514    0.483613    .
>
> This results is shown just for one ID (rs11101905) there is about 2
> million of those in the resulting file.
>
> My question is how do I present/plot the effect of covariate "TD" in
> the example it has "P" equal to 3.32228e-12 for all IDs in the
> resulting file so that I show how much effect covariate "TD" has on
> the analysis. Should I run another regression without covariate "TD"
> and than do scatter plot of P values with and without "TD" covariate
> or there is a better way to do this from the data I already have?
>
> Thanks
> Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to represent the effect of one covariate on regression results?

anikaM
Hi Abby and David,

Thanks for the useful tips! I will check those.

I completed the regression analysis in plink (as R would be very slow
for my sample size) but as I mentioned I need to determine the
influence of a specific covariate in my results and Plink is of no
help there.

I did Pearson correlation analysis for P values which I got in
regression with and without my covariate of interest and I got this:

> cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95)

    Pearson's product-moment correlation

data:  tt$P_TD and tt$P_noTD
t = 20.17, df = 283, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7156134 0.8117108
sample estimates:
      cor
0.7679493

I can see the p values are very correlated in those two instances. Can
I conclude that my covariate then doesn't have a huge effect or what
kind of conclusion I can draw from that?

Thanks for all your help
Ana



On Tue, Sep 15, 2020 at 1:26 AM David Winsemius <[hidden email]> wrote:

>
> There is a user-group for PLINK, easily found by looking at the page you
> cited. This is not the correct place to submit such questions.
>
>
> https://groups.google.com/g/plink2-users?pli=1
>
>
> --
>
> David.
>
> On 9/14/20 6:29 AM, Ana Marija wrote:
> > Hello,
> >
> > I was running association analysis using --glm genotypic from:
> > https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
> > sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
> > result looks like this:
> >
> >      #CHROM    POS    ID    REF    ALT    A1    TEST    OBS_CT    BETA
> >    SE    Z_OR_F_STAT    P    ERRCODE
> >      10    135434303    rs11101905    G    A    A    ADD    11863
> > -0.110733    0.0986981    -1.12193    0.261891    .
> >      10    135434303    rs11101905    G    A    A    DOMDEV    11863
> > 0.079797    0.111004    0.718868    0.472222    .
> >      10    135434303    rs11101905    G    A    A    sex=Female
> > 11863    -0.120404    0.0536069    -2.24605    0.0247006    .
> >      10    135434303    rs11101905    G    A    A    age    11863
> > 0.00524501    0.00391528    1.33963    0.180367    .
> >      10    135434303    rs11101905    G    A    A    PC1    11863
> > -0.0191779    0.0166868    -1.14928    0.25044    .
> >      10    135434303    rs11101905    G    A    A    PC2    11863
> > -0.0269939    0.0173086    -1.55957    0.118863    .
> >      10    135434303    rs11101905    G    A    A    PC3    11863
> > 0.0115207    0.0168076    0.685448    0.493061    .
> >      10    135434303    rs11101905    G    A    A    PC4    11863
> > 9.57832e-05    0.0124607    0.0076868    0.993867    .
> >      10    135434303    rs11101905    G    A    A    PC5    11863
> > -0.00191047    0.00543937    -0.35123    0.725416    .
> >      10    135434303    rs11101905    G    A    A    PC6    11863
> > -0.0103309    0.0159879    -0.646172    0.518168    .
> >      10    135434303    rs11101905    G    A    A    PC7    11863
> > 0.00790997    0.0144025    0.549207    0.582863    .
> >      10    135434303    rs11101905    G    A    A    PC8    11863
> > -0.00205639    0.0142709    -0.144096    0.885424    .
> >      10    135434303    rs11101905    G    A    A    PC9    11863
> > -0.00873771    0.0057239    -1.52653    0.126878    .
> >      10    135434303    rs11101905    G    A    A    PC10    11863
> > 0.0116197    0.0123826    0.938388    0.348045    .
> >      10    135434303    rs11101905    G    A    A    TD    11863
> > -0.670026    0.0962216    -6.96337    3.32228e-12    .
> >      10    135434303    rs11101905    G    A    A    array=Biobank
> > 11863    0.160666    0.073631    2.18205    0.0291062    .
> >      10    135434303    rs11101905    G    A    A    HBA1C    11863
> > 0.0265933    0.00168758    15.7583    6.0236e-56    .
> >      10    135434303    rs11101905    G    A    A    GENO_2DF    11863
> >    NA    NA    0.726514    0.483613    .
> >
> > This results is shown just for one ID (rs11101905) there is about 2
> > million of those in the resulting file.
> >
> > My question is how do I present/plot the effect of covariate "TD" in
> > the example it has "P" equal to 3.32228e-12 for all IDs in the
> > resulting file so that I show how much effect covariate "TD" has on
> > the analysis. Should I run another regression without covariate "TD"
> > and than do scatter plot of P values with and without "TD" covariate
> > or there is a better way to do this from the data I already have?
> >
> > Thanks
> > Ana
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to represent the effect of one covariate on regression results?

Abby Spurdle
In reply to this post by anikaM
> My question is how do I present/plot the effect of covariate "TD" in
> the example it has "P" equal to 3.32228e-12 for all IDs in the
> resulting file so that I show how much effect covariate "TD" has on
> the analysis. Should I run another regression without covariate "TD"

I'll take a second shot in the dark:

There is R^2, and a number of generalizations.
(The most common of which, is probably adjusted R^2).
And there are various other goodness of fit tests.

https://en.wikipedia.org/wiki/Goodness_of_fit
https://en.wikipedia.org/wiki/Coefficient_of_determination

You could fit two models (one with a particular variable included, and
one without), and compare how the statistic changes.

However, I'm probably going to get told off, for going off-topic.
So, unless any further questions are specific to R programming, I
don't think I'm going to contribute further.

Also, I'd recommend you read some notes on statistical modelling, or
consult an expert, or both.
And I suspect there are additional considerations modelling genetic data.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to represent the effect of one covariate on regression results?

David Winsemius
In reply to this post by anikaM

On 9/15/20 8:57 AM, Ana Marija wrote:

> Hi Abby and David,
>
> Thanks for the useful tips! I will check those.
>
> I completed the regression analysis in plink (as R would be very slow
> for my sample size) but as I mentioned I need to determine the
> influence of a specific covariate in my results and Plink is of no
> help there.
>
> I did Pearson correlation analysis for P values which I got in
> regression with and without my covariate of interest and I got this:
>
>> cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95)
>      Pearson's product-moment correlation
>
> data:  tt$P_TD and tt$P_noTD
> t = 20.17, df = 283, p-value < 2.2e-16
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
>   0.7156134 0.8117108
> sample estimates:
>        cor
> 0.7679493
>
> I can see the p values are very correlated in those two instances. Can
> I conclude that my covariate then doesn't have a huge effect or what
> kind of conclusion I can draw from that?


I do not think it follows from the correlation of p-values that your
covariate "does not have a huge effect". P-values are not really data,
although they are random values. A simulation study of this would
require a much better description of the original dataset. Again, that
is something that the users of Plink are more likely to be able to
intuit than are we. I still do not see why this question is not being
addressed to the users of the software from which you are deriving your
"data".


--

David.

>
> Thanks for all your help
> Ana
>
>
>
> On Tue, Sep 15, 2020 at 1:26 AM David Winsemius <[hidden email]> wrote:
>> There is a user-group for PLINK, easily found by looking at the page you
>> cited. This is not the correct place to submit such questions.
>>
>>
>> https://groups.google.com/g/plink2-users?pli=1
>>
>>
>> --
>>
>> David.
>>
>> On 9/14/20 6:29 AM, Ana Marija wrote:
>>> Hello,
>>>
>>> I was running association analysis using --glm genotypic from:
>>> https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
>>> sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
>>> result looks like this:
>>>
>>>       #CHROM    POS    ID    REF    ALT    A1    TEST    OBS_CT    BETA
>>>     SE    Z_OR_F_STAT    P    ERRCODE
>>>       10    135434303    rs11101905    G    A    A    ADD    11863
>>> -0.110733    0.0986981    -1.12193    0.261891    .
>>>       10    135434303    rs11101905    G    A    A    DOMDEV    11863
>>> 0.079797    0.111004    0.718868    0.472222    .
>>>       10    135434303    rs11101905    G    A    A    sex=Female
>>> 11863    -0.120404    0.0536069    -2.24605    0.0247006    .
>>>       10    135434303    rs11101905    G    A    A    age    11863
>>> 0.00524501    0.00391528    1.33963    0.180367    .
>>>       10    135434303    rs11101905    G    A    A    PC1    11863
>>> -0.0191779    0.0166868    -1.14928    0.25044    .
>>>       10    135434303    rs11101905    G    A    A    PC2    11863
>>> -0.0269939    0.0173086    -1.55957    0.118863    .
>>>       10    135434303    rs11101905    G    A    A    PC3    11863
>>> 0.0115207    0.0168076    0.685448    0.493061    .
>>>       10    135434303    rs11101905    G    A    A    PC4    11863
>>> 9.57832e-05    0.0124607    0.0076868    0.993867    .
>>>       10    135434303    rs11101905    G    A    A    PC5    11863
>>> -0.00191047    0.00543937    -0.35123    0.725416    .
>>>       10    135434303    rs11101905    G    A    A    PC6    11863
>>> -0.0103309    0.0159879    -0.646172    0.518168    .
>>>       10    135434303    rs11101905    G    A    A    PC7    11863
>>> 0.00790997    0.0144025    0.549207    0.582863    .
>>>       10    135434303    rs11101905    G    A    A    PC8    11863
>>> -0.00205639    0.0142709    -0.144096    0.885424    .
>>>       10    135434303    rs11101905    G    A    A    PC9    11863
>>> -0.00873771    0.0057239    -1.52653    0.126878    .
>>>       10    135434303    rs11101905    G    A    A    PC10    11863
>>> 0.0116197    0.0123826    0.938388    0.348045    .
>>>       10    135434303    rs11101905    G    A    A    TD    11863
>>> -0.670026    0.0962216    -6.96337    3.32228e-12    .
>>>       10    135434303    rs11101905    G    A    A    array=Biobank
>>> 11863    0.160666    0.073631    2.18205    0.0291062    .
>>>       10    135434303    rs11101905    G    A    A    HBA1C    11863
>>> 0.0265933    0.00168758    15.7583    6.0236e-56    .
>>>       10    135434303    rs11101905    G    A    A    GENO_2DF    11863
>>>     NA    NA    0.726514    0.483613    .
>>>
>>> This results is shown just for one ID (rs11101905) there is about 2
>>> million of those in the resulting file.
>>>
>>> My question is how do I present/plot the effect of covariate "TD" in
>>> the example it has "P" equal to 3.32228e-12 for all IDs in the
>>> resulting file so that I show how much effect covariate "TD" has on
>>> the analysis. Should I run another regression without covariate "TD"
>>> and than do scatter plot of P values with and without "TD" covariate
>>> or there is a better way to do this from the data I already have?
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to represent the effect of one covariate on regression results?

anikaM
Hi David,

thanks for the useful insight I did of course wrote to plink user
group but no answer there. I guess they are more concerned about how
to run commands with plink as oppose to interpret results.

What I can tell about my cohort is that about 80% of cases had Type 2
diabetes while about 8% had Type 1. (my TD covariate is reference for
the type of diabetes) In the attach is the description of the data.

Cheers,
Ana

On Tue, Sep 15, 2020 at 7:59 PM David Winsemius <[hidden email]> wrote:

>
>
> On 9/15/20 8:57 AM, Ana Marija wrote:
> > Hi Abby and David,
> >
> > Thanks for the useful tips! I will check those.
> >
> > I completed the regression analysis in plink (as R would be very slow
> > for my sample size) but as I mentioned I need to determine the
> > influence of a specific covariate in my results and Plink is of no
> > help there.
> >
> > I did Pearson correlation analysis for P values which I got in
> > regression with and without my covariate of interest and I got this:
> >
> >> cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95)
> >      Pearson's product-moment correlation
> >
> > data:  tt$P_TD and tt$P_noTD
> > t = 20.17, df = 283, p-value < 2.2e-16
> > alternative hypothesis: true correlation is not equal to 0
> > 95 percent confidence interval:
> >   0.7156134 0.8117108
> > sample estimates:
> >        cor
> > 0.7679493
> >
> > I can see the p values are very correlated in those two instances. Can
> > I conclude that my covariate then doesn't have a huge effect or what
> > kind of conclusion I can draw from that?
>
>
> I do not think it follows from the correlation of p-values that your
> covariate "does not have a huge effect". P-values are not really data,
> although they are random values. A simulation study of this would
> require a much better description of the original dataset. Again, that
> is something that the users of Plink are more likely to be able to
> intuit than are we. I still do not see why this question is not being
> addressed to the users of the software from which you are deriving your
> "data".
>
>
> --
>
> David.
>
> >
> > Thanks for all your help
> > Ana
> >
> >
> >
> > On Tue, Sep 15, 2020 at 1:26 AM David Winsemius <[hidden email]> wrote:
> >> There is a user-group for PLINK, easily found by looking at the page you
> >> cited. This is not the correct place to submit such questions.
> >>
> >>
> >> https://groups.google.com/g/plink2-users?pli=1
> >>
> >>
> >> --
> >>
> >> David.
> >>
> >> On 9/14/20 6:29 AM, Ana Marija wrote:
> >>> Hello,
> >>>
> >>> I was running association analysis using --glm genotypic from:
> >>> https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
> >>> sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
> >>> result looks like this:
> >>>
> >>>       #CHROM    POS    ID    REF    ALT    A1    TEST    OBS_CT    BETA
> >>>     SE    Z_OR_F_STAT    P    ERRCODE
> >>>       10    135434303    rs11101905    G    A    A    ADD    11863
> >>> -0.110733    0.0986981    -1.12193    0.261891    .
> >>>       10    135434303    rs11101905    G    A    A    DOMDEV    11863
> >>> 0.079797    0.111004    0.718868    0.472222    .
> >>>       10    135434303    rs11101905    G    A    A    sex=Female
> >>> 11863    -0.120404    0.0536069    -2.24605    0.0247006    .
> >>>       10    135434303    rs11101905    G    A    A    age    11863
> >>> 0.00524501    0.00391528    1.33963    0.180367    .
> >>>       10    135434303    rs11101905    G    A    A    PC1    11863
> >>> -0.0191779    0.0166868    -1.14928    0.25044    .
> >>>       10    135434303    rs11101905    G    A    A    PC2    11863
> >>> -0.0269939    0.0173086    -1.55957    0.118863    .
> >>>       10    135434303    rs11101905    G    A    A    PC3    11863
> >>> 0.0115207    0.0168076    0.685448    0.493061    .
> >>>       10    135434303    rs11101905    G    A    A    PC4    11863
> >>> 9.57832e-05    0.0124607    0.0076868    0.993867    .
> >>>       10    135434303    rs11101905    G    A    A    PC5    11863
> >>> -0.00191047    0.00543937    -0.35123    0.725416    .
> >>>       10    135434303    rs11101905    G    A    A    PC6    11863
> >>> -0.0103309    0.0159879    -0.646172    0.518168    .
> >>>       10    135434303    rs11101905    G    A    A    PC7    11863
> >>> 0.00790997    0.0144025    0.549207    0.582863    .
> >>>       10    135434303    rs11101905    G    A    A    PC8    11863
> >>> -0.00205639    0.0142709    -0.144096    0.885424    .
> >>>       10    135434303    rs11101905    G    A    A    PC9    11863
> >>> -0.00873771    0.0057239    -1.52653    0.126878    .
> >>>       10    135434303    rs11101905    G    A    A    PC10    11863
> >>> 0.0116197    0.0123826    0.938388    0.348045    .
> >>>       10    135434303    rs11101905    G    A    A    TD    11863
> >>> -0.670026    0.0962216    -6.96337    3.32228e-12    .
> >>>       10    135434303    rs11101905    G    A    A    array=Biobank
> >>> 11863    0.160666    0.073631    2.18205    0.0291062    .
> >>>       10    135434303    rs11101905    G    A    A    HBA1C    11863
> >>> 0.0265933    0.00168758    15.7583    6.0236e-56    .
> >>>       10    135434303    rs11101905    G    A    A    GENO_2DF    11863
> >>>     NA    NA    0.726514    0.483613    .
> >>>
> >>> This results is shown just for one ID (rs11101905) there is about 2
> >>> million of those in the resulting file.
> >>>
> >>> My question is how do I present/plot the effect of covariate "TD" in
> >>> the example it has "P" equal to 3.32228e-12 for all IDs in the
> >>> resulting file so that I show how much effect covariate "TD" has on
> >>> the analysis. Should I run another regression without covariate "TD"
> >>> and than do scatter plot of P values with and without "TD" covariate
> >>> or there is a better way to do this from the data I already have?
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

data.png (76K) Download Attachment