Remove highly correlated variables from a data frame or matrix

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Remove highly correlated variables from a data frame or matrix

anikaM
Hello,

I have a data frame like this (a matrix):
head(calc.rho)
            rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
rs35350506      0.975     0.309     0.371     0.371     0.371     0.638

> dim(calc.rho)
[1] 246 246

I would like to remove from this data all highly correlated variables,
with correlation more than 0.8

I tried this:

> data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> dim(data)
[1] 246   0

Can you please advise,

Thanks
Ana

But this removes everything.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

Bert Gunter-2
Obvious advice:

DON'T DO THIS!

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <[hidden email]>
wrote:

> Hello,
>
> I have a data frame like this (a matrix):
> head(calc.rho)
>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
>
> > dim(calc.rho)
> [1] 246 246
>
> I would like to remove from this data all highly correlated variables,
> with correlation more than 0.8
>
> I tried this:
>
> > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > dim(data)
> [1] 246   0
>
> Can you please advise,
>
> Thanks
> Ana
>
> But this removes everything.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

anikaM
I don't understand. I have to keep only pairs of variables with
correlation less than 0.8 in order to proceed with some calculations

On Thu, Nov 14, 2019 at 2:09 PM Bert Gunter <[hidden email]> wrote:

>
> Obvious advice:
>
> DON'T DO THIS!
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <[hidden email]> wrote:
>>
>> Hello,
>>
>> I have a data frame like this (a matrix):
>> head(calc.rho)
>>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
>> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
>> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
>> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
>> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
>> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
>> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
>>
>> > dim(calc.rho)
>> [1] 246 246
>>
>> I would like to remove from this data all highly correlated variables,
>> with correlation more than 0.8
>>
>> I tried this:
>>
>> > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
>> > dim(data)
>> [1] 246   0
>>
>> Can you please advise,
>>
>> Thanks
>> Ana
>>
>> But this removes everything.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

Abby Spurdle
In reply to this post by anikaM
Sorry, but I don't understand your question.

When I first looked at this, I thought it was a correlation (or
covariance) matrix.
e.g.

> cor (quakes)
> cov (quakes)

However, your  row and column variables are different, implying two
different data sets.
Also, some of the (correlation?) coefficients are the same, implying
that some of the variables are the same, or very close.

Also, note that a matrix is not a data.frame.


> I have a data frame like this (a matrix):
> head(calc.rho)
>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> > dim(calc.rho)
> [1] 246 246
> I would like to remove from this data all highly correlated variables,
> with correlation more than 0.8

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

anikaM
it can be converted between data frame and matrix. I am attaching here
the whole file for examination

I basically want to remove all entries for pairs which have value in
between them (correlation calculated not in R, bit it is correlation,
r2)
so for example I would not keep: rs883504 because it has r2>0.8 for
all those rs...

                  rs8069610 rs883504 rs8072394 rs4280293 rs4465638 rs12602378
rs56192520      0.582    0.903     0.582     0.582     0.811      0.302
rs3764410       0.598    0.928     0.598     0.598     0.836      0.311
rs145984817     0.638    0.975     0.638     0.638     0.879      0.344
rs1807401       0.638    0.975     0.638     0.638     0.879      0.344
rs1807402       0.638    0.975     0.638     0.638     0.879      0.344
rs35350506      0.638    0.975     0.638     0.638     0.879      0.344


On Thu, Nov 14, 2019 at 2:29 PM Abby Spurdle <[hidden email]> wrote:

>
> Sorry, but I don't understand your question.
>
> When I first looked at this, I thought it was a correlation (or
> covariance) matrix.
> e.g.
>
> > cor (quakes)
> > cov (quakes)
>
> However, your  row and column variables are different, implying two
> different data sets.
> Also, some of the (correlation?) coefficients are the same, implying
> that some of the variables are the same, or very close.
>
> Also, note that a matrix is not a data.frame.
>
>
> > I have a data frame like this (a matrix):
> > head(calc.rho)
> >             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> > rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> > rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> > rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> > rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> > rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> > rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> > > dim(calc.rho)
> > [1] 246 246
> > I would like to remove from this data all highly correlated variables,
> > with correlation more than 0.8

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

ro246_matrix.txt (466K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

Abby Spurdle
> I basically want to remove all entries for pairs which have value in
> between them (correlation calculated not in R, bit it is correlation,
> r2)
> so for example I would not keep: rs883504 because it has r2>0.8 for
> all those rs...

I'm still not sure what "remove all entries" means?
In your example rs883504, has all correlation coefficients > 0.8, in
the data returned by head().
However, most of its correlation coefficients are < 0.8, if you
include the entire matrix.

If you remove a variable that has at least one correlation coefficient
> 0.8, you would remove all the variables.
However, if you remove a variable that has all correlation
coefficients > 0.8, you would (probably) remove no variables.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

Abby Spurdle
That's assuming your data was returned by head().

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

anikaM
what would be the approach to remove variable that has at least 2
correlation coefficients >0.8?
this is the whole output of the head()

> head(calc.rho)
            rs56192520 rs3764410 rs145984817 rs1807401 rs1807402 rs35350506
rs56192520       1.000     0.976       0.927     0.927     0.927      0.927
rs3764410        0.976     1.000       0.952     0.952     0.952      0.952
rs145984817      0.927     0.952       1.000     1.000     1.000      1.000
rs1807401        0.927     0.952       1.000     1.000     1.000      1.000
rs1807402        0.927     0.952       1.000     1.000     1.000      1.000
rs35350506       0.927     0.952       1.000     1.000     1.000      1.000
            rs2089177 rs12325677 rs62064624 rs62064631 rs2349295 rs2174369
rs56192520      0.927      0.927      0.927      0.927     0.709     0.903
rs3764410       0.952      0.952      0.952      0.952     0.728     0.928
rs145984817     1.000      1.000      1.000      1.000     0.771     0.975
rs1807401       1.000      1.000      1.000      1.000     0.771     0.975
rs1807402       1.000      1.000      1.000      1.000     0.771     0.975
rs35350506      1.000      1.000      1.000      1.000     0.771     0.975
            rs7218554 rs62064634 rs4360974 rs4527060 rs6502526 rs6502527
rs56192520      0.903      0.903     0.903     0.903     0.903     0.903
rs3764410       0.928      0.928     0.928     0.928     0.928     0.928
rs145984817     0.975      0.975     0.975     0.975     0.975     0.975
rs1807401       0.975      0.975     0.975     0.975     0.975     0.975
rs1807402       0.975      0.975     0.975     0.975     0.975     0.975
rs35350506      0.975      0.975     0.975     0.975     0.975     0.975
            rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
            rs7211086 rs9905280 rs8073305 rs8072086 rs4312350 rs4313843
rs56192520      0.880     0.268     0.327     0.880     0.880     0.880
rs3764410       0.905     0.276     0.336     0.905     0.905     0.905
rs145984817     0.951     0.309     0.371     0.951     0.951     0.951
rs1807401       0.951     0.309     0.371     0.951     0.951     0.951
rs1807402       0.951     0.309     0.371     0.951     0.951     0.951
rs35350506      0.951     0.309     0.371     0.951     0.951     0.951
            rs8069610 rs883504 rs8072394 rs4280293 rs4465638 rs12602378
rs56192520      0.582    0.903     0.582     0.582     0.811      0.302
rs3764410       0.598    0.928     0.598     0.598     0.836      0.311
rs145984817     0.638    0.975     0.638     0.638     0.879      0.344
rs1807401       0.638    0.975     0.638     0.638     0.879      0.344
rs1807402       0.638    0.975     0.638     0.638     0.879      0.344
rs35350506      0.638    0.975     0.638     0.638     0.879      0.344
            rs9899059 rs6502530 rs4380085 rs6502532 rs4792798 rs4792799
rs56192520      0.302     0.309     0.834     0.251     0.063     0.063
rs3764410       0.311     0.318     0.858     0.259     0.080     0.080
rs145984817     0.344     0.352     0.902     0.291     0.086     0.086
rs1807401       0.344     0.352     0.902     0.291     0.086     0.086
rs1807402       0.344     0.352     0.902     0.291     0.086     0.086
rs35350506      0.344     0.352     0.902     0.291     0.086     0.086
            rs4316813 rs148563931 rs74751226 rs8068857 rs8069441 rs77397878
rs56192520      0.006       0.006      0.006     0.006     0.006      0.006
rs3764410       0.006       0.006      0.006     0.006     0.006      0.006
rs145984817     0.006       0.006      0.006     0.006     0.006      0.006
rs1807401       0.006       0.006      0.006     0.006     0.006      0.006
rs1807402       0.006       0.006      0.006     0.006     0.006      0.006
rs35350506      0.006       0.006      0.006     0.006     0.006      0.006
            rs75339756 rs4608391 rs79569548 rs4275914 rs11870422 rs8075751
rs56192520       0.006     0.006      0.006     0.044      0.007     0.004
rs3764410        0.006     0.006      0.006     0.042      0.005     0.005
rs145984817      0.006     0.006      0.006     0.047      0.002     0.015
rs1807401        0.006     0.006      0.006     0.047      0.002     0.015
rs1807402        0.006     0.006      0.006     0.047      0.002     0.015
rs35350506       0.006     0.006      0.006     0.047      0.002     0.015
            rs11658904 rs138437542 rs80344434 rs7222311 rs7221842 rs7223686
rs56192520       0.003       0.004      0.004     0.033     0.009     0.000
rs3764410        0.004       0.004      0.004     0.031     0.007     0.000
rs145984817      0.010       0.004      0.004     0.035     0.011     0.005
rs1807401        0.010       0.004      0.004     0.035     0.011     0.005
rs1807402        0.010       0.004      0.004     0.035     0.011     0.005
rs35350506       0.010       0.004      0.004     0.035     0.011     0.005
            rs78013597 rs74965036 rs78063986 rs118106233 rs117345712
rs56192520       0.004      0.004      0.004       0.004       0.005
rs3764410        0.004      0.004      0.004       0.004       0.006
rs145984817      0.004      0.004      0.004       0.004       0.005
rs1807401        0.004      0.004      0.004       0.004       0.005
rs1807402        0.004      0.004      0.004       0.004       0.005
rs35350506       0.004      0.004      0.004       0.004       0.005
            rs113004656 rs9898995 rs4985718 rs9893911 rs79110942 rs7208929
rs56192520        0.004     0.033     0.033     0.023      0.004     0.023
rs3764410         0.004     0.031     0.031     0.021      0.004     0.021
rs145984817       0.004     0.035     0.035     0.025      0.004     0.025
rs1807401         0.004     0.035     0.035     0.025      0.004     0.025
rs1807402         0.004     0.035     0.035     0.025      0.004     0.025
rs35350506        0.004     0.035     0.035     0.025      0.004     0.025
            rs12601453 rs4078062 rs75129280 rs76664572 rs78961289 rs146364798
rs56192520       0.004     0.001      0.004      0.004      0.004       0.004
rs3764410        0.004     0.002      0.004      0.004      0.004       0.004
rs145984817      0.004     0.001      0.004      0.004      0.004       0.004
rs1807401        0.004     0.001      0.004      0.004      0.004       0.004
rs1807402        0.004     0.001      0.004      0.004      0.004       0.004
rs35350506       0.004     0.001      0.004      0.004      0.004       0.004
            rs76715413 rs4078534 rs79457460 rs74369938 rs76423171 rs74668400
rs56192520           0     0.004      0.004      0.002      0.004      0.004
rs3764410            0     0.004      0.004      0.001      0.004      0.004
rs145984817          0     0.004      0.004      0.005      0.004      0.004
rs1807401            0     0.004      0.004      0.005      0.004      0.004
rs1807402            0     0.004      0.004      0.005      0.004      0.004
rs35350506           0     0.004      0.004      0.005      0.004      0.004
            rs75146120 rs1135237 rs9914671 rs117759512 rs4985696 rs16961340
rs56192520       0.004     0.003     0.009       0.004     0.009      0.004
rs3764410        0.004     0.003     0.007       0.004     0.007      0.004
rs145984817      0.004     0.003     0.011       0.004     0.011      0.004
rs1807401        0.004     0.003     0.011       0.004     0.011      0.004
rs1807402        0.004     0.003     0.011       0.004     0.011      0.004
rs35350506       0.004     0.003     0.011       0.004     0.011      0.004
            rs17794159 rs4247118 rs78572469 rs12601193 rs2349646 rs2090018
rs56192520       0.001     0.033      0.002      0.004     0.020     0.033
rs3764410        0.002     0.031      0.001      0.004     0.019     0.031
rs145984817      0.001     0.035      0.005      0.004     0.022     0.035
rs1807401        0.001     0.035      0.005      0.004     0.022     0.035
rs1807402        0.001     0.035      0.005      0.004     0.022     0.035
rs35350506       0.001     0.035      0.005      0.004     0.022     0.035
            rs12601424 rs4985701 rs8064550 rs2271521 rs2271520 rs11078374
rs56192520       0.004     0.033     0.033     0.004     0.033      0.014
rs3764410        0.004     0.031     0.031     0.004     0.031      0.012
rs145984817      0.004     0.035     0.035     0.004     0.035      0.016
rs1807401        0.004     0.035     0.035     0.004     0.035      0.016
rs1807402        0.004     0.035     0.035     0.004     0.035      0.016
rs35350506       0.004     0.035     0.035     0.004     0.035      0.016
            rs4985702 rs1124961 rs11652674 rs3924340 rs112450164 rs7208973
rs56192520      0.033     0.003      0.002     0.001       0.004     0.033
rs3764410       0.031     0.003      0.001     0.002       0.004     0.031
rs145984817     0.035     0.003      0.005     0.001       0.004     0.035
rs1807401       0.035     0.003      0.005     0.001       0.004     0.035
rs1807402       0.035     0.003      0.005     0.001       0.004     0.035
rs35350506      0.035     0.003      0.005     0.001       0.004     0.035
            rs9910857 rs78574480 rs8072184 rs12602196 rs6502563 rs3744135
rs56192520      0.006      0.004     0.014      0.004     0.033     0.004
rs3764410       0.005      0.004     0.012      0.004     0.031     0.004
rs145984817     0.002      0.004     0.016      0.004     0.035     0.004
rs1807401       0.002      0.004     0.016      0.004     0.035     0.004
rs1807402       0.002      0.004     0.016      0.004     0.035     0.004
rs35350506      0.002      0.004     0.016      0.004     0.035     0.004
            rs148779543 rs77689691 rs41319048 rs117340532 rs78647096 rs77712968
rs56192520            0      0.004      0.004       0.002      0.004      0.004
rs3764410             0      0.004      0.004       0.001      0.004      0.004
rs145984817           0      0.004      0.004       0.005      0.004      0.004
rs1807401             0      0.004      0.004       0.005      0.004      0.004
rs1807402             0      0.004      0.004       0.005      0.004      0.004
rs35350506            0      0.004      0.004       0.005      0.004      0.004
            rs16961396 rs80054920 rs7206981 rs4985740 rs3803762 rs77103270
rs56192520       0.004      0.004     0.033     0.023     0.004      0.002
rs3764410        0.004      0.004     0.031     0.021     0.004      0.001
rs145984817      0.004      0.004     0.035     0.025     0.004      0.005
rs1807401        0.004      0.004     0.035     0.025     0.004      0.005
rs1807402        0.004      0.004     0.035     0.025     0.004      0.005
rs35350506       0.004      0.004     0.035     0.025     0.004      0.005
            rs7207485 rs77342773 rs3826304 rs3744126 rs7210879 rs7211576
rs56192520      0.029      0.004     0.004     0.004     0.023     0.006
rs3764410       0.027      0.004     0.004     0.004     0.021     0.005
rs145984817     0.031      0.004     0.004     0.004     0.025     0.002
rs1807401       0.031      0.004     0.004     0.004     0.025     0.002
rs1807402       0.031      0.004     0.004     0.004     0.025     0.002
rs35350506      0.031      0.004     0.004     0.004     0.025     0.002
            rs117967362 rs75978745 rs6502564 rs9894565 rs36079048 rs8076621
rs56192520        0.004      0.004     0.007     0.017          0     0.004
rs3764410         0.004      0.004     0.005     0.015          0     0.004
rs145984817       0.004      0.004     0.009     0.019          0     0.004
rs1807401         0.004      0.004     0.009     0.019          0     0.004
rs1807402         0.004      0.004     0.009     0.019          0     0.004
rs35350506        0.004      0.004     0.009     0.019          0     0.004
            rs7218795 rs3803761 rs12602675 rs7208065 rs4985705 rs8080386
rs56192520      0.026     0.032          0     0.018     0.014     0.003
rs3764410       0.024     0.029          0     0.015     0.011     0.002
rs145984817     0.028     0.034          0     0.021     0.016     0.004
rs1807401       0.028     0.034          0     0.021     0.016     0.004
rs1807402       0.028     0.034          0     0.021     0.016     0.004
rs35350506      0.028     0.034          0     0.021     0.016     0.004
            rs8065832 rs2018781 rs1736221 rs1736220 rs1736217 rs1708620
rs56192520      0.008     0.039     0.003     0.003     0.021     0.009
rs3764410       0.006     0.037     0.002     0.002     0.019     0.007
rs145984817     0.010     0.042     0.004     0.004     0.024     0.011
rs1807401       0.010     0.042     0.004     0.004     0.024     0.011
rs1807402       0.010     0.042     0.004     0.004     0.024     0.011
rs35350506      0.010     0.042     0.004     0.004     0.024     0.011
            rs1708619 rs1736216 rs76319098 rs1736215 rs1736214 rs1708617
rs56192520      0.009     0.024      0.017     0.012     0.019     0.029
rs3764410       0.007     0.021      0.016     0.009     0.016     0.026
rs145984817     0.011     0.026      0.018     0.014     0.022     0.031
rs1807401       0.011     0.026      0.018     0.014     0.022     0.031
rs1807402       0.011     0.026      0.018     0.014     0.022     0.031
rs35350506      0.011     0.026      0.018     0.014     0.022     0.031
            rs12602831 rs12602871 rs1736213 rs1736212 rs76045368 rs34518797
rs56192520       0.000      0.000     0.015     0.029      0.001      0.001
rs3764410        0.001      0.001     0.013     0.026      0.001      0.001
rs145984817      0.000      0.000     0.018     0.031      0.000      0.000
rs1807401        0.000      0.000     0.018     0.031      0.000      0.000
rs1807402        0.000      0.000     0.018     0.031      0.000      0.000
rs35350506       0.000      0.000     0.018     0.031      0.000      0.000
            rs11078378 rs8079562 rs8065774 rs8066090 rs41337846 rs1736209
rs56192520       0.043     0.001     0.001     0.029      0.000     0.029
rs3764410        0.041     0.001     0.001     0.026      0.001     0.026
rs145984817      0.046     0.000     0.000     0.031      0.000     0.031
rs1807401        0.046     0.000     0.000     0.031      0.000     0.031
rs1807402        0.046     0.000     0.000     0.031      0.000     0.031
rs35350506       0.046     0.000     0.000     0.031      0.000     0.031
            rs1736208 rs12949822 rs76246042 rs12600635 rs55689224 rs1736207
rs56192520      0.043      0.043      0.000      0.000      0.000     0.015
rs3764410       0.041      0.041      0.001      0.001      0.001     0.013
rs145984817     0.046      0.046      0.000      0.000      0.000     0.018
rs1807401       0.046      0.046      0.000      0.000      0.000     0.018
rs1807402       0.046      0.046      0.000      0.000      0.000     0.018
rs35350506      0.046      0.046      0.000      0.000      0.000     0.018
            rs1708626 rs1736206 rs9896078 rs16961474 rs1708627 rs1736205
rs56192520      0.015     0.015     0.001      0.001     0.017     0.021
rs3764410       0.013     0.013     0.001      0.001     0.014     0.019
rs145984817     0.018     0.018     0.000      0.001     0.020     0.024
rs1807401       0.018     0.018     0.000      0.001     0.020     0.024
rs1807402       0.018     0.018     0.000      0.001     0.020     0.024
rs35350506      0.018     0.018     0.000      0.001     0.020     0.024
            rs1708628 rs7220577 rs2294155 rs1736204 rs1736203 rs1736202
rs56192520      0.011     0.011     0.000     0.021     0.014     0.014
rs3764410       0.009     0.009     0.000     0.019     0.011     0.011
rs145984817     0.013     0.013     0.001     0.024     0.016     0.016
rs1807401       0.013     0.013     0.001     0.024     0.016     0.016
rs1807402       0.013     0.013     0.001     0.024     0.016     0.016
rs35350506      0.013     0.013     0.001     0.024     0.016     0.016
            rs12937908 rs1736200 rs1708623 rs1708624 rs9894884 rs9901894
rs56192520       0.009     0.009     0.008     0.007     0.009     0.009
rs3764410        0.007     0.007     0.006     0.005     0.007     0.007
rs145984817      0.011     0.011     0.010     0.008     0.011     0.011
rs1807401        0.011     0.011     0.010     0.008     0.011     0.011
rs1807402        0.011     0.011     0.010     0.008     0.011     0.011
rs35350506       0.011     0.011     0.010     0.008     0.011     0.011
            rs9903294 rs2472689 rs1630656 rs111478970 rs3182911 rs7219012
rs56192520      0.008     0.011     0.007       0.007     0.008     0.000
rs3764410       0.006     0.009     0.005       0.005     0.006     0.000
rs145984817     0.010     0.013     0.008       0.008     0.010     0.001
rs1807401       0.010     0.013     0.008       0.008     0.010     0.001
rs1807402       0.010     0.013     0.008       0.008     0.010     0.001
rs35350506      0.010     0.013     0.008       0.008     0.010     0.001
            rs9890657 rs12453455 rs12947291 rs150267386 rs16961493 rs11652745
rs56192520      0.009      0.007      0.008       0.013      0.000      0.009
rs3764410       0.007      0.005      0.006       0.012      0.000      0.007
rs145984817     0.011      0.008      0.010       0.014      0.001      0.011
rs1807401       0.011      0.008      0.010       0.014      0.001      0.011
rs1807402       0.011      0.008      0.010       0.014      0.001      0.011
rs35350506      0.011      0.008      0.010       0.014      0.001      0.011
            rs9907107 rs8070574 rs4985759 rs3866959 rs7219248 rs6502568
rs56192520      0.009     0.009     0.009     0.011     0.009     0.011
rs3764410       0.007     0.007     0.007     0.009     0.007     0.009
rs145984817     0.011     0.011     0.011     0.013     0.011     0.013
rs1807401       0.011     0.011     0.011     0.013     0.011     0.013
rs1807402       0.011     0.011     0.011     0.013     0.011     0.013
rs35350506      0.011     0.011     0.011     0.013     0.011     0.013
            rs7220275 rs12450037 rs7225876 rs9892352 rs4985760 rs6502569
rs56192520      0.009      0.008     0.007     0.011     0.011     0.011
rs3764410       0.007      0.006     0.005     0.009     0.009     0.009
rs145984817     0.011      0.010     0.008     0.013     0.013     0.013
rs1807401       0.011      0.010     0.008     0.013     0.013     0.013
rs1807402       0.011      0.010     0.008     0.013     0.013     0.013
rs35350506      0.011      0.010     0.008     0.013     0.013     0.013
            rs1029830 rs2012954 rs1029832 rs2270180 rs8072402 rs7221553
rs56192520      0.009     0.011     0.008     0.000     0.009     0.011
rs3764410       0.007     0.009     0.006     0.000     0.007     0.009
rs145984817     0.011     0.013     0.010     0.001     0.011     0.013
rs1807401       0.011     0.013     0.010     0.001     0.011     0.013
rs1807402       0.011     0.013     0.010     0.001     0.011     0.013
rs35350506      0.011     0.013     0.010     0.001     0.011     0.013
            rs145597919 rs150772017 rs2041393 rs6502578 rs11078382 rs9912109
rs56192520        0.013       0.013     0.005     0.005          0     0.005
rs3764410         0.012       0.012     0.004     0.004          0     0.004
rs145984817       0.014       0.014     0.006     0.006          0     0.006
rs1807401         0.014       0.014     0.006     0.006          0     0.006
rs1807402         0.014       0.014     0.006     0.006          0     0.006
rs35350506        0.014       0.014     0.006     0.006          0     0.006
            rs12601631 rs11869054 rs11869079 rs9912599 rs7220057 rs9896970
rs56192520           0          0          0         0         0         0
rs3764410            0          0          0         0         0         0
rs145984817          0          0          0         0         0         0
rs1807401            0          0          0         0         0         0
rs1807402            0          0          0         0         0         0
rs35350506           0          0          0         0         0         0
            rs34121330 rs34668117 rs67773570 rs242252 rs955893 rs28583584
rs56192520       0.000      0.000          0    0.002    0.001      0.013
rs3764410        0.001      0.001          0    0.003    0.001      0.005
rs145984817      0.000      0.000          0    0.002    0.001      0.004
rs1807401        0.000      0.000          0    0.002    0.001      0.004
rs1807402        0.000      0.000          0    0.002    0.001      0.004
rs35350506       0.000      0.000          0    0.002    0.001      0.004
            rs9944423 rs7217764 rs11651957 rs73978990 rs8071007 rs56044345
rs56192520      0.013     0.011      0.011      0.011     0.011      0.011
rs3764410       0.005     0.004      0.004      0.004     0.004      0.004
rs145984817     0.004     0.003      0.003      0.003     0.003      0.003
rs1807401       0.004     0.003      0.003      0.003     0.003      0.003
rs1807402       0.004     0.003      0.003      0.003     0.003      0.003
rs35350506      0.004     0.003      0.003      0.003     0.003      0.003
            rs17804843
rs56192520           0
rs3764410            0
rs145984817          0
rs1807401            0
rs1807402            0
rs35350506           0


On Thu, Nov 14, 2019 at 2:59 PM Abby Spurdle <[hidden email]> wrote:
>
> That's assuming your data was returned by head().

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

Jim Lemon-4
In reply to this post by anikaM
Hi Ana,
Rather than addressing the question of why you want to do this, Let's
get make the question easier to answer:

calc.rho<-matrix(c(0.903,0.268,0.327,0.327,0.327,0.582,
0.928,0.276,0.336,0.336,0.336,0.598,
0.975,0.309,0.371,0.371,0.371,0.638,
0.975,0.309,0.371,0.371,0.371,0.638,
0.975,0.309,0.371,0.371,0.371,0.638,
0.975,0.309,0.371,0.371,0.371,0.638),ncol=6,byrow=TRUE)
rnames<-c("rs56192520","rs3764410","rs145984817","rs1807401",
"rs1807402","rs35350506")
rownames(calc.rho)<-rnames
cnames<-c("rs9900318","rs8069906","rs9908521","rs9908336",
"rs9908870","rs9895995")
colnames(calc.rho)<-cnames

Now if you  just want a vector of the values less than 0.8, it's trivial:

calc.rho[calc.rho<0.8]

However, based on your previous questions, I suspect you want
something else. Maybe the pairs of row/column names that correspond to
the values less than 0.8. To ensure that you haven't tricked us by not
including columns in which values range around 0.8, I'll do it this
way:

# make the new variable name possible to decode
calc.lt.8<-calc.rho<0.8
varnames.lt.8<-data.frame(var1=NA,var2=NA)
for(row in 1:nrow(calc.rho)) {
 for(col in 1:ncol(calc.rho))
  if(calc.lt.8[row,col])
   varnames.lt.8<-rbind(varnames.lt.8,c(rnames[row],cnames[col]))
}
# now get rid of the first row of NA values
varnames.lt.8<-varnames.lt.8[-1,]

Clunky, but effective. You now have those variable pairs that you may
want. Let us know in the next episode of this soap operation.

Jim

On Fri, Nov 15, 2019 at 5:50 AM Ana Marija <[hidden email]> wrote:

>
> Hello,
>
> I have a data frame like this (a matrix):
> head(calc.rho)
>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
>
> > dim(calc.rho)
> [1] 246 246
>
> I would like to remove from this data all highly correlated variables,
> with correlation more than 0.8
>
> I tried this:
>
> > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > dim(data)
> [1] 246   0
>
> Can you please advise,
>
> Thanks
> Ana
>
> But this removes everything.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

Jim Lemon-4
In reply to this post by anikaM
I thought you were going to trick us. What I think you are asking now
is how to get the variable names in the columns that have at most one
_absolute_ value greater than 0.8. OK:

# I'm not going to try to recreate your correlation matrix
calc.jim<-matrix(runif(100,min=-1,max=1),nrow=10)
for(i in 1:10) calc.jim[i,i]<-1
rownames(calc.jim)<-<-colnames(calc.jim)<-paste0("rs",1:10)

Now that we have a plausible fake correlation matrix, all we have to
do is extract the column names:

colnames(calc.jim)[colSums(abs(calc.jim)>0.8)<2]

Of course, what you really meant could have been, "I want the column
names of the variables with at most one absolute value greater than
0.8 ignoring the diagonal values because I don't care about those". If
so:

colnames(calc.jim)[colSums(abs(calc.jim)>0.8)<3]

Any more tricks?

Jim

On Fri, Nov 15, 2019 at 8:17 AM Ana Marija <[hidden email]> wrote:

>
> what would be the approach to remove variable that has at least 2
> correlation coefficients >0.8?
> this is the whole output of the head()
>
> > head(calc.rho)
>             rs56192520 rs3764410 rs145984817 rs1807401 rs1807402 rs35350506
> rs56192520       1.000     0.976       0.927     0.927     0.927      0.927
> rs3764410        0.976     1.000       0.952     0.952     0.952      0.952
> rs145984817      0.927     0.952       1.000     1.000     1.000      1.000
> rs1807401        0.927     0.952       1.000     1.000     1.000      1.000
> rs1807402        0.927     0.952       1.000     1.000     1.000      1.000
> rs35350506       0.927     0.952       1.000     1.000     1.000      1.000
>             rs2089177 rs12325677 rs62064624 rs62064631 rs2349295 rs2174369
> rs56192520      0.927      0.927      0.927      0.927     0.709     0.903
> rs3764410       0.952      0.952      0.952      0.952     0.728     0.928
> rs145984817     1.000      1.000      1.000      1.000     0.771     0.975
> rs1807401       1.000      1.000      1.000      1.000     0.771     0.975
> rs1807402       1.000      1.000      1.000      1.000     0.771     0.975
> rs35350506      1.000      1.000      1.000      1.000     0.771     0.975
>             rs7218554 rs62064634 rs4360974 rs4527060 rs6502526 rs6502527
> rs56192520      0.903      0.903     0.903     0.903     0.903     0.903
> rs3764410       0.928      0.928     0.928     0.928     0.928     0.928
> rs145984817     0.975      0.975     0.975     0.975     0.975     0.975
> rs1807401       0.975      0.975     0.975     0.975     0.975     0.975
> rs1807402       0.975      0.975     0.975     0.975     0.975     0.975
> rs35350506      0.975      0.975     0.975     0.975     0.975     0.975
>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
>             rs7211086 rs9905280 rs8073305 rs8072086 rs4312350 rs4313843
> rs56192520      0.880     0.268     0.327     0.880     0.880     0.880
> rs3764410       0.905     0.276     0.336     0.905     0.905     0.905
> rs145984817     0.951     0.309     0.371     0.951     0.951     0.951
> rs1807401       0.951     0.309     0.371     0.951     0.951     0.951
> rs1807402       0.951     0.309     0.371     0.951     0.951     0.951
> rs35350506      0.951     0.309     0.371     0.951     0.951     0.951
>             rs8069610 rs883504 rs8072394 rs4280293 rs4465638 rs12602378
> rs56192520      0.582    0.903     0.582     0.582     0.811      0.302
> rs3764410       0.598    0.928     0.598     0.598     0.836      0.311
> rs145984817     0.638    0.975     0.638     0.638     0.879      0.344
> rs1807401       0.638    0.975     0.638     0.638     0.879      0.344
> rs1807402       0.638    0.975     0.638     0.638     0.879      0.344
> rs35350506      0.638    0.975     0.638     0.638     0.879      0.344
>             rs9899059 rs6502530 rs4380085 rs6502532 rs4792798 rs4792799
> rs56192520      0.302     0.309     0.834     0.251     0.063     0.063
> rs3764410       0.311     0.318     0.858     0.259     0.080     0.080
> rs145984817     0.344     0.352     0.902     0.291     0.086     0.086
> rs1807401       0.344     0.352     0.902     0.291     0.086     0.086
> rs1807402       0.344     0.352     0.902     0.291     0.086     0.086
> rs35350506      0.344     0.352     0.902     0.291     0.086     0.086
>             rs4316813 rs148563931 rs74751226 rs8068857 rs8069441 rs77397878
> rs56192520      0.006       0.006      0.006     0.006     0.006      0.006
> rs3764410       0.006       0.006      0.006     0.006     0.006      0.006
> rs145984817     0.006       0.006      0.006     0.006     0.006      0.006
> rs1807401       0.006       0.006      0.006     0.006     0.006      0.006
> rs1807402       0.006       0.006      0.006     0.006     0.006      0.006
> rs35350506      0.006       0.006      0.006     0.006     0.006      0.006
>             rs75339756 rs4608391 rs79569548 rs4275914 rs11870422 rs8075751
> rs56192520       0.006     0.006      0.006     0.044      0.007     0.004
> rs3764410        0.006     0.006      0.006     0.042      0.005     0.005
> rs145984817      0.006     0.006      0.006     0.047      0.002     0.015
> rs1807401        0.006     0.006      0.006     0.047      0.002     0.015
> rs1807402        0.006     0.006      0.006     0.047      0.002     0.015
> rs35350506       0.006     0.006      0.006     0.047      0.002     0.015
>             rs11658904 rs138437542 rs80344434 rs7222311 rs7221842 rs7223686
> rs56192520       0.003       0.004      0.004     0.033     0.009     0.000
> rs3764410        0.004       0.004      0.004     0.031     0.007     0.000
> rs145984817      0.010       0.004      0.004     0.035     0.011     0.005
> rs1807401        0.010       0.004      0.004     0.035     0.011     0.005
> rs1807402        0.010       0.004      0.004     0.035     0.011     0.005
> rs35350506       0.010       0.004      0.004     0.035     0.011     0.005
>             rs78013597 rs74965036 rs78063986 rs118106233 rs117345712
> rs56192520       0.004      0.004      0.004       0.004       0.005
> rs3764410        0.004      0.004      0.004       0.004       0.006
> rs145984817      0.004      0.004      0.004       0.004       0.005
> rs1807401        0.004      0.004      0.004       0.004       0.005
> rs1807402        0.004      0.004      0.004       0.004       0.005
> rs35350506       0.004      0.004      0.004       0.004       0.005
>             rs113004656 rs9898995 rs4985718 rs9893911 rs79110942 rs7208929
> rs56192520        0.004     0.033     0.033     0.023      0.004     0.023
> rs3764410         0.004     0.031     0.031     0.021      0.004     0.021
> rs145984817       0.004     0.035     0.035     0.025      0.004     0.025
> rs1807401         0.004     0.035     0.035     0.025      0.004     0.025
> rs1807402         0.004     0.035     0.035     0.025      0.004     0.025
> rs35350506        0.004     0.035     0.035     0.025      0.004     0.025
>             rs12601453 rs4078062 rs75129280 rs76664572 rs78961289 rs146364798
> rs56192520       0.004     0.001      0.004      0.004      0.004       0.004
> rs3764410        0.004     0.002      0.004      0.004      0.004       0.004
> rs145984817      0.004     0.001      0.004      0.004      0.004       0.004
> rs1807401        0.004     0.001      0.004      0.004      0.004       0.004
> rs1807402        0.004     0.001      0.004      0.004      0.004       0.004
> rs35350506       0.004     0.001      0.004      0.004      0.004       0.004
>             rs76715413 rs4078534 rs79457460 rs74369938 rs76423171 rs74668400
> rs56192520           0     0.004      0.004      0.002      0.004      0.004
> rs3764410            0     0.004      0.004      0.001      0.004      0.004
> rs145984817          0     0.004      0.004      0.005      0.004      0.004
> rs1807401            0     0.004      0.004      0.005      0.004      0.004
> rs1807402            0     0.004      0.004      0.005      0.004      0.004
> rs35350506           0     0.004      0.004      0.005      0.004      0.004
>             rs75146120 rs1135237 rs9914671 rs117759512 rs4985696 rs16961340
> rs56192520       0.004     0.003     0.009       0.004     0.009      0.004
> rs3764410        0.004     0.003     0.007       0.004     0.007      0.004
> rs145984817      0.004     0.003     0.011       0.004     0.011      0.004
> rs1807401        0.004     0.003     0.011       0.004     0.011      0.004
> rs1807402        0.004     0.003     0.011       0.004     0.011      0.004
> rs35350506       0.004     0.003     0.011       0.004     0.011      0.004
>             rs17794159 rs4247118 rs78572469 rs12601193 rs2349646 rs2090018
> rs56192520       0.001     0.033      0.002      0.004     0.020     0.033
> rs3764410        0.002     0.031      0.001      0.004     0.019     0.031
> rs145984817      0.001     0.035      0.005      0.004     0.022     0.035
> rs1807401        0.001     0.035      0.005      0.004     0.022     0.035
> rs1807402        0.001     0.035      0.005      0.004     0.022     0.035
> rs35350506       0.001     0.035      0.005      0.004     0.022     0.035
>             rs12601424 rs4985701 rs8064550 rs2271521 rs2271520 rs11078374
> rs56192520       0.004     0.033     0.033     0.004     0.033      0.014
> rs3764410        0.004     0.031     0.031     0.004     0.031      0.012
> rs145984817      0.004     0.035     0.035     0.004     0.035      0.016
> rs1807401        0.004     0.035     0.035     0.004     0.035      0.016
> rs1807402        0.004     0.035     0.035     0.004     0.035      0.016
> rs35350506       0.004     0.035     0.035     0.004     0.035      0.016
>             rs4985702 rs1124961 rs11652674 rs3924340 rs112450164 rs7208973
> rs56192520      0.033     0.003      0.002     0.001       0.004     0.033
> rs3764410       0.031     0.003      0.001     0.002       0.004     0.031
> rs145984817     0.035     0.003      0.005     0.001       0.004     0.035
> rs1807401       0.035     0.003      0.005     0.001       0.004     0.035
> rs1807402       0.035     0.003      0.005     0.001       0.004     0.035
> rs35350506      0.035     0.003      0.005     0.001       0.004     0.035
>             rs9910857 rs78574480 rs8072184 rs12602196 rs6502563 rs3744135
> rs56192520      0.006      0.004     0.014      0.004     0.033     0.004
> rs3764410       0.005      0.004     0.012      0.004     0.031     0.004
> rs145984817     0.002      0.004     0.016      0.004     0.035     0.004
> rs1807401       0.002      0.004     0.016      0.004     0.035     0.004
> rs1807402       0.002      0.004     0.016      0.004     0.035     0.004
> rs35350506      0.002      0.004     0.016      0.004     0.035     0.004
>             rs148779543 rs77689691 rs41319048 rs117340532 rs78647096 rs77712968
> rs56192520            0      0.004      0.004       0.002      0.004      0.004
> rs3764410             0      0.004      0.004       0.001      0.004      0.004
> rs145984817           0      0.004      0.004       0.005      0.004      0.004
> rs1807401             0      0.004      0.004       0.005      0.004      0.004
> rs1807402             0      0.004      0.004       0.005      0.004      0.004
> rs35350506            0      0.004      0.004       0.005      0.004      0.004
>             rs16961396 rs80054920 rs7206981 rs4985740 rs3803762 rs77103270
> rs56192520       0.004      0.004     0.033     0.023     0.004      0.002
> rs3764410        0.004      0.004     0.031     0.021     0.004      0.001
> rs145984817      0.004      0.004     0.035     0.025     0.004      0.005
> rs1807401        0.004      0.004     0.035     0.025     0.004      0.005
> rs1807402        0.004      0.004     0.035     0.025     0.004      0.005
> rs35350506       0.004      0.004     0.035     0.025     0.004      0.005
>             rs7207485 rs77342773 rs3826304 rs3744126 rs7210879 rs7211576
> rs56192520      0.029      0.004     0.004     0.004     0.023     0.006
> rs3764410       0.027      0.004     0.004     0.004     0.021     0.005
> rs145984817     0.031      0.004     0.004     0.004     0.025     0.002
> rs1807401       0.031      0.004     0.004     0.004     0.025     0.002
> rs1807402       0.031      0.004     0.004     0.004     0.025     0.002
> rs35350506      0.031      0.004     0.004     0.004     0.025     0.002
>             rs117967362 rs75978745 rs6502564 rs9894565 rs36079048 rs8076621
> rs56192520        0.004      0.004     0.007     0.017          0     0.004
> rs3764410         0.004      0.004     0.005     0.015          0     0.004
> rs145984817       0.004      0.004     0.009     0.019          0     0.004
> rs1807401         0.004      0.004     0.009     0.019          0     0.004
> rs1807402         0.004      0.004     0.009     0.019          0     0.004
> rs35350506        0.004      0.004     0.009     0.019          0     0.004
>             rs7218795 rs3803761 rs12602675 rs7208065 rs4985705 rs8080386
> rs56192520      0.026     0.032          0     0.018     0.014     0.003
> rs3764410       0.024     0.029          0     0.015     0.011     0.002
> rs145984817     0.028     0.034          0     0.021     0.016     0.004
> rs1807401       0.028     0.034          0     0.021     0.016     0.004
> rs1807402       0.028     0.034          0     0.021     0.016     0.004
> rs35350506      0.028     0.034          0     0.021     0.016     0.004
>             rs8065832 rs2018781 rs1736221 rs1736220 rs1736217 rs1708620
> rs56192520      0.008     0.039     0.003     0.003     0.021     0.009
> rs3764410       0.006     0.037     0.002     0.002     0.019     0.007
> rs145984817     0.010     0.042     0.004     0.004     0.024     0.011
> rs1807401       0.010     0.042     0.004     0.004     0.024     0.011
> rs1807402       0.010     0.042     0.004     0.004     0.024     0.011
> rs35350506      0.010     0.042     0.004     0.004     0.024     0.011
>             rs1708619 rs1736216 rs76319098 rs1736215 rs1736214 rs1708617
> rs56192520      0.009     0.024      0.017     0.012     0.019     0.029
> rs3764410       0.007     0.021      0.016     0.009     0.016     0.026
> rs145984817     0.011     0.026      0.018     0.014     0.022     0.031
> rs1807401       0.011     0.026      0.018     0.014     0.022     0.031
> rs1807402       0.011     0.026      0.018     0.014     0.022     0.031
> rs35350506      0.011     0.026      0.018     0.014     0.022     0.031
>             rs12602831 rs12602871 rs1736213 rs1736212 rs76045368 rs34518797
> rs56192520       0.000      0.000     0.015     0.029      0.001      0.001
> rs3764410        0.001      0.001     0.013     0.026      0.001      0.001
> rs145984817      0.000      0.000     0.018     0.031      0.000      0.000
> rs1807401        0.000      0.000     0.018     0.031      0.000      0.000
> rs1807402        0.000      0.000     0.018     0.031      0.000      0.000
> rs35350506       0.000      0.000     0.018     0.031      0.000      0.000
>             rs11078378 rs8079562 rs8065774 rs8066090 rs41337846 rs1736209
> rs56192520       0.043     0.001     0.001     0.029      0.000     0.029
> rs3764410        0.041     0.001     0.001     0.026      0.001     0.026
> rs145984817      0.046     0.000     0.000     0.031      0.000     0.031
> rs1807401        0.046     0.000     0.000     0.031      0.000     0.031
> rs1807402        0.046     0.000     0.000     0.031      0.000     0.031
> rs35350506       0.046     0.000     0.000     0.031      0.000     0.031
>             rs1736208 rs12949822 rs76246042 rs12600635 rs55689224 rs1736207
> rs56192520      0.043      0.043      0.000      0.000      0.000     0.015
> rs3764410       0.041      0.041      0.001      0.001      0.001     0.013
> rs145984817     0.046      0.046      0.000      0.000      0.000     0.018
> rs1807401       0.046      0.046      0.000      0.000      0.000     0.018
> rs1807402       0.046      0.046      0.000      0.000      0.000     0.018
> rs35350506      0.046      0.046      0.000      0.000      0.000     0.018
>             rs1708626 rs1736206 rs9896078 rs16961474 rs1708627 rs1736205
> rs56192520      0.015     0.015     0.001      0.001     0.017     0.021
> rs3764410       0.013     0.013     0.001      0.001     0.014     0.019
> rs145984817     0.018     0.018     0.000      0.001     0.020     0.024
> rs1807401       0.018     0.018     0.000      0.001     0.020     0.024
> rs1807402       0.018     0.018     0.000      0.001     0.020     0.024
> rs35350506      0.018     0.018     0.000      0.001     0.020     0.024
>             rs1708628 rs7220577 rs2294155 rs1736204 rs1736203 rs1736202
> rs56192520      0.011     0.011     0.000     0.021     0.014     0.014
> rs3764410       0.009     0.009     0.000     0.019     0.011     0.011
> rs145984817     0.013     0.013     0.001     0.024     0.016     0.016
> rs1807401       0.013     0.013     0.001     0.024     0.016     0.016
> rs1807402       0.013     0.013     0.001     0.024     0.016     0.016
> rs35350506      0.013     0.013     0.001     0.024     0.016     0.016
>             rs12937908 rs1736200 rs1708623 rs1708624 rs9894884 rs9901894
> rs56192520       0.009     0.009     0.008     0.007     0.009     0.009
> rs3764410        0.007     0.007     0.006     0.005     0.007     0.007
> rs145984817      0.011     0.011     0.010     0.008     0.011     0.011
> rs1807401        0.011     0.011     0.010     0.008     0.011     0.011
> rs1807402        0.011     0.011     0.010     0.008     0.011     0.011
> rs35350506       0.011     0.011     0.010     0.008     0.011     0.011
>             rs9903294 rs2472689 rs1630656 rs111478970 rs3182911 rs7219012
> rs56192520      0.008     0.011     0.007       0.007     0.008     0.000
> rs3764410       0.006     0.009     0.005       0.005     0.006     0.000
> rs145984817     0.010     0.013     0.008       0.008     0.010     0.001
> rs1807401       0.010     0.013     0.008       0.008     0.010     0.001
> rs1807402       0.010     0.013     0.008       0.008     0.010     0.001
> rs35350506      0.010     0.013     0.008       0.008     0.010     0.001
>             rs9890657 rs12453455 rs12947291 rs150267386 rs16961493 rs11652745
> rs56192520      0.009      0.007      0.008       0.013      0.000      0.009
> rs3764410       0.007      0.005      0.006       0.012      0.000      0.007
> rs145984817     0.011      0.008      0.010       0.014      0.001      0.011
> rs1807401       0.011      0.008      0.010       0.014      0.001      0.011
> rs1807402       0.011      0.008      0.010       0.014      0.001      0.011
> rs35350506      0.011      0.008      0.010       0.014      0.001      0.011
>             rs9907107 rs8070574 rs4985759 rs3866959 rs7219248 rs6502568
> rs56192520      0.009     0.009     0.009     0.011     0.009     0.011
> rs3764410       0.007     0.007     0.007     0.009     0.007     0.009
> rs145984817     0.011     0.011     0.011     0.013     0.011     0.013
> rs1807401       0.011     0.011     0.011     0.013     0.011     0.013
> rs1807402       0.011     0.011     0.011     0.013     0.011     0.013
> rs35350506      0.011     0.011     0.011     0.013     0.011     0.013
>             rs7220275 rs12450037 rs7225876 rs9892352 rs4985760 rs6502569
> rs56192520      0.009      0.008     0.007     0.011     0.011     0.011
> rs3764410       0.007      0.006     0.005     0.009     0.009     0.009
> rs145984817     0.011      0.010     0.008     0.013     0.013     0.013
> rs1807401       0.011      0.010     0.008     0.013     0.013     0.013
> rs1807402       0.011      0.010     0.008     0.013     0.013     0.013
> rs35350506      0.011      0.010     0.008     0.013     0.013     0.013
>             rs1029830 rs2012954 rs1029832 rs2270180 rs8072402 rs7221553
> rs56192520      0.009     0.011     0.008     0.000     0.009     0.011
> rs3764410       0.007     0.009     0.006     0.000     0.007     0.009
> rs145984817     0.011     0.013     0.010     0.001     0.011     0.013
> rs1807401       0.011     0.013     0.010     0.001     0.011     0.013
> rs1807402       0.011     0.013     0.010     0.001     0.011     0.013
> rs35350506      0.011     0.013     0.010     0.001     0.011     0.013
>             rs145597919 rs150772017 rs2041393 rs6502578 rs11078382 rs9912109
> rs56192520        0.013       0.013     0.005     0.005          0     0.005
> rs3764410         0.012       0.012     0.004     0.004          0     0.004
> rs145984817       0.014       0.014     0.006     0.006          0     0.006
> rs1807401         0.014       0.014     0.006     0.006          0     0.006
> rs1807402         0.014       0.014     0.006     0.006          0     0.006
> rs35350506        0.014       0.014     0.006     0.006          0     0.006
>             rs12601631 rs11869054 rs11869079 rs9912599 rs7220057 rs9896970
> rs56192520           0          0          0         0         0         0
> rs3764410            0          0          0         0         0         0
> rs145984817          0          0          0         0         0         0
> rs1807401            0          0          0         0         0         0
> rs1807402            0          0          0         0         0         0
> rs35350506           0          0          0         0         0         0
>             rs34121330 rs34668117 rs67773570 rs242252 rs955893 rs28583584
> rs56192520       0.000      0.000          0    0.002    0.001      0.013
> rs3764410        0.001      0.001          0    0.003    0.001      0.005
> rs145984817      0.000      0.000          0    0.002    0.001      0.004
> rs1807401        0.000      0.000          0    0.002    0.001      0.004
> rs1807402        0.000      0.000          0    0.002    0.001      0.004
> rs35350506       0.000      0.000          0    0.002    0.001      0.004
>             rs9944423 rs7217764 rs11651957 rs73978990 rs8071007 rs56044345
> rs56192520      0.013     0.011      0.011      0.011     0.011      0.011
> rs3764410       0.005     0.004      0.004      0.004     0.004      0.004
> rs145984817     0.004     0.003      0.003      0.003     0.003      0.003
> rs1807401       0.004     0.003      0.003      0.003     0.003      0.003
> rs1807402       0.004     0.003      0.003      0.003     0.003      0.003
> rs35350506      0.004     0.003      0.003      0.003     0.003      0.003
>             rs17804843
> rs56192520           0
> rs3764410            0
> rs145984817          0
> rs1807401            0
> rs1807402            0
> rs35350506           0
>
>
> On Thu, Nov 14, 2019 at 2:59 PM Abby Spurdle <[hidden email]> wrote:
> >
> > That's assuming your data was returned by head().
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

anikaM
HI Jim,

This:
colnames(calc.jim)[colSums(abs(calc.jim)>0.8)<3]

was the master take!

Thank you so much!!!

On Thu, Nov 14, 2019 at 3:39 PM Jim Lemon <[hidden email]> wrote:

>
> I thought you were going to trick us. What I think you are asking now
> is how to get the variable names in the columns that have at most one
> _absolute_ value greater than 0.8. OK:
>
> # I'm not going to try to recreate your correlation matrix
> calc.jim<-matrix(runif(100,min=-1,max=1),nrow=10)
> for(i in 1:10) calc.jim[i,i]<-1
> rownames(calc.jim)<-<-colnames(calc.jim)<-paste0("rs",1:10)
>
> Now that we have a plausible fake correlation matrix, all we have to
> do is extract the column names:
>
> colnames(calc.jim)[colSums(abs(calc.jim)>0.8)<2]
>
> Of course, what you really meant could have been, "I want the column
> names of the variables with at most one absolute value greater than
> 0.8 ignoring the diagonal values because I don't care about those". If
> so:
>
> colnames(calc.jim)[colSums(abs(calc.jim)>0.8)<3]
>
> Any more tricks?
>
> Jim
>
> On Fri, Nov 15, 2019 at 8:17 AM Ana Marija <[hidden email]> wrote:
> >
> > what would be the approach to remove variable that has at least 2
> > correlation coefficients >0.8?
> > this is the whole output of the head()
> >
> > > head(calc.rho)
> >             rs56192520 rs3764410 rs145984817 rs1807401 rs1807402 rs35350506
> > rs56192520       1.000     0.976       0.927     0.927     0.927      0.927
> > rs3764410        0.976     1.000       0.952     0.952     0.952      0.952
> > rs145984817      0.927     0.952       1.000     1.000     1.000      1.000
> > rs1807401        0.927     0.952       1.000     1.000     1.000      1.000
> > rs1807402        0.927     0.952       1.000     1.000     1.000      1.000
> > rs35350506       0.927     0.952       1.000     1.000     1.000      1.000
> >             rs2089177 rs12325677 rs62064624 rs62064631 rs2349295 rs2174369
> > rs56192520      0.927      0.927      0.927      0.927     0.709     0.903
> > rs3764410       0.952      0.952      0.952      0.952     0.728     0.928
> > rs145984817     1.000      1.000      1.000      1.000     0.771     0.975
> > rs1807401       1.000      1.000      1.000      1.000     0.771     0.975
> > rs1807402       1.000      1.000      1.000      1.000     0.771     0.975
> > rs35350506      1.000      1.000      1.000      1.000     0.771     0.975
> >             rs7218554 rs62064634 rs4360974 rs4527060 rs6502526 rs6502527
> > rs56192520      0.903      0.903     0.903     0.903     0.903     0.903
> > rs3764410       0.928      0.928     0.928     0.928     0.928     0.928
> > rs145984817     0.975      0.975     0.975     0.975     0.975     0.975
> > rs1807401       0.975      0.975     0.975     0.975     0.975     0.975
> > rs1807402       0.975      0.975     0.975     0.975     0.975     0.975
> > rs35350506      0.975      0.975     0.975     0.975     0.975     0.975
> >             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> > rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> > rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> > rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> > rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> > rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> > rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> >             rs7211086 rs9905280 rs8073305 rs8072086 rs4312350 rs4313843
> > rs56192520      0.880     0.268     0.327     0.880     0.880     0.880
> > rs3764410       0.905     0.276     0.336     0.905     0.905     0.905
> > rs145984817     0.951     0.309     0.371     0.951     0.951     0.951
> > rs1807401       0.951     0.309     0.371     0.951     0.951     0.951
> > rs1807402       0.951     0.309     0.371     0.951     0.951     0.951
> > rs35350506      0.951     0.309     0.371     0.951     0.951     0.951
> >             rs8069610 rs883504 rs8072394 rs4280293 rs4465638 rs12602378
> > rs56192520      0.582    0.903     0.582     0.582     0.811      0.302
> > rs3764410       0.598    0.928     0.598     0.598     0.836      0.311
> > rs145984817     0.638    0.975     0.638     0.638     0.879      0.344
> > rs1807401       0.638    0.975     0.638     0.638     0.879      0.344
> > rs1807402       0.638    0.975     0.638     0.638     0.879      0.344
> > rs35350506      0.638    0.975     0.638     0.638     0.879      0.344
> >             rs9899059 rs6502530 rs4380085 rs6502532 rs4792798 rs4792799
> > rs56192520      0.302     0.309     0.834     0.251     0.063     0.063
> > rs3764410       0.311     0.318     0.858     0.259     0.080     0.080
> > rs145984817     0.344     0.352     0.902     0.291     0.086     0.086
> > rs1807401       0.344     0.352     0.902     0.291     0.086     0.086
> > rs1807402       0.344     0.352     0.902     0.291     0.086     0.086
> > rs35350506      0.344     0.352     0.902     0.291     0.086     0.086
> >             rs4316813 rs148563931 rs74751226 rs8068857 rs8069441 rs77397878
> > rs56192520      0.006       0.006      0.006     0.006     0.006      0.006
> > rs3764410       0.006       0.006      0.006     0.006     0.006      0.006
> > rs145984817     0.006       0.006      0.006     0.006     0.006      0.006
> > rs1807401       0.006       0.006      0.006     0.006     0.006      0.006
> > rs1807402       0.006       0.006      0.006     0.006     0.006      0.006
> > rs35350506      0.006       0.006      0.006     0.006     0.006      0.006
> >             rs75339756 rs4608391 rs79569548 rs4275914 rs11870422 rs8075751
> > rs56192520       0.006     0.006      0.006     0.044      0.007     0.004
> > rs3764410        0.006     0.006      0.006     0.042      0.005     0.005
> > rs145984817      0.006     0.006      0.006     0.047      0.002     0.015
> > rs1807401        0.006     0.006      0.006     0.047      0.002     0.015
> > rs1807402        0.006     0.006      0.006     0.047      0.002     0.015
> > rs35350506       0.006     0.006      0.006     0.047      0.002     0.015
> >             rs11658904 rs138437542 rs80344434 rs7222311 rs7221842 rs7223686
> > rs56192520       0.003       0.004      0.004     0.033     0.009     0.000
> > rs3764410        0.004       0.004      0.004     0.031     0.007     0.000
> > rs145984817      0.010       0.004      0.004     0.035     0.011     0.005
> > rs1807401        0.010       0.004      0.004     0.035     0.011     0.005
> > rs1807402        0.010       0.004      0.004     0.035     0.011     0.005
> > rs35350506       0.010       0.004      0.004     0.035     0.011     0.005
> >             rs78013597 rs74965036 rs78063986 rs118106233 rs117345712
> > rs56192520       0.004      0.004      0.004       0.004       0.005
> > rs3764410        0.004      0.004      0.004       0.004       0.006
> > rs145984817      0.004      0.004      0.004       0.004       0.005
> > rs1807401        0.004      0.004      0.004       0.004       0.005
> > rs1807402        0.004      0.004      0.004       0.004       0.005
> > rs35350506       0.004      0.004      0.004       0.004       0.005
> >             rs113004656 rs9898995 rs4985718 rs9893911 rs79110942 rs7208929
> > rs56192520        0.004     0.033     0.033     0.023      0.004     0.023
> > rs3764410         0.004     0.031     0.031     0.021      0.004     0.021
> > rs145984817       0.004     0.035     0.035     0.025      0.004     0.025
> > rs1807401         0.004     0.035     0.035     0.025      0.004     0.025
> > rs1807402         0.004     0.035     0.035     0.025      0.004     0.025
> > rs35350506        0.004     0.035     0.035     0.025      0.004     0.025
> >             rs12601453 rs4078062 rs75129280 rs76664572 rs78961289 rs146364798
> > rs56192520       0.004     0.001      0.004      0.004      0.004       0.004
> > rs3764410        0.004     0.002      0.004      0.004      0.004       0.004
> > rs145984817      0.004     0.001      0.004      0.004      0.004       0.004
> > rs1807401        0.004     0.001      0.004      0.004      0.004       0.004
> > rs1807402        0.004     0.001      0.004      0.004      0.004       0.004
> > rs35350506       0.004     0.001      0.004      0.004      0.004       0.004
> >             rs76715413 rs4078534 rs79457460 rs74369938 rs76423171 rs74668400
> > rs56192520           0     0.004      0.004      0.002      0.004      0.004
> > rs3764410            0     0.004      0.004      0.001      0.004      0.004
> > rs145984817          0     0.004      0.004      0.005      0.004      0.004
> > rs1807401            0     0.004      0.004      0.005      0.004      0.004
> > rs1807402            0     0.004      0.004      0.005      0.004      0.004
> > rs35350506           0     0.004      0.004      0.005      0.004      0.004
> >             rs75146120 rs1135237 rs9914671 rs117759512 rs4985696 rs16961340
> > rs56192520       0.004     0.003     0.009       0.004     0.009      0.004
> > rs3764410        0.004     0.003     0.007       0.004     0.007      0.004
> > rs145984817      0.004     0.003     0.011       0.004     0.011      0.004
> > rs1807401        0.004     0.003     0.011       0.004     0.011      0.004
> > rs1807402        0.004     0.003     0.011       0.004     0.011      0.004
> > rs35350506       0.004     0.003     0.011       0.004     0.011      0.004
> >             rs17794159 rs4247118 rs78572469 rs12601193 rs2349646 rs2090018
> > rs56192520       0.001     0.033      0.002      0.004     0.020     0.033
> > rs3764410        0.002     0.031      0.001      0.004     0.019     0.031
> > rs145984817      0.001     0.035      0.005      0.004     0.022     0.035
> > rs1807401        0.001     0.035      0.005      0.004     0.022     0.035
> > rs1807402        0.001     0.035      0.005      0.004     0.022     0.035
> > rs35350506       0.001     0.035      0.005      0.004     0.022     0.035
> >             rs12601424 rs4985701 rs8064550 rs2271521 rs2271520 rs11078374
> > rs56192520       0.004     0.033     0.033     0.004     0.033      0.014
> > rs3764410        0.004     0.031     0.031     0.004     0.031      0.012
> > rs145984817      0.004     0.035     0.035     0.004     0.035      0.016
> > rs1807401        0.004     0.035     0.035     0.004     0.035      0.016
> > rs1807402        0.004     0.035     0.035     0.004     0.035      0.016
> > rs35350506       0.004     0.035     0.035     0.004     0.035      0.016
> >             rs4985702 rs1124961 rs11652674 rs3924340 rs112450164 rs7208973
> > rs56192520      0.033     0.003      0.002     0.001       0.004     0.033
> > rs3764410       0.031     0.003      0.001     0.002       0.004     0.031
> > rs145984817     0.035     0.003      0.005     0.001       0.004     0.035
> > rs1807401       0.035     0.003      0.005     0.001       0.004     0.035
> > rs1807402       0.035     0.003      0.005     0.001       0.004     0.035
> > rs35350506      0.035     0.003      0.005     0.001       0.004     0.035
> >             rs9910857 rs78574480 rs8072184 rs12602196 rs6502563 rs3744135
> > rs56192520      0.006      0.004     0.014      0.004     0.033     0.004
> > rs3764410       0.005      0.004     0.012      0.004     0.031     0.004
> > rs145984817     0.002      0.004     0.016      0.004     0.035     0.004
> > rs1807401       0.002      0.004     0.016      0.004     0.035     0.004
> > rs1807402       0.002      0.004     0.016      0.004     0.035     0.004
> > rs35350506      0.002      0.004     0.016      0.004     0.035     0.004
> >             rs148779543 rs77689691 rs41319048 rs117340532 rs78647096 rs77712968
> > rs56192520            0      0.004      0.004       0.002      0.004      0.004
> > rs3764410             0      0.004      0.004       0.001      0.004      0.004
> > rs145984817           0      0.004      0.004       0.005      0.004      0.004
> > rs1807401             0      0.004      0.004       0.005      0.004      0.004
> > rs1807402             0      0.004      0.004       0.005      0.004      0.004
> > rs35350506            0      0.004      0.004       0.005      0.004      0.004
> >             rs16961396 rs80054920 rs7206981 rs4985740 rs3803762 rs77103270
> > rs56192520       0.004      0.004     0.033     0.023     0.004      0.002
> > rs3764410        0.004      0.004     0.031     0.021     0.004      0.001
> > rs145984817      0.004      0.004     0.035     0.025     0.004      0.005
> > rs1807401        0.004      0.004     0.035     0.025     0.004      0.005
> > rs1807402        0.004      0.004     0.035     0.025     0.004      0.005
> > rs35350506       0.004      0.004     0.035     0.025     0.004      0.005
> >             rs7207485 rs77342773 rs3826304 rs3744126 rs7210879 rs7211576
> > rs56192520      0.029      0.004     0.004     0.004     0.023     0.006
> > rs3764410       0.027      0.004     0.004     0.004     0.021     0.005
> > rs145984817     0.031      0.004     0.004     0.004     0.025     0.002
> > rs1807401       0.031      0.004     0.004     0.004     0.025     0.002
> > rs1807402       0.031      0.004     0.004     0.004     0.025     0.002
> > rs35350506      0.031      0.004     0.004     0.004     0.025     0.002
> >             rs117967362 rs75978745 rs6502564 rs9894565 rs36079048 rs8076621
> > rs56192520        0.004      0.004     0.007     0.017          0     0.004
> > rs3764410         0.004      0.004     0.005     0.015          0     0.004
> > rs145984817       0.004      0.004     0.009     0.019          0     0.004
> > rs1807401         0.004      0.004     0.009     0.019          0     0.004
> > rs1807402         0.004      0.004     0.009     0.019          0     0.004
> > rs35350506        0.004      0.004     0.009     0.019          0     0.004
> >             rs7218795 rs3803761 rs12602675 rs7208065 rs4985705 rs8080386
> > rs56192520      0.026     0.032          0     0.018     0.014     0.003
> > rs3764410       0.024     0.029          0     0.015     0.011     0.002
> > rs145984817     0.028     0.034          0     0.021     0.016     0.004
> > rs1807401       0.028     0.034          0     0.021     0.016     0.004
> > rs1807402       0.028     0.034          0     0.021     0.016     0.004
> > rs35350506      0.028     0.034          0     0.021     0.016     0.004
> >             rs8065832 rs2018781 rs1736221 rs1736220 rs1736217 rs1708620
> > rs56192520      0.008     0.039     0.003     0.003     0.021     0.009
> > rs3764410       0.006     0.037     0.002     0.002     0.019     0.007
> > rs145984817     0.010     0.042     0.004     0.004     0.024     0.011
> > rs1807401       0.010     0.042     0.004     0.004     0.024     0.011
> > rs1807402       0.010     0.042     0.004     0.004     0.024     0.011
> > rs35350506      0.010     0.042     0.004     0.004     0.024     0.011
> >             rs1708619 rs1736216 rs76319098 rs1736215 rs1736214 rs1708617
> > rs56192520      0.009     0.024      0.017     0.012     0.019     0.029
> > rs3764410       0.007     0.021      0.016     0.009     0.016     0.026
> > rs145984817     0.011     0.026      0.018     0.014     0.022     0.031
> > rs1807401       0.011     0.026      0.018     0.014     0.022     0.031
> > rs1807402       0.011     0.026      0.018     0.014     0.022     0.031
> > rs35350506      0.011     0.026      0.018     0.014     0.022     0.031
> >             rs12602831 rs12602871 rs1736213 rs1736212 rs76045368 rs34518797
> > rs56192520       0.000      0.000     0.015     0.029      0.001      0.001
> > rs3764410        0.001      0.001     0.013     0.026      0.001      0.001
> > rs145984817      0.000      0.000     0.018     0.031      0.000      0.000
> > rs1807401        0.000      0.000     0.018     0.031      0.000      0.000
> > rs1807402        0.000      0.000     0.018     0.031      0.000      0.000
> > rs35350506       0.000      0.000     0.018     0.031      0.000      0.000
> >             rs11078378 rs8079562 rs8065774 rs8066090 rs41337846 rs1736209
> > rs56192520       0.043     0.001     0.001     0.029      0.000     0.029
> > rs3764410        0.041     0.001     0.001     0.026      0.001     0.026
> > rs145984817      0.046     0.000     0.000     0.031      0.000     0.031
> > rs1807401        0.046     0.000     0.000     0.031      0.000     0.031
> > rs1807402        0.046     0.000     0.000     0.031      0.000     0.031
> > rs35350506       0.046     0.000     0.000     0.031      0.000     0.031
> >             rs1736208 rs12949822 rs76246042 rs12600635 rs55689224 rs1736207
> > rs56192520      0.043      0.043      0.000      0.000      0.000     0.015
> > rs3764410       0.041      0.041      0.001      0.001      0.001     0.013
> > rs145984817     0.046      0.046      0.000      0.000      0.000     0.018
> > rs1807401       0.046      0.046      0.000      0.000      0.000     0.018
> > rs1807402       0.046      0.046      0.000      0.000      0.000     0.018
> > rs35350506      0.046      0.046      0.000      0.000      0.000     0.018
> >             rs1708626 rs1736206 rs9896078 rs16961474 rs1708627 rs1736205
> > rs56192520      0.015     0.015     0.001      0.001     0.017     0.021
> > rs3764410       0.013     0.013     0.001      0.001     0.014     0.019
> > rs145984817     0.018     0.018     0.000      0.001     0.020     0.024
> > rs1807401       0.018     0.018     0.000      0.001     0.020     0.024
> > rs1807402       0.018     0.018     0.000      0.001     0.020     0.024
> > rs35350506      0.018     0.018     0.000      0.001     0.020     0.024
> >             rs1708628 rs7220577 rs2294155 rs1736204 rs1736203 rs1736202
> > rs56192520      0.011     0.011     0.000     0.021     0.014     0.014
> > rs3764410       0.009     0.009     0.000     0.019     0.011     0.011
> > rs145984817     0.013     0.013     0.001     0.024     0.016     0.016
> > rs1807401       0.013     0.013     0.001     0.024     0.016     0.016
> > rs1807402       0.013     0.013     0.001     0.024     0.016     0.016
> > rs35350506      0.013     0.013     0.001     0.024     0.016     0.016
> >             rs12937908 rs1736200 rs1708623 rs1708624 rs9894884 rs9901894
> > rs56192520       0.009     0.009     0.008     0.007     0.009     0.009
> > rs3764410        0.007     0.007     0.006     0.005     0.007     0.007
> > rs145984817      0.011     0.011     0.010     0.008     0.011     0.011
> > rs1807401        0.011     0.011     0.010     0.008     0.011     0.011
> > rs1807402        0.011     0.011     0.010     0.008     0.011     0.011
> > rs35350506       0.011     0.011     0.010     0.008     0.011     0.011
> >             rs9903294 rs2472689 rs1630656 rs111478970 rs3182911 rs7219012
> > rs56192520      0.008     0.011     0.007       0.007     0.008     0.000
> > rs3764410       0.006     0.009     0.005       0.005     0.006     0.000
> > rs145984817     0.010     0.013     0.008       0.008     0.010     0.001
> > rs1807401       0.010     0.013     0.008       0.008     0.010     0.001
> > rs1807402       0.010     0.013     0.008       0.008     0.010     0.001
> > rs35350506      0.010     0.013     0.008       0.008     0.010     0.001
> >             rs9890657 rs12453455 rs12947291 rs150267386 rs16961493 rs11652745
> > rs56192520      0.009      0.007      0.008       0.013      0.000      0.009
> > rs3764410       0.007      0.005      0.006       0.012      0.000      0.007
> > rs145984817     0.011      0.008      0.010       0.014      0.001      0.011
> > rs1807401       0.011      0.008      0.010       0.014      0.001      0.011
> > rs1807402       0.011      0.008      0.010       0.014      0.001      0.011
> > rs35350506      0.011      0.008      0.010       0.014      0.001      0.011
> >             rs9907107 rs8070574 rs4985759 rs3866959 rs7219248 rs6502568
> > rs56192520      0.009     0.009     0.009     0.011     0.009     0.011
> > rs3764410       0.007     0.007     0.007     0.009     0.007     0.009
> > rs145984817     0.011     0.011     0.011     0.013     0.011     0.013
> > rs1807401       0.011     0.011     0.011     0.013     0.011     0.013
> > rs1807402       0.011     0.011     0.011     0.013     0.011     0.013
> > rs35350506      0.011     0.011     0.011     0.013     0.011     0.013
> >             rs7220275 rs12450037 rs7225876 rs9892352 rs4985760 rs6502569
> > rs56192520      0.009      0.008     0.007     0.011     0.011     0.011
> > rs3764410       0.007      0.006     0.005     0.009     0.009     0.009
> > rs145984817     0.011      0.010     0.008     0.013     0.013     0.013
> > rs1807401       0.011      0.010     0.008     0.013     0.013     0.013
> > rs1807402       0.011      0.010     0.008     0.013     0.013     0.013
> > rs35350506      0.011      0.010     0.008     0.013     0.013     0.013
> >             rs1029830 rs2012954 rs1029832 rs2270180 rs8072402 rs7221553
> > rs56192520      0.009     0.011     0.008     0.000     0.009     0.011
> > rs3764410       0.007     0.009     0.006     0.000     0.007     0.009
> > rs145984817     0.011     0.013     0.010     0.001     0.011     0.013
> > rs1807401       0.011     0.013     0.010     0.001     0.011     0.013
> > rs1807402       0.011     0.013     0.010     0.001     0.011     0.013
> > rs35350506      0.011     0.013     0.010     0.001     0.011     0.013
> >             rs145597919 rs150772017 rs2041393 rs6502578 rs11078382 rs9912109
> > rs56192520        0.013       0.013     0.005     0.005          0     0.005
> > rs3764410         0.012       0.012     0.004     0.004          0     0.004
> > rs145984817       0.014       0.014     0.006     0.006          0     0.006
> > rs1807401         0.014       0.014     0.006     0.006          0     0.006
> > rs1807402         0.014       0.014     0.006     0.006          0     0.006
> > rs35350506        0.014       0.014     0.006     0.006          0     0.006
> >             rs12601631 rs11869054 rs11869079 rs9912599 rs7220057 rs9896970
> > rs56192520           0          0          0         0         0         0
> > rs3764410            0          0          0         0         0         0
> > rs145984817          0          0          0         0         0         0
> > rs1807401            0          0          0         0         0         0
> > rs1807402            0          0          0         0         0         0
> > rs35350506           0          0          0         0         0         0
> >             rs34121330 rs34668117 rs67773570 rs242252 rs955893 rs28583584
> > rs56192520       0.000      0.000          0    0.002    0.001      0.013
> > rs3764410        0.001      0.001          0    0.003    0.001      0.005
> > rs145984817      0.000      0.000          0    0.002    0.001      0.004
> > rs1807401        0.000      0.000          0    0.002    0.001      0.004
> > rs1807402        0.000      0.000          0    0.002    0.001      0.004
> > rs35350506       0.000      0.000          0    0.002    0.001      0.004
> >             rs9944423 rs7217764 rs11651957 rs73978990 rs8071007 rs56044345
> > rs56192520      0.013     0.011      0.011      0.011     0.011      0.011
> > rs3764410       0.005     0.004      0.004      0.004     0.004      0.004
> > rs145984817     0.004     0.003      0.003      0.003     0.003      0.003
> > rs1807401       0.004     0.003      0.003      0.003     0.003      0.003
> > rs1807402       0.004     0.003      0.003      0.003     0.003      0.003
> > rs35350506      0.004     0.003      0.003      0.003     0.003      0.003
> >             rs17804843
> > rs56192520           0
> > rs3764410            0
> > rs145984817          0
> > rs1807401            0
> > rs1807402            0
> > rs35350506           0
> >
> >
> > On Thu, Nov 14, 2019 at 2:59 PM Abby Spurdle <[hidden email]> wrote:
> > >
> > > That's assuming your data was returned by head().
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

plangfelder
In reply to this post by anikaM
I suspect that you want to identify which variables are highly
correlated, and then keep only "representative" variables, i.e.,
remove redundant ones. This is a bit of a risky procedure but I have
done such things before as well sometimes to simplify large sets of
highly related variables. If your threshold of 0.8 is approximate, you
could simply use average linkage hierarchical clustering with
dissimilarity = 1-correlation, cut the tree at the appropriate height
(1-0.8=0.2), and from each cluster keep a single representative (e.g.,
the one with the highest mean correlation with other members of the
cluster). Something along these lines (untested)

tree = hclust(1-calc.rho, method = "average")
clusts = cutree(tree, h = 0.2)
clustLevels = sort(unique(clusts))
representatives = unlist(lapply(clustLevels, function(cl)
{
  inClust = which(clusts==cl);
  rho1 = calc.rho[inClust, inClust, drop = FALSE];
  repr = inClust[ which.max(colSums(rho1)) ]
  repr
}))

the variable representatives now contains indices of the variables you
want to retain, so you could subset the calc.rho matrix as
rho.retained = calc.rho[representatives, representatives]

I haven't tested the code and it may contain bugs, but something along
these lines should get you where you want to be.

Oh, and depending on how strict you want to be with the remaining
correlations, you could use complete linkage clustering (will retain
more variables, some correlations will be above 0.8).

Peter

On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <[hidden email]> wrote:

>
> Hello,
>
> I have a data frame like this (a matrix):
> head(calc.rho)
>             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
>
> > dim(calc.rho)
> [1] 246 246
>
> I would like to remove from this data all highly correlated variables,
> with correlation more than 0.8
>
> I tried this:
>
> > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > dim(data)
> [1] 246   0
>
> Can you please advise,
>
> Thanks
> Ana
>
> But this removes everything.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

anikaM
HI Peter,

Thank you for getting back to me and shedding light on this. I see
your point, doing Jim's method:

> keeprows<-apply(calc.rho,1,function(x) return(sum(x>0.8)<3))
> ro246.lt.8<-calc.rho[keeprows,keeprows]
> ro246.lt.8[ro246.lt.8 == 1] <- NA
> (mmax <- max(abs(ro246.lt.8), na.rm=TRUE))
[1] 0.566

Which is good in general, correlations in my matrix  should not be
exceeding 0.8. I need to run Mendelian Rendomization on it later on so
I can not be having there highly correlated SNPs. But with Jim's
method I am only left with 17 SNPs (out of 246) and that means that
both pairs of highly correlated SNPs are removed and it would be good
to keep one of those highly correlated ones.

I tried to do your code:
> tree = hclust(1-calc.rho, method = "average")
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor
exceed 65536") :
  missing value where TRUE/FALSE needed

Please advise.

Thanks
Ana

On Thu, Nov 14, 2019 at 7:37 PM Peter Langfelder
<[hidden email]> wrote:

>
> I suspect that you want to identify which variables are highly
> correlated, and then keep only "representative" variables, i.e.,
> remove redundant ones. This is a bit of a risky procedure but I have
> done such things before as well sometimes to simplify large sets of
> highly related variables. If your threshold of 0.8 is approximate, you
> could simply use average linkage hierarchical clustering with
> dissimilarity = 1-correlation, cut the tree at the appropriate height
> (1-0.8=0.2), and from each cluster keep a single representative (e.g.,
> the one with the highest mean correlation with other members of the
> cluster). Something along these lines (untested)
>
> tree = hclust(1-calc.rho, method = "average")
> clusts = cutree(tree, h = 0.2)
> clustLevels = sort(unique(clusts))
> representatives = unlist(lapply(clustLevels, function(cl)
> {
>   inClust = which(clusts==cl);
>   rho1 = calc.rho[inClust, inClust, drop = FALSE];
>   repr = inClust[ which.max(colSums(rho1)) ]
>   repr
> }))
>
> the variable representatives now contains indices of the variables you
> want to retain, so you could subset the calc.rho matrix as
> rho.retained = calc.rho[representatives, representatives]
>
> I haven't tested the code and it may contain bugs, but something along
> these lines should get you where you want to be.
>
> Oh, and depending on how strict you want to be with the remaining
> correlations, you could use complete linkage clustering (will retain
> more variables, some correlations will be above 0.8).
>
> Peter
>
> On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <[hidden email]> wrote:
> >
> > Hello,
> >
> > I have a data frame like this (a matrix):
> > head(calc.rho)
> >             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> > rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> > rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> > rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> > rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> > rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> > rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> >
> > > dim(calc.rho)
> > [1] 246 246
> >
> > I would like to remove from this data all highly correlated variables,
> > with correlation more than 0.8
> >
> > I tried this:
> >
> > > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > > dim(data)
> > [1] 246   0
> >
> > Can you please advise,
> >
> > Thanks
> > Ana
> >
> > But this removes everything.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

anikaM
if it is of any help my correlation matrix (calc.rho) was done here,
under LDmatrix tab https://ldlink.nci.nih.gov/?tab=ldmatrix
and dataset of 246 is bellow

rs56192520
rs3764410
rs145984817
rs1807401
rs1807402
rs35350506
rs2089177
rs12325677
rs62064624
rs62064631
rs2349295
rs2174369
rs7218554
rs62064634
rs4360974
rs4527060
rs6502526
rs6502527
rs9900318
rs8069906
rs9908521
rs9908336
rs9908870
rs9895995
rs7211086
rs9905280
rs8073305
rs8072086
rs4312350
rs4313843
rs8069610
rs883504
rs8072394
rs4280293
rs4465638
rs12602378
rs9899059
rs6502530
rs4380085
rs6502532
rs4792798
rs4792799
rs4316813
rs148563931
rs74751226
rs8068857
rs8069441
rs77397878
rs75339756
rs4608391
rs79569548
rs4275914
rs11870422
rs8075751
rs11658904
rs138437542
rs80344434
rs7222311
rs7221842
rs7223686
rs78013597
rs74965036
rs78063986
rs118106233
rs117345712
rs113004656
rs9898995
rs4985718
rs9893911
rs79110942
rs7208929
rs12601453
rs4078062
rs75129280
rs76664572
rs78961289
rs146364798
rs76715413
rs4078534
rs79457460
rs74369938
rs76423171
rs74668400
rs75146120
rs1135237
rs9914671
rs117759512
rs4985696
rs16961340
rs17794159
rs4247118
rs78572469
rs12601193
rs2349646
rs2090018
rs12601424
rs4985701
rs8064550
rs2271521
rs2271520
rs11078374
rs4985702
rs1124961
rs11652674
rs3924340
rs112450164
rs7208973
rs9910857
rs78574480
rs8072184
rs12602196
rs6502563
rs3744135
rs148779543
rs77689691
rs41319048
rs117340532
rs78647096
rs77712968
rs16961396
rs80054920
rs7206981
rs4985740
rs3803762
rs77103270
rs7207485
rs77342773
rs3826304
rs3744126
rs7210879
rs7211576
rs117967362
rs75978745
rs6502564
rs9894565
rs36079048
rs8076621
rs7218795
rs3803761
rs12602675
rs7208065
rs4985705
rs8080386
rs8065832
rs2018781
rs1736221
rs1736220
rs1736217
rs1708620
rs1708619
rs1736216
rs76319098
rs1736215
rs1736214
rs1708617
rs12602831
rs12602871
rs1736213
rs1736212
rs76045368
rs34518797
rs11078378
rs8079562
rs8065774
rs8066090
rs41337846
rs1736209
rs1736208
rs12949822
rs76246042
rs12600635
rs55689224
rs1736207
rs1708626
rs1736206
rs9896078
rs16961474
rs1708627
rs1736205
rs1708628
rs7220577
rs2294155
rs1736204
rs1736203
rs1736202
rs12937908
rs1736200
rs1708623
rs1708624
rs9894884
rs9901894
rs9903294
rs2472689
rs1630656
rs111478970
rs3182911
rs7219012
rs9890657
rs12453455
rs12947291
rs150267386
rs16961493
rs11652745
rs9907107
rs8070574
rs4985759
rs3866959
rs7219248
rs6502568
rs7220275
rs12450037
rs7225876
rs9892352
rs4985760
rs6502569
rs1029830
rs2012954
rs1029832
rs2270180
rs8072402
rs7221553
rs145597919
rs150772017
rs2041393
rs6502578
rs11078382
rs9912109
rs12601631
rs11869054
rs11869079
rs9912599
rs7220057
rs9896970
rs34121330
rs34668117
rs67773570
rs242252
rs955893
rs28583584
rs9944423
rs7217764
rs11651957
rs73978990
rs8071007
rs56044345
rs17804843


On Fri, Nov 15, 2019 at 12:03 PM Ana Marija <[hidden email]> wrote:

>
> HI Peter,
>
> Thank you for getting back to me and shedding light on this. I see
> your point, doing Jim's method:
>
> > keeprows<-apply(calc.rho,1,function(x) return(sum(x>0.8)<3))
> > ro246.lt.8<-calc.rho[keeprows,keeprows]
> > ro246.lt.8[ro246.lt.8 == 1] <- NA
> > (mmax <- max(abs(ro246.lt.8), na.rm=TRUE))
> [1] 0.566
>
> Which is good in general, correlations in my matrix  should not be
> exceeding 0.8. I need to run Mendelian Rendomization on it later on so
> I can not be having there highly correlated SNPs. But with Jim's
> method I am only left with 17 SNPs (out of 246) and that means that
> both pairs of highly correlated SNPs are removed and it would be good
> to keep one of those highly correlated ones.
>
> I tried to do your code:
> > tree = hclust(1-calc.rho, method = "average")
> Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor
> exceed 65536") :
>   missing value where TRUE/FALSE needed
>
> Please advise.
>
> Thanks
> Ana
>
> On Thu, Nov 14, 2019 at 7:37 PM Peter Langfelder
> <[hidden email]> wrote:
> >
> > I suspect that you want to identify which variables are highly
> > correlated, and then keep only "representative" variables, i.e.,
> > remove redundant ones. This is a bit of a risky procedure but I have
> > done such things before as well sometimes to simplify large sets of
> > highly related variables. If your threshold of 0.8 is approximate, you
> > could simply use average linkage hierarchical clustering with
> > dissimilarity = 1-correlation, cut the tree at the appropriate height
> > (1-0.8=0.2), and from each cluster keep a single representative (e.g.,
> > the one with the highest mean correlation with other members of the
> > cluster). Something along these lines (untested)
> >
> > tree = hclust(1-calc.rho, method = "average")
> > clusts = cutree(tree, h = 0.2)
> > clustLevels = sort(unique(clusts))
> > representatives = unlist(lapply(clustLevels, function(cl)
> > {
> >   inClust = which(clusts==cl);
> >   rho1 = calc.rho[inClust, inClust, drop = FALSE];
> >   repr = inClust[ which.max(colSums(rho1)) ]
> >   repr
> > }))
> >
> > the variable representatives now contains indices of the variables you
> > want to retain, so you could subset the calc.rho matrix as
> > rho.retained = calc.rho[representatives, representatives]
> >
> > I haven't tested the code and it may contain bugs, but something along
> > these lines should get you where you want to be.
> >
> > Oh, and depending on how strict you want to be with the remaining
> > correlations, you could use complete linkage clustering (will retain
> > more variables, some correlations will be above 0.8).
> >
> > Peter
> >
> > On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > I have a data frame like this (a matrix):
> > > head(calc.rho)
> > >             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> > > rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> > > rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> > > rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> > > rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> > > rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> > > rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> > >
> > > > dim(calc.rho)
> > > [1] 246 246
> > >
> > > I would like to remove from this data all highly correlated variables,
> > > with correlation more than 0.8
> > >
> > > I tried this:
> > >
> > > > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > > > dim(data)
> > > [1] 246   0
> > >
> > > Can you please advise,
> > >
> > > Thanks
> > > Ana
> > >
> > > But this removes everything.
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

Jim Lemon-4
While the remedy for your dissatisfaction with my previous solution
should be obvious, I will make it explicit.

# that is rows containing at most one value > 0.8
# ignoring the diagonal
keeprows<-apply(ro246,1,function(x) return(sum(x>0.8)<2))
ro246.lt.8<-ro246[keeprows,keeprows]

Jim

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

plangfelder
In reply to this post by anikaM
Try hclust(as.dist(1-calc.rho), method = "average").

Peter

On Fri, Nov 15, 2019 at 10:02 AM Ana Marija <[hidden email]> wrote:

>
> HI Peter,
>
> Thank you for getting back to me and shedding light on this. I see
> your point, doing Jim's method:
>
> > keeprows<-apply(calc.rho,1,function(x) return(sum(x>0.8)<3))
> > ro246.lt.8<-calc.rho[keeprows,keeprows]
> > ro246.lt.8[ro246.lt.8 == 1] <- NA
> > (mmax <- max(abs(ro246.lt.8), na.rm=TRUE))
> [1] 0.566
>
> Which is good in general, correlations in my matrix  should not be
> exceeding 0.8. I need to run Mendelian Rendomization on it later on so
> I can not be having there highly correlated SNPs. But with Jim's
> method I am only left with 17 SNPs (out of 246) and that means that
> both pairs of highly correlated SNPs are removed and it would be good
> to keep one of those highly correlated ones.
>
> I tried to do your code:
> > tree = hclust(1-calc.rho, method = "average")
> Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor
> exceed 65536") :
>   missing value where TRUE/FALSE needed
>
> Please advise.
>
> Thanks
> Ana
>
> On Thu, Nov 14, 2019 at 7:37 PM Peter Langfelder
> <[hidden email]> wrote:
> >
> > I suspect that you want to identify which variables are highly
> > correlated, and then keep only "representative" variables, i.e.,
> > remove redundant ones. This is a bit of a risky procedure but I have
> > done such things before as well sometimes to simplify large sets of
> > highly related variables. If your threshold of 0.8 is approximate, you
> > could simply use average linkage hierarchical clustering with
> > dissimilarity = 1-correlation, cut the tree at the appropriate height
> > (1-0.8=0.2), and from each cluster keep a single representative (e.g.,
> > the one with the highest mean correlation with other members of the
> > cluster). Something along these lines (untested)
> >
> > tree = hclust(1-calc.rho, method = "average")
> > clusts = cutree(tree, h = 0.2)
> > clustLevels = sort(unique(clusts))
> > representatives = unlist(lapply(clustLevels, function(cl)
> > {
> >   inClust = which(clusts==cl);
> >   rho1 = calc.rho[inClust, inClust, drop = FALSE];
> >   repr = inClust[ which.max(colSums(rho1)) ]
> >   repr
> > }))
> >
> > the variable representatives now contains indices of the variables you
> > want to retain, so you could subset the calc.rho matrix as
> > rho.retained = calc.rho[representatives, representatives]
> >
> > I haven't tested the code and it may contain bugs, but something along
> > these lines should get you where you want to be.
> >
> > Oh, and depending on how strict you want to be with the remaining
> > correlations, you could use complete linkage clustering (will retain
> > more variables, some correlations will be above 0.8).
> >
> > Peter
> >
> > On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > I have a data frame like this (a matrix):
> > > head(calc.rho)
> > >             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> > > rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> > > rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> > > rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> > > rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> > > rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> > > rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> > >
> > > > dim(calc.rho)
> > > [1] 246 246
> > >
> > > I would like to remove from this data all highly correlated variables,
> > > with correlation more than 0.8
> > >
> > > I tried this:
> > >
> > > > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > > > dim(data)
> > > [1] 246   0
> > >
> > > Can you please advise,
> > >
> > > Thanks
> > > Ana
> > >
> > > But this removes everything.
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Remove highly correlated variables from a data frame or matrix

anikaM
Hi Peter,

Thank you so much!!! I will use complete linkage clustering because
Mendelian Randomization function
(https://cran.r-project.org/web/packages/MendelianRandomization/vignettes/Vignette_MR.pdf)
I plan to use allows for correlations but not as high as 0.9 or more.
I got 40 SNPs out of 246 so improvement!

Regards,
Ana

On Fri, Nov 15, 2019 at 8:01 PM Peter Langfelder
<[hidden email]> wrote:

>
> Try hclust(as.dist(1-calc.rho), method = "average").
>
> Peter
>
> On Fri, Nov 15, 2019 at 10:02 AM Ana Marija <[hidden email]> wrote:
> >
> > HI Peter,
> >
> > Thank you for getting back to me and shedding light on this. I see
> > your point, doing Jim's method:
> >
> > > keeprows<-apply(calc.rho,1,function(x) return(sum(x>0.8)<3))
> > > ro246.lt.8<-calc.rho[keeprows,keeprows]
> > > ro246.lt.8[ro246.lt.8 == 1] <- NA
> > > (mmax <- max(abs(ro246.lt.8), na.rm=TRUE))
> > [1] 0.566
> >
> > Which is good in general, correlations in my matrix  should not be
> > exceeding 0.8. I need to run Mendelian Rendomization on it later on so
> > I can not be having there highly correlated SNPs. But with Jim's
> > method I am only left with 17 SNPs (out of 246) and that means that
> > both pairs of highly correlated SNPs are removed and it would be good
> > to keep one of those highly correlated ones.
> >
> > I tried to do your code:
> > > tree = hclust(1-calc.rho, method = "average")
> > Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor
> > exceed 65536") :
> >   missing value where TRUE/FALSE needed
> >
> > Please advise.
> >
> > Thanks
> > Ana
> >
> > On Thu, Nov 14, 2019 at 7:37 PM Peter Langfelder
> > <[hidden email]> wrote:
> > >
> > > I suspect that you want to identify which variables are highly
> > > correlated, and then keep only "representative" variables, i.e.,
> > > remove redundant ones. This is a bit of a risky procedure but I have
> > > done such things before as well sometimes to simplify large sets of
> > > highly related variables. If your threshold of 0.8 is approximate, you
> > > could simply use average linkage hierarchical clustering with
> > > dissimilarity = 1-correlation, cut the tree at the appropriate height
> > > (1-0.8=0.2), and from each cluster keep a single representative (e.g.,
> > > the one with the highest mean correlation with other members of the
> > > cluster). Something along these lines (untested)
> > >
> > > tree = hclust(1-calc.rho, method = "average")
> > > clusts = cutree(tree, h = 0.2)
> > > clustLevels = sort(unique(clusts))
> > > representatives = unlist(lapply(clustLevels, function(cl)
> > > {
> > >   inClust = which(clusts==cl);
> > >   rho1 = calc.rho[inClust, inClust, drop = FALSE];
> > >   repr = inClust[ which.max(colSums(rho1)) ]
> > >   repr
> > > }))
> > >
> > > the variable representatives now contains indices of the variables you
> > > want to retain, so you could subset the calc.rho matrix as
> > > rho.retained = calc.rho[representatives, representatives]
> > >
> > > I haven't tested the code and it may contain bugs, but something along
> > > these lines should get you where you want to be.
> > >
> > > Oh, and depending on how strict you want to be with the remaining
> > > correlations, you could use complete linkage clustering (will retain
> > > more variables, some correlations will be above 0.8).
> > >
> > > Peter
> > >
> > > On Thu, Nov 14, 2019 at 10:50 AM Ana Marija <[hidden email]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have a data frame like this (a matrix):
> > > > head(calc.rho)
> > > >             rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> > > > rs56192520      0.903     0.268     0.327     0.327     0.327     0.582
> > > > rs3764410       0.928     0.276     0.336     0.336     0.336     0.598
> > > > rs145984817     0.975     0.309     0.371     0.371     0.371     0.638
> > > > rs1807401       0.975     0.309     0.371     0.371     0.371     0.638
> > > > rs1807402       0.975     0.309     0.371     0.371     0.371     0.638
> > > > rs35350506      0.975     0.309     0.371     0.371     0.371     0.638
> > > >
> > > > > dim(calc.rho)
> > > > [1] 246 246
> > > >
> > > > I would like to remove from this data all highly correlated variables,
> > > > with correlation more than 0.8
> > > >
> > > > I tried this:
> > > >
> > > > > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > > > > dim(data)
> > > > [1] 246   0
> > > >
> > > > Can you please advise,
> > > >
> > > > Thanks
> > > > Ana
> > > >
> > > > But this removes everything.
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.