how to merge 5 data frames by one column

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

how to merge 5 data frames by one column

anikaM
Hello,

I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:

> head(s11)
               V1.1                          rs         V3.1        V4.1
1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493

> head(s22)
               V1.2                               rs        V3.2      V4.2
602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904

Each one has one unique value in respective V1

I am trying to merge all at once all 5 data frames by the "rs" column.

Can you please help with this,
Ana

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to merge 5 data frames by one column

anikaM
the desired output would look like this (example give just for two genes,
it should include all 5 from all 5 data frames):

where the example is if say only 5 rs are shared between those two genes,
what is given after rs# is values from V4 column for each gene

GENES ENSG00000001629 ENSG00000127914
rs1208998 -0.0337989326337439  -0.00106024397995199
rs4729008 0.0630831868839983  0.00890783698397027
rs11772754 0.181375539335959  0.0012636115921931
rs10257459 0.0369962603988132  0.00509887844657462
rs17164876 0.0307882763321834  -0.00188979524322732

On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <[hidden email]>
wrote:

> Hello,
>
> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>
> > head(s11)
>                V1.1                          rs         V3.1        V4.1
> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>
> > head(s22)
>                V1.2                               rs        V3.2      V4.2
> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>
> Each one has one unique value in respective V1
>
> I am trying to merge all at once all 5 data frames by the "rs" column.
>
> Can you please help with this,
> Ana
>
>
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to merge 5 data frames by one column

anikaM
I can perhaps do this:

m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22, s33,s44,s55))

but than in the output of this one SNP (just for example)

> head(m)
         rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
 V1.3
6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
ENSG00000141030
         V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA
...

but how to filter out this output (m) in order to remove all rows where I
have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5





On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <[hidden email]>
wrote:

> the desired output would look like this (example give just for two genes,
> it should include all 5 from all 5 data frames):
>
> where the example is if say only 5 rs are shared between those two genes,
> what is given after rs# is values from V4 column for each gene
>
> GENES ENSG00000001629 ENSG00000127914
> rs1208998 -0.0337989326337439  -0.00106024397995199
> rs4729008 0.0630831868839983  0.00890783698397027
> rs11772754 0.181375539335959  0.0012636115921931
> rs10257459 0.0369962603988132  0.00509887844657462
> rs17164876 0.0307882763321834  -0.00188979524322732
>
> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <[hidden email]>
> wrote:
>
>> Hello,
>>
>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>
>> > head(s11)
>>                V1.1                          rs         V3.1        V4.1
>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>
>> > head(s22)
>>                V1.2                               rs        V3.2      V4.2
>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>
>> Each one has one unique value in respective V1
>>
>> I am trying to merge all at once all 5 data frames by the "rs" column.
>>
>> Can you please help with this,
>> Ana
>>
>>
>>
>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to merge 5 data frames by one column

anikaM
would this make sense for the previous:
mt=na.omit(m, cols = c("V1.1","V1.2","V1.3","V1.4","V1.5"))

On Tue, Dec 3, 2019 at 2:09 PM Ana Marija <[hidden email]>
wrote:

> I can perhaps do this:
>
> m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22, s33,s44,s55))
>
> but than in the output of this one SNP (just for example)
>
> > head(m)
>          rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
>  V1.3
> 6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
> ENSG00000141030
>          V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
> 6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA
> ...
>
> but how to filter out this output (m) in order to remove all rows where I
> have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5
>
>
>
>
>
> On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <[hidden email]>
> wrote:
>
>> the desired output would look like this (example give just for two genes,
>> it should include all 5 from all 5 data frames):
>>
>> where the example is if say only 5 rs are shared between those two genes,
>> what is given after rs# is values from V4 column for each gene
>>
>> GENES ENSG00000001629 ENSG00000127914
>> rs1208998 -0.0337989326337439  -0.00106024397995199
>> rs4729008 0.0630831868839983  0.00890783698397027
>> rs11772754 0.181375539335959  0.0012636115921931
>> rs10257459 0.0369962603988132  0.00509887844657462
>> rs17164876 0.0307882763321834  -0.00188979524322732
>>
>> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <[hidden email]>
>> wrote:
>>
>>> Hello,
>>>
>>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>>
>>> > head(s11)
>>>                V1.1                          rs         V3.1        V4.1
>>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>>
>>> > head(s22)
>>>                V1.2                               rs        V3.2
>>>  V4.2
>>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>>
>>> Each one has one unique value in respective V1
>>>
>>> I am trying to merge all at once all 5 data frames by the "rs" column.
>>>
>>> Can you please help with this,
>>> Ana
>>>
>>>
>>>
>>>
>>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to merge 5 data frames by one column

anikaM
I apologize I would need to reformulate this problem because there will be
much more unique genes I have to look up, 381

so all genes or in one data frame

> head(r)
               V1         V2          V3        V4
1 ENSG00000273172  rs7215271 4.33932e-17 -0.602316
2 ENSG00000273172 rs34889101 4.99518e-17 -0.596089
3 ENSG00000273172  rs4890177 4.23229e-17 -0.590085
4 ENSG00000273172  rs4890178 7.14216e-17 -0.581467
5 ENSG00000273172  rs7503363 3.16802e-17 -0.582836
6 ENSG00000273172 rs35611892 2.24399e-17 -0.583710

> tail(r)
                   V1          V2          V3        V4
18946 ENSG00000141560    rs7215271 8.53890e-17  0.572286
18947 ENSG00000141560    rs606532 9.00740e-17  0.572151
18963 ENSG00000175711 rs111566282 5.71871e-17 -0.609586
18964 ENSG00000175711  rs76319775 4.58843e-17 -0.610164
18965 ENSG00000175711  rs62074661 4.17490e-17 -0.603199
18966 ENSG00000176845  rs11433639 1.45496e-17 -0.761955

So for the adobe example I would just have in result for merging this one
row: because they gave this same rs: rs7215271
and output would contain all columns related to those two genes which have
the same:  rs7215271

it can be also possible that I can find more than 2 genes sharing the same
rs.

Can you please advise about this




On Tue, Dec 3, 2019 at 2:16 PM Ana Marija <[hidden email]>
wrote:

> would this make sense for the previous:
> mt=na.omit(m, cols = c("V1.1","V1.2","V1.3","V1.4","V1.5"))
>
> On Tue, Dec 3, 2019 at 2:09 PM Ana Marija <[hidden email]>
> wrote:
>
>> I can perhaps do this:
>>
>> m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22,
>> s33,s44,s55))
>>
>> but than in the output of this one SNP (just for example)
>>
>> > head(m)
>>          rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
>>  V1.3
>> 6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
>> ENSG00000141030
>>          V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
>> 6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA
>> ...
>>
>> but how to filter out this output (m) in order to remove all rows where I
>> have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5
>>
>>
>>
>>
>>
>> On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <[hidden email]>
>> wrote:
>>
>>> the desired output would look like this (example give just for two
>>> genes, it should include all 5 from all 5 data frames):
>>>
>>> where the example is if say only 5 rs are shared between those two
>>> genes, what is given after rs# is values from V4 column for each gene
>>>
>>> GENES ENSG00000001629 ENSG00000127914
>>> rs1208998 -0.0337989326337439  -0.00106024397995199
>>> rs4729008 0.0630831868839983  0.00890783698397027
>>> rs11772754 0.181375539335959  0.0012636115921931
>>> rs10257459 0.0369962603988132  0.00509887844657462
>>> rs17164876 0.0307882763321834  -0.00188979524322732
>>>
>>> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <[hidden email]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>>>
>>>> > head(s11)
>>>>                V1.1                          rs         V3.1        V4.1
>>>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>>>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>>>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>>>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>>>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>>>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>>>
>>>> > head(s22)
>>>>                V1.2                               rs        V3.2
>>>>  V4.2
>>>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>>>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>>>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>>>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>>>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>>>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>>>
>>>> Each one has one unique value in respective V1
>>>>
>>>> I am trying to merge all at once all 5 data frames by the "rs" column.
>>>>
>>>> Can you please help with this,
>>>> Ana
>>>>
>>>>
>>>>
>>>>
>>>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to merge 5 data frames by one column

David Winsemius
In reply to this post by anikaM

On 12/3/19 12:16 PM, Ana Marija wrote:

> would this make sense for the previous:
> mt=na.omit(m, cols = c("V1.1","V1.2","V1.3","V1.4","V1.5"))
>
> On Tue, Dec 3, 2019 at 2:09 PM Ana Marija <[hidden email]>
> wrote:
>
>> I can perhaps do this:
>>
>> m=Reduce(function(x, y) merge(x, y, all=TRUE), list(s11, s22, s33,s44,s55))
>>
>> but than in the output of this one SNP (just for example)
>>
>>> head(m)
>>           rs            V1.1        V3.1     V4.1 V1.2 V3.2 V4.2
>>   V1.3
>> 6 rs1029829 ENSG00000154803 1.02519e-11 0.469402 <NA>   NA   NA
>> ENSG00000141030
>>           V3.3     V4.3 V1.4 V3.4 V4.4 V1.5 V3.5 V4.5
>> 6 3.06126e-28 0.726948 <NA>   NA   NA <NA>   NA   NA


It's a very simple matter when using gmail to adhere to the Posting
Guide policy of plaintext submission to rhelp. Failing to adhere to that
rule is making your successive posting less and less readable.

>> ...
>>
>> but how to filter out this output (m) in order to remove all rows where I
>> have NA in any of these columns: V1.1,V1.2,V1.3,V1.4,V1.5

The complete.cases function returns a logical vector suitable for
selecting a subset.


--

David.

>>
>>
>>
>>
>>
>> On Tue, Dec 3, 2019 at 1:48 PM Ana Marija <[hidden email]>
>> wrote:
>>
>>> the desired output would look like this (example give just for two genes,
>>> it should include all 5 from all 5 data frames):
>>>
>>> where the example is if say only 5 rs are shared between those two genes,
>>> what is given after rs# is values from V4 column for each gene
>>>
>>> GENES ENSG00000001629 ENSG00000127914
>>> rs1208998 -0.0337989326337439  -0.00106024397995199
>>> rs4729008 0.0630831868839983  0.00890783698397027
>>> rs11772754 0.181375539335959  0.0012636115921931
>>> rs10257459 0.0369962603988132  0.00509887844657462
>>> rs17164876 0.0307882763321834  -0.00188979524322732
>>>
>>> On Tue, Dec 3, 2019 at 1:40 PM Ana Marija <[hidden email]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have 5 dataframes (s11,s22,s33,s44,s55) that look like this:
>>>>
>>>>> head(s11)
>>>>                 V1.1                          rs         V3.1        V4.1
>>>> 1 ENSG00000154803  rs12940868 3.80175e-05 -0.519565
>>>> 2 ENSG00000154803   rs4383187 8.92772e-05 -0.367303
>>>> 3 ENSG00000154803   rs4404112 9.32402e-05 -0.366634
>>>> 4 ENSG00000154803   rs7214091 8.38003e-05  0.337576
>>>> 5 ENSG00000154803  rs35871790 9.67028e-05 -0.305755
>>>> 6 ENSG00000154803 rs112532541 1.08341e-04 -0.305493
>>>>
>>>>> head(s22)
>>>>                 V1.2                               rs        V3.2
>>>>   V4.2
>>>> 602 ENSG00000264589  rs62065452 1.34475e-17 -0.695948
>>>> 603 ENSG00000264589 rs377004743 1.26272e-17 -0.695627
>>>> 630 ENSG00000264589   rs1724390 1.01129e-17 -0.693518
>>>> 643 ENSG00000264589 rs367637729 4.05726e-17 -0.682833
>>>> 653 ENSG00000264589 rs376183404 1.13177e-17 -0.697646
>>>> 673 ENSG00000264589 rs112327620 1.59840e-17 -0.707904
>>>>
>>>> Each one has one unique value in respective V1
>>>>
>>>> I am trying to merge all at once all 5 data frames by the "rs" column.
>>>>
>>>> Can you please help with this,
>>>> Ana
>>>>
>>>>
>>>>
>>>>
>>>>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.