Merge by Range in R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Merge by Range in R

R help mailing list-2
Hi, 
I have two big data set. 

data _1 : 
> dim(data_1)
[1] 15820 5

> head(data_1)
   Chromosome      Start        End        Feature GroupA_3
1:               chr1 521369  750000     chr1-0001        0.170
2:               chr1 750001  800000     chr1-0002       -0.086
3:               chr1 800001  850000     chr1-0003        0.006
4:               chr1 850001  900000     chr1-0004        0.050
5:               chr1 900001  950000     chr1-0005        0.062
6:               chr1 950001 1000000    chr1-0006       -0.016

data_2:
> dim(data_2)
[1] 470870 5

> head(data_2)
   Chromosome     Start   End            Feature     GroupA_3
1:               chr1 15864 15865     cg13869341            0.207
2:               chr1 18826 18827     cg14008030           -0.288
3:               chr1 29406 29407     cg12045430           -0.331
4:               chr1 29424 29425     cg20826792           -0.074
5:               chr1 29434 29435     cg00381604            0.141
6:               chr1 68848 68849     cg20253340           -0.458


What I want to do : 
Based on column name "Chromosome", "Start" and "End" of two data set ,   I want to find which row (preciously "Feature") of data_2 is in every range ( between "Start" and "End") of data_1 ? Also "Chromosome" column element should be match between two data set. 

I have tried "GenomicRanges" packages describe in the post  
https://stackoverflow.com/questions/11892241/merge-by-range-in-r-applying-loops
But i was not successful. Can any one please help me to do this fast, as the data is very big ? 
Thanks in advance.


Regards.............
Tanvir Ahamed Stockholm, Sweden     |  [hidden email]

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Merge by Range in R

jholtman
Have you tried 'foverlaps' in the data.table package?


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Sep 4, 2017 at 8:31 AM, Mohammad Tanvir Ahamed via R-help <
[hidden email]> wrote:

> Hi,
> I have two big data set.
>
> data _1 :
> > dim(data_1)
> [1] 15820 5
>
> > head(data_1)
>    Chromosome      Start        End        Feature GroupA_3
> 1:               chr1 521369  750000     chr1-0001        0.170
> 2:               chr1 750001  800000     chr1-0002       -0.086
> 3:               chr1 800001  850000     chr1-0003        0.006
> 4:               chr1 850001  900000     chr1-0004        0.050
> 5:               chr1 900001  950000     chr1-0005        0.062
> 6:               chr1 950001 1000000    chr1-0006       -0.016
>
> data_2:
> > dim(data_2)
> [1] 470870 5
>
> > head(data_2)
>    Chromosome     Start   End            Feature     GroupA_3
> 1:               chr1 15864 15865     cg13869341            0.207
> 2:               chr1 18826 18827     cg14008030           -0.288
> 3:               chr1 29406 29407     cg12045430           -0.331
> 4:               chr1 29424 29425     cg20826792           -0.074
> 5:               chr1 29434 29435     cg00381604            0.141
> 6:               chr1 68848 68849     cg20253340           -0.458
>
>
> What I want to do :
> Based on column name "Chromosome", "Start" and "End" of two data set ,   I
> want to find which row (preciously "Feature") of data_2 is in every range (
> between "Start" and "End") of data_1 ? Also "Chromosome" column element
> should be match between two data set.
>
> I have tried "GenomicRanges" packages describe in the post
> https://stackoverflow.com/questions/11892241/merge-by-
> range-in-r-applying-loops
> But i was not successful. Can any one please help me to do this fast, as
> the data is very big ?
> Thanks in advance.
>
>
> Regards.............
> Tanvir Ahamed Stockholm, Sweden     |  [hidden email]
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.