comparing two tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

comparing two tables

frymor
Hi everybody,

I would like to know whether it is possible to compare to tables for certain
parameters.
I have these two tables:
gene table
name     chr     start     end     str     accession     Length
gen1     4     646752     646838     +     MI0005806     86
gen12     2L     243035     243141     -     MI0005821     106
gen3     2L     159838     159928     +     MI0005813     90
gen7     2L     1831685     1831799     -     MI0011290     114
gen4     2L     2737568     2737661     +     MI0017696     93
...

localization table:
Chr     Start     End     length
4     136532     138654     2122
3     139870     141970     2100
2L     157838     158440     602
X     160834     162966     2132
4     204040     208536     4496
...

I would like to check whether a specific gene lie within a certain region.
For example I want to see if gene 3 on chromosome 2L lies within the region
given in the second table.

What I would like to is like
1. check if the gene lies on a specific chromosome
1.a if no - go to the next line
1.b if yes - go to 2
2. check if the start position of the gene is bigger than the start position
of the localization table AND if it smaller than the end position (if it
lies between the start and end positions in the localization table)
2.a if no - go to the next gene
2.b if yes - give it to me.

I was having difficulties doing it without running into three interleaved
conditional loops (if).

I would appreciate any help.

Thanks

Assa

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: comparing two tables

David Winsemius

On Oct 25, 2011, at 6:42 AM, Assa Yeroslaviz wrote:

> Hi everybody,
>
> I would like to know whether it is possible to compare to tables for  
> certain
> parameters.
> I have these two tables:
> gene table
> name     chr     start     end     str     accession     Length
> gen1     4     646752     646838     +     MI0005806     86
> gen12     2L     243035     243141     -     MI0005821     106
> gen3     2L     159838     159928     +     MI0005813     90
> gen7     2L     1831685     1831799     -     MI0011290     114
> gen4     2L     2737568     2737661     +     MI0017696     93
> ...
>
> localization table:
> Chr     Start     End     length
> 4     136532     138654     2122
> 3     139870     141970     2100
> 2L     157838     158440     602
> X     160834     162966     2132
> 4     204040     208536     4496
> ...
>
> I would like to check whether a specific gene lie within a certain  
> region.
> For example I want to see if gene 3 on chromosome 2L lies within the  
> region
> given in the second table.
>

rd.txt <- function(txt, header=TRUE, ...) {
      rd <- read.table(textConnection(txt), header=header, ...)
        closeAllConnections()
      rd }
# Data input
  genetable <- rd.txt("name     chr     start     end     str      
accession     Length
  gen1     4     646752     646838     +     MI0005806     86
  gen12     2L     243035     243141     -     MI0005821     106
  gen3     2L     159838     159928     +     MI0005813     90
  gen7     2L     1831685     1831799     -     MI0011290     114
  gen4     2L     2737568     2737661     +     MI0017696     93")
  loctable <- rd.txt("Chr     Start     End     length
  4     136532     138654     2122
  3     139870     141970     2100
  2L     157838     158440     602
  X     160834     162966     2132
  4     204040     208536     4496")

# Helper function
  inregion <- function(vec, locs) {
         any( apply(locs, 1, function(x) vec["start"]>x[1] &  
vec["end"]<=x[2])) }
# Test the function
  inregion(genetable[2, ], loctable[, c("Start", "End")])
# [1] FALSE

  apply(genetable, 1, function(x) inregion(x, loctable[, c("Start",  
"End")]) )
#[1] FALSE FALSE FALSE FALSE FALSE

The logical vector can be used to extract elements from genetable, but  
seems pointless to offer code that produces an empty dataframe.

(Wouldn't it have been more sensible to offer a test case that had a  
combination that satisfied you requirements?)

I'm guessing that this facility would already be implemented in one or  
more  BioConductor functions.

--
David.

> What I would like to is like
> 1. check if the gene lies on a specific chromosome
> 1.a if no - go to the next line
> 1.b if yes - go to 2
> 2. check if the start position of the gene is bigger than the start  
> position
> of the localization table AND if it smaller than the end position  
> (if it
> lies between the start and end positions in the localization table)
> 2.a if no - go to the next gene
> 2.b if yes - give it to me.
>
> I was having difficulties doing it without running into three  
> interleaved
> conditional loops (if).
>
> I would appreciate any help.
>
> Thanks
>
> Assa
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: comparing two tables

David Winsemius

I (now) see that you crossposted rhelp and bioc. That practice is  
deprecated. Please read the Posting Guide more thoroughly. I will need  
to bear the burden of my sin in not looking at headers more closely in  
my own.

--
David.


On Oct 25, 2011, at 9:27 AM, David Winsemius wrote:

>
> On Oct 25, 2011, at 6:42 AM, Assa Yeroslaviz wrote:
>
>> Hi everybody,
>>
>> I would like to know whether it is possible to compare to tables  
>> for certain
>> parameters.
>> I have these two tables:
>> gene table
>> name     chr     start     end     str     accession     Length
>> gen1     4     646752     646838     +     MI0005806     86
>> gen12     2L     243035     243141     -     MI0005821     106

snipped
--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

Martin Morgan
In reply to this post by frymor
On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote:

> Hi everybody,
>
> I would like to know whether it is possible to compare to tables for certain
> parameters.
> I have these two tables:
> gene table
> name     chr     start     end     str     accession     Length
> gen1     4     646752     646838     +     MI0005806     86
> gen12     2L     243035     243141     -     MI0005821     106
> gen3     2L     159838     159928     +     MI0005813     90
> gen7     2L     1831685     1831799     -     MI0011290     114
> gen4     2L     2737568     2737661     +     MI0017696     93
> ...
>
> localization table:
> Chr     Start     End     length
> 4     136532     138654     2122
> 3     139870     141970     2100
> 2L     157838     158440     602
> X     160834     162966     2132
> 4     204040     208536     4496
> ...
>
> I would like to check whether a specific gene lie within a certain region.
> For example I want to see if gene 3 on chromosome 2L lies within the region
> given in the second table.

Hi Assa --

In Bioconductor, use the GenomicRanges package. Create two GRanges objects

   genes = with(genetable, GRanges(chr, IRanges(start, end), str,
                                   accession=accession, Length=length)
   locations = with(locationtable, GRanges(Chr, IRanges(Start, End)))

then

   olaps = findOverlaps(genes, locations)

queryHits(olaps) and subjectHits(olaps) index each gene with all
locations it overlaps. The definition of 'overlap' is flexible, see
?findOverlaps.

Martin


>
> What I would like to is like
> 1. check if the gene lies on a specific chromosome
> 1.a if no - go to the next line
> 1.b if yes - go to 2
> 2. check if the start position of the gene is bigger than the start position
> of the localization table AND if it smaller than the end position (if it
> lies between the start and end positions in the localization table)
> 2.a if no - go to the next gene
> 2.b if yes - give it to me.
>
> I was having difficulties doing it without running into three interleaved
> conditional loops (if).
>
> I would appreciate any help.
>
> Thanks
>
> Assa
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

frymor
Hi all,

@Martin - thanks for the help it works very good.

@David - sorry for the misunderstanding. I will see to it, that it won't
happen again.
BTW, unfortunately your function is not working.
It is patialy my error as I gave no regions with overlaps, but even after
changing them it just doesn't fit.

Here is the new data with an overlap in the third gene:

genetable <- rd.txt("name     chr     start     end     str
accession     Length

 gen1     4     646752     646838     +     MI0005806     86
 gen12     2L     243035     243141     -     MI0005821     106
 gen3     2L     159838     159928     +     MI0005813     90
 gen7     2L     1831685     1831799     -     MI0011290     114
 gen4     2L     2737568     2737661     +     MI0017696     93")
 loctable <- rd.txt("Chr     Start     End     length

 4     136532     138654     2122
 3     139870     141970     2100
 2L     157838     160440     2602
 X     160834     162966     2132
 4     204040     208536     4496")

But I still get:
>  apply(genetable, 1, function(x) inregion(x, loctable[, c("Start",
"End")]) )
[1] FALSE FALSE FALSE FALSE FALSE

for the single queries I get TRUE:

>  inregion(genetable[3, ], loctable[, c("Start", "End")])
[1] TRUE

Do you have Idea, as to how I can fix this problem?

Thanks and again sorry for the trouble.

Assa

On Tue, Oct 25, 2011 at 15:48, Martin Morgan <[hidden email]> wrote:

> On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote:
>
>> Hi everybody,
>>
>> I would like to know whether it is possible to compare to tables for
>> certain
>> parameters.
>> I have these two tables:
>> gene table
>> name     chr     start     end     str     accession     Length
>> gen1     4     646752     646838     +     MI0005806     86
>> gen12     2L     243035     243141     -     MI0005821     106
>> gen3     2L     159838     159928     +     MI0005813     90
>> gen7     2L     1831685     1831799     -     MI0011290     114
>> gen4     2L     2737568     2737661     +     MI0017696     93
>> ...
>>
>> localization table:
>> Chr     Start     End     length
>> 4     136532     138654     2122
>> 3     139870     141970     2100
>> 2L     157838     158440     602
>> X     160834     162966     2132
>> 4     204040     208536     4496
>> ...
>>
>> I would like to check whether a specific gene lie within a certain region.
>> For example I want to see if gene 3 on chromosome 2L lies within the
>> region
>> given in the second table.
>>
>
> Hi Assa --
>
> In Bioconductor, use the GenomicRanges package. Create two GRanges objects
>
>  genes = with(genetable, GRanges(chr, IRanges(start, end), str,
>                                  accession=accession, Length=length)
>  locations = with(locationtable, GRanges(Chr, IRanges(Start, End)))
>
> then
>
>  olaps = findOverlaps(genes, locations)
>
> queryHits(olaps) and subjectHits(olaps) index each gene with all locations
> it overlaps. The definition of 'overlap' is flexible, see ?findOverlaps.
>
> Martin
>
>
>
>> What I would like to is like
>> 1. check if the gene lies on a specific chromosome
>> 1.a if no - go to the next line
>> 1.b if yes - go to 2
>> 2. check if the start position of the gene is bigger than the start
>> position
>> of the localization table AND if it smaller than the end position (if it
>> lies between the start and end positions in the localization table)
>> 2.a if no - go to the next gene
>> 2.b if yes - give it to me.
>>
>> I was having difficulties doing it without running into three interleaved
>> conditional loops (if).
>>
>> I would appreciate any help.
>>
>> Thanks
>>
>> Assa
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

David Winsemius

On Oct 25, 2011, at 10:40 AM, Assa Yeroslaviz wrote:

> Hi all,
>
> @Martin - thanks for the help it works very good.
>
> @David - sorry for the misunderstanding. I will see to it, that it  
> won't
> happen again.
> BTW, unfortunately your function is not working.
> It is patialy my error as I gave no regions with overlaps, but even  
> after
> changing them it just doesn't fit.
>
> Here is the new data with an overlap in the third gene:
>
> genetable <- rd.txt("name     chr     start     end     str
> accession     Length
>
> gen1     4     646752     646838     +     MI0005806     86
> gen12     2L     243035     243141     -     MI0005821     106
> gen3     2L     159838     159928     +     MI0005813     90
> gen7     2L     1831685     1831799     -     MI0011290     114
> gen4     2L     2737568     2737661     +     MI0017696     93")
> loctable <- rd.txt("Chr     Start     End     length
>
> 4     136532     138654     2122
> 3     139870     141970     2100
> 2L     157838     160440     2602
> X     160834     162966     2132
> 4     204040     208536     4496")
>
> But I still get:
>> apply(genetable, 1, function(x) inregion(x, loctable[, c("Start",
> "End")]) )
> [1] FALSE FALSE FALSE FALSE FALSE

You just want to pass the start and end columns of genetable

 > # Helper function
 > inregion <- function(vec, locs) {
+        any( apply(locs, 1, function(x) vec["start"]>x[1] &  
vec["end"]<=x[2])) }
 > # Test the function
 > inregion(genetable[2, ], loctable[, c("Start", "End")])
[1] FALSE
 > # [1] FALSE
 >
 > apply(genetable[, 3:4], 1, function(x) inregion(x, loctable[,  
c("Start", "End")]) )
[1] FALSE FALSE  TRUE FALSE FALSE

( I really wish that you would stop crossposting. I am only following  
your bad practice because you posted my code on BioC.)

--
David

>
> for the single queries I get TRUE:
>
>> inregion(genetable[3, ], loctable[, c("Start", "End")])
> [1] TRUE
>
> Do you have Idea, as to how I can fix this problem?
>
> Thanks and again sorry for the trouble.
>
> Assa
>
> On Tue, Oct 25, 2011 at 15:48, Martin Morgan <[hidden email]>  
> wrote:
>
>> On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote:
>>
>>> Hi everybody,
>>>
>>> I would like to know whether it is possible to compare to tables for
>>> certain
>>> parameters.
>>> I have these two tables:
>>> gene table
>>> name     chr     start     end     str     accession     Length
>>> gen1     4     646752     646838     +     MI0005806     86
>>> gen12     2L     243035     243141     -     MI0005821     106
>>> gen3     2L     159838     159928     +     MI0005813     90
>>> gen7     2L     1831685     1831799     -     MI0011290     114
>>> gen4     2L     2737568     2737661     +     MI0017696     93
>>> ...
>>>
>>> localization table:
>>> Chr     Start     End     length
>>> 4     136532     138654     2122
>>> 3     139870     141970     2100
>>> 2L     157838     158440     602
>>> X     160834     162966     2132
>>> 4     204040     208536     4496
>>> ...
>>>
>>> I would like to check whether a specific gene lie within a certain  
>>> region.
>>> For example I want to see if gene 3 on chromosome 2L lies within the
>>> region
>>> given in the second table.
>>>
>>
>> Hi Assa --
>>
>> In Bioconductor, use the GenomicRanges package. Create two GRanges  
>> objects
>>
>> genes = with(genetable, GRanges(chr, IRanges(start, end), str,
>>                                 accession=accession, Length=length)
>> locations = with(locationtable, GRanges(Chr, IRanges(Start, End)))
>>
>> then
>>
>> olaps = findOverlaps(genes, locations)
>>
>> queryHits(olaps) and subjectHits(olaps) index each gene with all  
>> locations
>> it overlaps. The definition of 'overlap' is flexible, see ?
>> findOverlaps.
>>
>> Martin
>>
>>
>>
>>> What I would like to is like
>>> 1. check if the gene lies on a specific chromosome
>>> 1.a if no - go to the next line
>>> 1.b if yes - go to 2
>>> 2. check if the start position of the gene is bigger than the start
>>> position
>>> of the localization table AND if it smaller than the end position  
>>> (if it
>>> lies between the start and end positions in the localization table)
>>> 2.a if no - go to the next gene
>>> 2.b if yes - give it to me.
>>>
>>> I was having difficulties doing it without running into three  
>>> interleaved
>>> conditional loops (if).
>>>
>>> I would appreciate any help.
>>>
>>> Thanks
>>>
>>> Assa
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor 
>>> >
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor 
>>> >
>>>
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

frymor
Hi David,

your function works just fine if I take nly the region into account. But
unfortunately it does not consider the first column of the chromosomes.
There can be an overlap between the two tables only if the regions are on
the same chromosome. This is why the first column of both tables is a
prerequisite for the analysis.

I treid somehow to create a second argument to consider this, but until now
without success.

If you have any Ideas I will be grateful.

Thanks
Assa

(I send it only to r-help, as iti si besically an R-question and not
specific to bioconductor, but I still think it is also something to do with
bioc as it deals with chromosome regions. But anyway, I think you were right
about it.)

On Tue, Oct 25, 2011 at 18:01, David Winsemius <[hidden email]>wrote:

>
> On Oct 25, 2011, at 10:40 AM, Assa Yeroslaviz wrote:
>
>  Hi all,
>>
>> @Martin - thanks for the help it works very good.
>>
>> @David - sorry for the misunderstanding. I will see to it, that it won't
>> happen again.
>> BTW, unfortunately your function is not working.
>> It is patialy my error as I gave no regions with overlaps, but even after
>> changing them it just doesn't fit.
>>
>> Here is the new data with an overlap in the third gene:
>>
>> genetable <- rd.txt("name     chr     start     end     str
>> accession     Length
>>
>> gen1     4     646752     646838     +     MI0005806     86
>> gen12     2L     243035     243141     -     MI0005821     106
>> gen3     2L     159838     159928     +     MI0005813     90
>> gen7     2L     1831685     1831799     -     MI0011290     114
>> gen4     2L     2737568     2737661     +     MI0017696     93")
>> loctable <- rd.txt("Chr     Start     End     length
>>
>> 4     136532     138654     2122
>> 3     139870     141970     2100
>> 2L     157838     160440     2602
>> X     160834     162966     2132
>> 4     204040     208536     4496")
>>
>> But I still get:
>>
>>> apply(genetable, 1, function(x) inregion(x, loctable[, c("Start",
>>>
>> "End")]) )
>> [1] FALSE FALSE FALSE FALSE FALSE
>>
>
> You just want to pass the start and end columns of genetable
>
>
> > # Helper function
> > inregion <- function(vec, locs) {
> +        any( apply(locs, 1, function(x) vec["start"]>x[1] &
> vec["end"]<=x[2])) }
> > # Test the function
> > inregion(genetable[2, ], loctable[, c("Start", "End")])
> [1] FALSE
> > # [1] FALSE
> >
> > apply(genetable[, 3:4], 1, function(x) inregion(x, loctable[, c("Start",
> "End")]) )
> [1] FALSE FALSE  TRUE FALSE FALSE
>
> ( I really wish that you would stop crossposting. I am only following your
> bad practice because you posted my code on BioC.)
>
> --
> David
>
>>
>> for the single queries I get TRUE:
>>
>>  inregion(genetable[3, ], loctable[, c("Start", "End")])
>>>
>> [1] TRUE
>>
>> Do you have Idea, as to how I can fix this problem?
>>
>> Thanks and again sorry for the trouble.
>>
>> Assa
>>
>> On Tue, Oct 25, 2011 at 15:48, Martin Morgan <[hidden email]> wrote:
>>
>>  On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote:
>>>
>>>  Hi everybody,
>>>>
>>>> I would like to know whether it is possible to compare to tables for
>>>> certain
>>>> parameters.
>>>> I have these two tables:
>>>> gene table
>>>> name     chr     start     end     str     accession     Length
>>>> gen1     4     646752     646838     +     MI0005806     86
>>>> gen12     2L     243035     243141     -     MI0005821     106
>>>> gen3     2L     159838     159928     +     MI0005813     90
>>>> gen7     2L     1831685     1831799     -     MI0011290     114
>>>> gen4     2L     2737568     2737661     +     MI0017696     93
>>>> ...
>>>>
>>>> localization table:
>>>> Chr     Start     End     length
>>>> 4     136532     138654     2122
>>>> 3     139870     141970     2100
>>>> 2L     157838     158440     602
>>>> X     160834     162966     2132
>>>> 4     204040     208536     4496
>>>> ...
>>>>
>>>> I would like to check whether a specific gene lie within a certain
>>>> region.
>>>> For example I want to see if gene 3 on chromosome 2L lies within the
>>>> region
>>>> given in the second table.
>>>>
>>>>
>>> Hi Assa --
>>>
>>> In Bioconductor, use the GenomicRanges package. Create two GRanges
>>> objects
>>>
>>> genes = with(genetable, GRanges(chr, IRanges(start, end), str,
>>>                                accession=accession, Length=length)
>>> locations = with(locationtable, GRanges(Chr, IRanges(Start, End)))
>>>
>>> then
>>>
>>> olaps = findOverlaps(genes, locations)
>>>
>>> queryHits(olaps) and subjectHits(olaps) index each gene with all
>>> locations
>>> it overlaps. The definition of 'overlap' is flexible, see ?findOverlaps.
>>>
>>> Martin
>>>
>>>
>>>
>>>  What I would like to is like
>>>> 1. check if the gene lies on a specific chromosome
>>>> 1.a if no - go to the next line
>>>> 1.b if yes - go to 2
>>>> 2. check if the start position of the gene is bigger than the start
>>>> position
>>>> of the localization table AND if it smaller than the end position (if it
>>>> lies between the start and end positions in the localization table)
>>>> 2.a if no - go to the next gene
>>>> 2.b if yes - give it to me.
>>>>
>>>> I was having difficulties doing it without running into three
>>>> interleaved
>>>> conditional loops (if).
>>>>
>>>> I would appreciate any help.
>>>>
>>>> Thanks
>>>>
>>>> Assa
>>>>
>>>>      [[alternative HTML version deleted]]
>>>>
>>>> ______________________________****_________________
>>>> Bioconductor mailing list
>>>> [hidden email]
>>>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https://stat.ethz.ch/mailman/**listinfo/bioconductor>
>>>> <https:/**/stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> >
>>>> Search the archives: http://news.gmane.org/gmane.**
>>>> science.biology.informatics.****conductor<http://news.gmane.**
>>>> org/gmane.science.biology.**informatics.conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>> >
>>>>
>>>>
>>>
>>> --
>>> Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: M1-B861
>>> Telephone: 206 667-2793
>>>
>>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________**________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

Steve Lianoglou-6
Hi,

On Wed, Oct 26, 2011 at 8:17 AM, Assa Yeroslaviz <[hidden email]> wrote:

> Hi David,
>
> your function works just fine if I take nly the region into account. But
> unfortunately it does not consider the first column of the chromosomes.
> There can be an overlap between the two tables only if the regions are on
> the same chromosome. This is why the first column of both tables is a
> prerequisite for the analysis.
>
> I treid somehow to create a second argument to consider this, but until now
> without success.

Well, bioconductor has packages to deal with this type of data, and
these type of queries (overlaps) very efficiently.

Martin Morgan had sent you an email earlier explaining how you can use
the GenomicRanges packages to get what you're after ... I (highly)
suggest you go that route.

HTH,

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

frymor
Thanks Steve,

I already did it and it went perfectly well.

I was just trying to understand the functions David wrote, so that I can use
them maybe for other queries.
Unfortunately I wasn't able to add a condition for the fact that there is a
third parameter to be compared.

I would still ove to know whether there is a way of adding such a perameter.

I tried to do it with a third argument in this line:       any( apply(locs,
1, function(x){vec["start"]>x[2] & vec["start"]<=x[3] & *
as.character(vec["chr"])==as.character(x["chr"]*)
but it doesn't seems to work at all.

Thanks for the help anyway
Assa

On Wed, Oct 26, 2011 at 15:33, Steve Lianoglou <
[hidden email]> wrote:

> Hi,
>
> On Wed, Oct 26, 2011 at 8:17 AM, Assa Yeroslaviz <[hidden email]> wrote:
> > Hi David,
> >
> > your function works just fine if I take nly the region into account. But
> > unfortunately it does not consider the first column of the chromosomes.
> > There can be an overlap between the two tables only if the regions are on
> > the same chromosome. This is why the first column of both tables is a
> > prerequisite for the analysis.
> >
> > I treid somehow to create a second argument to consider this, but until
> now
> > without success.
>
> Well, bioconductor has packages to deal with this type of data, and
> these type of queries (overlaps) very efficiently.
>
> Martin Morgan had sent you an email earlier explaining how you can use
> the GenomicRanges packages to get what you're after ... I (highly)
> suggest you go that route.
>
> HTH,
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

Steve Lianoglou-6
Hi Assa,


On Wed, Oct 26, 2011 at 9:44 AM, Assa Yeroslaviz <[hidden email]> wrote:

> Thanks Steve,
>
> I already did it and it went perfectly well.
>
> I was just trying to understand the functions David wrote, so that I can use
> them maybe for other queries.
> Unfortunately I wasn't able to add a condition for the fact that there is a
> third parameter to be compared.
>
> I would still ove to know whether there is a way of adding such a perameter.

Sorry, I didn't realize you were after some personal "R study"

> I tried to do it with a third argument in this line:       any( apply(locs,
> 1, function(x){vec["start"]>x[2] & vec["start"]<=x[3] &
> as.character(vec["chr"])==as.character(x["chr"])
> but it doesn't seems to work at all.

You have to change the "table" you are sending to the second param of
your "inregion" function.

currently you are sending into the `locs` parameter a two column table
that just has c("Start", "End"), eg:

R> Think about inregion(genetable[2, ], loctable[, c("Start", "End")])

Look at what `loctable[, c("Start", "End")]` gives you

It looks like your change to inregion should work once you pass in the
"Chr" column from your loctable (barring case-sensitive issues (you
have 'chr' and "Chr" in your separate tables), eg use your modified
inregion function and call it like so:

R> inregion(genetable[2, ], loctable[, c("Chr", "Start", "End")])

modulo this or that.


--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

tomkina
Hello,

I have the similar task.  I have two tables and I need to get the third table containing data from both of them with extra column with information of what data from which table:

table1
chr pos ref alt
chr1 5 A G
chr1 8 T C
chr2 2 C T

table2
chr pos ref alt
chr1 5 A G
chr1 7 T C
chr1 8 T A

resulted table
chr pos ref alt info
chr1 5 A G 1, 4
chr1 7 T C 4
chr1 8 T C 1
chr1 8 T A 4

I need all 4 columns (chr, pos, ref and alt) to be compared. I didn't find this function in Bioconductor. I am a beginner at R and would appreciate any help.

Thanks,
Tamara


Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

arun kirshna


Assuming that you wanted to label '1' for table1 and '4' for table2 (info column). 

Also, not sure why chr2 row is not in the resulted table.

dat1<- read.table(text="
chr    pos    ref    alt
chr1    5    A    G
chr1    8    T    C
chr2    2    C    T
",sep="",header=TRUE,stringsAsFactors=FALSE)

dat2<-read.table(text="
chr    pos    ref    alt
chr1    5    A    G
chr1    7    T    C
chr1    8    T    A
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat1$info<- 1
 dat2$info<-4
 dat3New<-with(dat3,aggregate(info,list(chr,pos,ref,alt),FUN=function(x) x))
 colnames(dat3New)<- colnames(dat1)
dat3New1<-dat3New[order(dat3New$chr,dat3New$pos),]
 row.names(dat3New1)<-1:nrow(dat3New1)
 dat3New1
#   chr pos ref alt info
#1 chr1   5   A   G 1, 4
#2 chr1   7   T   C    4
#3 chr1   8   T   A    4
#4 chr1   8   T   C    1
#5 chr2   2   C   T    1

#or
library(plyr)
res<-ddply(merge(dat1,dat2,all=TRUE),.(chr,pos,ref,alt),summarize,info=list(info))
res
#   chr pos ref alt info
#1 chr1   5   A   G 1, 4
#2 chr1   7   T   C    4
#3 chr1   8   T   A    4
#4 chr1   8   T   C    1
#5 chr2   2   C   T    1
names(dat3New1$info)<-NULL
 identical(dat3New1,res)
#[1] TRUE

A.K.

----- Original Message -----
From: tomkina <[hidden email]>
To: [hidden email]
Cc:
Sent: Thursday, May 30, 2013 4:45 AM
Subject: Re: [R] [BioC] comparing two tables

Hello,

I have the similar task.  I have two tables and I need to get the third
table containing data from both of them with extra column with information
of what data from which table:

table1           
chr    pos    ref    alt
chr1    5    A    G
chr1    8    T    C
chr2    2    C    T

table2           
chr    pos    ref    alt
chr1    5    A    G
chr1    7    T    C
chr1    8    T    A

resulted table
chr    pos    ref    alt    info
chr1    5    A    G    1, 4
chr1    7    T    C    4
chr1    8    T    C    1
chr1    8    T    A    4

I need all 4 columns (chr, pos, ref and alt) to be compared. I didn't find
this function in Bioconductor. I am a beginner at R and would appreciate any
help.

Thanks,
Tamara






--
View this message in context: http://r.789695.n4.nabble.com/comparing-two-tables-tp3936306p4668272.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

arun kirshna
Hi Tamara,
No problem.
 dat3<- rbind(dat1,dat2)  #Sorry, forgot this line.
A.K.








________________________________
From: Tamara Simakova <[hidden email]>
To: arun <[hidden email]>
Sent: Thursday, May 30, 2013 12:26 PM
Subject: Re: [R] [BioC] comparing two tables



Hello Arun,

Thanks very much for help. Indeed there is a mistake in the resulted table, it should be exactly as in your example. When I use
dat3New<-with(dat3,aggregate(info,list(chr,pos,ref,alt),FUN=function(x) x))
 colnames(dat3New)<- colnames(dat1)
the R returns "dat3 is not found", but with plyr library everything works well.

Thank again,
Tamara 





2013/5/30 arun <[hidden email]>


>
>Assuming that you wanted to label '1' for table1 and '4' for table2 (info column). 
>
>Also, not sure why chr2 row is not in the resulted table.
>
>dat1<- read.table(text="
>
>chr    pos    ref    alt
>chr1    5    A    G
>chr1    8    T    C
>chr2    2    C    T
>",sep="",header=TRUE,stringsAsFactors=FALSE)
>
>dat2<-read.table(text="
>
>chr    pos    ref    alt
>chr1    5    A    G
>chr1    7    T    C
>chr1    8    T    A
>",sep="",header=TRUE,stringsAsFactors=FALSE)
>dat1$info<- 1
> dat2$info<-4
> dat3New<-with(dat3,aggregate(info,list(chr,pos,ref,alt),FUN=function(x) x))
> colnames(dat3New)<- colnames(dat1)
>dat3New1<-dat3New[order(dat3New$chr,dat3New$pos),]
> row.names(dat3New1)<-1:nrow(dat3New1)
> dat3New1
>#   chr pos ref alt info
>#1 chr1   5   A   G 1, 4
>#2 chr1   7   T   C    4
>#3 chr1   8   T   A    4
>#4 chr1   8   T   C    1
>#5 chr2   2   C   T    1
>
>#or
>library(plyr)
>res<-ddply(merge(dat1,dat2,all=TRUE),.(chr,pos,ref,alt),summarize,info=list(info))
>res
>#   chr pos ref alt info
>#1 chr1   5   A   G 1, 4
>#2 chr1   7   T   C    4
>#3 chr1   8   T   A    4
>#4 chr1   8   T   C    1
>#5 chr2   2   C   T    1
>names(dat3New1$info)<-NULL
> identical(dat3New1,res)
>#[1] TRUE
>
>A.K.
>
>
>----- Original Message -----
>From: tomkina <[hidden email]>
>To: [hidden email]
>Cc:
>Sent: Thursday, May 30, 2013 4:45 AM
>Subject: Re: [R] [BioC] comparing two tables
>
>Hello,
>
>I have the similar task.  I have two tables and I need to get the third
>table containing data from both of them with extra column with information
>of what data from which table:
>
>table1           
>chr    pos    ref    alt
>chr1    5    A    G
>chr1    8    T    C
>chr2    2    C    T
>
>table2           
>chr    pos    ref    alt
>chr1    5    A    G
>chr1    7    T    C
>chr1    8    T    A
>
>resulted table
>chr    pos    ref    alt    info
>chr1    5    A    G    1, 4
>chr1    7    T    C    4
>chr1    8    T    C    1
>chr1    8    T    A    4
>
>I need all 4 columns (chr, pos, ref and alt) to be compared. I didn't find
>this function in Bioconductor. I am a beginner at R and would appreciate any
>help.
>
>Thanks,
>Tamara
>
>
>
>
>
>
>--
>View this message in context: http://r.789695.n4.nabble.com/comparing-two-tables-tp3936306p4668272.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [BioC] comparing two tables

tomkina
In reply to this post by tomkina
UPD: there should be 2 instead of 4 in the resulted table and chr2 line should be included.

The problem has been solved with plyr library.

Thanks