Quantcast

Re: keys that dont match

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: keys that dont match

Santosh Srinivas
Tried this now .. I guess using nomatch? ..
dt_ind_fin_landscape[dt_ind_dev_landscape, nomatch=NA]
And then filtering for NAs?
_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: keys that dont match

Steve Lianoglou-6
Hi Santosh,

On Wed, May 4, 2011 at 10:12 AM, sant <[hidden email]> wrote:
> Tried this now .. I guess using nomatch? ..
> dt_ind_fin_landscape[dt_ind_dev_landscape, nomatch=NA]
> And then filtering for NAs?

The context of your message seems to be lost, and I don't see it in
(my) mailing list archives.

Are you asking a question, or answering one?

If you're asking a question -- what is it? :-)

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: keys that dont match

Steve Lianoglou-6
Hi,

On Wed, May 4, 2011 at 12:23 PM, Santosh Srinivas
<[hidden email]> wrote:
> Hi Steve,
>
> Sorry ... strange problem .. Dont know why that happened.
>
> http://groups.google.com/group/datatable/browse_thread/thread/51a0387e95d37feb

It looks like your first email was sent to the @googlegroups.com
address (I didn't even know we had that setup), and the second one
came through the @lists.r-forge.r-project.

So (I guess) the first didn't come through because it was sent to the
wrong(?) list -- anyway, in the future you should send to the
@lists.r-forge... one.

> I had the question and my attempt to answer before someone says go read the
> manual :)

It looks like the answer you offered is reasonable, though.

In short -- the question was "How can I quickly tell which (keyed)
rows are in one data.table vs. another)".

As you mentioned, you can do this by joining using `[` -- in order to
do this easily, you could ensure that each data.table has a column
that isn't in the other.

For example, if you have data like so:


R> dt1 <- data.table(a=1:10, b=letters[1:10], key="a,b")
R> dt2 <- data.table(a=c(1, 3, 5, 10), b=letters[c(1, 3, 5, 10)], key="a,b")

Doing either `dt1[dt2]` or `dt2[dt1]` doesn't get you anywhere too
fast (especially if one is just a subset of the other (like dt2 is to
dt1):

R> dt1[dt2]
      a b
[1,]  1 a
[2,]  3 c
[3,]  5 e
[4,] 10 j

R> dt2[dt1]
       a b
 [1,]  1 a
 [2,]  2 b
 [3,]  3 c
 [4,]  4 d
 [5,]  5 e
 [6,]  6 f
 [7,]  7 g
 [8,]  8 h
 [9,]  9 i
[10,] 10 j

Adding some 'dummy' columns may help:

R> dt1$in.1 <- TRUE
R> dt2$in.2 <- TRUE

Then you can (easily) ask which rows are in dt1 that aren't in dt2:

R> dt2[dt1] ## nomatch=NA is the default
       a b in.2 in.1
 [1,]  1 a TRUE TRUE
 [2,]  2 b   NA TRUE
 [3,]  3 c TRUE TRUE
 [4,]  4 d   NA TRUE
 [5,]  5 e TRUE TRUE
 [6,]  6 f   NA TRUE
 [7,]  7 g   NA TRUE
 [8,]  8 h   NA TRUE
 [9,]  9 i   NA TRUE
[10,] 10 j TRUE TRUE

## or more email friendly format:
R> which(is.na(dt2[dt1]$in.2))
[1] 2 4 6 7 8 9

Which are the rows in dt1 that aren't in dt2

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: keys that dont match

Matthew Dowle
The original post from Santosh came through as a BCC. I guess
GoogleGroups did the BCC. Will need to do more investigation.

> Which are the rows in dt1 that aren't in dt2
Another option may be a 'not join'; e.g.,
  X[-X[Y,which=TRUE]]
or
  seq(1,nrow(X))[-X[Y,which=TRUE]]

Will add something to docs/wiki re 'not joins'.

Matthew


On Wed, 2011-05-04 at 13:00 -0400, Steve Lianoglou wrote:

> Hi,
>
> On Wed, May 4, 2011 at 12:23 PM, Santosh Srinivas
> <[hidden email]> wrote:
> > Hi Steve,
> >
> > Sorry ... strange problem .. Dont know why that happened.
> >
> > http://groups.google.com/group/datatable/browse_thread/thread/51a0387e95d37feb
>
> It looks like your first email was sent to the @googlegroups.com
> address (I didn't even know we had that setup), and the second one
> came through the @lists.r-forge.r-project.
>
> So (I guess) the first didn't come through because it was sent to the
> wrong(?) list -- anyway, in the future you should send to the
> @lists.r-forge... one.
>
> > I had the question and my attempt to answer before someone says go read the
> > manual :)
>
> It looks like the answer you offered is reasonable, though.
>
> In short -- the question was "How can I quickly tell which (keyed)
> rows are in one data.table vs. another)".
>
> As you mentioned, you can do this by joining using `[` -- in order to
> do this easily, you could ensure that each data.table has a column
> that isn't in the other.
>
> For example, if you have data like so:
>
>
> R> dt1 <- data.table(a=1:10, b=letters[1:10], key="a,b")
> R> dt2 <- data.table(a=c(1, 3, 5, 10), b=letters[c(1, 3, 5, 10)], key="a,b")
>
> Doing either `dt1[dt2]` or `dt2[dt1]` doesn't get you anywhere too
> fast (especially if one is just a subset of the other (like dt2 is to
> dt1):
>
> R> dt1[dt2]
>       a b
> [1,]  1 a
> [2,]  3 c
> [3,]  5 e
> [4,] 10 j
>
> R> dt2[dt1]
>        a b
>  [1,]  1 a
>  [2,]  2 b
>  [3,]  3 c
>  [4,]  4 d
>  [5,]  5 e
>  [6,]  6 f
>  [7,]  7 g
>  [8,]  8 h
>  [9,]  9 i
> [10,] 10 j
>
> Adding some 'dummy' columns may help:
>
> R> dt1$in.1 <- TRUE
> R> dt2$in.2 <- TRUE
>
> Then you can (easily) ask which rows are in dt1 that aren't in dt2:
>
> R> dt2[dt1] ## nomatch=NA is the default
>        a b in.2 in.1
>  [1,]  1 a TRUE TRUE
>  [2,]  2 b   NA TRUE
>  [3,]  3 c TRUE TRUE
>  [4,]  4 d   NA TRUE
>  [5,]  5 e TRUE TRUE
>  [6,]  6 f   NA TRUE
>  [7,]  7 g   NA TRUE
>  [8,]  8 h   NA TRUE
>  [9,]  9 i   NA TRUE
> [10,] 10 j TRUE TRUE
>
> ## or more email friendly format:
> R> which(is.na(dt2[dt1]$in.2))
> [1] 2 4 6 7 8 9
>
> Which are the rows in dt1 that aren't in dt2
>
> HTH,
> -steve
>


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: keys that dont match

Matthew Dowle
Read again and realised the 'not join' (although correct) solves a
different problem; i.e., finding the rows in dt2 that no row in dt1
matches to, not the rows in dt1 that don't match to any row in dt2. So,
second attempt ...

Create 2 tables Steve L created :

> dt1 = data.table(a=1:10,b=letters[1:10],key="a,b")
> dt2 = data.table(a=c(1,3,5,10),b=letters[c(1,3,5,10)],key="a,b")

then (without dummy columns needed, so faster and more convenient) :

> which(is.na(dt2[dt1,which=TRUE]))
[1] 2 4 6 7 8 9

Which are the rows in dt1 that aren't in dt2.

This works in this case because when a row in dt1 matches to a row in
(this) dt2, it only matches to one row in dt2 since this dt2's key is
unique, and, this join is using all the columns in dt2's key.

To robustly cope with multiple matches :

> which(is.na(dt2[dt1,which=TRUE,mult="first"]))
[1] 2 4 6 7 8 9

Testing that by creating a dup in dt2 :

> dt2 = data.table(a=c(1,3,5,5,10),b=letters[c(1,3,5,5,10)],key="a,b")
> dt2
      a b
[1,]  1 a
[2,]  3 c
[3,]  5 e
[4,]  5 e
[5,] 10 j
> which(is.na(dt2[dt1,which=TRUE]))  
[1]  2  4  7  8  9 10    # unintended result
> which(is.na(dt2[dt1,which=TRUE,mult="first"]))
[1] 2 4 6 7 8 9          # intended result (same as before)
>

Admittedly, that's ugly, but it should be fast and scale. Perhaps there
is scope for a new 'whichna' argument then that does that directly :

> dt2[dt1,whichna=TRUE]
[1] 2 4 6 7 8 9
>

and maybe a new 'not' argument :

> dt2 =
data.table(a=c(1,3,5,5,10,11),b=letters[c(1,3,5,5,10,11)],key="a,b")
> dt2
      a b
[1,]  1 a
[2,]  3 c
[3,]  5 e
[4,]  5 e
[5,] 10 j
[6,] 11 k   # extra row not in dt1
> dt2[-dt2[dt1,which=TRUE,nomatch=0]]    # not join
      a b
[1,] 11 k
> dt2[dt1,not=TRUE]   # more user friendly
      a b
[1,] 11 k

> seq(1,nrow(dt2))[-dt2[dt1,which=TRUE,nomatch=0]]  # which not
[1] 6
> dt2[dt1,not=TRUE,which=TRUE]   # more user friendly
[1] 6

Setting both not=TRUE and whichna=TRUE wouldn't make sense and would be
prevented.

Thoughts anyone?

Matthew


On Sat, 2011-05-07 at 08:03 +0100, Matthew Dowle wrote:

> The original post from Santosh came through as a BCC. I guess
> GoogleGroups did the BCC. Will need to do more investigation.
>
> > Which are the rows in dt1 that aren't in dt2
> Another option may be a 'not join'; e.g.,
>   X[-X[Y,which=TRUE]]
> or
>   seq(1,nrow(X))[-X[Y,which=TRUE]]
>
> Will add something to docs/wiki re 'not joins'.
>
> Matthew
>
>
> On Wed, 2011-05-04 at 13:00 -0400, Steve Lianoglou wrote:
> > Hi,
> >
> > On Wed, May 4, 2011 at 12:23 PM, Santosh Srinivas
> > <[hidden email]> wrote:
> > > Hi Steve,
> > >
> > > Sorry ... strange problem .. Dont know why that happened.
> > >
> > > http://groups.google.com/group/datatable/browse_thread/thread/51a0387e95d37feb
> >
> > It looks like your first email was sent to the @googlegroups.com
> > address (I didn't even know we had that setup), and the second one
> > came through the @lists.r-forge.r-project.
> >
> > So (I guess) the first didn't come through because it was sent to the
> > wrong(?) list -- anyway, in the future you should send to the
> > @lists.r-forge... one.
> >
> > > I had the question and my attempt to answer before someone says go read the
> > > manual :)
> >
> > It looks like the answer you offered is reasonable, though.
> >
> > In short -- the question was "How can I quickly tell which (keyed)
> > rows are in one data.table vs. another)".
> >
> > As you mentioned, you can do this by joining using `[` -- in order to
> > do this easily, you could ensure that each data.table has a column
> > that isn't in the other.
> >
> > For example, if you have data like so:
> >
> >
> > R> dt1 <- data.table(a=1:10, b=letters[1:10], key="a,b")
> > R> dt2 <- data.table(a=c(1, 3, 5, 10), b=letters[c(1, 3, 5, 10)], key="a,b")
> >
> > Doing either `dt1[dt2]` or `dt2[dt1]` doesn't get you anywhere too
> > fast (especially if one is just a subset of the other (like dt2 is to
> > dt1):
> >
> > R> dt1[dt2]
> >       a b
> > [1,]  1 a
> > [2,]  3 c
> > [3,]  5 e
> > [4,] 10 j
> >
> > R> dt2[dt1]
> >        a b
> >  [1,]  1 a
> >  [2,]  2 b
> >  [3,]  3 c
> >  [4,]  4 d
> >  [5,]  5 e
> >  [6,]  6 f
> >  [7,]  7 g
> >  [8,]  8 h
> >  [9,]  9 i
> > [10,] 10 j
> >
> > Adding some 'dummy' columns may help:
> >
> > R> dt1$in.1 <- TRUE
> > R> dt2$in.2 <- TRUE
> >
> > Then you can (easily) ask which rows are in dt1 that aren't in dt2:
> >
> > R> dt2[dt1] ## nomatch=NA is the default
> >        a b in.2 in.1
> >  [1,]  1 a TRUE TRUE
> >  [2,]  2 b   NA TRUE
> >  [3,]  3 c TRUE TRUE
> >  [4,]  4 d   NA TRUE
> >  [5,]  5 e TRUE TRUE
> >  [6,]  6 f   NA TRUE
> >  [7,]  7 g   NA TRUE
> >  [8,]  8 h   NA TRUE
> >  [9,]  9 i   NA TRUE
> > [10,] 10 j TRUE TRUE
> >
> > ## or more email friendly format:
> > R> which(is.na(dt2[dt1]$in.2))
> > [1] 2 4 6 7 8 9
> >
> > Which are the rows in dt1 that aren't in dt2
> >
> > HTH,
> > -steve
> >
>
>
> _______________________________________________
> datatable-help mailing list
> [hidden email]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Loading...