Quantcast

unique

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

unique

Damian Betebenner-2

All,

 

I have a simple question regarding the behavior of unique with more than 1 key. Consider the example:

 

> dt <- data.table(X=c(NA,1,2,3), Y=c(NA,2,1,3))

> dt

      X  Y

[1,] NA NA

[2,]  1  2

[3,]  2  1

[4,]  3  3

> key(dt) <- c("X", "Y")

> unique(dt)

      X  Y

[1,] NA NA

[2,]  2  1

[3,]  3  3

 

If I understand this correctly, unique see rows 2 and 3 of dt as the same.

 

Is this the behavior one wants?

 

Thanks for any clarification.

 

Damian

 

 


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: unique

Matthew Dowle
The NA in the key seems to be the issue. duplicated.data.table calls
diff on the columns and assumes no NA in keys.
Can't remember - did we decide to disallow NA in keys? There were other
issues with NA in keys and some threads in the past.
Do you need the key to contain NA?
Matthew


On Sat, 2011-06-25 at 16:23 -0500, Damian Betebenner wrote:

> All,
>
>  
>
> I have a simple question regarding the behavior of unique with more
> than 1 key. Consider the example:
>
>  
>
> > dt <- data.table(X=c(NA,1,2,3), Y=c(NA,2,1,3))
>
> > dt
>
>       X  Y
>
> [1,] NA NA
>
> [2,]  1  2
>
> [3,]  2  1
>
> [4,]  3  3
>
> > key(dt) <- c("X", "Y")
>
> > unique(dt)
>
>       X  Y
>
> [1,] NA NA
>
> [2,]  2  1
>
> [3,]  3  3
>
>  
>
> If I understand this correctly, unique see rows 2 and 3 of dt as the
> same.
>
>  
>
> Is this the behavior one wants?
>
>  
>
> Thanks for any clarification.
>
>  
>
> Damian
>
>  
>
>  
>
>
> _______________________________________________
> datatable-help mailing list
> [hidden email]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: unique

Damian Betebenner-2
Thanks Matthew

I was just trying something and saw this.

I can get what I want using unique with a data.frame. It gives the result one would expect.

This did make me somewhat scared that using a "by" statement with a data.table on multiple key might skip some unique combinations.

Damian



Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH   03821-0351
 
Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910

[hidden email]
www.nciea.org




-----Original Message-----
From: Matthew Dowle [mailto:[hidden email]] On Behalf Of Matthew Dowle
Sent: Saturday, June 25, 2011 8:27 PM
To: Damian Betebenner
Cc: [hidden email]
Subject: Re: [datatable-help] unique

The NA in the key seems to be the issue. duplicated.data.table calls
diff on the columns and assumes no NA in keys.
Can't remember - did we decide to disallow NA in keys? There were other
issues with NA in keys and some threads in the past.
Do you need the key to contain NA?
Matthew


On Sat, 2011-06-25 at 16:23 -0500, Damian Betebenner wrote:

> All,
>
>  
>
> I have a simple question regarding the behavior of unique with more
> than 1 key. Consider the example:
>
>  
>
> > dt <- data.table(X=c(NA,1,2,3), Y=c(NA,2,1,3))
>
> > dt
>
>       X  Y
>
> [1,] NA NA
>
> [2,]  1  2
>
> [3,]  2  1
>
> [4,]  3  3
>
> > key(dt) <- c("X", "Y")
>
> > unique(dt)
>
>       X  Y
>
> [1,] NA NA
>
> [2,]  2  1
>
> [3,]  3  3
>
>  
>
> If I understand this correctly, unique see rows 2 and 3 of dt as the
> same.
>
>  
>
> Is this the behavior one wants?
>
>  
>
> Thanks for any clarification.
>
>  
>
> Damian
>
>  
>
>  
>
>
> _______________________________________________
> datatable-help mailing list
> [hidden email]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: unique

Matthew Dowle

Fixed, thanks.

o    unique.data.table now calls the same internal code
     (in C) that grouping calls. This fixes a bug with
     NA in keys when unique is called directly and the key
     contains NA (which may be quite rare). Thanks to
     Damian Betebenner for bug report. unique should also
     now be faster.


On Sat, 2011-06-25 at 19:34 -0500, Damian Betebenner wrote:

> Thanks Matthew
>
> I was just trying something and saw this.
>
> I can get what I want using unique with a data.frame. It gives the result one would expect.
>
> This did make me somewhat scared that using a "by" statement with a data.table on multiple key might skip some unique combinations.
>
> Damian
>
>
>
> Damian Betebenner
> Center for Assessment
> PO Box 351
> Dover, NH   03821-0351
>  
> Phone (office): (603) 516-7900
> Phone (cell): (857) 234-2474
> Fax: (603) 516-7910
>
> [hidden email]
> www.nciea.org
>
>
>
>
> -----Original Message-----
> From: Matthew Dowle [mailto:[hidden email]] On Behalf Of Matthew Dowle
> Sent: Saturday, June 25, 2011 8:27 PM
> To: Damian Betebenner
> Cc: [hidden email]
> Subject: Re: [datatable-help] unique
>
> The NA in the key seems to be the issue. duplicated.data.table calls
> diff on the columns and assumes no NA in keys.
> Can't remember - did we decide to disallow NA in keys? There were other
> issues with NA in keys and some threads in the past.
> Do you need the key to contain NA?
> Matthew
>
>
> On Sat, 2011-06-25 at 16:23 -0500, Damian Betebenner wrote:
> > All,
> >
> >  
> >
> > I have a simple question regarding the behavior of unique with more
> > than 1 key. Consider the example:
> >
> >  
> >
> > > dt <- data.table(X=c(NA,1,2,3), Y=c(NA,2,1,3))
> >
> > > dt
> >
> >       X  Y
> >
> > [1,] NA NA
> >
> > [2,]  1  2
> >
> > [3,]  2  1
> >
> > [4,]  3  3
> >
> > > key(dt) <- c("X", "Y")
> >
> > > unique(dt)
> >
> >       X  Y
> >
> > [1,] NA NA
> >
> > [2,]  2  1
> >
> > [3,]  3  3
> >
> >  
> >
> > If I understand this correctly, unique see rows 2 and 3 of dt as the
> > same.
> >
> >  
> >
> > Is this the behavior one wants?
> >
> >  
> >
> > Thanks for any clarification.
> >
> >  
> >
> > Damian
> >
> >  
> >
> >  
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > [hidden email]
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Loading...