Set operation generics

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Set operation generics

hadley wickham
Hi all,

Would anyone be interested in reviewing a patch to make the set
operations (union, intersect, setdiff, setequal, is.element) generic?

Thanks,

Hadley

--
Chief Scientist, RStudio
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Set operation generics

Hervé Pagès
Hi Hadley,

On 10/21/2013 10:51 AM, Hadley Wickham wrote:
> Hi all,
>
> Would anyone be interested in reviewing a patch to make the set
> operations (union, intersect, setdiff, setequal, is.element) generic?

S3 generics, S4 generics, or primitives?

Since they are binary operations, sounds like supporting multiple
dispatch would be a plus.

Note that all those things heavily rely on match() behind the scene.
If match() itself was an S4 generic (or a primitive like c() and [)
then union(), intersect(), setdiff(), is.element() could be defined
with something like:

   union <- function(x, y)
   {
     xy <- c(x, y)
     sm <- match(xy, xy)
     xy[sm == seq_along(sm)]
   }

   intersect <- function(x, y)
   {
     sm <- match(x, x)
     x <- x[sm == seq_along(sm)]
     m <- match(x, y)
     x[!is.na(m)]
   }

   setequal <- function(x, y)
   {
     !(anyNA(match(x, y)) || anyNA(match(x, y)))
   }

and as long as your objects support [, c(), and match(), then the set
operations will work out-of-the-box on them. Note that you would also
get %in% for free.

There might be some rare situations where it might still be useful
that the set operations are generic functions but I see a lot more
value in making match() itself a generic (which doesn't exclude also
making the set operations generic).

For the record, match(), union(), intersect(), and setdiff() are S4
generics in the BiocGenerics package. But there is no doubt it would
be a better/cleaner situation if base::match() itself was an S4 generic
or primitive.

My 2 cents,

Cheers,
H.


>
> Thanks,
>
> Hadley
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Set operation generics

hadley wickham
>> Would anyone be interested in reviewing a patch to make the set
>> operations (union, intersect, setdiff, setequal, is.element) generic?
>
> S3 generics, S4 generics, or primitives?

I would expect S3. Can you even have an S4 generic in the base
package? (i.e. before the methods package is loaded)

> Note that all those things heavily rely on match() behind the scene.
> If match() itself was an S4 generic (or a primitive like c() and [)
> then union(), intersect(), setdiff(), is.element() could be defined
> with something like:
>
>
>   union <- function(x, y)
>   {
>     xy <- c(x, y)
>     sm <- match(xy, xy)
>     xy[sm == seq_along(sm)]
>   }
>
>   intersect <- function(x, y)
>   {
>     sm <- match(x, x)
>     x <- x[sm == seq_along(sm)]
>     m <- match(x, y)
>     x[!is.na(m)]
>   }
>
>   setequal <- function(x, y)
>   {
>     !(anyNA(match(x, y)) || anyNA(match(x, y)))
>   }

Although I suspect R-core would prefer a minimal change where it's
easier to see that existing behaviour is preserved.

> For the record, match(), union(), intersect(), and setdiff() are S4
> generics in the BiocGenerics package. But there is no doubt it would
> be a better/cleaner situation if base::match() itself was an S4 generic
> or primitive.

By primitive, you mean internal generic?

Hadley

--
Chief Scientist, RStudio
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Set operation generics

Hervé Pagès
Hi Hadley,

On 10/22/2013 07:54 AM, Hadley Wickham wrote:
>>> Would anyone be interested in reviewing a patch to make the set
>>> operations (union, intersect, setdiff, setequal, is.element) generic?
>>
>> S3 generics, S4 generics, or primitives?
>
> I would expect S3. Can you even have an S4 generic in the base
> package? (i.e. before the methods package is loaded)

Probably not. But the patch could be trying to put them in stats4.

>
>> Note that all those things heavily rely on match() behind the scene.
>> If match() itself was an S4 generic (or a primitive like c() and [)
>> then union(), intersect(), setdiff(), is.element() could be defined
>> with something like:
>>
>>
>>    union <- function(x, y)
>>    {
>>      xy <- c(x, y)
>>      sm <- match(xy, xy)
>>      xy[sm == seq_along(sm)]
>>    }
>>
>>    intersect <- function(x, y)
>>    {
>>      sm <- match(x, x)
>>      x <- x[sm == seq_along(sm)]
>>      m <- match(x, y)
>>      x[!is.na(m)]
>>    }
>>
>>    setequal <- function(x, y)
>>    {
>>      !(anyNA(match(x, y)) || anyNA(match(x, y)))
>>    }
>
> Although I suspect R-core would prefer a minimal change where it's
> easier to see that existing behaviour is preserved.
>
>> For the record, match(), union(), intersect(), and setdiff() are S4
>> generics in the BiocGenerics package. But there is no doubt it would
>> be a better/cleaner situation if base::match() itself was an S4 generic
>> or primitive.
>
> By primitive, you mean internal generic?

Yes.

Thanks,
H.

>
> Hadley
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel