combn(n, k, ...) and all its re-inventions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

combn(n, k, ...) and all its re-inventions

Martin Maechler
It seems people are reinventing the wheel here:
The goal is to generate  all combinations of 1:n of size k.
This (typically) results in a matrix of size  k * choose(n,k)
i.e. needs O(n ^ k) space, hence is only applicable to
relatively small k.
Then alternatives have been devised to generate the combinations
"one by one", and I think I remember there has been a
quiz/challenge about 20 years ago, at an S user's conference in
Australia(?), on this topic.

Anyway, AFAIK, the first nice and efficient function for this
has been provided by Scott Chasalow for S (not R)  and he made
his S code available at the University of Wageningen as
"combinat" module. Later this became into an R package which is
now maintained by Vince Carey.  The source of 'combinat' still
shows
#       DATE WRITTEN: 14 April 1994          LAST REVISED:  10 July 1995
#       AUTHOR:  Scott Chasalow

OTOH, people have since reinvented the wheel quite prolifically:
There's combinations() in gtools {based on Bill Venables' code from R News 1/1},
combs() in CAtools, subsets() in BHH2, and nchoosek() in vsn (bioconductor);
then 'fwd.combn' in package "forward" which states explicitly
that it is Scott's combn() renamed..
I stopped searching for more, and I've made sure all
these 6 functions compute the same thing, at least in the most simple case.

After simply replacing nCm() by choose(), and some other minor
tweaks, I have now a version of combn() that is faster than all
the other implementations {only slightly faster than
combinations()}, and I plan to add this to R's standard package
'utils'.
Hopefully, the reinventing can be stopped by this, once people
can rely on a relatively fast implementation of the
functionality.

One might also consider to include a version of the ``one by
one'' combination generators {as mentioned above} which is
needed for larger k.

Opinions ?

Martin Maechler, ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: combn(n, k, ...) and all its re-inventions

Peter Dalgaard
Martin Maechler <[hidden email]> writes:


> tweaks, I have now a version of combn() that is faster than all
> the other implementations {only slightly faster than
> combinations()}, and I plan to add this to R's standard package
> 'utils'.
> Hopefully, the reinventing can be stopped by this, once people
> can rely on a relatively fast implementation of the
> functionality.
>
> One might also consider to include a version of the ``one by
> one'' combination generators {as mentioned above} which is
> needed for larger k.
>
> Opinions ?

While you're in there... combn() has a nice feature that you can apply
a function to each combination, which can be quite useful (and fun) if
you're demonstrating permutation tests:

> x <- combn(20,10,function(i)mean(sleep$extra[i]))
> x <- round(x,2) # this is needed, or you get artifacts
> plot(sort(unique(x)),table(x),type="h")
> plot(sort(x),type="l")
> sum(x <= mean(sleep$extra[1:10])) / length(x)
[1] 0.04072398


However, combn() does its work in sapply() style: First create a list,
then simplify. As this quickly becomes a rather *long* list (e.g., a
slightly larger case with choose(29, 14)==77558760 combinations kills
R for me on a 2GB 64 bit machine), it might be desirable to have an option,
"assume.scalar" or so, to specify that the function always returns a
single scalar.

--
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel