|
sapply() stems from S / S+ times and hence has a long tradition.
In spite of that I think that it should be enhanced... As the subject mentions, sapply() produces a matrix in cases where the list components of the lapply(.) results are of the same length (and ...). However, it unfortunately "stops there". E.g., if you *nest* two sapply() calls where the inner one produces a matrix, very often the logical behavior would be for the outer sapply() to stack these matrices into an array of rank 3 ["array rank"(x) := length(dim(x))]. However it does not do that, e.g., an artifical example p0 <- function(...) paste(..., sep="") myF <- function(x,y) { stopifnot(length(x) <= 3) x <- rep(x, length.out=3) ny <- length(y) r <- outer(x,y) dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny))) r } and > (v <- structure(10*(5:8), names=LETTERS[1:4])) A B C D 50 60 70 80 if we let sapply() not simplify, we see the list of same size matrices it produes: > sapply(v, myF, y = 2*(1:5), simplify=FALSE) $A C1 C2 C3 C4 C5 r1 100 200 300 400 500 r2 100 200 300 400 500 r3 100 200 300 400 500 $B C1 C2 C3 C4 C5 r1 120 240 360 480 600 r2 120 240 360 480 600 r3 120 240 360 480 600 $C C1 C2 C3 C4 C5 r1 140 280 420 560 700 r2 140 280 420 560 700 r3 140 280 420 560 700 $D C1 C2 C3 C4 C5 r1 160 320 480 640 800 r2 160 320 480 640 800 r3 160 320 480 640 800 However, quite deceptively > sapply(v, myF, y = 2*(1:5)) A B C D [1,] 100 120 140 160 [2,] 100 120 140 160 [3,] 100 120 140 160 [4,] 200 240 280 320 [5,] 200 240 280 320 [6,] 200 240 280 320 [7,] 300 360 420 480 [8,] 300 360 420 480 [9,] 300 360 420 480 [10,] 400 480 560 640 [11,] 400 480 560 640 [12,] 400 480 560 640 [13,] 500 600 700 800 [14,] 500 600 700 800 [15,] 500 600 700 800 My proposal -- implemented and "make check" tested -- is to add an optional argument 'ARRAY' which allows > sapply(v, myF, y = 2*(1:5), ARRAY=TRUE) , , A C1 C2 C3 C4 C5 r1 100 200 300 400 500 r2 100 200 300 400 500 r3 100 200 300 400 500 , , B C1 C2 C3 C4 C5 r1 120 240 360 480 600 r2 120 240 360 480 600 r3 120 240 360 480 600 , , C C1 C2 C3 C4 C5 r1 140 280 420 560 700 r2 140 280 420 560 700 r3 140 280 420 560 700 , , D C1 C2 C3 C4 C5 r1 160 320 480 640 800 r2 160 320 480 640 800 r3 160 320 480 640 800 > ----------- In the best of all worlds, the default would be 'ARRAY = TRUE', but of course, given the long-standing different behavior, it seem much too "risky", and my proposal includes remaining back-compatible with default ARRAY = FALSE. Martin Maechler, ETH Zurich ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
On Dec 1, 2010, at 2:39 AM, Martin Maechler wrote:
> sapply() stems from S / S+ times and hence has a long tradition. > In spite of that I think that it should be enhanced... > > As the subject mentions, sapply() produces a matrix in cases > where the list components of the lapply(.) results are of the > same length (and ...). > However, it unfortunately "stops there". > E.g., if you *nest* two sapply() calls where the inner one > produces a matrix, very often the logical behavior would be for > the outer sapply() to stack these matrices into an array of > rank 3 ["array rank"(x) := length(dim(x))]. > However it does not do that, e.g., an artifical example > > p0 <- function(...) paste(..., sep="") > myF <- function(x,y) { > stopifnot(length(x) <= 3) > x <- rep(x, length.out=3) > ny <- length(y) > r <- outer(x,y) > dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny))) > r > } > > and > >> (v <- structure(10*(5:8), names=LETTERS[1:4])) > A B C D > 50 60 70 80 > > if we let sapply() not simplify, we see the list of same size > matrices it produes: > >> sapply(v, myF, y = 2*(1:5), simplify=FALSE) > $A > C1 C2 C3 C4 C5 > r1 100 200 300 400 500 > r2 100 200 300 400 500 > r3 100 200 300 400 500 > > $B > C1 C2 C3 C4 C5 > r1 120 240 360 480 600 > r2 120 240 360 480 600 > r3 120 240 360 480 600 > > $C > C1 C2 C3 C4 C5 > r1 140 280 420 560 700 > r2 140 280 420 560 700 > r3 140 280 420 560 700 > > $D > C1 C2 C3 C4 C5 > r1 160 320 480 640 800 > r2 160 320 480 640 800 > r3 160 320 480 640 800 > > However, quite deceptively > >> sapply(v, myF, y = 2*(1:5)) > A B C D > [1,] 100 120 140 160 > [2,] 100 120 140 160 > [3,] 100 120 140 160 > [4,] 200 240 280 320 > [5,] 200 240 280 320 > [6,] 200 240 280 320 > [7,] 300 360 420 480 > [8,] 300 360 420 480 > [9,] 300 360 420 480 > [10,] 400 480 560 640 > [11,] 400 480 560 640 > [12,] 400 480 560 640 > [13,] 500 600 700 800 > [14,] 500 600 700 800 > [15,] 500 600 700 800 > > > My proposal -- implemented and "make check" tested -- > is to add an optional argument 'ARRAY' > which allows > >> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE) > , , A > > C1 C2 C3 C4 C5 > r1 100 200 300 400 500 > r2 100 200 300 400 500 > r3 100 200 300 400 500 > > , , B > > C1 C2 C3 C4 C5 > r1 120 240 360 480 600 > r2 120 240 360 480 600 > r3 120 240 360 480 600 > > , , C > > C1 C2 C3 C4 C5 > r1 140 280 420 560 700 > r2 140 280 420 560 700 > r3 140 280 420 560 700 > > , , D > > C1 C2 C3 C4 C5 > r1 160 320 480 640 800 > r2 160 320 480 640 800 > r3 160 320 480 640 800 > >> > ----------- > > In the best of all worlds, the default would be 'ARRAY = TRUE', > but of course, given the long-standing different behavior, > it seem much too "risky", and my proposal includes remaining > back-compatible with default ARRAY = FALSE. > > Martin Maechler, > ETH Zurich Seems to me to be a reasonable proposal Martin, obviously with the proviso that the current default behavior is unaltered, as you note. Regards, Marc ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
I think an even better approach would be to extract the
"simplification" component out of sapply, so that could write sapply <- function(...) simplify(lapply(...)) (although obviously some arguments would go to lapply and some to simplify). The advantage of this would be that you could use the same simplification algorithm in other places. Hadley On Wed, Dec 1, 2010 at 8:39 AM, Martin Maechler <[hidden email]> wrote: > sapply() stems from S / S+ times and hence has a long tradition. > In spite of that I think that it should be enhanced... > > As the subject mentions, sapply() produces a matrix in cases > where the list components of the lapply(.) results are of the > same length (and ...). > However, it unfortunately "stops there". > E.g., if you *nest* two sapply() calls where the inner one > produces a matrix, very often the logical behavior would be for > the outer sapply() to stack these matrices into an array of > rank 3 ["array rank"(x) := length(dim(x))]. > However it does not do that, e.g., an artifical example > > p0 <- function(...) paste(..., sep="") > myF <- function(x,y) { > stopifnot(length(x) <= 3) > x <- rep(x, length.out=3) > ny <- length(y) > r <- outer(x,y) > dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny))) > r > } > > and > >> (v <- structure(10*(5:8), names=LETTERS[1:4])) > A B C D > 50 60 70 80 > > if we let sapply() not simplify, we see the list of same size > matrices it produes: > >> sapply(v, myF, y = 2*(1:5), simplify=FALSE) > $A > C1 C2 C3 C4 C5 > r1 100 200 300 400 500 > r2 100 200 300 400 500 > r3 100 200 300 400 500 > > $B > C1 C2 C3 C4 C5 > r1 120 240 360 480 600 > r2 120 240 360 480 600 > r3 120 240 360 480 600 > > $C > C1 C2 C3 C4 C5 > r1 140 280 420 560 700 > r2 140 280 420 560 700 > r3 140 280 420 560 700 > > $D > C1 C2 C3 C4 C5 > r1 160 320 480 640 800 > r2 160 320 480 640 800 > r3 160 320 480 640 800 > > However, quite deceptively > >> sapply(v, myF, y = 2*(1:5)) > A B C D > [1,] 100 120 140 160 > [2,] 100 120 140 160 > [3,] 100 120 140 160 > [4,] 200 240 280 320 > [5,] 200 240 280 320 > [6,] 200 240 280 320 > [7,] 300 360 420 480 > [8,] 300 360 420 480 > [9,] 300 360 420 480 > [10,] 400 480 560 640 > [11,] 400 480 560 640 > [12,] 400 480 560 640 > [13,] 500 600 700 800 > [14,] 500 600 700 800 > [15,] 500 600 700 800 > > > My proposal -- implemented and "make check" tested -- > is to add an optional argument 'ARRAY' > which allows > >> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE) > , , A > > C1 C2 C3 C4 C5 > r1 100 200 300 400 500 > r2 100 200 300 400 500 > r3 100 200 300 400 500 > > , , B > > C1 C2 C3 C4 C5 > r1 120 240 360 480 600 > r2 120 240 360 480 600 > r3 120 240 360 480 600 > > , , C > > C1 C2 C3 C4 C5 > r1 140 280 420 560 700 > r2 140 280 420 560 700 > r3 140 280 420 560 700 > > , , D > > C1 C2 C3 C4 C5 > r1 160 320 480 640 800 > r2 160 320 480 640 800 > r3 160 320 480 640 800 > >> > ----------- > > In the best of all worlds, the default would be 'ARRAY = TRUE', > but of course, given the long-standing different behavior, > it seem much too "risky", and my proposal includes remaining > back-compatible with default ARRAY = FALSE. > > Martin Maechler, > ETH Zurich > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
> -----Original Message-----
> From: [hidden email] > [mailto:[hidden email]] On Behalf Of Hadley Wickham > Sent: Wednesday, December 01, 2010 6:27 AM > To: Martin Maechler > Cc: [hidden email] > Subject: Re: [Rd] RFC: sapply() limitation from vector to > matrix,but not further > > I think an even better approach would be to extract the > "simplification" component out of sapply, so that could write > > sapply <- function(...) simplify(lapply(...)) > > (although obviously some arguments would go to lapply and > some to simplify). > > The advantage of this would be that you could use the same > simplification algorithm in other places. A downside of that approach is that lapply(X,...) can cause a lot of unneeded memory to be allocated (length(X) SEXP's). Those SEXP's would be tossed out by simplify() but the peak memory usage would remain high. sapply() can be written to avoid the intermediate list structure. vapply() can avoid the intermediate list structure because it knows what the output of FUN will look like and can put the results directly into the desired output structure. Perhaps its processing of the FUN.VALUE argument could be beefed up so that matrices would be stacked as you want. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > Hadley > > On Wed, Dec 1, 2010 at 8:39 AM, Martin Maechler > <[hidden email]> wrote: > > sapply() stems from S / S+ times and hence has a long tradition. > > In spite of that I think that it should be enhanced... > > > > As the subject mentions, sapply() produces a matrix in cases > > where the list components of the lapply(.) results are of the > > same length (and ...). > > However, it unfortunately "stops there". > > E.g., if you *nest* two sapply() calls where the inner one > > produces a matrix, very often the logical behavior would be for > > the outer sapply() to stack these matrices into an array of > > rank 3 ["array rank"(x) := length(dim(x))]. > > However it does not do that, e.g., an artifical example > > > > p0 <- function(...) paste(..., sep="") > > myF <- function(x,y) { > > stopifnot(length(x) <= 3) > > x <- rep(x, length.out=3) > > ny <- length(y) > > r <- outer(x,y) > > dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny))) > > r > > } > > > > and > > > >> (v <- structure(10*(5:8), names=LETTERS[1:4])) > > A B C D > > 50 60 70 80 > > > > if we let sapply() not simplify, we see the list of same size > > matrices it produes: > > > >> sapply(v, myF, y = 2*(1:5), simplify=FALSE) > > $A > > C1 C2 C3 C4 C5 > > r1 100 200 300 400 500 > > r2 100 200 300 400 500 > > r3 100 200 300 400 500 > > > > $B > > C1 C2 C3 C4 C5 > > r1 120 240 360 480 600 > > r2 120 240 360 480 600 > > r3 120 240 360 480 600 > > > > $C > > C1 C2 C3 C4 C5 > > r1 140 280 420 560 700 > > r2 140 280 420 560 700 > > r3 140 280 420 560 700 > > > > $D > > C1 C2 C3 C4 C5 > > r1 160 320 480 640 800 > > r2 160 320 480 640 800 > > r3 160 320 480 640 800 > > > > However, quite deceptively > > > >> sapply(v, myF, y = 2*(1:5)) > > A B C D > > [1,] 100 120 140 160 > > [2,] 100 120 140 160 > > [3,] 100 120 140 160 > > [4,] 200 240 280 320 > > [5,] 200 240 280 320 > > [6,] 200 240 280 320 > > [7,] 300 360 420 480 > > [8,] 300 360 420 480 > > [9,] 300 360 420 480 > > [10,] 400 480 560 640 > > [11,] 400 480 560 640 > > [12,] 400 480 560 640 > > [13,] 500 600 700 800 > > [14,] 500 600 700 800 > > [15,] 500 600 700 800 > > > > > > My proposal -- implemented and "make check" tested -- > > is to add an optional argument 'ARRAY' > > which allows > > > >> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE) > > , , A > > > > C1 C2 C3 C4 C5 > > r1 100 200 300 400 500 > > r2 100 200 300 400 500 > > r3 100 200 300 400 500 > > > > , , B > > > > C1 C2 C3 C4 C5 > > r1 120 240 360 480 600 > > r2 120 240 360 480 600 > > r3 120 240 360 480 600 > > > > , , C > > > > C1 C2 C3 C4 C5 > > r1 140 280 420 560 700 > > r2 140 280 420 560 700 > > r3 140 280 420 560 700 > > > > , , D > > > > C1 C2 C3 C4 C5 > > r1 160 320 480 640 800 > > r2 160 320 480 640 800 > > r3 160 320 480 640 800 > > > >> > > ----------- > > > > In the best of all worlds, the default would be 'ARRAY = TRUE', > > but of course, given the long-standing different behavior, > > it seem much too "risky", and my proposal includes remaining > > back-compatible with default ARRAY = FALSE. > > > > Martin Maechler, > > ETH Zurich > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
> A downside of that approach is that lapply(X,...) can
> cause a lot of unneeded memory to be allocated (length(X) > SEXP's). Those SEXP's would be tossed out by simplify() but > the peak memory usage would remain high. sapply() can > be written to avoid the intermediate list structure. But the upside is reusable code that can be used in multiple places - what about the simplification code used by mapply and tapply? Why are there three different implementations of simplification? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
Finally finding time to come back to this.
Remember that I've started the thread by proposing a version of sapply() which does not just "stop" with making a matrix() from the lapply() result, but instead --- only when the new argument ARRAY = TRUE is set --- may return an array() of any (appropriate) order, in those cases where the lapply() result elements all return an array of the same dim(). On Wed, Dec 1, 2010 at 19:51, Hadley Wickham <[hidden email]> wrote: >> A downside of that approach is that lapply(X,...) can >> cause a lot of unneeded memory to be allocated (length(X) >> SEXP's). Those SEXP's would be tossed out by simplify() but >> the peak memory usage would remain high. sapply() can >> be written to avoid the intermediate list structure. > > But the upside is reusable code that can be used in multiple places - > what about the simplification code used by mapply and tapply? Why are > there three different implementations of simplification? > > Hadley I have now looked into using a version of what Hadley had proposed. Note (to Bill's point) that the current implementation of sapply() does go via lapply() and that we have vapply() as a faster version of sapply() with less copying (hopefully). Very unfortunately, vapply() .. which was only created 13 months ago, has inherited the ``illogical'' behavior of sapply() in that it does not make up higher rank arrays if the single element is already a matrix (say). ... Consequently, we also need a patch to vapply(), and I do wonder if we should not make "ARRAY=TRUE" the default there, since with vapply() you specify a result value, and if you specify a matrix, the total result should stack these matrices into an array of rank 3, etc. Looking at it, the patch is not so much work... notably if we don't use a new argument but really let FUN.VALUE determine what the result should look like. More comments are stil welcome... Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Martin Maechler
On Wed, Dec 1, 2010 at 3:39 AM, Martin Maechler
<[hidden email]> wrote: > My proposal -- implemented and "make check" tested -- > is to add an optional argument 'ARRAY' > which allows > >> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE) It would reduce the proliferation of arguments if the simplify= argument were extended to allow this, e.g. simplify = "array" or perhaps simplify = n would allow a maximum of n dimensions. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
>>>>> Gabor Grothendieck <[hidden email]>
>>>>> on Mon, 27 Dec 2010 17:06:25 -0500 writes: > On Wed, Dec 1, 2010 at 3:39 AM, Martin Maechler > <[hidden email]> wrote: >> My proposal -- implemented and "make check" tested -- is >> to add an optional argument 'ARRAY' which allows >> >>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE) > It would reduce the proliferation of arguments if the > simplify= argument were extended to allow this, > e.g. simplify = "array" or perhaps simplify = n would > allow a maximum of n dimensions. That's a good idea, though it makes the implementation/documentation very slightly more complicated. I'm interested to get more feedback on my other questions, notably the only about *changing* vapply() (on the C-level) to behave "logical" in the sense of adding one dim(.)ension in those cases, the FUN.VALUE (result prototype) has a dim(). Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
The abind() function from the abind package is an alternative here -- it can take a list argument, which makes it easy to use with the result of lapply(). It's also able take direction about which dimension to join on.
> x <- list(a=1,b=2,c=3) > f <- function(v) matrix(v, nrow=2, ncol=4) > sapply(x, f) a b c [1,] 1 2 3 [2,] 1 2 3 [3,] 1 2 3 [4,] 1 2 3 [5,] 1 2 3 [6,] 1 2 3 [7,] 1 2 3 [8,] 1 2 3 > > # The 'along=' argument to abind() determines on which dimension > # the list elements are joined. Use a fractional value to put the new > # dimension between existing ones. > > dim(abind(lapply(x, f), along=0)) [1] 3 2 4 > dim(abind(lapply(x, f), along=1.5)) [1] 2 3 4 > dim(abind(lapply(x, f), along=3)) [1] 2 4 3 > abind(lapply(x, f), along=3) , , a [,1] [,2] [,3] [,4] [1,] 1 1 1 1 [2,] 1 1 1 1 , , b [,1] [,2] [,3] [,4] [1,] 2 2 2 2 [2,] 2 2 2 2 , , c [,1] [,2] [,3] [,4] [1,] 3 3 3 3 [2,] 3 3 3 3 > On 12/28/2010 8:49 AM, Martin Maechler wrote: >>>>>> Gabor Grothendieck<[hidden email]> >>>>>> on Mon, 27 Dec 2010 17:06:25 -0500 writes: > > On Wed, Dec 1, 2010 at 3:39 AM, Martin Maechler > > <[hidden email]> wrote: > >> My proposal -- implemented and "make check" tested -- is > >> to add an optional argument 'ARRAY' which allows > >> > >>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE) > > > It would reduce the proliferation of arguments if the > > simplify= argument were extended to allow this, > > e.g. simplify = "array" or perhaps simplify = n would > > allow a maximum of n dimensions. > > That's a good idea, though it makes the > implementation/documentation very slightly more complicated. > > I'm interested to get more feedback on my other questions, > notably the only about *changing* vapply() (on the C-level) to > behave "logical" in the sense of adding one dim(.)ension in > those cases, the FUN.VALUE (result prototype) has a dim(). > > > Martin > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
On Tue, Dec 28, 2010 at 19:14, Tony Plate <[hidden email]> wrote:
> The abind() function from the abind package is an alternative here -- it can > take a list argument, which makes it easy to use with the result of > lapply(). It's also able take direction about which dimension to join on. > >> x <- list(a=1,b=2,c=3) >> f <- function(v) matrix(v, nrow=2, ncol=4) >> sapply(x, f) > a b c > [1,] 1 2 3 > [2,] 1 2 3 > [3,] 1 2 3 > [4,] 1 2 3 > [5,] 1 2 3 > [6,] 1 2 3 > [7,] 1 2 3 > [8,] 1 2 3 >> >> # The 'along=' argument to abind() determines on which dimension >> # the list elements are joined. Use a fractional value to put the new >> # dimension between existing ones. >> >> dim(abind(lapply(x, f), along=0)) > [1] 3 2 4 >> dim(abind(lapply(x, f), along=1.5)) > [1] 2 3 4 >> dim(abind(lapply(x, f), along=3)) > [1] 2 4 3 >> abind(lapply(x, f), along=3) > , , a > > [,1] [,2] [,3] [,4] > [1,] 1 1 1 1 > [2,] 1 1 1 1 > > , , b > > [,1] [,2] [,3] [,4] > [1,] 2 2 2 2 > [2,] 2 2 2 2 > > , , c > > [,1] [,2] [,3] [,4] > [1,] 3 3 3 3 > [2,] 3 3 3 3 > Thank you, Tony. Indeed, yes, abind() is nice here (and in the good ol' APL spirit !) Wanting to keep things both simple *and* fast here, of course, hence I currently contemplate the following code, where the new simplify2array() is considerably simpler than abind(): ##' "Simplify" a list of commonly structured components into an array. ##' ##' @title simplify list() to an array if the list elements are structurally equal ##' @param x a list, typically resulting from lapply() ##' @param higher logical indicating if an array() of "higher rank" ##' should be returned when appropriate, namely when all elements of ##' \code{x} have the same \code{\link{dim}()}ension. ##' @return x itself, or an array if the simplification "is sensible" simplify2array <- function(x, higher = TRUE) { if(length(common.len <- unique(unlist(lapply(x, length)))) > 1L) return(x) if(common.len == 1L) unlist(x, recursive = FALSE) else if(common.len > 1L) { n <- length(x) ## make sure that array(*) will not call rep() {e.g. for 'call's}: r <- as.vector(unlist(x, recursive = FALSE)) if(higher && length(c.dim <- unique(lapply(x, dim))) == 1 && is.numeric(c.dim <- c.dim[[1L]]) && prod(d <- c(c.dim, n)) == length(r)) { iN1 <- is.null(n1 <- dimnames(x[[1L]])) n2 <- names(x) dnam <- if(!(iN1 && is.null(n2))) c(if(iN1) rep.int(list(n1), length(c.dim)) else n1, list(n2)) ## else NULL array(r, dim = d, dimnames = dnam) } else if(prod(d <- c(common.len, n)) == length(r)) array(r, dim = d, dimnames= if(!(is.null(n1 <- names(x[[1L]])) & is.null(n2 <- names(x)))) list(n1,n2)) else x } else x } sapply <- function(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) { FUN <- match.fun(FUN) answer <- lapply(X, FUN, ...) if(USE.NAMES && is.character(X) && is.null(names(answer))) names(answer) <- X if(!identical(simplify, FALSE) && length(answer)) simplify2array(answer, higher = (simplify == "array")) else answer } ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
>>>>> Martin Maechler <[hidden email]>
>>>>> on Tue, 28 Dec 2010 20:06:07 +0100 writes: > On Tue, Dec 28, 2010 at 19:14, Tony Plate <[hidden email]> > wrote: >> The abind() function from the abind package is an >> alternative here -- it can take a list argument, which >> makes it easy to use with the result of lapply(). It's >> also able take direction about which dimension to join >> on. >> >>> x <- list(a=1,b=2,c=3) f <- function(v) matrix(v, >>> nrow=2, ncol=4) sapply(x, f) >> a b c [1,] 1 2 3 [2,] 1 2 3 [3,] 1 2 3 [4,] 1 2 3 >> [5,] 1 2 3 [6,] 1 2 3 [7,] 1 2 3 [8,] 1 2 3 >>> >>> # The 'along=' argument to abind() determines on which >>> dimension # the list elements are joined. Use a >>> fractional value to put the new # dimension between >>> existing ones. >>> >>> dim(abind(lapply(x, f), along=0)) >> [1] 3 2 4 >>> dim(abind(lapply(x, f), along=1.5)) >> [1] 2 3 4 >>> dim(abind(lapply(x, f), along=3)) >> [1] 2 4 3 >>> abind(lapply(x, f), along=3) >> , , a >> >> [,1] [,2] [,3] [,4] [1,] 1 1 1 1 [2,] >> 1 1 1 1 >> >> , , b >> >> [,1] [,2] [,3] [,4] [1,] 2 2 2 2 [2,] >> 2 2 2 2 >> >> , , c >> >> [,1] [,2] [,3] [,4] [1,] 3 3 3 3 [2,] >> 3 3 3 3 >> > Thank you, Tony. > Indeed, yes, abind() is nice here (and in the good ol' APL > spirit !) > Wanting to keep things both simple *and* fast here, of > course, hence I currently contemplate the following code, > where the new simplify2array() is considerably simpler > than abind(): > ##' "Simplify" a list of commonly structured components into an array. > ##' > ##' @title simplify list() to an array if the list elements are structurally equal > ##' @param x a list, typically resulting from lapply() > ##' @param higher logical indicating if an array() of "higher rank" > ##' should be returned when appropriate, namely when all elements of > ##' \code{x} have the same \code{\link{dim}()}ension. > ##' @return x itself, or an array if the simplification "is sensible" > simplify2array <- function(x, higher = TRUE) > { > if(length(common.len <- unique(unlist(lapply(x, length)))) > 1L) > return(x) > if(common.len == 1L) > unlist(x, recursive = FALSE) > else if(common.len > 1L) { > n <- length(x) > ## make sure that array(*) will not call rep() {e.g. for 'call's}: > r <- as.vector(unlist(x, recursive = FALSE)) > if(higher && length(c.dim <- unique(lapply(x, dim))) == 1 && > is.numeric(c.dim <- c.dim[[1L]]) && > prod(d <- c(c.dim, n)) == length(r)) { > iN1 <- is.null(n1 <- dimnames(x[[1L]])) > n2 <- names(x) > dnam <- > if(!(iN1 && is.null(n2))) > c(if(iN1) rep.int(list(n1), length(c.dim)) else n1, > list(n2)) ## else NULL > array(r, dim = d, dimnames = dnam) > } else if(prod(d <- c(common.len, n)) == length(r)) > array(r, dim = d, > dimnames= if(!(is.null(n1 <- names(x[[1L]])) & > is.null(n2 <- names(x)))) list(n1,n2)) > else x > } > else x > } > sapply <- function(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) > { > FUN <- match.fun(FUN) > answer <- lapply(X, FUN, ...) > if(USE.NAMES && is.character(X) && is.null(names(answer))) > names(answer) <- X > if(!identical(simplify, FALSE) && length(answer)) > simplify2array(answer, higher = (simplify == "array")) > else answer > } As some may have noted, the above has been committed to R-devel (r53886 | maechler | 2010-12-29 10:36:01 +0100) with the extra ------------------------------------------------------ NOTE: vapply() and replicate() doing that *by default* ------------------------------------------------------ which means that I've deared to let vapply() and replicate() behave logically (in the above sense) by default, i.e. *not* back compatibly. If you want to remain bug-compatible (:-), for replicate() you can explicitly ask for 'simplify=TRUE' instead of the new default simplify="array". For vapply(), the extra work of implementing such a back/bug compatibility option did not seem worth; in particular, as vapply() is very new and not used in many places (on CRAN) anyway. The new replicate() default behavior has lead to two CRAN packages ('emoa', 'plsRglm' whose authors I'll address privately) to fail 'R CMD check'; inspection however shows that in both cases, 1) the check failure is from examples / test functions 2) the usage there being t(replicate(N, foobar())) where foobar() returns a 1D array instead of a vector, so one way to "fix" the problem would be to change the above to t(replicate(N, t(foobar()))) So, in summary, the changed behavior of replicate() seems indeed more logical insofar as it revealed programming/usage glitches in other parts of R code. BTW: The above makes me considering --- once again -- extending the definition of t(a) to arrays a of array-rank {:= length(dim(a))} >= 3, and there generalize t(.) to be the same as aperm(., rev(seq_along(dim(.)))) {.. in the APL tradition where t() and aperm() really where the same} Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
| Powered by Nabble | Edit this page |
