Quantcast

RFC: sapply() limitation from vector to matrix, but not further

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RFC: sapply() limitation from vector to matrix, but not further

Martin Maechler
sapply() stems from S / S+ times and hence has a long tradition.
In spite of that I think that it should be enhanced...

As the subject mentions, sapply() produces a matrix in cases
where the list components of the lapply(.) results are of the
same length (and ...).
However, it unfortunately "stops there".
E.g., if you *nest* two sapply() calls where the inner one
produces a matrix, very often the logical behavior would be for
the outer sapply() to stack these matrices into an array of
rank 3 ["array rank"(x) := length(dim(x))].
However it does not do that, e.g., an artifical example

p0 <- function(...) paste(..., sep="")
myF <- function(x,y) {
    stopifnot(length(x) <= 3)
    x <- rep(x, length.out=3)
    ny <- length(y)
    r <- outer(x,y)
    dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny)))
    r
}

and

> (v <- structure(10*(5:8), names=LETTERS[1:4]))
 A  B  C  D
50 60 70 80

if we let sapply() not simplify, we see the list of same size
matrices it produes:

> sapply(v, myF, y = 2*(1:5), simplify=FALSE)
$A
    C1  C2  C3  C4  C5
r1 100 200 300 400 500
r2 100 200 300 400 500
r3 100 200 300 400 500

$B
    C1  C2  C3  C4  C5
r1 120 240 360 480 600
r2 120 240 360 480 600
r3 120 240 360 480 600

$C
    C1  C2  C3  C4  C5
r1 140 280 420 560 700
r2 140 280 420 560 700
r3 140 280 420 560 700

$D
    C1  C2  C3  C4  C5
r1 160 320 480 640 800
r2 160 320 480 640 800
r3 160 320 480 640 800

However, quite deceptively

> sapply(v, myF, y = 2*(1:5))
        A   B   C   D
 [1,] 100 120 140 160
 [2,] 100 120 140 160
 [3,] 100 120 140 160
 [4,] 200 240 280 320
 [5,] 200 240 280 320
 [6,] 200 240 280 320
 [7,] 300 360 420 480
 [8,] 300 360 420 480
 [9,] 300 360 420 480
[10,] 400 480 560 640
[11,] 400 480 560 640
[12,] 400 480 560 640
[13,] 500 600 700 800
[14,] 500 600 700 800
[15,] 500 600 700 800


My proposal -- implemented and "make check" tested --
is to add an optional argument  'ARRAY'
which allows

> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
, , A

    C1  C2  C3  C4  C5
r1 100 200 300 400 500
r2 100 200 300 400 500
r3 100 200 300 400 500

, , B

    C1  C2  C3  C4  C5
r1 120 240 360 480 600
r2 120 240 360 480 600
r3 120 240 360 480 600

, , C

    C1  C2  C3  C4  C5
r1 140 280 420 560 700
r2 140 280 420 560 700
r3 140 280 420 560 700

, , D

    C1  C2  C3  C4  C5
r1 160 320 480 640 800
r2 160 320 480 640 800
r3 160 320 480 640 800

>
-----------

In the best of all worlds, the default would be 'ARRAY = TRUE',
but of course, given the long-standing different behavior,
it seem much too "risky", and my proposal includes remaining
back-compatible with default ARRAY = FALSE.

Martin Maechler,
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Marc Schwartz-3
On Dec 1, 2010, at 2:39 AM, Martin Maechler wrote:

> sapply() stems from S / S+ times and hence has a long tradition.
> In spite of that I think that it should be enhanced...
>
> As the subject mentions, sapply() produces a matrix in cases
> where the list components of the lapply(.) results are of the
> same length (and ...).
> However, it unfortunately "stops there".
> E.g., if you *nest* two sapply() calls where the inner one
> produces a matrix, very often the logical behavior would be for
> the outer sapply() to stack these matrices into an array of
> rank 3 ["array rank"(x) := length(dim(x))].
> However it does not do that, e.g., an artifical example
>
> p0 <- function(...) paste(..., sep="")
> myF <- function(x,y) {
>    stopifnot(length(x) <= 3)
>    x <- rep(x, length.out=3)
>    ny <- length(y)
>    r <- outer(x,y)
>    dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny)))
>    r
> }
>
> and
>
>> (v <- structure(10*(5:8), names=LETTERS[1:4]))
> A  B  C  D
> 50 60 70 80
>
> if we let sapply() not simplify, we see the list of same size
> matrices it produes:
>
>> sapply(v, myF, y = 2*(1:5), simplify=FALSE)
> $A
>    C1  C2  C3  C4  C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
>
> $B
>    C1  C2  C3  C4  C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
>
> $C
>    C1  C2  C3  C4  C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
>
> $D
>    C1  C2  C3  C4  C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
>
> However, quite deceptively
>
>> sapply(v, myF, y = 2*(1:5))
>        A   B   C   D
> [1,] 100 120 140 160
> [2,] 100 120 140 160
> [3,] 100 120 140 160
> [4,] 200 240 280 320
> [5,] 200 240 280 320
> [6,] 200 240 280 320
> [7,] 300 360 420 480
> [8,] 300 360 420 480
> [9,] 300 360 420 480
> [10,] 400 480 560 640
> [11,] 400 480 560 640
> [12,] 400 480 560 640
> [13,] 500 600 700 800
> [14,] 500 600 700 800
> [15,] 500 600 700 800
>
>
> My proposal -- implemented and "make check" tested --
> is to add an optional argument  'ARRAY'
> which allows
>
>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
> , , A
>
>    C1  C2  C3  C4  C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
>
> , , B
>
>    C1  C2  C3  C4  C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
>
> , , C
>
>    C1  C2  C3  C4  C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
>
> , , D
>
>    C1  C2  C3  C4  C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
>
>>
> -----------
>
> In the best of all worlds, the default would be 'ARRAY = TRUE',
> but of course, given the long-standing different behavior,
> it seem much too "risky", and my proposal includes remaining
> back-compatible with default ARRAY = FALSE.
>
> Martin Maechler,
> ETH Zurich


Seems to me to be a reasonable proposal Martin, obviously with the proviso that the current default behavior is unaltered, as you note.

Regards,

Marc

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Hadley Wickham-2
In reply to this post by Martin Maechler
I think an even better approach would be to extract the
"simplification" component out of sapply, so that could write

sapply <- function(...) simplify(lapply(...))

(although obviously some arguments would go to lapply and some to simplify).

The advantage of this would be that you could use the same
simplification algorithm in other places.

Hadley

On Wed, Dec 1, 2010 at 8:39 AM, Martin Maechler
<[hidden email]> wrote:

> sapply() stems from S / S+ times and hence has a long tradition.
> In spite of that I think that it should be enhanced...
>
> As the subject mentions, sapply() produces a matrix in cases
> where the list components of the lapply(.) results are of the
> same length (and ...).
> However, it unfortunately "stops there".
> E.g., if you *nest* two sapply() calls where the inner one
> produces a matrix, very often the logical behavior would be for
> the outer sapply() to stack these matrices into an array of
> rank 3 ["array rank"(x) := length(dim(x))].
> However it does not do that, e.g., an artifical example
>
> p0 <- function(...) paste(..., sep="")
> myF <- function(x,y) {
>    stopifnot(length(x) <= 3)
>    x <- rep(x, length.out=3)
>    ny <- length(y)
>    r <- outer(x,y)
>    dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny)))
>    r
> }
>
> and
>
>> (v <- structure(10*(5:8), names=LETTERS[1:4]))
>  A  B  C  D
> 50 60 70 80
>
> if we let sapply() not simplify, we see the list of same size
> matrices it produes:
>
>> sapply(v, myF, y = 2*(1:5), simplify=FALSE)
> $A
>    C1  C2  C3  C4  C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
>
> $B
>    C1  C2  C3  C4  C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
>
> $C
>    C1  C2  C3  C4  C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
>
> $D
>    C1  C2  C3  C4  C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
>
> However, quite deceptively
>
>> sapply(v, myF, y = 2*(1:5))
>        A   B   C   D
>  [1,] 100 120 140 160
>  [2,] 100 120 140 160
>  [3,] 100 120 140 160
>  [4,] 200 240 280 320
>  [5,] 200 240 280 320
>  [6,] 200 240 280 320
>  [7,] 300 360 420 480
>  [8,] 300 360 420 480
>  [9,] 300 360 420 480
> [10,] 400 480 560 640
> [11,] 400 480 560 640
> [12,] 400 480 560 640
> [13,] 500 600 700 800
> [14,] 500 600 700 800
> [15,] 500 600 700 800
>
>
> My proposal -- implemented and "make check" tested --
> is to add an optional argument  'ARRAY'
> which allows
>
>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
> , , A
>
>    C1  C2  C3  C4  C5
> r1 100 200 300 400 500
> r2 100 200 300 400 500
> r3 100 200 300 400 500
>
> , , B
>
>    C1  C2  C3  C4  C5
> r1 120 240 360 480 600
> r2 120 240 360 480 600
> r3 120 240 360 480 600
>
> , , C
>
>    C1  C2  C3  C4  C5
> r1 140 280 420 560 700
> r2 140 280 420 560 700
> r3 140 280 420 560 700
>
> , , D
>
>    C1  C2  C3  C4  C5
> r1 160 320 480 640 800
> r2 160 320 480 640 800
> r3 160 320 480 640 800
>
>>
> -----------
>
> In the best of all worlds, the default would be 'ARRAY = TRUE',
> but of course, given the long-standing different behavior,
> it seem much too "risky", and my proposal includes remaining
> back-compatible with default ARRAY = FALSE.
>
> Martin Maechler,
> ETH Zurich
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

William Dunlap
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Hadley Wickham
> Sent: Wednesday, December 01, 2010 6:27 AM
> To: Martin Maechler
> Cc: [hidden email]
> Subject: Re: [Rd] RFC: sapply() limitation from vector to
> matrix,but not further
>
> I think an even better approach would be to extract the
> "simplification" component out of sapply, so that could write
>
> sapply <- function(...) simplify(lapply(...))
>
> (although obviously some arguments would go to lapply and
> some to simplify).
>
> The advantage of this would be that you could use the same
> simplification algorithm in other places.

A downside of that approach is that lapply(X,...) can
cause a lot of unneeded memory to be allocated (length(X)
SEXP's).  Those SEXP's would be tossed out by simplify() but
the peak memory usage would remain high.  sapply() can
be written to avoid the intermediate list structure.

vapply() can avoid the intermediate list structure because
it knows what the output of FUN will look like and can
put the results directly into the desired output structure.
Perhaps its processing of the FUN.VALUE argument could be
beefed up so that matrices would be stacked as you want.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

>
> Hadley
>
> On Wed, Dec 1, 2010 at 8:39 AM, Martin Maechler
> <[hidden email]> wrote:
> > sapply() stems from S / S+ times and hence has a long tradition.
> > In spite of that I think that it should be enhanced...
> >
> > As the subject mentions, sapply() produces a matrix in cases
> > where the list components of the lapply(.) results are of the
> > same length (and ...).
> > However, it unfortunately "stops there".
> > E.g., if you *nest* two sapply() calls where the inner one
> > produces a matrix, very often the logical behavior would be for
> > the outer sapply() to stack these matrices into an array of
> > rank 3 ["array rank"(x) := length(dim(x))].
> > However it does not do that, e.g., an artifical example
> >
> > p0 <- function(...) paste(..., sep="")
> > myF <- function(x,y) {
> >    stopifnot(length(x) <= 3)
> >    x <- rep(x, length.out=3)
> >    ny <- length(y)
> >    r <- outer(x,y)
> >    dimnames(r) <- list(p0("r",1:3), p0("C", seq_len(ny)))
> >    r
> > }
> >
> > and
> >
> >> (v <- structure(10*(5:8), names=LETTERS[1:4]))
> >  A  B  C  D
> > 50 60 70 80
> >
> > if we let sapply() not simplify, we see the list of same size
> > matrices it produes:
> >
> >> sapply(v, myF, y = 2*(1:5), simplify=FALSE)
> > $A
> >    C1  C2  C3  C4  C5
> > r1 100 200 300 400 500
> > r2 100 200 300 400 500
> > r3 100 200 300 400 500
> >
> > $B
> >    C1  C2  C3  C4  C5
> > r1 120 240 360 480 600
> > r2 120 240 360 480 600
> > r3 120 240 360 480 600
> >
> > $C
> >    C1  C2  C3  C4  C5
> > r1 140 280 420 560 700
> > r2 140 280 420 560 700
> > r3 140 280 420 560 700
> >
> > $D
> >    C1  C2  C3  C4  C5
> > r1 160 320 480 640 800
> > r2 160 320 480 640 800
> > r3 160 320 480 640 800
> >
> > However, quite deceptively
> >
> >> sapply(v, myF, y = 2*(1:5))
> >        A   B   C   D
> >  [1,] 100 120 140 160
> >  [2,] 100 120 140 160
> >  [3,] 100 120 140 160
> >  [4,] 200 240 280 320
> >  [5,] 200 240 280 320
> >  [6,] 200 240 280 320
> >  [7,] 300 360 420 480
> >  [8,] 300 360 420 480
> >  [9,] 300 360 420 480
> > [10,] 400 480 560 640
> > [11,] 400 480 560 640
> > [12,] 400 480 560 640
> > [13,] 500 600 700 800
> > [14,] 500 600 700 800
> > [15,] 500 600 700 800
> >
> >
> > My proposal -- implemented and "make check" tested --
> > is to add an optional argument  'ARRAY'
> > which allows
> >
> >> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
> > , , A
> >
> >    C1  C2  C3  C4  C5
> > r1 100 200 300 400 500
> > r2 100 200 300 400 500
> > r3 100 200 300 400 500
> >
> > , , B
> >
> >    C1  C2  C3  C4  C5
> > r1 120 240 360 480 600
> > r2 120 240 360 480 600
> > r3 120 240 360 480 600
> >
> > , , C
> >
> >    C1  C2  C3  C4  C5
> > r1 140 280 420 560 700
> > r2 140 280 420 560 700
> > r3 140 280 420 560 700
> >
> > , , D
> >
> >    C1  C2  C3  C4  C5
> > r1 160 320 480 640 800
> > r2 160 320 480 640 800
> > r3 160 320 480 640 800
> >
> >>
> > -----------
> >
> > In the best of all worlds, the default would be 'ARRAY = TRUE',
> > but of course, given the long-standing different behavior,
> > it seem much too "risky", and my proposal includes remaining
> > back-compatible with default ARRAY = FALSE.
> >
> > Martin Maechler,
> > ETH Zurich
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Hadley Wickham-2
> A downside of that approach is that lapply(X,...) can
> cause a lot of unneeded memory to be allocated (length(X)
> SEXP's).  Those SEXP's would be tossed out by simplify() but
> the peak memory usage would remain high.  sapply() can
> be written to avoid the intermediate list structure.

But the upside is reusable code that can be used in multiple places -
what about the simplification code used by mapply and tapply? Why are
there three different implementations of simplification?

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Martin Maechler
Finally finding time to come back to this.
Remember that I've started the thread by proposing a version of sapply()
which does not just "stop" with making a matrix() from the lapply() result, but
instead --- only when the new argument ARRAY = TRUE is set ---
may return an array() of any (appropriate) order, in those cases where
the lapply() result elements all return an array of the same dim().

On Wed, Dec 1, 2010 at 19:51, Hadley Wickham <[hidden email]> wrote:

>> A downside of that approach is that lapply(X,...) can
>> cause a lot of unneeded memory to be allocated (length(X)
>> SEXP's).  Those SEXP's would be tossed out by simplify() but
>> the peak memory usage would remain high.  sapply() can
>> be written to avoid the intermediate list structure.
>
> But the upside is reusable code that can be used in multiple places -
> what about the simplification code used by mapply and tapply? Why are
> there three different implementations of simplification?
>
> Hadley

I have now looked into using a version of what Hadley had proposed.
Note (to Bill's point) that the current implementation of sapply()
does go via lapply() and
that we have  vapply()  as a faster version of sapply()  with less
copying (hopefully).

Very unfortunately, vapply() .. which was only created 13 months ago,
has inherited the ``illogical''  behavior of  sapply()
in that it does not make up higher rank arrays if the single element
is already a matrix (say).
...
Consequently, we also need a patch to vapply(),
and I do wonder if we should not make "ARRAY=TRUE" the default there,
since with vapply() you specify a result value, and if you specify a
matrix, the total result should stack these matrices into an array of
rank 3, etc.
Looking at it, the patch is not so much work... notably if we don't
use a new argument but really let  FUN.VALUE determine what the result
should look like.

More comments are stil welcome...
Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Gabor Grothendieck
In reply to this post by Martin Maechler
On Wed, Dec 1, 2010 at 3:39 AM, Martin Maechler
<[hidden email]> wrote:
> My proposal -- implemented and "make check" tested --
> is to add an optional argument  'ARRAY'
> which allows
>
>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)

It would reduce the proliferation of arguments if the simplify=
argument were extended to allow this, e.g. simplify = "array" or
perhaps simplify = n would allow a maximum of n dimensions.

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Martin Maechler
>>>>> Gabor Grothendieck <[hidden email]>
>>>>>     on Mon, 27 Dec 2010 17:06:25 -0500 writes:

    > On Wed, Dec 1, 2010 at 3:39 AM, Martin Maechler
    > <[hidden email]> wrote:
    >> My proposal -- implemented and "make check" tested -- is
    >> to add an optional argument  'ARRAY' which allows
    >>
    >>> sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)

    > It would reduce the proliferation of arguments if the
    > simplify= argument were extended to allow this,
    > e.g. simplify = "array" or perhaps simplify = n would
    > allow a maximum of n dimensions.

That's a good idea, though it makes the
implementation/documentation very slightly more complicated.

I'm interested to get more feedback on my other questions,
notably the only about *changing*  vapply() (on the C-level) to
behave "logical" in the sense of adding one  dim(.)ension in
those cases, the FUN.VALUE (result prototype) has a dim().


Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Tony Plate-3
The abind() function from the abind package is an alternative here -- it can take a list argument, which makes it easy to use with the result of lapply().  It's also able take direction about which dimension to join on.

 > x <- list(a=1,b=2,c=3)
 > f <- function(v) matrix(v, nrow=2, ncol=4)
 > sapply(x, f)
      a b c
[1,] 1 2 3
[2,] 1 2 3
[3,] 1 2 3
[4,] 1 2 3
[5,] 1 2 3
[6,] 1 2 3
[7,] 1 2 3
[8,] 1 2 3
 >
 > # The 'along=' argument to abind() determines on which dimension
 > # the list elements are joined.  Use a fractional value to put the new
 > # dimension between existing ones.
 >
 > dim(abind(lapply(x, f), along=0))
[1] 3 2 4
 > dim(abind(lapply(x, f), along=1.5))
[1] 2 3 4
 > dim(abind(lapply(x, f), along=3))
[1] 2 4 3
 > abind(lapply(x, f), along=3)
, , a

      [,1] [,2] [,3] [,4]
[1,]    1    1    1    1
[2,]    1    1    1    1

, , b

      [,1] [,2] [,3] [,4]
[1,]    2    2    2    2
[2,]    2    2    2    2

, , c

      [,1] [,2] [,3] [,4]
[1,]    3    3    3    3
[2,]    3    3    3    3

 >

On 12/28/2010 8:49 AM, Martin Maechler wrote:

>>>>>> Gabor Grothendieck<[hidden email]>
>>>>>>      on Mon, 27 Dec 2010 17:06:25 -0500 writes:
>      >  On Wed, Dec 1, 2010 at 3:39 AM, Martin Maechler
>      >  <[hidden email]>  wrote:
>      >>  My proposal -- implemented and "make check" tested -- is
>      >>  to add an optional argument  'ARRAY' which allows
>      >>
>      >>>  sapply(v, myF, y = 2*(1:5), ARRAY=TRUE)
>
>      >  It would reduce the proliferation of arguments if the
>      >  simplify= argument were extended to allow this,
>      >  e.g. simplify = "array" or perhaps simplify = n would
>      >  allow a maximum of n dimensions.
>
> That's a good idea, though it makes the
> implementation/documentation very slightly more complicated.
>
> I'm interested to get more feedback on my other questions,
> notably the only about *changing*  vapply() (on the C-level) to
> behave "logical" in the sense of adding one  dim(.)ension in
> those cases, the FUN.VALUE (result prototype) has a dim().
>
>
> Martin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Martin Maechler
On Tue, Dec 28, 2010 at 19:14, Tony Plate <[hidden email]> wrote:

> The abind() function from the abind package is an alternative here -- it can
> take a list argument, which makes it easy to use with the result of
> lapply().  It's also able take direction about which dimension to join on.
>
>> x <- list(a=1,b=2,c=3)
>> f <- function(v) matrix(v, nrow=2, ncol=4)
>> sapply(x, f)
>     a b c
> [1,] 1 2 3
> [2,] 1 2 3
> [3,] 1 2 3
> [4,] 1 2 3
> [5,] 1 2 3
> [6,] 1 2 3
> [7,] 1 2 3
> [8,] 1 2 3
>>
>> # The 'along=' argument to abind() determines on which dimension
>> # the list elements are joined.  Use a fractional value to put the new
>> # dimension between existing ones.
>>
>> dim(abind(lapply(x, f), along=0))
> [1] 3 2 4
>> dim(abind(lapply(x, f), along=1.5))
> [1] 2 3 4
>> dim(abind(lapply(x, f), along=3))
> [1] 2 4 3
>> abind(lapply(x, f), along=3)
> , , a
>
>     [,1] [,2] [,3] [,4]
> [1,]    1    1    1    1
> [2,]    1    1    1    1
>
> , , b
>
>     [,1] [,2] [,3] [,4]
> [1,]    2    2    2    2
> [2,]    2    2    2    2
>
> , , c
>
>     [,1] [,2] [,3] [,4]
> [1,]    3    3    3    3
> [2,]    3    3    3    3
>

Thank you, Tony.
Indeed, yes,  abind() is nice here (and in the good ol' APL spirit !)

Wanting to keep things both simple *and* fast here, of course,
hence I currently contemplate the following code,
where the new  simplify2array()  is  considerably simpler than  abind():

##' "Simplify" a list of commonly structured components into an array.
##'
##' @title simplify list() to an array if the list elements are
structurally equal
##' @param x a list, typically resulting from lapply()
##' @param higher logical indicating if an array() of "higher rank"
##'  should be returned when appropriate, namely when all elements of
##' \code{x} have the same \code{\link{dim}()}ension.
##' @return x itself, or an array if the simplification "is sensible"
simplify2array <- function(x, higher = TRUE)
{
    if(length(common.len <- unique(unlist(lapply(x, length)))) > 1L)
        return(x)
    if(common.len == 1L)
        unlist(x, recursive = FALSE)
    else if(common.len > 1L) {
        n <- length(x)
        ## make sure that array(*) will not call rep() {e.g. for 'call's}:
        r <- as.vector(unlist(x, recursive = FALSE))
        if(higher && length(c.dim <- unique(lapply(x, dim))) == 1 &&
           is.numeric(c.dim <- c.dim[[1L]]) &&
           prod(d <- c(c.dim, n)) == length(r)) {

            iN1 <- is.null(n1 <- dimnames(x[[1L]]))
            n2 <- names(x)
            dnam <-
                if(!(iN1 && is.null(n2)))
                    c(if(iN1) rep.int(list(n1), length(c.dim)) else n1,
                      list(n2)) ## else NULL
            array(r, dim = d, dimnames = dnam)

        } else if(prod(d <- c(common.len, n)) == length(r))
            array(r, dim = d,
                  dimnames= if(!(is.null(n1 <- names(x[[1L]])) &
                  is.null(n2 <- names(x)))) list(n1,n2))
        else x
    }
    else x
}

sapply <- function(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
{
    FUN <- match.fun(FUN)
    answer <- lapply(X, FUN, ...)
    if(USE.NAMES && is.character(X) && is.null(names(answer)))
        names(answer) <- X
    if(!identical(simplify, FALSE) && length(answer))
        simplify2array(answer, higher = (simplify == "array"))
    else answer
}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: RFC: sapply() limitation from vector to matrix, but not further

Martin Maechler
>>>>> Martin Maechler <[hidden email]>
>>>>>     on Tue, 28 Dec 2010 20:06:07 +0100 writes:

    > On Tue, Dec 28, 2010 at 19:14, Tony Plate <[hidden email]>
    > wrote:
    >> The abind() function from the abind package is an
    >> alternative here -- it can take a list argument, which
    >> makes it easy to use with the result of lapply().  It's
    >> also able take direction about which dimension to join
    >> on.
    >>
    >>> x <- list(a=1,b=2,c=3) f <- function(v) matrix(v,
    >>> nrow=2, ncol=4) sapply(x, f)
    >>     a b c [1,] 1 2 3 [2,] 1 2 3 [3,] 1 2 3 [4,] 1 2 3
    >> [5,] 1 2 3 [6,] 1 2 3 [7,] 1 2 3 [8,] 1 2 3
    >>>
    >>> # The 'along=' argument to abind() determines on which
    >>> dimension # the list elements are joined.  Use a
    >>> fractional value to put the new # dimension between
    >>> existing ones.
    >>>
    >>> dim(abind(lapply(x, f), along=0))
    >> [1] 3 2 4
    >>> dim(abind(lapply(x, f), along=1.5))
    >> [1] 2 3 4
    >>> dim(abind(lapply(x, f), along=3))
    >> [1] 2 4 3
    >>> abind(lapply(x, f), along=3)
    >> , , a
    >>
    >>     [,1] [,2] [,3] [,4] [1,]    1    1    1    1 [2,]  
    >>  1    1    1    1
    >>
    >> , , b
    >>
    >>     [,1] [,2] [,3] [,4] [1,]    2    2    2    2 [2,]  
    >>  2    2    2    2
    >>
    >> , , c
    >>
    >>     [,1] [,2] [,3] [,4] [1,]    3    3    3    3 [2,]  
    >>  3    3    3    3
    >>

    > Thank you, Tony.
    > Indeed, yes, abind() is nice here (and in the good ol' APL
    > spirit !)

    > Wanting to keep things both simple *and* fast here, of
    > course, hence I currently contemplate the following code,
    > where the new simplify2array() is considerably simpler
    > than abind():

>     ##' "Simplify" a list of commonly structured components into an array.
>     ##'
>     ##' @title simplify list() to an array if the list elements are structurally equal
>     ##' @param x a list, typically resulting from lapply()
>     ##' @param higher logical indicating if an array() of "higher rank"
>     ##'  should be returned when appropriate, namely when all elements of
>     ##' \code{x} have the same \code{\link{dim}()}ension.
>     ##' @return x itself, or an array if the simplification "is sensible"
>     simplify2array <- function(x, higher = TRUE)
>     {
> if(length(common.len <- unique(unlist(lapply(x, length)))) > 1L)
>    return(x)
> if(common.len == 1L)
>    unlist(x, recursive = FALSE)
> else if(common.len > 1L) {
>    n <- length(x)
>    ## make sure that array(*) will not call rep() {e.g. for 'call's}:
>    r <- as.vector(unlist(x, recursive = FALSE))
>    if(higher && length(c.dim <- unique(lapply(x, dim))) == 1 &&
>       is.numeric(c.dim <- c.dim[[1L]]) &&
>       prod(d <- c(c.dim, n)) == length(r)) {

> iN1 <- is.null(n1 <- dimnames(x[[1L]]))
> n2 <- names(x)
> dnam <-
>    if(!(iN1 && is.null(n2)))
> c(if(iN1) rep.int(list(n1), length(c.dim)) else n1,
>  list(n2)) ## else NULL
> array(r, dim = d, dimnames = dnam)

>    } else if(prod(d <- c(common.len, n)) == length(r))
> array(r, dim = d,
>      dimnames= if(!(is.null(n1 <- names(x[[1L]])) &
>      is.null(n2 <- names(x)))) list(n1,n2))
>    else x
> }
> else x
>     }

>     sapply <- function(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
>     {
> FUN <- match.fun(FUN)
> answer <- lapply(X, FUN, ...)
> if(USE.NAMES && is.character(X) && is.null(names(answer)))
>    names(answer) <- X
> if(!identical(simplify, FALSE) && length(answer))
>    simplify2array(answer, higher = (simplify == "array"))
> else answer
>     }

As some may have noted, the above has been committed to R-devel
   (r53886 | maechler | 2010-12-29 10:36:01 +0100)

with the extra
------------------------------------------------------
NOTE: vapply() and replicate() doing that *by default*
------------------------------------------------------
which means that I've deared to let vapply() and replicate()
behave logically (in the above sense) by default, i.e.
*not* back compatibly.

If you want to remain bug-compatible (:-),
for replicate() you can explicitly ask for  'simplify=TRUE'
instead of the new default simplify="array".
For vapply(), the extra work of implementing such a back/bug
compatibility option did not seem worth; in particular, as
vapply() is very new and not used in many places (on CRAN)
anyway.

The new replicate() default behavior has lead to two CRAN
packages ('emoa', 'plsRglm' whose authors I'll address privately)
to fail 'R CMD check'; inspection however shows that in both cases,

1) the check failure is from examples / test functions

2) the usage there being

        t(replicate(N, foobar()))

where foobar() returns a 1D array instead of a vector, so one
way to "fix" the problem would be to change the above to

        t(replicate(N, t(foobar())))

So, in summary, the changed behavior of replicate() seems indeed
more logical insofar as it revealed programming/usage glitches
in other parts of R code.

BTW: The above makes me considering --- once again -- extending the
     definition of  t(a) to arrays a of array-rank {:= length(dim(a))} >= 3,
     and there generalize t(.) to be the same as
     aperm(., rev(seq_along(dim(.))))

     {.. in the APL tradition where t() and aperm() really where the same}

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...