apply with zero-row matrix

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

apply with zero-row matrix

David Hugh-Jones-3
Forgive me if this has been asked many times before, but I couldn't find
anything on the mailing lists.

I'd expect apply(m, 1, foo) not to call `foo` if m is a matrix with zero
rows.
In fact:

m <- matrix(NA, 0, 5)
apply(m, 1, function (x) {cat("Called...\n"); print(x)})
## Called...
## [1] FALSE FALSE FALSE FALSE FALSE

Similarly for apply(m, 2,...) if m has no columns.
Is there a reason for this? Could it be documented?

David
--
Sent from Gmail Mobile

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: apply with zero-row matrix

Martin Maechler
>>>>> David Hugh-Jones
>>>>>     on Mon, 30 Jul 2018 05:33:19 +0100 writes:

    > Forgive me if this has been asked many times before, but I
    > couldn't find anything on the mailing lists.

    > I'd expect apply(m, 1, foo) not to call `foo` if m is a
    > matrix with zero rows.  In fact:

    > m <- matrix(NA, 0, 5)
    > apply(m, 1, function (x) {cat("Called...\n"); print(x)})
    > ## Called...
    > ## [1] FALSE FALSE FALSE FALSE FALSE


    > Similarly for apply(m, 2,...) if m has no columns.  Is
    > there a reason for this?

Yes :

The reverse is really true for almost all basic R functions:

    They *are* called and give an "empty" result automatically
    when the main argument is empty.

What you basicaly propose is to add an extra

     if(<length 0 input>)
      return(<correspondingly formatted length-0 output>)

to all R functions.  While that makes sense for high-level R
functions that do a lot of things, this would really be a bad
idea in general :

This would make all of these basic functions larger {more to maintain} and
slightly slower for all non-zero cases just to make them
slightly faster for the rare zero-length case.

Martin Maechler
ETH Zurich and R core Team

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: apply with zero-row matrix

David Hugh-Jones-3
Hi Martin,

Fair enough for R functions in general. But the behaviour of apply violates
the expectation that apply(m, 1, fun) calls fun n times when m has n rows.
That seems pretty basic.

Also, I understand from your argument why it makes sense to call apply and
return a special result (presumably NULL) for an empty argument; but why
should apply call fun?

Cheers
David

On Mon, 30 Jul 2018 at 08:41, Martin Maechler <[hidden email]>
wrote:

> >>>>> David Hugh-Jones
> >>>>>     on Mon, 30 Jul 2018 05:33:19 +0100 writes:
>
>     > Forgive me if this has been asked many times before, but I
>     > couldn't find anything on the mailing lists.
>
>     > I'd expect apply(m, 1, foo) not to call `foo` if m is a
>     > matrix with zero rows.  In fact:
>
>     > m <- matrix(NA, 0, 5)
>     > apply(m, 1, function (x) {cat("Called...\n"); print(x)})
>     > ## Called...
>     > ## [1] FALSE FALSE FALSE FALSE FALSE
>
>
>     > Similarly for apply(m, 2,...) if m has no columns.  Is
>     > there a reason for this?
>
> Yes :
>
> The reverse is really true for almost all basic R functions:
>
>     They *are* called and give an "empty" result automatically
>     when the main argument is empty.
>
> What you basicaly propose is to add an extra
>
>      if(<length 0 input>)
>          return(<correspondingly formatted length-0 output>)
>
> to all R functions.  While that makes sense for high-level R
> functions that do a lot of things, this would really be a bad
> idea in general :
>
> This would make all of these basic functions larger {more to maintain} and
> slightly slower for all non-zero cases just to make them
> slightly faster for the rare zero-length case.
>
> Martin Maechler
> ETH Zurich and R core Team
>
> --
Sent from Gmail Mobile

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: apply with zero-row matrix

Martin Maechler
>>>>> David Hugh-Jones
>>>>>     on Mon, 30 Jul 2018 10:12:24 +0100 writes:

    > Hi Martin, Fair enough for R functions in general. But the
    > behaviour of apply violates the expectation that apply(m,
    > 1, fun) calls fun n times when m has n rows.  That seems
    > pretty basic.

Well, that expectation is obviously wrong ;-)  see below

    > Also, I understand from your argument why it makes sense
    > to call apply and return a special result (presumably
    > NULL) for an empty argument; but why should apply call fun?

    > Cheers David

The reason is seen e.g. in

    > apply(matrix(,0,3), 2, quantile)
         [,1] [,2] [,3]
    0%     NA   NA   NA
    25%    NA   NA   NA
    50%    NA   NA   NA
    75%    NA   NA   NA
    100%   NA   NA   NA
    >

and that is documented (+/-) in the first paragraph of the
'Value:' section of help(apply) :

 > Value:
 >
 >      If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
 >      returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’.
 >      If ‘n’ equals ‘1’, ‘apply’ returns a vector if ‘MARGIN’ has length
 >      1 and an array of dimension ‘dim(X)[MARGIN]’ otherwise.  If ‘n’ is
 >      ‘0’, the result has length 0 but not necessarily the ‘correct’
 >      dimension.


To determine 'n', the function *is* called once even when
length(X) ==  0

It may indeed be would helpful to add this explicitly to the
help page  ( <R>/src/library/base/man/apply.Rd ).
Can you propose a wording (in *.Rd if possible) ?

With regards,
Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: apply with zero-row matrix

Emil
In reply to this post by David Hugh-Jones-3
    Hi David,
   
    Besides Martins point, there is also the issue that for a lot of cases you would still like to have the right class returned.
    Right now these are returns:
   
    > apply(matrix(NA_integer_,0,5), 1, class)
    character(0)
    > apply(matrix(NA_integer_,0,5), 1, identity)
    integer(0)
    > apply(matrix(NA,0,5), 1, identity)
    logical(0)
   
    In your case, these would all return NULL, so I think there is value in running FUN at least once (Say if you'd want to check if FUN always returns the right class).
    And from a philosophical point of view, R is mostly a functional programming language, I think if you want side-effects a for-loop would look better.
   
   
    Best regards,
    Emil Bode
     
    Data-analyst
     
    +31 6 43 83 89 33
    [hidden email]
     
    DANS: Netherlands Institute for Permanent Access to Digital Research Resources
    Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | [hidden email] <mailto:[hidden email]> | dans.knaw.nl <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
    DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding organisation NWO <http://www.nwo.nl/>.
   
    On 30/07/2018, 11:12, "R-devel on behalf of David Hugh-Jones" <[hidden email] on behalf of [hidden email]> wrote:
   
        Hi Martin,
       
        Fair enough for R functions in general. But the behaviour of apply violates
        the expectation that apply(m, 1, fun) calls fun n times when m has n rows.
        That seems pretty basic.
       
        Also, I understand from your argument why it makes sense to call apply and
        return a special result (presumably NULL) for an empty argument; but why
        should apply call fun?
       
        Cheers
        David
       
        On Mon, 30 Jul 2018 at 08:41, Martin Maechler <[hidden email]>
        wrote:
       
        > >>>>> David Hugh-Jones
        > >>>>>     on Mon, 30 Jul 2018 05:33:19 +0100 writes:
        >
        >     > Forgive me if this has been asked many times before, but I
        >     > couldn't find anything on the mailing lists.
        >
        >     > I'd expect apply(m, 1, foo) not to call `foo` if m is a
        >     > matrix with zero rows.  In fact:
        >
        >     > m <- matrix(NA, 0, 5)
        >     > apply(m, 1, function (x) {cat("Called...\n"); print(x)})
        >     > ## Called...
        >     > ## [1] FALSE FALSE FALSE FALSE FALSE
        >
        >
        >     > Similarly for apply(m, 2,...) if m has no columns.  Is
        >     > there a reason for this?
        >
        > Yes :
        >
        > The reverse is really true for almost all basic R functions:
        >
        >     They *are* called and give an "empty" result automatically
        >     when the main argument is empty.
        >
        > What you basicaly propose is to add an extra
        >
        >      if(<length 0 input>)
        >          return(<correspondingly formatted length-0 output>)
        >
        > to all R functions.  While that makes sense for high-level R
        > functions that do a lot of things, this would really be a bad
        > idea in general :
        >
        > This would make all of these basic functions larger {more to maintain} and
        > slightly slower for all non-zero cases just to make them
        > slightly faster for the rare zero-length case.
        >
        > Martin Maechler
        > ETH Zurich and R core Team
        >
        > --
        Sent from Gmail Mobile
       
        [[alternative HTML version deleted]]
       
        ______________________________________________
        [hidden email] mailing list
        https://stat.ethz.ch/mailman/listinfo/r-devel
       
   
   

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: apply with zero-row matrix

Gabor Grothendieck
In reply to this post by David Hugh-Jones-3
Try pmap and related functions in purrr:

  pmap(as.data.frame(m), ~ { cat("Called...\n"); print(c(...)) })
  ## list()

On Mon, Jul 30, 2018 at 12:33 AM, David Hugh-Jones
<[hidden email]> wrote:

> Forgive me if this has been asked many times before, but I couldn't find
> anything on the mailing lists.
>
> I'd expect apply(m, 1, foo) not to call `foo` if m is a matrix with zero
> rows.
> In fact:
>
> m <- matrix(NA, 0, 5)
> apply(m, 1, function (x) {cat("Called...\n"); print(x)})
> ## Called...
> ## [1] FALSE FALSE FALSE FALSE FALSE
>
> Similarly for apply(m, 2,...) if m has no columns.
> Is there a reason for this? Could it be documented?
>
> David
> --
> Sent from Gmail Mobile
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: apply with zero-row matrix

David Hugh-Jones-3
Interesting discussion. I'm not wholly convinced by Martin's and Emil's
arguments. The behaviour seems to violate an obvious expectation (fun is
called once per row) to satisfy a subtle one (result has a guaranteed
dimension and type).

In any case, here's a suggested chunk of rd to go at the end of the "Value":

If \code{dim(X)[MARGIN]} is zero, then \code{FUN} is called once, with an
argument of the appropriate dimensions. The argument's type is the same as
\code{typeof(m)}, and the argument values are those returned by
\code{vector(typeof(m))}. For example, if m is numeric, the argument will
be a vector (or matrix or array) of zeroes. The type and length of the
value returned by \code{FUN} is used to determine the type of the result.

And at the end of "Details":

\code{FUN} is always called at least once, see below.


David


On Mon, 30 Jul 2018 at 15:05, Gabor Grothendieck <[hidden email]>
wrote:

> Try pmap and related functions in purrr:
>
>   pmap(as.data.frame(m), ~ { cat("Called...\n"); print(c(...)) })
>   ## list()
>
> On Mon, Jul 30, 2018 at 12:33 AM, David Hugh-Jones
> <[hidden email]> wrote:
> > Forgive me if this has been asked many times before, but I couldn't find
> > anything on the mailing lists.
> >
> > I'd expect apply(m, 1, foo) not to call `foo` if m is a matrix with zero
> > rows.
> > In fact:
> >
> > m <- matrix(NA, 0, 5)
> > apply(m, 1, function (x) {cat("Called...\n"); print(x)})
> > ## Called...
> > ## [1] FALSE FALSE FALSE FALSE FALSE
> >
> > Similarly for apply(m, 2,...) if m has no columns.
> > Is there a reason for this? Could it be documented?
> >
> > David
> > --
> > Sent from Gmail Mobile
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: apply with zero-row matrix

Deepayan Sarkar
In reply to this post by Martin Maechler
On Mon, Jul 30, 2018 at 6:08 PM, Martin Maechler
<[hidden email]> wrote:

>>>>>> David Hugh-Jones
>>>>>>     on Mon, 30 Jul 2018 10:12:24 +0100 writes:
>
>     > Hi Martin, Fair enough for R functions in general. But the
>     > behaviour of apply violates the expectation that apply(m,
>     > 1, fun) calls fun n times when m has n rows.  That seems
>     > pretty basic.
>
> Well, that expectation is obviously wrong ;-)  see below
>
>     > Also, I understand from your argument why it makes sense
>     > to call apply and return a special result (presumably
>     > NULL) for an empty argument; but why should apply call fun?
>
>     > Cheers David
>
> The reason is seen e.g. in
>
>     > apply(matrix(,0,3), 2, quantile)
>          [,1] [,2] [,3]
>     0%     NA   NA   NA
>     25%    NA   NA   NA
>     50%    NA   NA   NA
>     75%    NA   NA   NA
>     100%   NA   NA   NA
>     >

I don't think this example is relevant to what David is saying:
matrix(,0,3) has three columns, so he would expect quantile() to be
called 3 times, as it is.

I think his question is why quantile() is called at all when the input
has 0 rows, as in

apply(matrix(,0,3), 1, quantile)
# named numeric(0)

> and that is documented (+/-) in the first paragraph of the
> 'Value:' section of help(apply) :
>
>  > Value:
>  >
>  >      If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
>  >      returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’.
>  >      If ‘n’ equals ‘1’, ‘apply’ returns a vector if ‘MARGIN’ has length
>  >      1 and an array of dimension ‘dim(X)[MARGIN]’ otherwise.  If ‘n’ is
>  >      ‘0’, the result has length 0 but not necessarily the ‘correct’
>  >      dimension.
>
>
> To determine 'n', the function *is* called once even when
> length(X) ==  0

This part of the docs also doesn't seem applicable, and in fact seems
incorrect: here we should have (according to the docs)

n = length(quantile(logical(0))) # 5

but the result does not have dim == c(5, 0) as the docs suggest:

dim(apply(matrix(,0,3), 1, quantile))
# NULL

So the length of the result of calling FUN() seems to be ignored in
this case, and as Emil points out, is only used to determine the mode
of the result.

I can't immediately think of an example where returning NULL instead
would make a difference, but there may well be some.

-Deepayan

> It may indeed be would helpful to add this explicitly to the
> help page  ( <R>/src/library/base/man/apply.Rd ).
> Can you propose a wording (in *.Rd if possible) ?
>
> With regards,
> Martin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: apply with zero-row matrix

R devel mailing list
In reply to this post by Martin Maechler
vapply has a mandatory FUN.VALUE argument which specifies the type and size
of FUN's return value.  This helps when you want to cover the 0-length case
without 'if' statements.  You can change your apply calls to vapply calls,
but they will be a bit more complicated.  E.g.,  change
   apply(X=myMatrix, MARGIN=2, FUN=quantile)
to
   vapply(seq_len(ncol(myMatrix)), FUN=function(i)quantile(myMatrix[,i]),
FUN.VALUE=numeric(5))

The latter will always return a 5-row by ncol(myMatrix) matrix.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Jul 30, 2018 at 5:38 AM, Martin Maechler <[hidden email]
> wrote:

> >>>>> David Hugh-Jones
> >>>>>     on Mon, 30 Jul 2018 10:12:24 +0100 writes:
>
>     > Hi Martin, Fair enough for R functions in general. But the
>     > behaviour of apply violates the expectation that apply(m,
>     > 1, fun) calls fun n times when m has n rows.  That seems
>     > pretty basic.
>
> Well, that expectation is obviously wrong ;-)  see below
>
>     > Also, I understand from your argument why it makes sense
>     > to call apply and return a special result (presumably
>     > NULL) for an empty argument; but why should apply call fun?
>
>     > Cheers David
>
> The reason is seen e.g. in
>
>     > apply(matrix(,0,3), 2, quantile)
>          [,1] [,2] [,3]
>     0%     NA   NA   NA
>     25%    NA   NA   NA
>     50%    NA   NA   NA
>     75%    NA   NA   NA
>     100%   NA   NA   NA
>     >
>
> and that is documented (+/-) in the first paragraph of the
> 'Value:' section of help(apply) :
>
>  > Value:
>  >
>  >      If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
>  >      returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’.
>  >      If ‘n’ equals ‘1’, ‘apply’ returns a vector if ‘MARGIN’ has length
>  >      1 and an array of dimension ‘dim(X)[MARGIN]’ otherwise.  If ‘n’ is
>  >      ‘0’, the result has length 0 but not necessarily the ‘correct’
>  >      dimension.
>
>
> To determine 'n', the function *is* called once even when
> length(X) ==  0
>
> It may indeed be would helpful to add this explicitly to the
> help page  ( <R>/src/library/base/man/apply.Rd ).
> Can you propose a wording (in *.Rd if possible) ?
>
> With regards,
> Martin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel