iterated lapply

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

iterated lapply

Daniel Kaschek
Hi everybody,

with the following code I generate a list of functions. Each function
reflects a "condition". When I evaluate this list of functions by
another lapply/sapply, I get an unexpected result: all values coincide.
However, when I uncomment the print(), it works as expected. Is this a
bug or a feature?

conditions <- 1:4
test <- lapply(conditions, function(mycondition){
  #print(mycondition)
  myfn <- function(i) mycondition*i
  return(myfn)
})

sapply(test, function(myfn) myfn(2))



Cheers,
Daniel

--
Daniel Kaschek
Institute of Physics
Freiburg University

Room 210
Phone: +49 761 2038531

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Jeroen Ooms
On Mon, Feb 23, 2015 at 12:57 PM, Daniel Kaschek
<[hidden email]> wrote:
> Is this a bug or a feature?

I think it is a bug. If we use substitute to inspect the promise, it
appears the index number is always equal to its last value:

vec <- c("foo", "bar", "baz")
test <- lapply(vec, function(x){
  function(){x}
})
substitute(x, environment(test[[1]]))
substitute(x, environment(test[[2]]))

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Duncan Temple Lang-3
In reply to this post by Daniel Kaschek
Use force() (or anything that evaluates mycondition, e.g. your print):

function(mycondition) {
  force(mycondition)
  function(i) mycondition * i
}

within the lapply() loop.

Not a bug, but does surprise people. It is lazy evaluation.

  D.

On 2/23/15 12:57 PM, Daniel Kaschek wrote:

> Hi everybody,
>
> with the following code I generate a list of functions. Each function
> reflects a "condition". When I evaluate this list of functions by
> another lapply/sapply, I get an unexpected result: all values coincide.
> However, when I uncomment the print(), it works as expected. Is this a
> bug or a feature?
>
> conditions <- 1:4
> test <- lapply(conditions, function(mycondition){
>   #print(mycondition)
>   myfn <- function(i) mycondition*i
>   return(myfn)
> })
>
> sapply(test, function(myfn) myfn(2))
>
>
>
> Cheers,
> Daniel
>

--
Director, Data Sciences Initiative, UC Davis
Professor, Dept. of Statistics, UC Davis

http://datascience.ucdavis.edu
http://www.stat.ucdavis.edu/~duncan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Duncan Murdoch-2
In reply to this post by Daniel Kaschek
On 23/02/2015 3:57 PM, Daniel Kaschek wrote:
> Hi everybody,
>
> with the following code I generate a list of functions. Each function
> reflects a "condition". When I evaluate this list of functions by
> another lapply/sapply, I get an unexpected result: all values coincide.
> However, when I uncomment the print(), it works as expected. Is this a
> bug or a feature?
>

Arguments aren't evaluated until they are used.  The force() function
can be used to force evaluation when you want it.

This is a feature:  it allows you to have arguments that are never
evaluated, because they are never used, or defaults that depend on
things that are calculated within the function.

Duncan Murdoch


> conditions <- 1:4
> test <- lapply(conditions, function(mycondition){
>   #print(mycondition)
>   myfn <- function(i) mycondition*i
>   return(myfn)
> })
>
> sapply(test, function(myfn) myfn(2))
>
>
>
> Cheers,
> Daniel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Eduardo Arino de la Rubia
In reply to this post by Jeroen Ooms
Greetings!

I thought it was a lazy evaluation thing. I added "force" around
mycondition and everything worked as expected.

Cheers!

On Mon Feb 23 2015 at 1:47:20 PM Jeroen Ooms <[hidden email]> wrote:

> On Mon, Feb 23, 2015 at 12:57 PM, Daniel Kaschek
> <[hidden email]> wrote:
> > Is this a bug or a feature?
>
> I think it is a bug. If we use substitute to inspect the promise, it
> appears the index number is always equal to its last value:
>
> vec <- c("foo", "bar", "baz")
> test <- lapply(vec, function(x){
>   function(){x}
> })
> substitute(x, environment(test[[1]]))
> substitute(x, environment(test[[2]]))
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Daniel Kaschek
In reply to this post by Duncan Murdoch-2
On Mo, 2015-02-23 at 16:54 -0500, Duncan Murdoch wrote:
> This is a feature:  it allows you to have arguments that are never
> evaluated, because they are never used, or defaults that depend on
> things that are calculated within the function.

I haven't thought about the thing with the default arguments. That's
really a feature.

Thanks,
Daniel

>
> Duncan Murdoch
>
>
> > conditions <- 1:4
> > test <- lapply(conditions, function(mycondition){
> >   #print(mycondition)
> >   myfn <- function(i) mycondition*i
> >   return(myfn)
> > })
> >
> > sapply(test, function(myfn) myfn(2))
> >
> >
> >
> > Cheers,
> > Daniel
> >
>

--
Daniel Kaschek
Institute of Physics
Freiburg University

Room 210
Phone: +49 761 2038531

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Radford Neal
In reply to this post by Daniel Kaschek
From: Daniel Kaschek <[hidden email]>

> ... When I evaluate this list of functions by
> another lapply/sapply, I get an unexpected result: all values coincide.
> However, when I uncomment the print(), it works as expected. Is this a
> bug or a feature?
>
> conditions <- 1:4
> test <- lapply(conditions, function(mycondition){
>   #print(mycondition)
>   myfn <- function(i) mycondition*i
>   return(myfn)
> })
>
> sapply(test, function(myfn) myfn(2))

From: Jeroen Ooms <[hidden email]>
> I think it is a bug. If we use substitute to inspect the promise, it
> appears the index number is always equal to its last value:

From: Duncan Temple Lang <[hidden email]>
> Not a bug, but does surprise people. It is lazy evaluation.


I think it is indeed a bug.  The lapply code saves a bit of time by
reusing the same storage for the scalar index number every iteration.
This amounts to modifying the R code that was used for the previous
function call.  There's no justification for doing this in the
documentation for lapply.  It is certainly not desired behaviour,
except in so far as it allows a slight savings in time (which is
minor, given the time that the function call itself will take).

   Radford Neal

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Tierney, Luke
The documentation is not specific enough on the indented semantics in
this situation to consider this a bug. The original R-level
implementation of lapply was

     lapply <- function(X, FUN, ...) {
         FUN <- match.fun(FUN)
         if (!is.list(X))
         X <- as.list(X)
         rval <- vector("list", length(X))
         for(i in seq(along = X))
         rval[i] <- list(FUN(X[[i]], ...))
         names(rval) <- names(X)           # keep `names' !
         return(rval)
     }

and the current internal implementation is consistent with this. With
a loop like this lazy evaluation and binding assignment interact in
this way; the force() function was introduced to help with this.

That said, the expression FUN(X[[i]], ...) could be replaced by

     local({
         i <- i
         list(FUN(X[[i]], ...)
     })

which would produce the more desirable result

     > sapply(test, function(myfn) myfn(2))
     [1] 2 4 6 8

The C implementation could use this approach, or could rebuild the
expression being evaluated at each call to get almost the same semantics.
Both would add a little overhead. Some code optimization might reduce
the overhead in some instances (e.g. if FUN is a BUILTIN), but it's
not clear that would be worth while.

Variants of this issue arise in a couple of places so it may be worth
looking into.

Best,

luke

On Tue, 24 Feb 2015, Radford Neal wrote:

> From: Daniel Kaschek <[hidden email]>
>> ... When I evaluate this list of functions by
>> another lapply/sapply, I get an unexpected result: all values coincide.
>> However, when I uncomment the print(), it works as expected. Is this a
>> bug or a feature?
>>
>> conditions <- 1:4
>> test <- lapply(conditions, function(mycondition){
>>   #print(mycondition)
>>   myfn <- function(i) mycondition*i
>>   return(myfn)
>> })
>>
>> sapply(test, function(myfn) myfn(2))
>
> From: Jeroen Ooms <[hidden email]>
>> I think it is a bug. If we use substitute to inspect the promise, it
>> appears the index number is always equal to its last value:
>
> From: Duncan Temple Lang <[hidden email]>
>> Not a bug, but does surprise people. It is lazy evaluation.
>
>
> I think it is indeed a bug.  The lapply code saves a bit of time by
> reusing the same storage for the scalar index number every iteration.
> This amounts to modifying the R code that was used for the previous
> function call.  There's no justification for doing this in the
> documentation for lapply.  It is certainly not desired behaviour,
> except in so far as it allows a slight savings in time (which is
> minor, given the time that the function call itself will take).
>
>   Radford Neal
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Michael Weylandt

> On Feb 24, 2015, at 10:50 AM, <[hidden email]> wrote:
>
> The documentation is not specific enough on the indented semantics in
> this situation to consider this a bug. The original R-level
> implementation of lapply was
>
>    lapply <- function(X, FUN, ...) {
>        FUN <- match.fun(FUN)
>        if (!is.list(X))
>        X <- as.list(X)
>        rval <- vector("list", length(X))
>        for(i in seq(along = X))
>        rval[i] <- list(FUN(X[[i]], ...))
>        names(rval) <- names(X)           # keep `names' !
>        return(rval)
>    }
>
> and the current internal implementation is consistent with this. With
> a loop like this lazy evaluation and binding assignment interact in
> this way; the force() function was introduced to help with this.
>
> That said, the expression FUN(X[[i]], ...) could be replaced by
>
>    local({
>        i <- i
>        list(FUN(X[[i]], ...)
>    })
>
> which would produce the more desirable result
>
>    > sapply(test, function(myfn) myfn(2))
>    [1] 2 4 6 8
>

Would the same semantics be applied to parallel::mclapply and friends?

sapply(lapply(1:4, function(c){function(i){c*i}}), function(f) f(2))

# [1] 8 8 8 8

sapply(mclapply(1:4, function(c){function(i){c*i}}), function(f) f(2))

# [1] 6 8 6 8

I understand why they differ, but making mclapply easier for 'drop-in' parallelism might be a good thing.

Michael
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Benjamin Tyner
Actually, it depends on the number of cores:

    > fun1 <- function(c){function(i){c*i}}
    > fun2 <- function(f) f(2)
    > sapply(mclapply(1:4, fun1, mc.cores=1L), fun2)
    [1] 8 8 8 8
    > sapply(mclapply(1:4, fun1, mc.cores=2L), fun2)
    [1] 6 8 6 8
    > sapply(mclapply(1:4, fun1, mc.cores=4L), fun2)
    [1] 2 4 6 8

> >/ On Feb 24, 2015, at 10:50 AM, <luke-tierney at uiowa.edu <https://stat.ethz.ch/mailman/listinfo/r-devel>> wrote:
> />/
> />/ The documentation is not specific enough on the indented semantics in
> />/ this situation to consider this a bug. The original R-level
> />/ implementation of lapply was
> />/
> />/    lapply <- function(X, FUN, ...) {
> />/        FUN <- match.fun(FUN)
> />/        if (!is.list(X))
> />/        X <- as.list(X)
> />/        rval <- vector("list", length(X))
> />/        for(i in seq(along = X))
> />/        rval[i] <- list(FUN(X[[i]], ...))
> />/        names(rval) <- names(X)           # keep `names' !
> />/        return(rval)
> />/    }
> />/
> />/ and the current internal implementation is consistent with this. With
> />/ a loop like this lazy evaluation and binding assignment interact in
> />/ this way; the force() function was introduced to help with this.
> />/
> />/ That said, the expression FUN(X[[i]], ...) could be replaced by
> />/
> />/    local({
> />/        i <- i
> />/        list(FUN(X[[i]], ...)
> />/    })
> />/
> />/ which would produce the more desirable result
> />/
> />/    > sapply(test, function(myfn) myfn(2))
> />/    [1] 2 4 6 8
> />/
> /
> Would the same semantics be applied to parallel::mclapply and friends?
>
> sapply(lapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
>
> # [1] 8 8 8 8
>
> sapply(mclapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
>
> # [1] 6 8 6 8
>
> I understand why they differ, but making mclapply easier for 'drop-in' parallelism might be a good thing.
>
> Michael

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Michael Weylandt


> On Feb 25, 2015, at 5:35 PM, Benjamin Tyner <[hidden email]> wrote:
>
> Actually, it depends on the number of cores:

Under current semantics, yes. Each 'stream' of function calls is lazily capturing the last value of `i` on that core.

Under Luke's proposed semantics (IIUC),
the result would be the same (2,4,6,8) for both parallel and serial execution. This is what allows for 'drop-in' parallelism.


>> fun1 <- function(c){function(i){c*i}}
>> fun2 <- function(f) f(2)
>> sapply(mclapply(1:4, fun1, mc.cores=1L), fun2)
>    [1] 8 8 8 8
>> sapply(mclapply(1:4, fun1, mc.cores=2L), fun2)
>    [1] 6 8 6 8
>> sapply(mclapply(1:4, fun1, mc.cores=4L), fun2)
>    [1] 2 4 6 8
>
>>> / On Feb 24, 2015, at 10:50 AM, <luke-tierney at uiowa.edu <https://stat.ethz.ch/mailman/listinfo/r-devel>> wrote:
>> />/
>> />/ The documentation is not specific enough on the indented semantics in
>> />/ this situation to consider this a bug. The original R-level
>> />/ implementation of lapply was
>> />/
>> />/    lapply <- function(X, FUN, ...) {
>> />/        FUN <- match.fun(FUN)
>> />/        if (!is.list(X))
>> />/        X <- as.list(X)
>> />/        rval <- vector("list", length(X))
>> />/        for(i in seq(along = X))
>> />/        rval[i] <- list(FUN(X[[i]], ...))
>> />/        names(rval) <- names(X)           # keep `names' !
>> />/        return(rval)
>> />/    }
>> />/
>> />/ and the current internal implementation is consistent with this. With
>> />/ a loop like this lazy evaluation and binding assignment interact in
>> />/ this way; the force() function was introduced to help with this.
>> />/
>> />/ That said, the expression FUN(X[[i]], ...) could be replaced by
>> />/
>> />/    local({
>> />/        i <- i
>> />/        list(FUN(X[[i]], ...)
>> />/    })
>> />/
>> />/ which would produce the more desirable result
>> />/
>> />/    > sapply(test, function(myfn) myfn(2))
>> />/    [1] 2 4 6 8
>> />/
>> /
>> Would the same semantics be applied to parallel::mclapply and friends?
>>
>> sapply(lapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
>>
>> # [1] 8 8 8 8
>>
>> sapply(mclapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
>>
>> # [1] 6 8 6 8
>>
>> I understand why they differ, but making mclapply easier for 'drop-in' parallelism might be a good thing.
>>
>> Michael
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Martin Maechler-5
>>>>> Michael Weylandt <[hidden email]>
>>>>>     on Wed, 25 Feb 2015 21:43:36 -0500 writes:

    >> On Feb 25, 2015, at 5:35 PM, Benjamin Tyner
    >> <[hidden email]> wrote:
    >>
    >> Actually, it depends on the number of cores:

    > Under current semantics, yes. Each 'stream' of function
    > calls is lazily capturing the last value of `i` on that
    > core.

    > Under Luke's proposed semantics (IIUC), the result would
    > be the same (2,4,6,8) for both parallel and serial
    > execution. This is what allows for 'drop-in' parallelism.

    >>> fun1 <- function(c){function(i){c*i}} fun2 <-
    >>> function(f) f(2) sapply(mclapply(1:4, fun1,
    >>> mc.cores=1L), fun2)
    >> [1] 8 8 8 8
    >>> sapply(mclapply(1:4, fun1, mc.cores=2L), fun2)
    >> [1] 6 8 6 8
    >>> sapply(mclapply(1:4, fun1, mc.cores=4L), fun2)
    >> [1] 2 4 6 8
    >>

Thank you, Michael and Benjamin.

I strongly agree with your statements and the very strong desirability of
these mclapply() calls to behave the same as lapply().

So indeed, something like Luke's proposed changes both for
lapply(), mclapply()  --- *and* the other *apply() versions in
the parallel packages where needed (??) --- are very desirable.

In my teaching, and in our CRAN package 'simsalapar' we
that useRs should organize computations such that using lapply
serially is used for preliminary testing and  mclapply() etc are
used for the heavy weight computations.

Best,
Martin Maechler

> >>> / On Feb 24, 2015, at 10:50 AM, <luke-tierney at uiowa.edu <https://stat.ethz.ch/mailman/listinfo/r-devel>> wrote:
> >> />/
> >> />/ The documentation is not specific enough on the indented semantics in
> >> />/ this situation to consider this a bug. The original R-level
> >> />/ implementation of lapply was
> >> />/
> >> />/    lapply <- function(X, FUN, ...) {
> >> />/        FUN <- match.fun(FUN)
> >> />/        if (!is.list(X))
> >> />/        X <- as.list(X)
> >> />/        rval <- vector("list", length(X))
> >> />/        for(i in seq(along = X))
> >> />/        rval[i] <- list(FUN(X[[i]], ...))
> >> />/        names(rval) <- names(X)           # keep `names' !
> >> />/        return(rval)
> >> />/    }
> >> />/
> >> />/ and the current internal implementation is consistent with this. With
> >> />/ a loop like this lazy evaluation and binding assignment interact in
> >> />/ this way; the force() function was introduced to help with this.
> >> />/
> >> />/ That said, the expression FUN(X[[i]], ...) could be replaced by
> >> />/
> >> />/    local({
> >> />/        i <- i
> >> />/        list(FUN(X[[i]], ...)
> >> />/    })
> >> />/
> >> />/ which would produce the more desirable result
> >> />/
> >> />/    > sapply(test, function(myfn) myfn(2))
> >> />/    [1] 2 4 6 8
> >> />/
> >> /
> >> Would the same semantics be applied to parallel::mclapply and friends?
> >>
> >> sapply(lapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
> >>
> >> # [1] 8 8 8 8
> >>
> >> sapply(mclapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
> >>
> >> # [1] 6 8 6 8
> >>
> >> I understand why they differ, but making mclapply easier for 'drop-in' parallelism might be a good thing.
> >>
> >> Michael

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

William Dunlap
In reply to this post by Tierney, Luke
Would introducing the new frame, with the call to local(), cause problems
when you use frame counting instead of <<- to modify variables outside the
scope of lapply's FUN, I think the frame counts may have to change.  E.g.,
here is code from actuar::simul() that might be affected:

        x <- unlist(lapply(nodes[[i]], seq))
        lapply(nodes[(i + 1):(nlevels - 1)],
               function(v) assign("x", rep.int(x, v), envir =
parent.frame(2)))
        m[, i] <- x

(I think the parent.frame(2) might have to be changed to parent.frame(8)
for that to work.  Such code looks pretty ugly to me but seems to be rare.)

It also seems to cause problems with some built-in functions:
newlapply <- function (X, FUN, ...)
{
    FUN <- match.fun(FUN)
    if (!is.list(X))
        X <- as.list(X)
    rval <- vector("list", length(X))
    for (i in seq(along = X)) {
        rval[i] <- list(local({
            i <- i
            FUN(X[[i]], ...)
        }))
    }
    names(rval) <- names(X)
    return(rval)
}
newlapply(1:2,log)
#Error in FUN(X[[i]], ...) : non-numeric argument to mathematical function
newlapply(1:2,function(x)log(x))
#[[1]]
#[1] 0
#
#[[2]]
#[1] 0.6931472



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Feb 24, 2015 at 7:50 AM, <[hidden email]> wrote:

> The documentation is not specific enough on the indented semantics in
> this situation to consider this a bug. The original R-level
> implementation of lapply was
>
>     lapply <- function(X, FUN, ...) {
>         FUN <- match.fun(FUN)
>         if (!is.list(X))
>         X <- as.list(X)
>         rval <- vector("list", length(X))
>         for(i in seq(along = X))
>         rval[i] <- list(FUN(X[[i]], ...))
>         names(rval) <- names(X)           # keep `names' !
>         return(rval)
>     }
>
> and the current internal implementation is consistent with this. With
> a loop like this lazy evaluation and binding assignment interact in
> this way; the force() function was introduced to help with this.
>
> That said, the expression FUN(X[[i]], ...) could be replaced by
>
>     local({
>         i <- i
>         list(FUN(X[[i]], ...)
>     })
>
> which would produce the more desirable result
>
>     > sapply(test, function(myfn) myfn(2))
>     [1] 2 4 6 8
>
> The C implementation could use this approach, or could rebuild the
> expression being evaluated at each call to get almost the same semantics.
> Both would add a little overhead. Some code optimization might reduce
> the overhead in some instances (e.g. if FUN is a BUILTIN), but it's
> not clear that would be worth while.
>
> Variants of this issue arise in a couple of places so it may be worth
> looking into.
>
> Best,
>
> luke
>
>
> On Tue, 24 Feb 2015, Radford Neal wrote:
>
>  From: Daniel Kaschek <[hidden email]>
>>
>>> ... When I evaluate this list of functions by
>>> another lapply/sapply, I get an unexpected result: all values coincide.
>>> However, when I uncomment the print(), it works as expected. Is this a
>>> bug or a feature?
>>>
>>> conditions <- 1:4
>>> test <- lapply(conditions, function(mycondition){
>>>   #print(mycondition)
>>>   myfn <- function(i) mycondition*i
>>>   return(myfn)
>>> })
>>>
>>> sapply(test, function(myfn) myfn(2))
>>>
>>
>> From: Jeroen Ooms <[hidden email]>
>>
>>> I think it is a bug. If we use substitute to inspect the promise, it
>>> appears the index number is always equal to its last value:
>>>
>>
>> From: Duncan Temple Lang <[hidden email]>
>>
>>> Not a bug, but does surprise people. It is lazy evaluation.
>>>
>>
>>
>> I think it is indeed a bug.  The lapply code saves a bit of time by
>> reusing the same storage for the scalar index number every iteration.
>> This amounts to modifying the R code that was used for the previous
>> function call.  There's no justification for doing this in the
>> documentation for lapply.  It is certainly not desired behaviour,
>> except in so far as it allows a slight savings in time (which is
>> minor, given the time that the function call itself will take).
>>
>>   Radford Neal
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   [hidden email]
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Tierney, Luke
Actually using local() might create some issues, though probably not
many. For the C implementation of lapply I would probably create a new
environment with a frame containing the binding for i and use that in
an eval call.  That wouldn't add another call frame, but it would
change the environment which could still bite something. I would want
to run any change like this over at least CRAN, maybe also BIOC, tests
to see if there are any issues before committing.

There are a few other places where the internal C code does calls to R
functions in a less that ideal way. apply() is also currently written
as a loop along the lines of the original lapply I showed. The
parallel constructs from snow all use lapply or apply, so any changes
there would be inherited; the mc functions are a bit more complicated
and may need a more careful look.

Overall it looks like we could use a new utility at both R and C level
for calling a function with already evaluated arguments and use this
in all relevant places (maybe called funcall or .Funcall or something
like that). I'll try look into this in the next few weeks.

Best,

luke

On Thu, 26 Feb 2015, William Dunlap wrote:

> Would introducing the new frame, with the call to local(), cause problems
> when you use frame counting instead of <<- to modify variables outside the
> scope of lapply's FUN, I think the frame counts may have to change.  E.g.,
> here is code from actuar::simul() that might be affected:
>         x <- unlist(lapply(nodes[[i]], seq))
>         lapply(nodes[(i + 1):(nlevels - 1)],
>                function(v) assign("x", rep.int(x, v), envir =
> parent.frame(2)))
>         m[, i] <- x
>
> (I think the parent.frame(2) might have to be changed to parent.frame(8) for
> that to work.  Such code looks pretty ugly to me but seems to be rare.)
>
> It also seems to cause problems with some built-in functions:
> newlapply <- function (X, FUN, ...) 
> {
>     FUN <- match.fun(FUN)
>     if (!is.list(X)) 
>         X <- as.list(X)
>     rval <- vector("list", length(X))
>     for (i in seq(along = X)) {
>         rval[i] <- list(local({
>             i <- i
>             FUN(X[[i]], ...)
>         }))
>     }
>     names(rval) <- names(X)
>     return(rval)
> }
> newlapply(1:2,log)
> #Error in FUN(X[[i]], ...) : non-numeric argument to mathematical function
> newlapply(1:2,function(x)log(x))
> #[[1]]
> #[1] 0
> #
> #[[2]]
> #[1] 0.6931472
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, Feb 24, 2015 at 7:50 AM, <[hidden email]> wrote:
>       The documentation is not specific enough on the indented
>       semantics in
>       this situation to consider this a bug. The original R-level
>       implementation of lapply was
>
>           lapply <- function(X, FUN, ...) {
>               FUN <- match.fun(FUN)
>               if (!is.list(X))
>               X <- as.list(X)
>               rval <- vector("list", length(X))
>               for(i in seq(along = X))
>               rval[i] <- list(FUN(X[[i]], ...))
>               names(rval) <- names(X)           # keep `names' !
>               return(rval)
>           }
>
>       and the current internal implementation is consistent with this.
>       With
>       a loop like this lazy evaluation and binding assignment interact
>       in
>       this way; the force() function was introduced to help with this.
>
>       That said, the expression FUN(X[[i]], ...) could be replaced by
>
>           local({
>               i <- i
>               list(FUN(X[[i]], ...)
>           })
>
>       which would produce the more desirable result
>
>           > sapply(test, function(myfn) myfn(2))
>           [1] 2 4 6 8
>
>       The C implementation could use this approach, or could rebuild
>       the
>       expression being evaluated at each call to get almost the same
>       semantics.
>       Both would add a little overhead. Some code optimization might
>       reduce
>       the overhead in some instances (e.g. if FUN is a BUILTIN), but
>       it's
>       not clear that would be worth while.
>
>       Variants of this issue arise in a couple of places so it may be
>       worth
>       looking into.
>
>       Best,
>
>       luke
>
>       On Tue, 24 Feb 2015, Radford Neal wrote:
>
>             From: Daniel Kaschek
>             <[hidden email]>
>                   ... When I evaluate this list of
>                   functions by
>                   another lapply/sapply, I get an
>                   unexpected result: all values coincide.
>                   However, when I uncomment the print(),
>                   it works as expected. Is this a
>                   bug or a feature?
>
>                   conditions <- 1:4
>                   test <- lapply(conditions,
>                   function(mycondition){
>                     #print(mycondition)
>                     myfn <- function(i) mycondition*i
>                     return(myfn)
>                   })
>
>                   sapply(test, function(myfn) myfn(2))
>
>
>             From: Jeroen Ooms <[hidden email]>
>                   I think it is a bug. If we use
>                   substitute to inspect the promise, it
>                   appears the index number is always equal
>                   to its last value:
>
>
>             From: Duncan Temple Lang <[hidden email]>
>                   Not a bug, but does surprise people. It
>                   is lazy evaluation.
>
>
>
>             I think it is indeed a bug.  The lapply code saves a
>             bit of time by
>             reusing the same storage for the scalar index number
>             every iteration.
>             This amounts to modifying the R code that was used
>             for the previous
>             function call.  There's no justification for doing
>             this in the
>             documentation for lapply.  It is certainly not
>             desired behaviour,
>             except in so far as it allows a slight savings in
>             time (which is
>             minor, given the time that the function call itself
>             will take).
>
>               Radford Neal
>
>             ______________________________________________
>             [hidden email] mailing list
>             https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   [hidden email]
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Radford Neal
In reply to this post by Daniel Kaschek
I think the discussion of this issue has gotten more complicated than
necessary.

First, there really is a bug.  You can see this also by the fact that
delayed warning messages are wrong.  For instance, in R-3.1.2:

  > lapply(c(-1,2,-1),sqrt)
  [[1]]
  [1] NaN
 
  [[2]]
  [1] 1.414214
 
  [[3]]
  [1] NaN
 
  Warning messages:
  1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
  2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
 
The first warning message should have "1L" rather than "3L".  It
doesn't because lapply made a destructive change to the R expression
that was evaluated for the first element.  Throughout the R
interpreter, there is a general assumption that expressions that are
or were evaluated are immutable, which lapply is not abiding by.  The
only question is whether the bugs from this are sufficiently obscure
that it's worth keeping them for the gain in speed, but the speed cost
of fixing it is fairly small (though it's not negligible when the
function applied is something simple like sqrt).

The fix in the C code for lapply, vapply, and eapply is easy: Rather
than create an R expression such as FUN(X[[1L]]) for the first
function call, and then modify it in place to FUN(X[[2L]]), and so
forth, just create a new expression for each iteration.  This requires
allocating a few new CONS cells each iteration, which does have a
cost, but not a huge one.  It's certainly easier and faster than
creating a new environment (and also less likely to cause
incompatibilities).

The R code for apply can be changed to use the same approach,
rather than using expressions such as FUN(X[i,]), where i is an
index variable, it can create expressions like FUN(X[1L,]), then
FUN(X[2L,]), etc.  The method for this is simple, like so:

  > a <- quote(FUN(X[i,]))     # "i" could be anything
  > b <- a; b[[c(2,3)]] <- 1L  # change "i" to 1L (creates new expr)

This has the added advantage of making error messages refer to the
actual index, not to "i", which has no meaning if you haven't looked
at the source code for apply (and which doesn't tell you which element
led to the error even if you know what "i" does).

I've implemented this in the development version of pqR, on the
development branch 31-apply-fix, at

  https://github.com/radfordneal/pqR/tree/31-apply-fix

The changes are in src/main/apply.R, src/main/envir.R, and
src/library/base/R/apply.R, plus a new test in tests/apply.R.  You can
compare to branch 31 to see what's changed.  (Note rapply seems to not
have had a problem, and that other apply functions just use these, so
should be fixed as well.)  There are also other optimizations in pqR
for these functions but the code is still quite similar to R-3.1.2.

   Radford Neal

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Tierney, Luke
On Sun, 1 Mar 2015, Radford Neal wrote:

> I think the discussion of this issue has gotten more complicated than
> necessary.

The discussion has gotten no more complicated than it needs to
be. There are other instances, such as Reduce where there is a bug
report pending that amounts to the same issue.  Performing surgery on
expressions and calling eval is not good practice at the R level and
probably not a good idea at the C level either.  It is worth thinking
this through carefully before a adopting a solution, which is what we
will be doing.

Best,

luke

>
> First, there really is a bug.  You can see this also by the fact that
> delayed warning messages are wrong.  For instance, in R-3.1.2:
>
>  > lapply(c(-1,2,-1),sqrt)
>  [[1]]
>  [1] NaN
>
>  [[2]]
>  [1] 1.414214
>
>  [[3]]
>  [1] NaN
>
>  Warning messages:
>  1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
>  2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
>
> The first warning message should have "1L" rather than "3L".  It
> doesn't because lapply made a destructive change to the R expression
> that was evaluated for the first element.  Throughout the R
> interpreter, there is a general assumption that expressions that are
> or were evaluated are immutable, which lapply is not abiding by.  The
> only question is whether the bugs from this are sufficiently obscure
> that it's worth keeping them for the gain in speed, but the speed cost
> of fixing it is fairly small (though it's not negligible when the
> function applied is something simple like sqrt).
>
> The fix in the C code for lapply, vapply, and eapply is easy: Rather
> than create an R expression such as FUN(X[[1L]]) for the first
> function call, and then modify it in place to FUN(X[[2L]]), and so
> forth, just create a new expression for each iteration.  This requires
> allocating a few new CONS cells each iteration, which does have a
> cost, but not a huge one.  It's certainly easier and faster than
> creating a new environment (and also less likely to cause
> incompatibilities).
>
> The R code for apply can be changed to use the same approach,
> rather than using expressions such as FUN(X[i,]), where i is an
> index variable, it can create expressions like FUN(X[1L,]), then
> FUN(X[2L,]), etc.  The method for this is simple, like so:
>
>  > a <- quote(FUN(X[i,]))     # "i" could be anything
>  > b <- a; b[[c(2,3)]] <- 1L  # change "i" to 1L (creates new expr)
>
> This has the added advantage of making error messages refer to the
> actual index, not to "i", which has no meaning if you haven't looked
> at the source code for apply (and which doesn't tell you which element
> led to the error even if you know what "i" does).
>
> I've implemented this in the development version of pqR, on the
> development branch 31-apply-fix, at
>
>  https://github.com/radfordneal/pqR/tree/31-apply-fix
>
> The changes are in src/main/apply.R, src/main/envir.R, and
> src/library/base/R/apply.R, plus a new test in tests/apply.R.  You can
> compare to branch 31 to see what's changed.  (Note rapply seems to not
> have had a problem, and that other apply functions just use these, so
> should be fixed as well.)  There are also other optimizations in pqR
> for these functions but the code is still quite similar to R-3.1.2.
>
>   Radford Neal
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Radford Neal
> There are other instances, such as Reduce where there is a bug
> report pending that amounts to the same issue.  Performing surgery on
> expressions and calling eval is not good practice at the R level and
> probably not a good idea at the C level either.  It is worth thinking
> this through carefully before a adopting a solution, which is what we
> will be doing.

Surgery on expressions is what lapply does at the moment.  My change
makes it no longer do that.  

There is a general problem that lazy evaluation can have the effect
of making the internal details of how an R function like "apply" is
implemented leak into its semantics.  That's what's going on with
the Reduce bug (16093) too.  

I think one can avoid this by defining the following function for
calling a function with evaluation of arguments forced (ie, lazy
evaluation disabled):

  call_forced <- function (f, ...) { list (...); f (...) }

(Of course, for speed one could make this a primitive function, which
wouldn't actually build a list.)

Then the critical code in Reduce could be changed from

  for (i in rev(ind)) init <- f(x[[i]], init)

to

  for (i in rev(ind)) init <- call_forced (f, x[[i]], init)

If one had a primitive (ie, fast) call_forced, a similar technique
might be better than the one I presented for fixing "apply" (cleaner,
and perhaps slightly faster).  I don't see how it helps for functions
like lapply that are written in C, however (where help isn't needed,
since there's nothing wrong with the mod in my previous message).

   Radford Neal

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Tierney, Luke
On Sun, 1 Mar 2015, Radford Neal wrote:

>> There are other instances, such as Reduce where there is a bug
>> report pending that amounts to the same issue.  Performing surgery on
>> expressions and calling eval is not good practice at the R level and
>> probably not a good idea at the C level either.  It is worth thinking
>> this through carefully before a adopting a solution, which is what we
>> will be doing.
>
> Surgery on expressions is what lapply does at the moment.  My change
> makes it no longer do that.
>
> There is a general problem that lazy evaluation can have the effect
> of making the internal details of how an R function like "apply" is
> implemented leak into its semantics.  That's what's going on with
> the Reduce bug (16093) too.
>
> I think one can avoid this by defining the following function for
> calling a function with evaluation of arguments forced (ie, lazy
> evaluation disabled):
>
>  call_forced <- function (f, ...) { list (...); f (...) }
>
> (Of course, for speed one could make this a primitive function, which
> wouldn't actually build a list.)
>
> Then the critical code in Reduce could be changed from
>
>  for (i in rev(ind)) init <- f(x[[i]], init)
>
> to
>
>  for (i in rev(ind)) init <- call_forced (f, x[[i]], init)

This is the option I was suggesting as a possibility in my reply to
Bill Dunlap -- I called it funcall. This may be the right way to
go. There are some subtleties to sort out, such as how missing
arguments should be handled (allowed or error), and whether the force
should stop at ... arguments as in turning

     FUN(X[[i]], ...)

into

      funcall(FUN, X[[i]], ...)

There is also a change in when the evaluation of X[[i]] happens, which
may or may not matter.  Some testing against CRAN/BIOC packages should
reveal how much of an issue these are.

> If one had a primitive (ie, fast) call_forced, a similar technique
> might be better than the one I presented for fixing "apply" (cleaner,
> and perhaps slightly faster).  I don't see how it helps for functions
> like lapply that are written in C, however (where help isn't needed,
> since there's nothing wrong with the mod in my previous message).

If we adapt the funcall approach then it would be best if the C
implementation stayed as close as possible to an R reference
implementation. I do not think your proposed approach would do that.

mapply has its own issues with the MoreArgs argument that would be
nice to sort out at the same time if possible, as there are also a few
more instances of this in several places.

I will try to look into this more in the next week or so.

Best,

luke

>
>   Radford Neal
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Tierney, Luke
In reply to this post by William Dunlap
On Thu, 26 Feb 2015, William Dunlap wrote:

> ...
> It also seems to cause problems with some built-in functions:
> newlapply <- function (X, FUN, ...) 
> {
>     FUN <- match.fun(FUN)
>     if (!is.list(X)) 
>         X <- as.list(X)
>     rval <- vector("list", length(X))
>     for (i in seq(along = X)) {
>         rval[i] <- list(local({
>             i <- i
>             FUN(X[[i]], ...)
>         }))
>     }
>     names(rval) <- names(X)
>     return(rval)
> }
> newlapply(1:2,log)
> #Error in FUN(X[[i]], ...) : non-numeric argument to mathematical function

This seems to be a bug in log() -- this takes local() out of the issue:

> f <- function(x, ...) {
+     g <- function()
+         log(x, ...)
+     g()
+ }
> f(1)
Error in log(x, ...) : non-numeric argument to mathematical function

It's not following the ... properly for some reason.

Best,

luke

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iterated lapply

Tierney, Luke
On Sun, 1 Mar 2015, [hidden email] wrote:

> On Thu, 26 Feb 2015, William Dunlap wrote:
>
>> ...
>> It also seems to cause problems with some built-in functions:
>> newlapply <- function (X, FUN, ...) 
>> {
>>     FUN <- match.fun(FUN)
>>     if (!is.list(X)) 
>>         X <- as.list(X)
>>     rval <- vector("list", length(X))
>>     for (i in seq(along = X)) {
>>         rval[i] <- list(local({
>>             i <- i
>>             FUN(X[[i]], ...)
>>         }))
>>     }
>>     names(rval) <- names(X)
>>     return(rval)
>> }
>> newlapply(1:2,log)
>> #Error in FUN(X[[i]], ...) : non-numeric argument to mathematical function
>
> This seems to be a bug in log() -- this takes local() out of the issue:
>
>> f <- function(x, ...) {
> +     g <- function()
> +         log(x, ...)
> +     g()
> + }
>> f(1)
> Error in log(x, ...) : non-numeric argument to mathematical function
>
> It's not following the ... properly for some reason.

But no longer a problem in R-devel -- maybe there is a change worth
back-porting to R-patched.

Best,

luke

> luke
>
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel