ifelse() woes ... can we agree on a ifelse2() ?

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler
Dear R-devel readers,
( = people interested in the improvement and development of R).

This is not the first time that this topic is raised.
and I am in now state to promise that anything will result from
this thread ...

Still, I think the majority among us has agreed that

1) you should never use ifelse(test, yes, no)
   if you know that length(test) == 1, in which case
       if(test) yes else no
   is much preferable  (though not equivalent: ifelse(NA, 1, 0) !)  

2) it is potentially inefficient by design since it (almost
   always) evaluates both 'yes' and 'no' independent of 'test'.

3) is a nice syntax in principle, and so is often used, also by
   myself, inspite of '2)'  just because nicely self-explaining
   code is sometimes clearly preferable to more efficient but
   less readable code.

4) it is too late to change ifelse() fundamentally, because it
   works according to its documentation
   (and I think very much the same as in S and S-PLUS) and has
   done so for ages.

---- and if you don't agree with  1) -- 4)  you may pretend for
     a moment instead of starting to discuss them thoroughly.

Recently, a useR has alerted me to the fact that my Rmpfr's
package arbitrary (high) precision numbers don't work for a
relatively simple function.

As I found the reason was that that simple function used
 ifelse(.,.,.)
and the problem was that the (*simplified*) gist of ifelse(test, yes, no)
is

  test <- as.logical(test)
  ans <- test
  ans[ test] <- yes
  ans[!test] <- no

and in case of Rmpfr, the problem is that

   <logical>[<logical>]  <-  <mpfr>

cannot work correctly

    [[ maybe it could in a future R, if I could define a method

       setReplaceMethod("[", c("logical,"logical","mpfr"),
                        function(x,i,value) .........)

       but that currently fails as the C-low-level dispatch for '[<-'
       does not look at the full signature
     ]]

I vaguely remember having seen proposals for
light weight substitutes for ifelse(),  called
 ifelse1() or
 ifelse2() etc...

and I wonder if we should not try to see if there was a version
that could go into "base R" (maybe the 'utils' package, not
                  'base'; that's not so important).

One difference to ifelse() would be that the type/mode/class of the result
is not initialized by logical, by default but rather by the
"common type" of  yes and no ... maybe determined  by  c()'ing
parts of those.
The idea was that this would work for most S3 and S4 objects for
which logical 'length', (logical) indexing '[', and 'rep()' works.

One possibility would also be to consider  a "numbers-only" or
rather "same type"-only {e.g., would also work for characters}
version.

Of course, an ifelse2()  should also be more efficient than
ifelse() in typical "atomic" cases.


Thank you for your ideas and suggestions.
Again, there's no promise of implementation coming along with this e-mail.

Martin Maechler
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Bob Rudis
have you tried seeing if `dplyr::if_else` behaves more to your liking?

On Sat, Aug 6, 2016 at 10:20 AM Martin Maechler <[hidden email]>
wrote:

> Dear R-devel readers,
> ( = people interested in the improvement and development of R).
>
> This is not the first time that this topic is raised.
> and I am in now state to promise that anything will result from
> this thread ...
>
> Still, I think the majority among us has agreed that
>
> 1) you should never use ifelse(test, yes, no)
>    if you know that length(test) == 1, in which case
>           if(test) yes else no
>    is much preferable  (though not equivalent: ifelse(NA, 1, 0) !)
>
> 2) it is potentially inefficient by design since it (almost
>    always) evaluates both 'yes' and 'no' independent of 'test'.
>
> 3) is a nice syntax in principle, and so is often used, also by
>    myself, inspite of '2)'  just because nicely self-explaining
>    code is sometimes clearly preferable to more efficient but
>    less readable code.
>
> 4) it is too late to change ifelse() fundamentally, because it
>    works according to its documentation
>    (and I think very much the same as in S and S-PLUS) and has
>    done so for ages.
>
> ---- and if you don't agree with  1) -- 4)  you may pretend for
>      a moment instead of starting to discuss them thoroughly.
>
> Recently, a useR has alerted me to the fact that my Rmpfr's
> package arbitrary (high) precision numbers don't work for a
> relatively simple function.
>
> As I found the reason was that that simple function used
>  ifelse(.,.,.)
> and the problem was that the (*simplified*) gist of ifelse(test, yes, no)
> is
>
>   test <- as.logical(test)
>   ans <- test
>   ans[ test] <- yes
>   ans[!test] <- no
>
> and in case of Rmpfr, the problem is that
>
>    <logical>[<logical>]  <-  <mpfr>
>
> cannot work correctly
>
>     [[ maybe it could in a future R, if I could define a method
>
>        setReplaceMethod("[", c("logical,"logical","mpfr"),
>                         function(x,i,value) .........)
>
>        but that currently fails as the C-low-level dispatch for '[<-'
>        does not look at the full signature
>      ]]
>
> I vaguely remember having seen proposals for
> light weight substitutes for ifelse(),  called
>  ifelse1() or
>  ifelse2() etc...
>
> and I wonder if we should not try to see if there was a version
> that could go into "base R" (maybe the 'utils' package, not
>                              'base'; that's not so important).
>
> One difference to ifelse() would be that the type/mode/class of the result
> is not initialized by logical, by default but rather by the
> "common type" of  yes and no ... maybe determined  by  c()'ing
> parts of those.
> The idea was that this would work for most S3 and S4 objects for
> which logical 'length', (logical) indexing '[', and 'rep()' works.
>
> One possibility would also be to consider  a "numbers-only" or
> rather "same type"-only {e.g., would also work for characters}
> version.
>
> Of course, an ifelse2()  should also be more efficient than
> ifelse() in typical "atomic" cases.
>
>
> Thank you for your ideas and suggestions.
> Again, there's no promise of implementation coming along with this e-mail.
>
> Martin Maechler
> ETH Zurich
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Duncan Murdoch-2
In reply to this post by Martin Maechler
On 06/08/2016 10:18 AM, Martin Maechler wrote:

> Dear R-devel readers,
> ( = people interested in the improvement and development of R).
>
> This is not the first time that this topic is raised.
> and I am in now state to promise that anything will result from
> this thread ...
>
> Still, I think the majority among us has agreed that
>
> 1) you should never use ifelse(test, yes, no)
>    if you know that length(test) == 1, in which case
>        if(test) yes else no
>    is much preferable  (though not equivalent: ifelse(NA, 1, 0) !)
>
> 2) it is potentially inefficient by design since it (almost
>    always) evaluates both 'yes' and 'no' independent of 'test'.
>
> 3) is a nice syntax in principle, and so is often used, also by
>    myself, inspite of '2)'  just because nicely self-explaining
>    code is sometimes clearly preferable to more efficient but
>    less readable code.
>
> 4) it is too late to change ifelse() fundamentally, because it
>    works according to its documentation
>    (and I think very much the same as in S and S-PLUS) and has
>    done so for ages.
>
> ---- and if you don't agree with  1) -- 4)  you may pretend for
>      a moment instead of starting to discuss them thoroughly.
>
> Recently, a useR has alerted me to the fact that my Rmpfr's
> package arbitrary (high) precision numbers don't work for a
> relatively simple function.
>
> As I found the reason was that that simple function used
>  ifelse(.,.,.)
> and the problem was that the (*simplified*) gist of ifelse(test, yes, no)
> is
>
>   test <- as.logical(test)
>   ans <- test
>   ans[ test] <- yes
>   ans[!test] <- no
>
> and in case of Rmpfr, the problem is that
>
>    <logical>[<logical>]  <-  <mpfr>
>
> cannot work correctly
>
>     [[ maybe it could in a future R, if I could define a method
>
>        setReplaceMethod("[", c("logical,"logical","mpfr"),
>                         function(x,i,value) .........)
>
>        but that currently fails as the C-low-level dispatch for '[<-'
>        does not look at the full signature
>      ]]
>
> I vaguely remember having seen proposals for
> light weight substitutes for ifelse(),  called
>  ifelse1() or
>  ifelse2() etc...
>
> and I wonder if we should not try to see if there was a version
> that could go into "base R" (maybe the 'utils' package, not
>                   'base'; that's not so important).
>
> One difference to ifelse() would be that the type/mode/class of the result
> is not initialized by logical, by default but rather by the
> "common type" of  yes and no ... maybe determined  by  c()'ing
> parts of those.
> The idea was that this would work for most S3 and S4 objects for
> which logical 'length', (logical) indexing '[', and 'rep()' works.

I think your description is more or less:

    test <- as.logical(test)
    ans <- c(yes, no)[seq_along(test)]
    ans <- ans[seq_along(test)]
    ans[ test] <- yes[test]
    ans[!test] <- no[!test]

(though the implementation details would vary, and recycling rules would
apply if the lengths of test, yes and no weren't all equal).

You didn't mention what happens with attributes.  Currently we keep the
attributes from test, which probably doesn't make a lot of sense. In
particular,

ifelse(c(TRUE, FALSE), factor(2:3), factor(3:4))

returns nonsense, as does my translation of your idea above.

That implementation also drops attributes. I'd say this definition would
make more sense:

    test <- as.logical(test)
    ans <- yes
    ans[!test] <- no[!test]

(and this is suggested as an alternative in ?ifelse).  It generates an
error in my test example, which seems reasonable.  It gives the "right"
thing in

ifelse(c(TRUE, FALSE), factor(2:3), factor(3:2))

because the factors have the same levels.

The lack of symmetry between yes and no is slightly irksome, but I would
think in most cases you could choose attributes from just one of yes and
no to be what you want in the result (and use !test to swap the order if
necessary).

>
> One possibility would also be to consider  a "numbers-only" or
> rather "same type"-only {e.g., would also work for characters}
> version.

I don't know what you mean by these.
>
> Of course, an ifelse2()  should also be more efficient than
> ifelse() in typical "atomic" cases.

I don't think it is obvious how to make it more efficient.  ifelse()
already skips evaluation of yes or no if not needed.  (An argument could
be made that it would be better to guarantee evaluation of both, but
it's usually easy enough to do this explicitly, so I don't see a need.)

Duncan Murdoch

>
>
> Thank you for your ideas and suggestions.
> Again, there's no promise of implementation coming along with this e-mail.
>
> Martin Maechler
> ETH Zurich
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Uwe Ligges-3


On 06.08.2016 17:30, Duncan Murdoch wrote:

> On 06/08/2016 10:18 AM, Martin Maechler wrote:
>> Dear R-devel readers,
>> ( = people interested in the improvement and development of R).
>>
>> This is not the first time that this topic is raised.
>> and I am in now state to promise that anything will result from
>> this thread ...
>>
>> Still, I think the majority among us has agreed that
>>
>> 1) you should never use ifelse(test, yes, no)
>>    if you know that length(test) == 1, in which case
>>             if(test) yes else no
>>    is much preferable  (though not equivalent: ifelse(NA, 1, 0) !)
>>
>> 2) it is potentially inefficient by design since it (almost
>>    always) evaluates both 'yes' and 'no' independent of 'test'.
>>
>> 3) is a nice syntax in principle, and so is often used, also by
>>    myself, inspite of '2)'  just because nicely self-explaining
>>    code is sometimes clearly preferable to more efficient but
>>    less readable code.
>>
>> 4) it is too late to change ifelse() fundamentally, because it
>>    works according to its documentation
>>    (and I think very much the same as in S and S-PLUS) and has
>>    done so for ages.
>>
>> ---- and if you don't agree with  1) -- 4)  you may pretend for
>>      a moment instead of starting to discuss them thoroughly.
>>
>> Recently, a useR has alerted me to the fact that my Rmpfr's
>> package arbitrary (high) precision numbers don't work for a
>> relatively simple function.
>>
>> As I found the reason was that that simple function used
>>  ifelse(.,.,.)
>> and the problem was that the (*simplified*) gist of ifelse(test, yes, no)
>> is
>>
>>   test <- as.logical(test)
>>   ans <- test
>>   ans[ test] <- yes
>>   ans[!test] <- no
>>
>> and in case of Rmpfr, the problem is that
>>
>>    <logical>[<logical>]  <-  <mpfr>
>>
>> cannot work correctly
>>
>>     [[ maybe it could in a future R, if I could define a method
>>
>>        setReplaceMethod("[", c("logical,"logical","mpfr"),
>>                         function(x,i,value) .........)
>>
>>        but that currently fails as the C-low-level dispatch for '[<-'
>>        does not look at the full signature
>>      ]]
>>
>> I vaguely remember having seen proposals for
>> light weight substitutes for ifelse(),  called
>>  ifelse1() or
>>  ifelse2() etc...
>>
>> and I wonder if we should not try to see if there was a version
>> that could go into "base R" (maybe the 'utils' package, not
>>                                'base'; that's not so important).
>>
>> One difference to ifelse() would be that the type/mode/class of the
>> result
>> is not initialized by logical, by default but rather by the
>> "common type" of  yes and no ... maybe determined  by  c()'ing
>> parts of those.
>> The idea was that this would work for most S3 and S4 objects for
>> which logical 'length', (logical) indexing '[', and 'rep()' works.
>
> I think your description is more or less:
>
>    test <- as.logical(test)
>    ans <- c(yes, no)[seq_along(test)]
>    ans <- ans[seq_along(test)]
>    ans[ test] <- yes[test]
>    ans[!test] <- no[!test]
>
> (though the implementation details would vary, and recycling rules would
> apply if the lengths of test, yes and no weren't all equal).
>
> You didn't mention what happens with attributes.  Currently we keep the
> attributes from test, which probably doesn't make a lot of sense. In
> particular,
>
> ifelse(c(TRUE, FALSE), factor(2:3), factor(3:4))
>
> returns nonsense, as does my translation of your idea above.
>
> That implementation also drops attributes. I'd say this definition would
> make more sense:
>
>    test <- as.logical(test)
>    ans <- yes
>    ans[!test] <- no[!test]
>
> (and this is suggested as an alternative in ?ifelse).  It generates an
> error in my test example, which seems reasonable.  It gives the "right"
> thing in
>
> ifelse(c(TRUE, FALSE), factor(2:3), factor(3:2))
>
> because the factors have the same levels.
>
> The lack of symmetry between yes and no is slightly irksome, but I would
> think in most cases you could choose attributes from just one of yes and
> no to be what you want in the result (and use !test to swap the order if
> necessary).
>
>>
>> One possibility would also be to consider  a "numbers-only" or
>> rather "same type"-only {e.g., would also work for characters}
>> version.
>
> I don't know what you mean by these.
>>
>> Of course, an ifelse2()  should also be more efficient than
>> ifelse() in typical "atomic" cases.
>
> I don't think it is obvious how to make it more efficient.  ifelse()
> already skips evaluation of yes or no if not needed.  (An argument could
> be made that it would be better to guarantee evaluation of both, but
> it's usually easy enough to do this explicitly, so I don't see a need.)

Same from here: I do not see how this can easily be made more efficient,
since evaluating ony parts causes a lot of copies of objects whichs
slows stuff down, hence you need some complexity in yes and no to make
evaluations of parts of them more efficient on R level.


Anyway, to solve the problem, we may want an add argument to ifelse2()
that allows for specification of the type of the result (as vapply does)?

Best,
Uwe

> Duncan Murdoch
>
>>
>>
>> Thank you for your ideas and suggestions.
>> Again, there's no promise of implementation coming along with this
>> e-mail.
>>
>> Martin Maechler
>> ETH Zurich
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler
>>>>> Uwe Ligges <[hidden email]>
>>>>>     on Sun, 7 Aug 2016 09:51:58 +0200 writes:

    > On 06.08.2016 17:30, Duncan Murdoch wrote:
    >> On 06/08/2016 10:18 AM, Martin Maechler wrote:

   [.................]

    >>> Of course, an ifelse2()  should also be more efficient than
    >>> ifelse() in typical "atomic" cases.
    >>
    >> I don't think it is obvious how to make it more efficient.  ifelse()
    >> already skips evaluation of yes or no if not needed.  (An argument could
    >> be made that it would be better to guarantee evaluation of both, but
    >> it's usually easy enough to do this explicitly, so I don't see a need.)

    > Same from here: I do not see how this can easily be made more efficient,
    > since evaluating ony parts causes a lot of copies of objects whichs
    > slows stuff down, hence you need some complexity in yes and no to make
    > evaluations of parts of them more efficient on R level.

Yes, Duncan and Uwe are right, and my comment "wish" above was
mostly misleading.  Some of the many small changes to ifelse()
since its initial [1998, R version 0.63.3] simple

ifelseR0633 <- function (test, yes, no)
{
    ans <- test
    test <- as.logical(test)
    nas <- is.na(test)
    ans[ test] <- rep(yes, length = length(ans))[ test]
    ans[!test] <- rep(no,  length = length(ans))[!test]
    ans[nas] <- NA
    ans
}

were exactly for adding speed in some of these cases.

    > Anyway, to solve the problem, we may want an add argument to ifelse2()
    > that allows for specification of the type of the result (as vapply does)?

A good idea, probably only needed / desirable if we'd consider a
C based version {as vapply} but for the moment I did not want to
go there.

The current ifelse() is nice with "pre-S3" objects, such as
as atomic (named) vectors and (dimnamed) arrays, including matrices,
by keeping most attributes for those... and does that relatively
efficiently.

What I really meant, not above, but earlier when talking about
ifelse()'s inefficiency should really *not* have been related to
this thread, I'm sorry for that confusion.

I mean the fact that many many usages of ifelse() are of the
form
        ifelse(logiFn(x), f1(x), f2(x))

  {with f1() or f2() often even being constant}

and e.g.,  in the case where logiFn(x) gives few TRUEs and f1(.)
is expensive and f2(.) very cheap (say "constant" NA), it is
much more efficient to use

     ans <- x
     Y <- logiFn(x)
     ans[ Y] <- f1(x[ Y])
     ans[!Y] <- f2(x[!Y])

as the expensive function is only called on a small subset of
the full x.

I'm working at the main topic and *am* thanking Duncan
for his conceptual analysis and the (few) proposals.

Martin

    > Best,
    > Uwe

    >> Duncan Murdoch
    >>
    >>>
    >>>
    >>> Thank you for your ideas and suggestions.
    >>> Again, there's no promise of implementation coming along with this
    >>> e-mail.
    >>>
    >>> Martin Maechler
    >>> ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler
In reply to this post by Duncan Murdoch-2
Excuse for the delay;  I had waited for other / additional
comments and reactions (and been distracted with other urgent issues),
but do want to keep this thread alive  [inline] ..

>>>>> Duncan Murdoch <[hidden email]>
>>>>>     on Sat, 6 Aug 2016 11:30:08 -0400 writes:

    > On 06/08/2016 10:18 AM, Martin Maechler wrote:
    >> Dear R-devel readers,
    >> ( = people interested in the improvement and development of R).
    >>
    >> This is not the first time that this topic is raised.
    >> and I am in now state to promise that anything will result from
    >> this thread ...
    >>
    >> Still, I think the majority among us has agreed that
    >>
    >> 1) you should never use ifelse(test, yes, no)
    >> if you know that length(test) == 1, in which case
    >> if(test) yes else no
    >> is much preferable  (though not equivalent: ifelse(NA, 1, 0) !)
    >>
    >> 2) it is potentially inefficient by design since it (almost
    >> always) evaluates both 'yes' and 'no' independent of 'test'.
    >>
    >> 3) is a nice syntax in principle, and so is often used, also by
    >> myself, inspite of '2)'  just because nicely self-explaining
    >> code is sometimes clearly preferable to more efficient but
    >> less readable code.
    >>
    >> 4) it is too late to change ifelse() fundamentally, because it
    >> works according to its documentation
    >> (and I think very much the same as in S and S-PLUS) and has
    >> done so for ages.
    >>
    >> ---- and if you don't agree with  1) -- 4)  you may pretend for
    >> a moment instead of starting to discuss them thoroughly.
    >>
    >> Recently, a useR has alerted me to the fact that my Rmpfr's
    >> package arbitrary (high) precision numbers don't work for a
    >> relatively simple function.
    >>
    >> As I found the reason was that that simple function used
    >> ifelse(.,.,.)
    >> and the problem was that the (*simplified*) gist of ifelse(test, yes, no)
    >> is
    >>
    >> test <- as.logical(test)
    >> ans <- test
    >> ans[ test] <- yes
    >> ans[!test] <- no
    >>
    >> and in case of Rmpfr, the problem is that
    >>
    >> <logical>[<logical>]  <-  <mpfr>
    >>
    >> cannot work correctly
    >>
    >> [[ maybe it could in a future R, if I could define a method
    >>
    >> setReplaceMethod("[", c("logical,"logical","mpfr"),
    >> function(x,i,value) .........)
    >>
    >> but that currently fails as the C-low-level dispatch for '[<-'
    >> does not look at the full signature
    >> ]]
    >>
    >> I vaguely remember having seen proposals for
    >> light weight substitutes for ifelse(),  called
    >> ifelse1() or
    >> ifelse2() etc...
    >>
    >> and I wonder if we should not try to see if there was a version
    >> that could go into "base R" (maybe the 'utils' package, not
    >> 'base'; that's not so important).
    >>
    >> One difference to ifelse() would be that the type/mode/class of the result
    >> is not initialized by logical, by default but rather by the
    >> "common type" of  yes and no ... maybe determined  by  c()'ing
    >> parts of those.
    >> The idea was that this would work for most S3 and S4 objects for
    >> which logical 'length', (logical) indexing '[', and 'rep()' works.

    > I think your description is more or less:

    > test <- as.logical(test)
    > ans <- c(yes, no)[seq_along(test)]
    > ans <- ans[seq_along(test)]
    > ans[ test] <- yes[test]
    > ans[!test] <- no[!test]

    > (though the implementation details would vary, and recycling rules would
    > apply if the lengths of test, yes and no weren't all equal).

Yes, more or less,  notably, conceptually a version of  c(yes, no)
to get a common mode/class.... but as you mention below, c()
cannot be used alone because famously "misbehaves" e.g., for factors.

    > You didn't mention what happens with attributes.  Currently we keep the
    > attributes from test, which probably doesn't make a lot of sense. In
    > particular,

    > ifelse(c(TRUE, FALSE), factor(2:3), factor(3:4))

    > returns nonsense, as does my translation of your idea above.

yes.   factor()s  or "Date" or "POSIXt" objects are  'base R'
examples where an alternative  ifelse() would have to work
(ideally automatically with no special-case code!) by "keeping
the class".


    > That implementation also drops attributes. I'd say this definition would
    > make more sense:

    > test <- as.logical(test)
    > ans <- yes
    > ans[!test] <- no[!test]

    > (and this is suggested as an alternative in ?ifelse).  It generates an
    > error in my test example, which seems reasonable.  It gives the "right"
    > thing in

    > ifelse(c(TRUE, FALSE), factor(2:3), factor(3:2))

    > because the factors have the same levels.

    > The lack of symmetry between yes and no is slightly irksome, but I would
    > think in most cases you could choose attributes from just one of yes and
    > no to be what you want in the result (and use !test to swap the order if
    > necessary).

Yes, you are right, that's a good point:
if we don't want to "take everything" from 'test' (which is
symmetric in 'yes' and 'no'), but rather from 'yes' and 'no', we
either must be "very strict" -- as e.g., dplyr::if_else() -- or
then have (border) cases where the first argument takes
precedence over the second as in other cases in R
e.g. how names/dimnames of the result are determined in cbind(),
rbind(), and I think data.frame().


    >> One possibility would also be to consider  a "numbers-only" or
    >> rather "same type"-only {e.g., would also work for characters}
    >> version.

    > I don't know what you mean by these.

In the mean time, Bob Rudis mentioned   dplyr::if_else(),
which is very relevant, thank you Bob!

As I have found, that actually works in such a "same type"-only way:
It does not try to coerce, but gives an error when the classes
differ, even in this somewhat debatable case :

   > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11)
   Error: `false` has type 'double' not 'integer'
   >

As documented, if_else() is clearly stricter than ifelse()
and e.g., also does no recycling (but of length() 1).

I'm dropping the remaining issue of efficiency as I replied to
that (on Aug. 8) in the other branch of this thread.

Hadley's  if_else() is really nice in its clean approach and
does fulfill the main important desideratum, and hence e.g.,
works for "Date" etc.

My goal however would still be considerably closer to
base::ifelse(), namely an alternative would
- *coerce* types/classes as much as sensible,
- (by default at least) recycle  (test, yes, no) to common length

Unfortunately I've been too busy these days, also with a couple
of non-R (and some non-work) matters, so am not yet proposing
concrete alternatives.

More thoughts, ideas and proposals are still very welcome; as
with Duncan, it does make much sense to discuss the theme
already relatively abstractly!

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

hadley wickham
>     >> One possibility would also be to consider  a "numbers-only" or
>     >> rather "same type"-only {e.g., would also work for characters}
>     >> version.
>
>     > I don't know what you mean by these.
>
> In the mean time, Bob Rudis mentioned   dplyr::if_else(),
> which is very relevant, thank you Bob!
>
> As I have found, that actually works in such a "same type"-only way:
> It does not try to coerce, but gives an error when the classes
> differ, even in this somewhat debatable case :
>
>    > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11)
>    Error: `false` has type 'double' not 'integer'
>    >
>
> As documented, if_else() is clearly stricter than ifelse()
> and e.g., also does no recycling (but of length() 1).

I agree that if_else() is currently too strict - it's particularly
annoying if you want to replace some values with a missing:

x <- sample(10)
if_else(x > 5, NA, x)
#  Error: `false` has type 'integer' not 'logical'

But I would like to make sure that this remains an error:

if_else(x > 5, x, "BLAH")

Because that seems more likely to be a user error (but reasonable
people might certainly believe that it should just work)

dplyr is more accommodating in other places (i.e. in bind_rows(),
collapse() and the joins) but it's surprisingly hard to get all the
details right. For example, what should the result of this call be?

if_else(c(TRUE, FALSE), factor(c("a", "b")), factor(c("c", "b"))

Strictly speaking I think you could argue it's an error, but that's
not very user-friendly. Should it be a factor with the union of the
levels? Should it be a character vector + warning? Should the
behaviour change if one set of levels is a subset of the other set?

There are similar issues for POSIXct (if the time zones are different,
which should win?), and difftimes (similarly for units).  Ideally
you'd like the behaviour to be extensible for new S3 classes, which
suggests it should be a generic (and for the most general case, it
would need to dispatch on both arguments).

Hadley

--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

hadley wickham
On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham <[hidden email]> wrote:

>>     >> One possibility would also be to consider  a "numbers-only" or
>>     >> rather "same type"-only {e.g., would also work for characters}
>>     >> version.
>>
>>     > I don't know what you mean by these.
>>
>> In the mean time, Bob Rudis mentioned   dplyr::if_else(),
>> which is very relevant, thank you Bob!
>>
>> As I have found, that actually works in such a "same type"-only way:
>> It does not try to coerce, but gives an error when the classes
>> differ, even in this somewhat debatable case :
>>
>>    > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11)
>>    Error: `false` has type 'double' not 'integer'
>>    >
>>
>> As documented, if_else() is clearly stricter than ifelse()
>> and e.g., also does no recycling (but of length() 1).
>
> I agree that if_else() is currently too strict - it's particularly
> annoying if you want to replace some values with a missing:
>
> x <- sample(10)
> if_else(x > 5, NA, x)
> #  Error: `false` has type 'integer' not 'logical'
>
> But I would like to make sure that this remains an error:
>
> if_else(x > 5, x, "BLAH")
>
> Because that seems more likely to be a user error (but reasonable
> people might certainly believe that it should just work)
>
> dplyr is more accommodating in other places (i.e. in bind_rows(),
> collapse() and the joins) but it's surprisingly hard to get all the
> details right. For example, what should the result of this call be?
>
> if_else(c(TRUE, FALSE), factor(c("a", "b")), factor(c("c", "b"))
>
> Strictly speaking I think you could argue it's an error, but that's
> not very user-friendly. Should it be a factor with the union of the
> levels? Should it be a character vector + warning? Should the
> behaviour change if one set of levels is a subset of the other set?
>
> There are similar issues for POSIXct (if the time zones are different,
> which should win?), and difftimes (similarly for units).  Ideally
> you'd like the behaviour to be extensible for new S3 classes, which
> suggests it should be a generic (and for the most general case, it
> would need to dispatch on both arguments).

One possible principle would be to use c() - i.e. construct out as

out <- c(yes[0], no[0]
length(out) <- max(length(yes), length(no))

But of course that wouldn't help with factor responses.

Also, if you're considering an improved ifelse(), I'd strongly urge
you to consider adding an `na` argument, so that you can use ifelse()
to transform all three possible values in a logical vector.

Hadley

--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler
Finally getting back to this :

>>>>> Hadley Wickham <[hidden email]>
>>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:

    > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
    > <[hidden email]> wrote:
    >>> >> One possibility would also be to consider a
    >>> "numbers-only" or >> rather "same type"-only {e.g.,
    >>> would also work for characters} >> version.
    >>>
    >>> > I don't know what you mean by these.
    >>>
    >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
    >>> which is very relevant, thank you Bob!
    >>>
    >>> As I have found, that actually works in such a "same
    >>> type"-only way: It does not try to coerce, but gives an
    >>> error when the classes differ, even in this somewhat
    >>> debatable case :
    >>>
    >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
    >>> `false` has type 'double' not 'integer'
    >>> >
    >>>
    >>> As documented, if_else() is clearly stricter than
    >>> ifelse() and e.g., also does no recycling (but of
    >>> length() 1).
    >>
    >> I agree that if_else() is currently too strict - it's
    >> particularly annoying if you want to replace some values
    >> with a missing:
    >>
    >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
    >> has type 'integer' not 'logical'
    >>
    >> But I would like to make sure that this remains an error:
    >>
    >> if_else(x > 5, x, "BLAH")
    >>
    >> Because that seems more likely to be a user error (but
    >> reasonable people might certainly believe that it should
    >> just work)
    >>
    >> dplyr is more accommodating in other places (i.e. in
    >> bind_rows(), collapse() and the joins) but it's
    >> surprisingly hard to get all the details right. For
    >> example, what should the result of this call be?
    >>
    >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
    >> factor(c("c", "b"))
    >>
    >> Strictly speaking I think you could argue it's an error,
    >> but that's not very user-friendly. Should it be a factor
    >> with the union of the levels? Should it be a character
    >> vector + warning? Should the behaviour change if one set
    >> of levels is a subset of the other set?
    >>
    >> There are similar issues for POSIXct (if the time zones
    >> are different, which should win?), and difftimes
    >> (similarly for units).  Ideally you'd like the behaviour
    >> to be extensible for new S3 classes, which suggests it
    >> should be a generic (and for the most general case, it
    >> would need to dispatch on both arguments).

    > One possible principle would be to use c() -
    > i.e. construct out as

    > out <- c(yes[0], no[0]
    > length(out) <- max(length(yes), length(no))

yes; this would require that a  `length<-` method works for the
class of the result.

Duncan Murdoch mentioned a version of this, in his very
first reply:

    ans <- c(yes, no)[seq_along(test)]
    ans <- ans[seq_along(test)]

which is less efficient for atomic vectors, but requires
less from the class: it "only" needs `c` and `[` to work

and a mixture of your two proposals would be possible too:

    ans <- c(yes[0], no[0])
    ans <- ans[seq_along(test)]

which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
but that's a buglet in the  c.mpfr() implementation of my Rmpfr
package... (which has already been fixed in the development version on R-forge,
            https://r-forge.r-project.org/R/?group_id=386)

    > But of course that wouldn't help with factor responses.

Yes.  However, a version of Duncan's suggestion -- of treating 'yes' first
-- does help in that case.

For once, mainly as "feasability experiment",
I have created a github gist to make my current ifelse2() proposal available
for commenting, cloning, pullrequesting, etc:

Consisting of 2 files
- ifelse-def.R :  Functions definitions only, basically all the current
                proposals, called  ifelse*()
- ifelse-checks.R : A simplistic checking function
  and examples calling it, notably demonstrating that my
        ifelse2()  does work with
        "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
        and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")

Also if you are not on github, you can quickly get to the ifelse2()
definition :

https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d87894#file-ifelse-def-r-L168

    > Also, if you're considering an improved ifelse(), I'd
    > strongly urge you to consider adding an `na` argument,

I now did (called it 'NA.').

    > so that you can use ifelse() to transform all three
    > possible values in a logical vector.

    > Hadley
    > -- http://hadley.nz

For those who really hate GH (and don't want or cannot easily follow the
above URL), here's my current definition:


##' Martin Maechler, 14. Nov 2016 --- taking into account Duncan M. and Hadley's
##' ideas in the R-devel thread starting at (my mom's 86th birthday):
##' https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
ifelse2 <- function (test, yes, no, NA. = NA) {
    if(!is.logical(test)) {
        if(is.atomic(test))
            storage.mode(test) <- "logical"
        else ## typically a "class"; storage.mode<-() typically fails
            test <- if(isS4(test)) methods::as(test, "logical") else as.logical(test)
    }

    ## No longer optimize the  "if (a) x else y"  cases:
    ## Only "non-good" R users use ifelse(.) instead of if(.) in these cases.

    ans <-
        tryCatch(rep(if(is.object(yes) && identical(class(yes), class(no)))
                         ## as c(o) or o[0] may not work for the class
                         yes else c(yes[0], no[0]), length.out = length(test)),
                 error = function(e) { ## try asymmetric, yes-leaning
                     r <- yes
                     r[!test] <- no[!test]
                     r
                 })
    ok <- !(nas <- is.na(test))
    if (any(test[ok]))
        ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
    if (any(!test[ok]))
        ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
    ans[nas] <- NA. # possibly coerced to class(ans)
    ans
}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Gabriel Becker
All,

Martin: Thanks for this and all the other things you are doing to both
drive R forward and engage more with the community about things like this.

Apologies for missing this discussion the first time it came around and if
anything here has already been brought up, but I wonder what exactly you
mean when you want recycling behavior.

Specifically, based on an unrelated discussion with Henrik Bengtsson on
Twitter, I wonder if preserving the recycling behavior test is longer than
yes, no, but making the case where

length( test ) < max(length( yes ), length( no ))

would simplify usage for userRs in a useful way. I suspect it's easy to
forget that the result is not guaranteed to be the length of  test, even
for intermediate and advanced users familiar with ifelse and it's
strengths/weaknesses.

I certainly agree (for what that's worth...) that

x = rnorm(100)

y = ifelse2(x > 0, 1L, 2L)

should continue to work.

Also, If we combine a stricter contract that the output will always be of
length with the suggestion of a specified output class the pseudo code
could be

ifelse2 = function(test, yes, no, outclass) {

lenout  = length(test)

out = as( rep(yes, length.out = lenout), outclass)

out[!test] = as(rep(no, length.out = lenout)[!test], outclass)

#handle NA stuff

out

}


NAs could be tricky if outclass were allowed to be completely general, but
doable, I think? Another approach  if we ARE fast-passing while leaving
ifelse intact is that maybe NA's in test just aren't allowed in ifelse2.
I'm not saying we should definitely do that, but it's possible and would
make things faster.

Finally, In terms of efficiency, with the stuff that Luke and I are working
on, the NA detection could be virtually free in certain cases, which could
give a nice boost for long vectors  that don't have any NAs (and 'know'
that they don't).

Best,
~G

On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <[hidden email]
> wrote:

> Finally getting back to this :
>
> >>>>> Hadley Wickham <[hidden email]>
> >>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:
>
>     > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
>     > <[hidden email]> wrote:
>     >>> >> One possibility would also be to consider a
>     >>> "numbers-only" or >> rather "same type"-only {e.g.,
>     >>> would also work for characters} >> version.
>     >>>
>     >>> > I don't know what you mean by these.
>     >>>
>     >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
>     >>> which is very relevant, thank you Bob!
>     >>>
>     >>> As I have found, that actually works in such a "same
>     >>> type"-only way: It does not try to coerce, but gives an
>     >>> error when the classes differ, even in this somewhat
>     >>> debatable case :
>     >>>
>     >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
>     >>> `false` has type 'double' not 'integer'
>     >>> >
>     >>>
>     >>> As documented, if_else() is clearly stricter than
>     >>> ifelse() and e.g., also does no recycling (but of
>     >>> length() 1).
>     >>
>     >> I agree that if_else() is currently too strict - it's
>     >> particularly annoying if you want to replace some values
>     >> with a missing:
>     >>
>     >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
>     >> has type 'integer' not 'logical'
>     >>
>     >> But I would like to make sure that this remains an error:
>     >>
>     >> if_else(x > 5, x, "BLAH")
>     >>
>     >> Because that seems more likely to be a user error (but
>     >> reasonable people might certainly believe that it should
>     >> just work)
>     >>
>     >> dplyr is more accommodating in other places (i.e. in
>     >> bind_rows(), collapse() and the joins) but it's
>     >> surprisingly hard to get all the details right. For
>     >> example, what should the result of this call be?
>     >>
>     >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
>     >> factor(c("c", "b"))
>     >>
>     >> Strictly speaking I think you could argue it's an error,
>     >> but that's not very user-friendly. Should it be a factor
>     >> with the union of the levels? Should it be a character
>     >> vector + warning? Should the behaviour change if one set
>     >> of levels is a subset of the other set?
>     >>
>     >> There are similar issues for POSIXct (if the time zones
>     >> are different, which should win?), and difftimes
>     >> (similarly for units).  Ideally you'd like the behaviour
>     >> to be extensible for new S3 classes, which suggests it
>     >> should be a generic (and for the most general case, it
>     >> would need to dispatch on both arguments).
>
>     > One possible principle would be to use c() -
>     > i.e. construct out as
>
>     > out <- c(yes[0], no[0]
>     > length(out) <- max(length(yes), length(no))
>
> yes; this would require that a  `length<-` method works for the
> class of the result.
>
> Duncan Murdoch mentioned a version of this, in his very
> first reply:
>
>     ans <- c(yes, no)[seq_along(test)]
>     ans <- ans[seq_along(test)]
>
> which is less efficient for atomic vectors, but requires
> less from the class: it "only" needs `c` and `[` to work
>
> and a mixture of your two proposals would be possible too:
>
>     ans <- c(yes[0], no[0])
>     ans <- ans[seq_along(test)]
>
> which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
> but that's a buglet in the  c.mpfr() implementation of my Rmpfr
> package... (which has already been fixed in the development version on
> R-forge,
>             https://r-forge.r-project.org/R/?group_id=386)
>
>     > But of course that wouldn't help with factor responses.
>
> Yes.  However, a version of Duncan's suggestion -- of treating 'yes' first
> -- does help in that case.
>
> For once, mainly as "feasability experiment",
> I have created a github gist to make my current ifelse2() proposal
> available
> for commenting, cloning, pullrequesting, etc:
>
> Consisting of 2 files
> - ifelse-def.R :  Functions definitions only, basically all the current
>                 proposals, called  ifelse*()
> - ifelse-checks.R : A simplistic checking function
>         and examples calling it, notably demonstrating that my
>         ifelse2()  does work with
>         "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
>         and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")
>
> Also if you are not on github, you can quickly get to the ifelse2()
> definition :
>
> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
> 94#file-ifelse-def-r-L168
>
>     > Also, if you're considering an improved ifelse(), I'd
>     > strongly urge you to consider adding an `na` argument,
>
> I now did (called it 'NA.').
>
>     > so that you can use ifelse() to transform all three
>     > possible values in a logical vector.
>
>     > Hadley
>     > -- http://hadley.nz
>
> For those who really hate GH (and don't want or cannot easily follow the
> above URL), here's my current definition:
>
>
> ##' Martin Maechler, 14. Nov 2016 --- taking into account Duncan M. and
> Hadley's
> ##' ideas in the R-devel thread starting at (my mom's 86th birthday):
> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
> ifelse2 <- function (test, yes, no, NA. = NA) {
>     if(!is.logical(test)) {
>         if(is.atomic(test))
>             storage.mode(test) <- "logical"
>         else ## typically a "class"; storage.mode<-() typically fails
>             test <- if(isS4(test)) methods::as(test, "logical") else
> as.logical(test)
>     }
>
>     ## No longer optimize the  "if (a) x else y"  cases:
>     ## Only "non-good" R users use ifelse(.) instead of if(.) in these
> cases.
>
>     ans <-
>         tryCatch(rep(if(is.object(yes) && identical(class(yes), class(no)))
>                          ## as c(o) or o[0] may not work for the class
>                          yes else c(yes[0], no[0]), length.out =
> length(test)),
>                  error = function(e) { ## try asymmetric, yes-leaning
>                      r <- yes
>                      r[!test] <- no[!test]
>                      r
>                  })
>     ok <- !(nas <- is.na(test))
>     if (any(test[ok]))
>         ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
>     if (any(!test[ok]))
>         ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
>     ans[nas] <- NA. # possibly coerced to class(ans)
>     ans
> }
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler
>>>>> Gabriel Becker <[hidden email]>
>>>>>     on Tue, 15 Nov 2016 11:56:04 -0800 writes:

    > All,
    > Martin: Thanks for this and all the other things you are doing to both
    > drive R forward and engage more with the community about things like this.

    > Apologies for missing this discussion the first time it came around and if
    > anything here has already been brought up, but I wonder what exactly you
    > mean when you want recycling behavior.

Thank you, Gabe.

Note that my premise was really to get *away* from inheriting
too much from 'test'.
Hence, I have *not* been talking about replacing ifelse() but
rather of providing a new  ifelse2()

       [ or if_else()  if Hadley was willing to ditch the dplyr one
                in favor of a base one]

    > Specifically, based on an unrelated discussion with Henrik Bengtsson on
    > Twitter, I wonder if preserving the recycling behavior test is longer than
    > yes, no, but making the case where

    > length( test ) < max(length( yes ), length( no ))

    > would simplify usage for userRs in a useful way.

I'm sorry I don't understand the sentence above.

    > I suspect it's easy to
    > forget that the result is not guaranteed to be the length of  test, even
    > for intermediate and advanced users familiar with ifelse and it's
    > strengths/weaknesses.

    > I certainly agree (for what that's worth...) that

    > x = rnorm(100)

    > y = ifelse2(x > 0, 1L, 2L)

    > should continue to work.

(and give a a length 10 result).
Also
        ifelse2(x > 0, sqrt(x), 0L)

should work even though  class(sqrt(x)) is "numeric" and the one
of 0L is "integer", and I'd argue

        ifelse2(x < 0, sqrt(x + 0i), sqrt(x))

should also continue to work as with ifelse().

    > Also, If we combine a stricter contract that the output will always be of
    > length with the suggestion of a specified output class

that was not my intent here.... but would be another interesting
extension. However, I would like to keep  R-semantic silent coercions
such as
          logical < integer < double < complex

and your pseudo code below would not work so easily I think.

    > the pseudo code could be

(I'm changing assignment '=' to  '<-' ...  [please!] )

    > ifelse2 <- function(test, yes, no, outclass) {
    >   lenout  <- length(test)
    >   out <- as( rep(yes, length.out <- lenout), outclass)
    >   out[!test] <- as(rep(no, length.out = lenout)[!test], outclass)
    >   # handle NA stuff
    >   out
    > }


    > NAs could be tricky if outclass were allowed to be completely general, but
    > doable, I think? Another approach  if we ARE fast-passing while leaving
    > ifelse intact is that maybe NA's in test just aren't allowed in ifelse2.
    > I'm not saying we should definitely do that, but it's possible and would
    > make things faster.

    > Finally, In terms of efficiency, with the stuff that Luke and I are working
    > on, the NA detection could be virtually free in certain cases, which could
    > give a nice boost for long vectors  that don't have any NAs (and 'know'
    > that they don't).

That *is* indeed a very promising prospect!
Thank you in advance!

    > Best,
    > ~G

I still am bit disappointed by the fact that it seems nobody has
taken a good look at my ifelse2() proposal.

I really would like an alternative to ifelse() in *addition* to
the current ifelse(), but hopefully in the future being used in
quite a few places instead of ifelse()
efficiency but for changed semantics, namely working for considerably
more "vector like" classes of  'yes' and 'no'  than the current
ifelse().

As I said, the current proposal works for objects of class
   "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
and hopefully for "sparseVector" (in a next version of the 'Matrix' pkg).

Martin

    > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <[hidden email]
    >> wrote:

    >> Finally getting back to this :
    >>
    >> >>>>> Hadley Wickham <[hidden email]>
    >> >>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:
    >>
    >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
    >> > <[hidden email]> wrote:
    >> >>> >> One possibility would also be to consider a
    >> >>> "numbers-only" or >> rather "same type"-only {e.g.,
    >> >>> would also work for characters} >> version.
    >> >>>
    >> >>> > I don't know what you mean by these.
    >> >>>
    >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
    >> >>> which is very relevant, thank you Bob!
    >> >>>
    >> >>> As I have found, that actually works in such a "same
    >> >>> type"-only way: It does not try to coerce, but gives an
    >> >>> error when the classes differ, even in this somewhat
    >> >>> debatable case :
    >> >>>
    >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
    >> >>> `false` has type 'double' not 'integer'
    >> >>> >
    >> >>>
    >> >>> As documented, if_else() is clearly stricter than
    >> >>> ifelse() and e.g., also does no recycling (but of
    >> >>> length() 1).
    >> >>
    >> >> I agree that if_else() is currently too strict - it's
    >> >> particularly annoying if you want to replace some values
    >> >> with a missing:
    >> >>
    >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
    >> >> has type 'integer' not 'logical'
    >> >>
    >> >> But I would like to make sure that this remains an error:
    >> >>
    >> >> if_else(x > 5, x, "BLAH")
    >> >>
    >> >> Because that seems more likely to be a user error (but
    >> >> reasonable people might certainly believe that it should
    >> >> just work)
    >> >>
    >> >> dplyr is more accommodating in other places (i.e. in
    >> >> bind_rows(), collapse() and the joins) but it's
    >> >> surprisingly hard to get all the details right. For
    >> >> example, what should the result of this call be?
    >> >>
    >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
    >> >> factor(c("c", "b"))
    >> >>
    >> >> Strictly speaking I think you could argue it's an error,
    >> >> but that's not very user-friendly. Should it be a factor
    >> >> with the union of the levels? Should it be a character
    >> >> vector + warning? Should the behaviour change if one set
    >> >> of levels is a subset of the other set?
    >> >>
    >> >> There are similar issues for POSIXct (if the time zones
    >> >> are different, which should win?), and difftimes
    >> >> (similarly for units).  Ideally you'd like the behaviour
    >> >> to be extensible for new S3 classes, which suggests it
    >> >> should be a generic (and for the most general case, it
    >> >> would need to dispatch on both arguments).
    >>
    >> > One possible principle would be to use c() -
    >> > i.e. construct out as
    >>
    >> > out <- c(yes[0], no[0]
    >> > length(out) <- max(length(yes), length(no))
    >>
    >> yes; this would require that a  `length<-` method works for the
    >> class of the result.
    >>
    >> Duncan Murdoch mentioned a version of this, in his very
    >> first reply:
    >>
    >> ans <- c(yes, no)[seq_along(test)]
    >> ans <- ans[seq_along(test)]
    >>
    >> which is less efficient for atomic vectors, but requires
    >> less from the class: it "only" needs `c` and `[` to work
    >>
    >> and a mixture of your two proposals would be possible too:
    >>
    >> ans <- c(yes[0], no[0])
    >> ans <- ans[seq_along(test)]
    >>
    >> which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
    >> but that's a buglet in the  c.mpfr() implementation of my Rmpfr
    >> package... (which has already been fixed in the development version on
    >> R-forge,
    >> https://r-forge.r-project.org/R/?group_id=386)
    >>
    >> > But of course that wouldn't help with factor responses.
    >>
    >> Yes.  However, a version of Duncan's suggestion -- of treating 'yes' first
    >> -- does help in that case.
    >>
    >> For once, mainly as "feasability experiment",
    >> I have created a github gist to make my current ifelse2() proposal
    >> available
    >> for commenting, cloning, pullrequesting, etc:
    >>
    >> Consisting of 2 files
    >> - ifelse-def.R :  Functions definitions only, basically all the current
    >> proposals, called  ifelse*()
    >> - ifelse-checks.R : A simplistic checking function
    >> and examples calling it, notably demonstrating that my
    >> ifelse2()  does work with
    >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
    >> and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")
    >>
    >> Also if you are not on github, you can quickly get to the ifelse2()
    >> definition :
    >>
    >> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
    >> 94#file-ifelse-def-r-L168
    >>
    >> > Also, if you're considering an improved ifelse(), I'd
    >> > strongly urge you to consider adding an `na` argument,
    >>
    >> I now did (called it 'NA.').
    >>
    >> > so that you can use ifelse() to transform all three
    >> > possible values in a logical vector.
    >>
    >> > Hadley
    >> > -- http://hadley.nz
    >>
    >> For those who really hate GH (and don't want or cannot easily follow the
    >> above URL), here's my current definition:
    >>
    >>
    >> ##' Martin Maechler, 14. Nov 2016 --- taking into account Duncan M. and
    >> Hadley's
    >> ##' ideas in the R-devel thread starting at (my mom's 86th birthday):
    >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
    >> ifelse2 <- function (test, yes, no, NA. = NA) {
    >> if(!is.logical(test)) {
    >> if(is.atomic(test))
    >> storage.mode(test) <- "logical"
    >> else ## typically a "class"; storage.mode<-() typically fails
    >> test <- if(isS4(test)) methods::as(test, "logical") else
    >> as.logical(test)
    >> }
    >>
    >> ## No longer optimize the  "if (a) x else y"  cases:
    >> ## Only "non-good" R users use ifelse(.) instead of if(.) in these
    >> cases.
    >>
    >> ans <-
    >> tryCatch(rep(if(is.object(yes) && identical(class(yes), class(no)))
    >> ## as c(o) or o[0] may not work for the class
    >> yes else c(yes[0], no[0]), length.out =
    >> length(test)),
    >> error = function(e) { ## try asymmetric, yes-leaning
    >> r <- yes
    >> r[!test] <- no[!test]
    >> r
    >> })
    >> ok <- !(nas <- is.na(test))
    >> if (any(test[ok]))
    >> ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
    >> if (any(!test[ok]))
    >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
    >> ans[nas] <- NA. # possibly coerced to class(ans)
    >> ans
    >> }
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>



    > --
    > Gabriel Becker, PhD
    > Associate Scientist (Bioinformatics)
    > Genentech Research

    > [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Gabriel Becker
Martin et al.,




On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <[hidden email]
> wrote:

>
> Note that my premise was really to get *away* from inheriting
> too much from 'test'.
> Hence, I have *not* been talking about replacing ifelse() but
> rather of providing a new  ifelse2()
>
>        [ or if_else()  if Hadley was willing to ditch the dplyr one
>                        in favor of a base one]
>
>     > Specifically, based on an unrelated discussion with Henrik Bengtsson
> on
>     > Twitter, I wonder if preserving the recycling behavior test is
> longer than
>     > yes, no, but making the case where
>
>     > length( test ) < max(length( yes ), length( no ))
>
>     > would simplify usage for userRs in a useful way.
>

That was a copyediting bug on my part, it seems I hit send with my message
only half-edited/proofread. Apologies.

 That should have said that making the case where test is the one that will
be recycled (because it is shorter than either yes or no) an error. My
claim is that the fact that test itself can be recycled, rather than just
yes or no, is confusing to many R users. If we are writing an ifelse2 we
might want to drop that feature and just throw an error in that case.
(Users could still use the original ifelse if they understand and
specifically want that behavior).

Does that make more sense?



>
>     > Also, If we combine a stricter contract that the output will always
> be of
>     > length with the suggestion of a specified output class
>
>
Here, again, I was talking about the restriction that the output be
guaranteed to be the length of test, regardless of the length of yes and
no. That, combined with a specific, guaranteed output class would make a
much narrower/more restricted but also (I argue) much easier to understand
function. Particularly for beginning and intermediate users.

I do hear what you're saying about silent conversion, though, so what I'm
describing might be a third function (ifelse3 for lack of a better name for
now), as you pointed out.


> that was not my intent here.... but would be another interesting
> extension. However, I would like to keep  R-semantic silent coercions
> such as
>           logical < integer < double < complex
>
> and your pseudo code below would not work so easily I think.
>
>     > the pseudo code could be
>
> (I'm changing assignment '=' to  '<-' ...  [please!] )
>
>     > ifelse2 <- function(test, yes, no, outclass) {
>     >   lenout  <- length(test)
>     >   out <- as( rep(yes, length.out <- lenout), outclass)
>     >   out[!test] <- as(rep(no, length.out = lenout)[!test], outclass)
>     >   # handle NA stuff
>     >   out
>     > }
>
>
>     > NAs could be tricky if outclass were allowed to be completely
> general, but
>     > doable, I think? Another approach  if we ARE fast-passing while
> leaving
>     > ifelse intact is that maybe NA's in test just aren't allowed in
> ifelse2.
>     > I'm not saying we should definitely do that, but it's possible and
> would
>     > make things faster.
>
>     > Finally, In terms of efficiency, with the stuff that Luke and I are
> working
>     > on, the NA detection could be virtually free in certain cases, which
> could
>     > give a nice boost for long vectors  that don't have any NAs (and
> 'know'
>     > that they don't).
>
> That *is* indeed a very promising prospect!
> Thank you in advance!
>
>     > Best,
>     > ~G
>
> I still am bit disappointed by the fact that it seems nobody has
> taken a good look at my ifelse2() proposal.
>

I plan to look at it soon. Thanks again for all your work.

~G


>
> I really would like an alternative to ifelse() in *addition* to
> the current ifelse(), but hopefully in the future being used in
> quite a few places instead of ifelse()
> efficiency but for changed semantics, namely working for considerably
> more "vector like" classes of  'yes' and 'no'  than the current
> ifelse().
>
> As I said, the current proposal works for objects of class
>    "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
> and hopefully for "sparseVector" (in a next version of the 'Matrix' pkg).
>
> Martin
>
>     > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <
> [hidden email]
>     >> wrote:
>
>     >> Finally getting back to this :
>     >>
>     >> >>>>> Hadley Wickham <[hidden email]>
>     >> >>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:
>     >>
>     >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
>     >> > <[hidden email]> wrote:
>     >> >>> >> One possibility would also be to consider a
>     >> >>> "numbers-only" or >> rather "same type"-only {e.g.,
>     >> >>> would also work for characters} >> version.
>     >> >>>
>     >> >>> > I don't know what you mean by these.
>     >> >>>
>     >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
>     >> >>> which is very relevant, thank you Bob!
>     >> >>>
>     >> >>> As I have found, that actually works in such a "same
>     >> >>> type"-only way: It does not try to coerce, but gives an
>     >> >>> error when the classes differ, even in this somewhat
>     >> >>> debatable case :
>     >> >>>
>     >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
>     >> >>> `false` has type 'double' not 'integer'
>     >> >>> >
>     >> >>>
>     >> >>> As documented, if_else() is clearly stricter than
>     >> >>> ifelse() and e.g., also does no recycling (but of
>     >> >>> length() 1).
>     >> >>
>     >> >> I agree that if_else() is currently too strict - it's
>     >> >> particularly annoying if you want to replace some values
>     >> >> with a missing:
>     >> >>
>     >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
>     >> >> has type 'integer' not 'logical'
>     >> >>
>     >> >> But I would like to make sure that this remains an error:
>     >> >>
>     >> >> if_else(x > 5, x, "BLAH")
>     >> >>
>     >> >> Because that seems more likely to be a user error (but
>     >> >> reasonable people might certainly believe that it should
>     >> >> just work)
>     >> >>
>     >> >> dplyr is more accommodating in other places (i.e. in
>     >> >> bind_rows(), collapse() and the joins) but it's
>     >> >> surprisingly hard to get all the details right. For
>     >> >> example, what should the result of this call be?
>     >> >>
>     >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
>     >> >> factor(c("c", "b"))
>     >> >>
>     >> >> Strictly speaking I think you could argue it's an error,
>     >> >> but that's not very user-friendly. Should it be a factor
>     >> >> with the union of the levels? Should it be a character
>     >> >> vector + warning? Should the behaviour change if one set
>     >> >> of levels is a subset of the other set?
>     >> >>
>     >> >> There are similar issues for POSIXct (if the time zones
>     >> >> are different, which should win?), and difftimes
>     >> >> (similarly for units).  Ideally you'd like the behaviour
>     >> >> to be extensible for new S3 classes, which suggests it
>     >> >> should be a generic (and for the most general case, it
>     >> >> would need to dispatch on both arguments).
>     >>
>     >> > One possible principle would be to use c() -
>     >> > i.e. construct out as
>     >>
>     >> > out <- c(yes[0], no[0]
>     >> > length(out) <- max(length(yes), length(no))
>     >>
>     >> yes; this would require that a  `length<-` method works for the
>     >> class of the result.
>     >>
>     >> Duncan Murdoch mentioned a version of this, in his very
>     >> first reply:
>     >>
>     >> ans <- c(yes, no)[seq_along(test)]
>     >> ans <- ans[seq_along(test)]
>     >>
>     >> which is less efficient for atomic vectors, but requires
>     >> less from the class: it "only" needs `c` and `[` to work
>     >>
>     >> and a mixture of your two proposals would be possible too:
>     >>
>     >> ans <- c(yes[0], no[0])
>     >> ans <- ans[seq_along(test)]
>     >>
>     >> which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
>     >> but that's a buglet in the  c.mpfr() implementation of my Rmpfr
>     >> package... (which has already been fixed in the development version
> on
>     >> R-forge,
>     >> https://r-forge.r-project.org/R/?group_id=386)
>     >>
>     >> > But of course that wouldn't help with factor responses.
>     >>
>     >> Yes.  However, a version of Duncan's suggestion -- of treating
> 'yes' first
>     >> -- does help in that case.
>     >>
>     >> For once, mainly as "feasability experiment",
>     >> I have created a github gist to make my current ifelse2() proposal
>     >> available
>     >> for commenting, cloning, pullrequesting, etc:
>     >>
>     >> Consisting of 2 files
>     >> - ifelse-def.R :  Functions definitions only, basically all the
> current
>     >> proposals, called  ifelse*()
>     >> - ifelse-checks.R : A simplistic checking function
>     >> and examples calling it, notably demonstrating that my
>     >> ifelse2()  does work with
>     >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
>     >> and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")
>     >>
>     >> Also if you are not on github, you can quickly get to the ifelse2()
>     >> definition :
>     >>
>     >> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
>     >> 94#file-ifelse-def-r-L168
>     >>
>     >> > Also, if you're considering an improved ifelse(), I'd
>     >> > strongly urge you to consider adding an `na` argument,
>     >>
>     >> I now did (called it 'NA.').
>     >>
>     >> > so that you can use ifelse() to transform all three
>     >> > possible values in a logical vector.
>     >>
>     >> > Hadley
>     >> > -- http://hadley.nz
>     >>
>     >> For those who really hate GH (and don't want or cannot easily
> follow the
>     >> above URL), here's my current definition:
>     >>
>     >>
>     >> ##' Martin Maechler, 14. Nov 2016 --- taking into account Duncan M.
> and
>     >> Hadley's
>     >> ##' ideas in the R-devel thread starting at (my mom's 86th
> birthday):
>     >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
>     >> ifelse2 <- function (test, yes, no, NA. = NA) {
>     >> if(!is.logical(test)) {
>     >> if(is.atomic(test))
>     >> storage.mode(test) <- "logical"
>     >> else ## typically a "class"; storage.mode<-() typically fails
>     >> test <- if(isS4(test)) methods::as(test, "logical") else
>     >> as.logical(test)
>     >> }
>     >>
>     >> ## No longer optimize the  "if (a) x else y"  cases:
>     >> ## Only "non-good" R users use ifelse(.) instead of if(.) in these
>     >> cases.
>     >>
>     >> ans <-
>     >> tryCatch(rep(if(is.object(yes) && identical(class(yes), class(no)))
>     >> ## as c(o) or o[0] may not work for the class
>     >> yes else c(yes[0], no[0]), length.out =
>     >> length(test)),
>     >> error = function(e) { ## try asymmetric, yes-leaning
>     >> r <- yes
>     >> r[!test] <- no[!test]
>     >> r
>     >> })
>     >> ok <- !(nas <- is.na(test))
>     >> if (any(test[ok]))
>     >> ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
>     >> if (any(!test[ok]))
>     >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
>     >> ans[nas] <- NA. # possibly coerced to class(ans)
>     >> ans
>     >> }
>     >>
>     >> ______________________________________________
>     >> [hidden email] mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>
>
>
>
>     > --
>     > Gabriel Becker, PhD
>     > Associate Scientist (Bioinformatics)
>     > Genentech Research
>
>     > [[alternative HTML version deleted]]
>
>


--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

R devel mailing list
In reply to this post by Duncan Murdoch-2
Just stating, in 'ifelse', 'test' is not recycled. As I said in "R-intro: length of 'ifelse' result" (https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b) returns a vector of the length of 'condition', even if 'a' or 'b' is longer.

On current 'ifelse' code in R:
* The part
ans[nas] <- NA
could be omitted because NA's are already in place.
If the part is removed, variable 'nas' is no longer used.
* The any(*) part actually checks the thing that is used as the index vector. The index vector could be stored and then repeatedly used, like the following.
    if (any(sel <- test & ok))
    ans[sel] <- rep(yes, length.out = length(ans))[sel]
* If 'test' is a factor, doing
storage.mode(test) <- "logical"
is not appropriate, but is.atomic(test) returns TRUE. Maybe use
if(!is.object(test))
instead of
if(is.atomic(test)) .

On ifelse-checks.R:
* In function 'chkIfelse', if the fourth function argument names is not "NA.", the argument name is changed, but the function body still uses the old name. That makes error in chkIfelse(ifelseHW) .
A fix:
        if(names(formals(FUN))[[4]] != "NA.") {
            body(FUN) <- do.call(substitute, list(body(FUN),
                setNames(list(quote(NA.)), names(formals(FUN))[[4]])))
            names(formals(FUN))[[4]] <- "NA."
        }
After fixing, chkIfelse(ifelseHW) just fails at identical(iflt, as.POSIXlt(ifct)) .
'iflt' has NA as 'tzone' and 'isdst' components.
* Because function 'chkIfelse' continues checking after failure,
as.POSIXlt(ifct)
may give error. The error happens, for example, in chkIfelse(ifelseR) . Maybe place it inside try(...).
* If 'lt' is a "POSIXlt" object, (lt-100) is a "POSIXct" object.
So,
FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
is an example of mixed class.
* The part of function 'chkIfelse' in
for(i in seq_len(nFact))
uses 'NA.' function argument. That makes error when 'chkIfelse' is applied to function without fourth argument.
The part should be wrapped in
if(has.4th) .
* Function 'ifelseJH' has fourth argument, but the argument is not for value if NA. So, instead of
chkIfelse(ifelseJH) ,
maybe call
chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .

A concrete version of 'ifelse2' that starts the result from 'yes':
function(test, yes, no, NA. = NA) {
    if(!is.logical(test))
        test <- if(isS4(test)) methods::as(test, "logical") else as.logical(test)
    n <- length(test)
    ans <- rep(yes, length.out = n)
    ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & !is.na(test)]
    ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
    ans
}

It requires 'rep' method that is compatible with subsetting. It also works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an appropriate result if time zones are the same.
For coercion of 'test', there is no need of keeping attributes. So, it doesn't do
storage.mode(test) <- "logical"
and goes directly to 'as.logical'.
It relies on subassignment for silent coercions of
logical < integer < double < complex .
Unlike 'ifelse', it never skips any subassignment. So, phenomenon as in "example of different return modes" in ?ifelse doesn't happen.

Another version, for keeping attributes as pointed out by Duncan Murdoch:
function(test, yes, no, NA. = NA) {
    if(!is.logical(test))
        test <- if(isS4(test)) methods::as(test, "logical") else as.logical(test)
    n <- length(test)
    n.yes <- length(yes); n.no <- length(no)
    if (n.yes != n) {
        if (n.no == n) {  # swap yes <-> no
            test <- !test
            ans <- yes; yes <- no; no <- ans
            n.no <- n.yes
        } else yes <- yes[rep_len(seq_len(n.yes), n)]
    }
    ans <- yes
    if (n.no == 1L)
        ans[!test] <- no
    else
        ans[!test & !is.na(test)] <- no[
            if (n.no == n) !test & !is.na(test)
            else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
    stopifnot(length(NA.) == 1L)
    ans[is.na(test)] <- NA.
    ans
}

Note argument evaluation order: 'test', 'yes', 'no', 'NA.'.
First, it chooses the first of 'yes' and 'no' that has the same length as the result. If none of 'yes' and 'no' matches the length of the result, it chooses recycled (or truncated) 'yes'.
It uses 'rep' on the index and subsetting as a substitute for 'rep' on the value.
It requires 'length' method that is compatible with subsetting.
Additionally, it uses the same idea as dplyr::if_else, or more precisely the helper function 'replace_with'. It doesn't use 'rep' if the length of 'no' is 1 or is the same as the length of the result. For subassignment with value of length 1, recycling happens by itself and NA in index is OK.
It limits 'NA.' to be of length 1, considering 'NA.' just as a label for NA.

Cases where the last version above or 'ifelse2 or 'ifelseHW' in ifelse-def.R gives inappropriate answers:
- 'yes' and 'no' are "difftime" objects with different "units" attribute
- 'yes' and 'no' are "POSIXlt" objects with different time zone
Example: 'yes' in "UTC" and 'no' in "EST5EDT". The reverse, 'yes' in "EST5EDT" and 'no' in "UTC" gives error.

For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R gives a right answer for "POSIXlt" case.
---------------------
Martin et al.,




On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at stat.math.ethz.ch
> wrote:

>
> Note that my premise was really to get *away* from inheriting
> too much from 'test'.
> Hence, I have *not* been talking about replacing ifelse() but
> rather of providing a new  ifelse2()
>
>        [ or if_else()  if Hadley was willing to ditch the dplyr one
>                        in favor of a base one]
>
>     > Specifically, based on an unrelated discussion with Henrik Bengtsson
> on
>     > Twitter, I wonder if preserving the recycling behavior test is
> longer than
>     > yes, no, but making the case where
>
>     > length( test ) < max(length( yes ), length( no ))
>
>     > would simplify usage for userRs in a useful way.
>

That was a copyediting bug on my part, it seems I hit send with my message
only half-edited/proofread. Apologies.

 That should have said that making the case where test is the one that will
be recycled (because it is shorter than either yes or no) an error. My
claim is that the fact that test itself can be recycled, rather than just
yes or no, is confusing to many R users. If we are writing an ifelse2 we
might want to drop that feature and just throw an error in that case.
(Users could still use the original ifelse if they understand and
specifically want that behavior).

Does that make more sense?



>
>     > Also, If we combine a stricter contract that the output will always
> be of
>     > length with the suggestion of a specified output class
>
>
Here, again, I was talking about the restriction that the output be
guaranteed to be the length of test, regardless of the length of yes and
no. That, combined with a specific, guaranteed output class would make a
much narrower/more restricted but also (I argue) much easier to understand
function. Particularly for beginning and intermediate users.

I do hear what you're saying about silent conversion, though, so what I'm
describing might be a third function (ifelse3 for lack of a better name for
now), as you pointed out.


> that was not my intent here.... but would be another interesting
> extension. However, I would like to keep  R-semantic silent coercions
> such as
>           logical < integer < double < complex
>
> and your pseudo code below would not work so easily I think.
>
>     > the pseudo code could be
>
> (I'm changing assignment '=' to  '<-' ...  [please!] )
>
>     > ifelse2 <- function(test, yes, no, outclass) {
>     >   lenout  <- length(test)
>     >   out <- as( rep(yes, length.out <- lenout), outclass)
>     >   out[!test] <- as(rep(no, length.out = lenout)[!test], outclass)
>     >   # handle NA stuff
>     >   out
>     > }
>
>
>     > NAs could be tricky if outclass were allowed to be completely
> general, but
>     > doable, I think? Another approach  if we ARE fast-passing while
> leaving
>     > ifelse intact is that maybe NA's in test just aren't allowed in
> ifelse2.
>     > I'm not saying we should definitely do that, but it's possible and
> would
>     > make things faster.
>
>     > Finally, In terms of efficiency, with the stuff that Luke and I are
> working
>     > on, the NA detection could be virtually free in certain cases, which
> could
>     > give a nice boost for long vectors  that don't have any NAs (and
> 'know'
>     > that they don't).
>
> That *is* indeed a very promising prospect!
> Thank you in advance!
>
>     > Best,
>     > ~G
>
> I still am bit disappointed by the fact that it seems nobody has
> taken a good look at my ifelse2() proposal.
>

I plan to look at it soon. Thanks again for all your work.

~G


>
> I really would like an alternative to ifelse() in *addition* to
> the current ifelse(), but hopefully in the future being used in
> quite a few places instead of ifelse()
> efficiency but for changed semantics, namely working for considerably
> more "vector like" classes of  'yes' and 'no'  than the current
> ifelse().
>
> As I said, the current proposal works for objects of class
>    "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
> and hopefully for "sparseVector" (in a next version of the 'Matrix' pkg).
>
> Martin
>
>     > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <
> maechler at stat.math.ethz.ch
>     >> wrote:
>
>     >> Finally getting back to this :
>     >>
>     >> >>>>> Hadley Wickham <h.wickham at gmail.com>
>     >> >>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:
>     >>
>     >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
>     >> > <h.wickham at gmail.com> wrote:
>     >> >>> >> One possibility would also be to consider a
>     >> >>> "numbers-only" or >> rather "same type"-only {e.g.,
>     >> >>> would also work for characters} >> version.
>     >> >>>
>     >> >>> > I don't know what you mean by these.
>     >> >>>
>     >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
>     >> >>> which is very relevant, thank you Bob!
>     >> >>>
>     >> >>> As I have found, that actually works in such a "same
>     >> >>> type"-only way: It does not try to coerce, but gives an
>     >> >>> error when the classes differ, even in this somewhat
>     >> >>> debatable case :
>     >> >>>
>     >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
>     >> >>> `false` has type 'double' not 'integer'
>     >> >>> >
>     >> >>>
>     >> >>> As documented, if_else() is clearly stricter than
>     >> >>> ifelse() and e.g., also does no recycling (but of
>     >> >>> length() 1).
>     >> >>
>     >> >> I agree that if_else() is currently too strict - it's
>     >> >> particularly annoying if you want to replace some values
>     >> >> with a missing:
>     >> >>
>     >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
>     >> >> has type 'integer' not 'logical'
>     >> >>
>     >> >> But I would like to make sure that this remains an error:
>     >> >>
>     >> >> if_else(x > 5, x, "BLAH")
>     >> >>
>     >> >> Because that seems more likely to be a user error (but
>     >> >> reasonable people might certainly believe that it should
>     >> >> just work)
>     >> >>
>     >> >> dplyr is more accommodating in other places (i.e. in
>     >> >> bind_rows(), collapse() and the joins) but it's
>     >> >> surprisingly hard to get all the details right. For
>     >> >> example, what should the result of this call be?
>     >> >>
>     >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
>     >> >> factor(c("c", "b"))
>     >> >>
>     >> >> Strictly speaking I think you could argue it's an error,
>     >> >> but that's not very user-friendly. Should it be a factor
>     >> >> with the union of the levels? Should it be a character
>     >> >> vector + warning? Should the behaviour change if one set
>     >> >> of levels is a subset of the other set?
>     >> >>
>     >> >> There are similar issues for POSIXct (if the time zones
>     >> >> are different, which should win?), and difftimes
>     >> >> (similarly for units).  Ideally you'd like the behaviour
>     >> >> to be extensible for new S3 classes, which suggests it
>     >> >> should be a generic (and for the most general case, it
>     >> >> would need to dispatch on both arguments).
>     >>
>     >> > One possible principle would be to use c() -
>     >> > i.e. construct out as
>     >>
>     >> > out <- c(yes[0], no[0]
>     >> > length(out) <- max(length(yes), length(no))
>     >>
>     >> yes; this would require that a  `length<-` method works for the
>     >> class of the result.
>     >>
>     >> Duncan Murdoch mentioned a version of this, in his very
>     >> first reply:
>     >>
>     >> ans <- c(yes, no)[seq_along(test)]
>     >> ans <- ans[seq_along(test)]
>     >>
>     >> which is less efficient for atomic vectors, but requires
>     >> less from the class: it "only" needs `c` and `[` to work
>     >>
>     >> and a mixture of your two proposals would be possible too:
>     >>
>     >> ans <- c(yes[0], no[0])
>     >> ans <- ans[seq_along(test)]
>     >>
>     >> which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
>     >> but that's a buglet in the  c.mpfr() implementation of my Rmpfr
>     >> package... (which has already been fixed in the development version
> on
>     >> R-forge,
>     >> https://r-forge.r-project.org/R/?group_id=386)
>     >>
>     >> > But of course that wouldn't help with factor responses.
>     >>
>     >> Yes.  However, a version of Duncan's suggestion -- of treating
> 'yes' first
>     >> -- does help in that case.
>     >>
>     >> For once, mainly as "feasability experiment",
>     >> I have created a github gist to make my current ifelse2() proposal
>     >> available
>     >> for commenting, cloning, pullrequesting, etc:
>     >>
>     >> Consisting of 2 files
>     >> - ifelse-def.R :  Functions definitions only, basically all the
> current
>     >> proposals, called  ifelse*()
>     >> - ifelse-checks.R : A simplistic checking function
>     >> and examples calling it, notably demonstrating that my
>     >> ifelse2()  does work with
>     >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
>     >> and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")
>     >>
>     >> Also if you are not on github, you can quickly get to the ifelse2()
>     >> definition :
>     >>
>     >> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
>     >> 94#file-ifelse-def-r-L168
>     >>
>     >> > Also, if you're considering an improved ifelse(), I'd
>     >> > strongly urge you to consider adding an `na` argument,
>     >>
>     >> I now did (called it 'NA.').
>     >>
>     >> > so that you can use ifelse() to transform all three
>     >> > possible values in a logical vector.
>     >>
>     >> > Hadley
>     >> > -- http://hadley.nz
>     >>
>     >> For those who really hate GH (and don't want or cannot easily
> follow the
>     >> above URL), here's my current definition:
>     >>
>     >>
>     >> ##' Martin Maechler, 14. Nov 2016 --- taking into account Duncan M.
> and
>     >> Hadley's
>     >> ##' ideas in the R-devel thread starting at (my mom's 86th
> birthday):
>     >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
>     >> ifelse2 <- function (test, yes, no, NA. = NA) {
>     >> if(!is.logical(test)) {
>     >> if(is.atomic(test))
>     >> storage.mode(test) <- "logical"
>     >> else ## typically a "class"; storage.mode<-() typically fails
>     >> test <- if(isS4(test)) methods::as(test, "logical") else
>     >> as.logical(test)
>     >> }
>     >>
>     >> ## No longer optimize the  "if (a) x else y"  cases:
>     >> ## Only "non-good" R users use ifelse(.) instead of if(.) in these
>     >> cases.
>     >>
>     >> ans <-
>     >> tryCatch(rep(if(is.object(yes) && identical(class(yes), class(no)))
>     >> ## as c(o) or o[0] may not work for the class
>     >> yes else c(yes[0], no[0]), length.out =
>     >> length(test)),
>     >> error = function(e) { ## try asymmetric, yes-leaning
>     >> r <- yes
>     >> r[!test] <- no[!test]
>     >> r
>     >> })
>     >> ok <- !(nas <- is.na(test))
>     >> if (any(test[ok]))
>     >> ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
>     >> if (any(!test[ok]))
>     >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
>     >> ans[nas] <- NA. # possibly coerced to class(ans)
>     >> ans
>     >> }
>     >>
>     >> ______________________________________________
>     >> R-devel at r-project.org mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>
>
>
>
>     > --
>     > Gabriel Becker, PhD
>     > Associate Scientist (Bioinformatics)
>     > Genentech Research
>
>     > [[alternative HTML version deleted]]
>
>


--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

S Ellison-2
> Just stating, in 'ifelse', 'test' is not recycled. As I said in "R-intro: length of 'ifelse' result"
> (https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b)
> returns a vector of the length of 'condition', even if 'a' or 'b' is longer.

That is indeed (almost) the documented behaviour. The documented behaviour is slightly more complex; '... returns a value _of the same shape_ as 'test''. IN principle, test can be a matrix, for example.

> A concrete version of 'ifelse2' that starts the result from 'yes':
> .. still a bit disappointed that nobody has taken a look ...

I took a look. The idea leaves (at least) me very uneasy. If you are recycling 'test' as well as arbitrary-length yes and no, results will become frighteningly hard to predict except in very simple cases where you have well-defined and consistent regularities in the data. And where you do, surely passing ifelse a vetor of the right length, generated by rep() applied to a short 'test' vector, will do what you want without messing around with new functions that hide what you're doing.

Do you really have a case where 'test' is neither a single logical (that could be used with 'if') nor a vector that can be readily replicated to the desired length with 'rep'?

If not, I'd drop the attempt to generate new ifelse-like functions.

S Ellison



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

R devel mailing list
In reply to this post by Duncan Murdoch-2
For S Ellison, just clarifying, I am Suharto Anggono, not Martin Maechler. "Martin et al.," from my previous E-mail was the beginning of message from Gabriel Becker, that I quoted.
The quoted "still a bit disappointed that nobody has taken a look" is from Martin Maechler.
In all of the proposed 'ifelse'-like functions so far, including from me (that I labeled as 'ifelse2', following Martin Maechler) and from Martin Maechler, the length of the result equals the length of 'test', like in 'ifelse'. There is no recycling of 'test'.



-----------------------------------------
> Just stating, in 'ifelse', 'test' is not recycled. As I said in "R-intro: length of 'ifelse' result"
> (https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b)
> returns a vector of the length of 'condition', even if 'a' or 'b' is longer.

That is indeed (almost) the documented behaviour. The documented behaviour is slightly more complex; '... returns a value _of the same shape_ as 'test''. IN principle, test can be a matrix, for example.

> A concrete version of 'ifelse2' that starts the result from 'yes':
> .. still a bit disappointed that nobody has taken a look ...

I took a look. The idea leaves (at least) me very uneasy. If you are recycling 'test' as well as arbitrary-length yes and no, results will become frighteningly hard to predict except in very simple cases where you have well-defined and consistent regularities in the data. And where you do, surely passing ifelse a vetor of the right length, generated by rep() applied to a short 'test' vector, will do what you want without messing around with new functions that hide what you're doing.

Do you really have a case where 'test' is neither a single logical (that could be used with 'if') nor a vector that can be readily replicated to the desired length with 'rep'?

If not, I'd drop the attempt to generate new ifelse-like functions.

S Ellison

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

R devel mailing list
In reply to this post by Duncan Murdoch-2
Related to the length of 'ifelse' result, I want to say that "example of different return modes" in ?ifelse led me to perceive a wrong thing in the past.

     ## example of different return modes:
     yes <- 1:3
     no <- pi^(0:3)
     typeof(ifelse(NA,    yes, no)) # logical
     typeof(ifelse(TRUE,  yes, no)) # integer
     typeof(ifelse(FALSE, yes, no)) # double

As the result of each 'ifelse' call is not printed, I thought that the length of the result is 3. In fact, the length of the result is 1.
I realize just now that the length of 'no' is different from 'yes'. The length of 'yes' is 3, the length of 'no' is 4.



--------------------------------------------


 Subject: Re: ifelse() woes ... can we agree on a ifelse2() ?
 To: [hidden email]
 Date: Sunday, 27 November, 2016, 8:50 AM
 
In all of the proposed 'ifelse'-like functions so far, including from me (that I labeled as 'ifelse2', following Martin Maechler) and from Martin Maechler, the length of the result equals the length of 'test', like in 'ifelse'. There is no recycling of 'test'.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler

> Related to the length of 'ifelse' result, I want to say that "example of different return modes" in ?ifelse led me to perceive a wrong thing in the past.
>      ## example of different return modes:
>      yes <- 1:3
>      no <- pi^(0:3)
>      typeof(ifelse(NA,    yes, no)) # logical
>      typeof(ifelse(TRUE,  yes, no)) # integer
>      typeof(ifelse(FALSE, yes, no)) # double
>
> As the result of each 'ifelse' call is not printed, I thought that the length of the result is 3. In fact, the length of the result is 1.

"of course"... (;-)

But this indeed proves that the example is too sophisticated and
not helpful/clear enough.
Is this better?

## example of different return modes (and 'test' alone determining length):
yes <- 1:3
no  <- pi^(1:4)
utils::str( ifelse(NA,    yes, no) ) # logical, length 1
utils::str( ifelse(TRUE,  yes, no) ) # integer, length 1
utils::str( ifelse(FALSE, yes, no) ) # double,  length 1



> I realize just now that the length of 'no' is different from 'yes'. The length of 'yes' is 3, the length of 'no' is 4.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler
In reply to this post by R devel mailing list
>>>>> Suharto Anggono Suharto Anggono via R-devel <[hidden email]>
>>>>>     on Sat, 26 Nov 2016 17:14:01 +0000 writes:

    > Just stating, in 'ifelse', 'test' is not recycled. As I said in "R-intro: length of 'ifelse' result" (https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b) returns a vector of the length of 'condition', even if 'a' or 'b' is longer.

yes and  ?ifelse (the help page) also does not say that test is
recycled, rather

   >>  If \code{yes} or \code{no} are too short, their elements are recycled.

(*and* the problem you wrote the above has been corrected in the
 R-intro manual shortly after).


    > On current 'ifelse' code in R:

    > * The part
    > ans[nas] <- NA
    > could be omitted because NA's are already in place.
    > If the part is removed, variable 'nas' is no longer used.

I agree that this seems logical.  If I apply the change, R's own
full checks do not seem affected, and I may try to commit that
change and "wait and see".


    > * The any(*) part actually checks the thing that is used as the index vector. The index vector could be stored and then repeatedly used, like the following.

    >     if (any(sel <- test & ok))
    >     ans[sel] <- rep(yes, length.out = length(ans))[sel]

yes, I know, and have had similar thoughts in the past.
However note  (I know you that) the current code is

    if (any(test[ok]))
        ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]

and   any(test[ok])         may be considerably faster than
      any(sel <- test & ok)
     
OTOH I think the current code would only be faster (for the
above) when any(.) returned FALSE ...
I think it may depend on the typical use cases which of the two
versions is more efficient.


    > * If 'test' is a factor, doing
    > storage.mode(test) <- "logical"
    > is not appropriate, but is.atomic(test) returns TRUE. Maybe use
    > if(!is.object(test))
    >    instead of
    > if(is.atomic(test)) .

This would be a considerable change I think...
Note that I'm currently really proposing to introduce an *additional*
ifelse function with different "more reasonable" semantic,  and
your last change would do that too.

My alternative should really work
- for factors
- for "array"s including "matrix" (as the current ifelse() does!)
- for "Date", "POSIXct", "ts"(timeseries), "zoo",
      "sparseVector", "sparseMatrix" (*), or "mpfr",
  without any special code, but rather by design.

 *) Currently needs the R-forge version of  Matrix, version 1.2-8.

A bit less than an hour ago, I have updated the gist with an updated
proposal ifelse2() {and the current alternatives that I know},
modified so it *does* keep more, e.g.  dim() attributes in
reasonable cases.

https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d87894#file-ifelse-def-r-L168

Hence my ifelse2() became even a bit longer (but not slower)
working for even more classes of  "yes" and "no".


    > On ifelse-checks.R:
    > * In function 'chkIfelse', if the fourth function argument names is not "NA.", the argument name is changed, but the function body still uses the old name. That makes error in chkIfelse(ifelseHW) .
    > A fix:
    >         if(names(formals(FUN))[[4]] != "NA.") {
    >             body(FUN) <- do.call(substitute, list(body(FUN),
    >                 setNames(list(quote(NA.)), names(formals(FUN))[[4]])))
    >             names(formals(FUN))[[4]] <- "NA."
    >         }

yes, thank you!  (a bit embarrassing for me ..)

    > After fixing, chkIfelse(ifelseHW) just fails at identical(iflt, as.POSIXlt(ifct)) .
    > 'iflt' has NA as 'tzone' and 'isdst' components.
    > * Because function 'chkIfelse' continues checking after failure,
    > as.POSIXlt(ifct)
    > may give error. The error happens, for example, in chkIfelse(ifelseR) . Maybe place it inside try(...).
    > * If 'lt' is a "POSIXlt" object, (lt-100) is a "POSIXct" object.
    > So,
    > FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
    > is an example of mixed class.

good; thank you for the hint.

    > * The part of function 'chkIfelse' in
    > for(i in seq_len(nFact))
    > uses 'NA.' function argument. That makes error when 'chkIfelse' is applied to function without fourth argument.
    > The part should be wrapped in
    > if(has.4th) .
yes of course

    > * Function 'ifelseJH' has fourth argument, but the argument is not for value if NA. So, instead of
    > chkIfelse(ifelseJH) ,
    > maybe call
    > chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .
You are right;
I've decided to solve this differently.

I'm looking at these suggestions now, notably also your proposals below;
thank you, Suharto!

(I wanted to put my improved 'ifelse2' out first, quickly).
Martin


    > A concrete version of 'ifelse2' that starts the result from 'yes':
    > function(test, yes, no, NA. = NA) {
    >     if(!is.logical(test))
    >         test <- if(isS4(test)) methods::as(test, "logical") else as.logical(test)
    >     n <- length(test)
    >     ans <- rep(yes, length.out = n)
    >     ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & !is.na(test)]
    >     ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
    >     ans
    > }

    > It requires 'rep' method that is compatible with subsetting. It also works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an appropriate result if time zones are the same.
    > For coercion of 'test', there is no need of keeping attributes. So, it doesn't do
    > storage.mode(test) <- "logical"
    > and goes directly to 'as.logical'.
    > It relies on subassignment for silent coercions of
    > logical < integer < double < complex .
    > Unlike 'ifelse', it never skips any subassignment. So, phenomenon as in "example of different return modes" in ?ifelse doesn't happen.

    > Another version, for keeping attributes as pointed out by Duncan Murdoch:
    > function(test, yes, no, NA. = NA) {
    >     if(!is.logical(test))
    >         test <- if(isS4(test)) methods::as(test, "logical") else as.logical(test)
    >     n <- length(test)
    >     n.yes <- length(yes); n.no <- length(no)
    >     if (n.yes != n) {
    >         if (n.no == n) {  # swap yes <-> no
    >             test <- !test
    >             ans <- yes; yes <- no; no <- ans
    >             n.no <- n.yes
    >         } else yes <- yes[rep_len(seq_len(n.yes), n)]
    >     }
    >     ans <- yes
    >     if (n.no == 1L)
    >         ans[!test] <- no
    >     else
    >         ans[!test & !is.na(test)] <- no[
    >             if (n.no == n) !test & !is.na(test)
    >             else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
    >     stopifnot(length(NA.) == 1L)
    >     ans[is.na(test)] <- NA.
    >     ans
    > }

    > Note argument evaluation order: 'test', 'yes', 'no', 'NA.'.
    > First, it chooses the first of 'yes' and 'no' that has the same length as the result. If none of 'yes' and 'no' matches the length of the result, it chooses recycled (or truncated) 'yes'.
    > It uses 'rep' on the index and subsetting as a substitute for 'rep' on the value.
    > It requires 'length' method that is compatible with subsetting.
    > Additionally, it uses the same idea as dplyr::if_else, or more precisely the helper function 'replace_with'. It doesn't use 'rep' if the length of 'no' is 1 or is the same as the length of the result. For subassignment with value of length 1, recycling happens by itself and NA in index is OK.
    > It limits 'NA.' to be of length 1, considering 'NA.' just as a label for NA.

    > Cases where the last version above or 'ifelse2 or 'ifelseHW' in ifelse-def.R gives inappropriate answers:
    > - 'yes' and 'no' are "difftime" objects with different "units" attribute
    > - 'yes' and 'no' are "POSIXlt" objects with different time zone
    > Example: 'yes' in "UTC" and 'no' in "EST5EDT". The reverse, 'yes' in "EST5EDT" and 'no' in "UTC" gives error.

    > For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R gives a right answer for "POSIXlt" case.
    > ---------------------
    > Martin et al.,




    > On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at stat.math.ethz.ch
    >> wrote:

    >>
    >> Note that my premise was really to get *away* from inheriting
    >> too much from 'test'.
    >> Hence, I have *not* been talking about replacing ifelse() but
    >> rather of providing a new  ifelse2()
    >>
    >>         [ or if_else()  if Hadley was willing to ditch the dplyr one
    >>                         in favor of a base one]
    >>
    >>      > Specifically, based on an unrelated discussion with Henrik Bengtsson
    >> on
    >>      > Twitter, I wonder if preserving the recycling behavior test is
    >> longer than
    >>      > yes, no, but making the case where
    >>
    >>      > length( test ) < max(length( yes ), length( no ))
    >>
    >>      > would simplify usage for userRs in a useful way.
    >>

    > That was a copyediting bug on my part, it seems I hit send with my message
    > only half-edited/proofread. Apologies.

    > That should have said that making the case where test is the one that will
    > be recycled (because it is shorter than either yes or no) an error. My
    > claim is that the fact that test itself can be recycled, rather than just
    > yes or no, is confusing to many R users. If we are writing an ifelse2 we
    > might want to drop that feature and just throw an error in that case.
    > (Users could still use the original ifelse if they understand and
    > specifically want that behavior).

    > Does that make more sense?



    >>
    >>      > Also, If we combine a stricter contract that the output will always
    >> be of
    >>      > length with the suggestion of a specified output class
    >>
    >>
    > Here, again, I was talking about the restriction that the output be
    > guaranteed to be the length of test, regardless of the length of yes and
    > no. That, combined with a specific, guaranteed output class would make a
    > much narrower/more restricted but also (I argue) much easier to understand
    > function. Particularly for beginning and intermediate users.

    > I do hear what you're saying about silent conversion, though, so what I'm
    > describing might be a third function (ifelse3 for lack of a better name for
    > now), as you pointed out.


    >> that was not my intent here.... but would be another interesting
    >> extension. However, I would like to keep  R-semantic silent coercions
    >> such as
    >>            logical < integer < double < complex
    >>
    >> and your pseudo code below would not work so easily I think.
    >>
    >>      > the pseudo code could be
    >>
    >> (I'm changing assignment '=' to  '<-' ...  [please!] )
    >>
    >>      > ifelse2 <- function(test, yes, no, outclass) {
    >>      >   lenout  <- length(test)
    >>      >   out <- as( rep(yes, length.out <- lenout), outclass)
    >>      >   out[!test] <- as(rep(no, length.out = lenout)[!test], outclass)
    >>      >   # handle NA stuff
    >>      >   out
    >>      > }
    >>
    >>
    >>      > NAs could be tricky if outclass were allowed to be completely
    >> general, but
    >>      > doable, I think? Another approach  if we ARE fast-passing while
    >> leaving
    >>      > ifelse intact is that maybe NA's in test just aren't allowed in
    >> ifelse2.
    >>      > I'm not saying we should definitely do that, but it's possible and
    >> would
    >>      > make things faster.
    >>
    >>      > Finally, In terms of efficiency, with the stuff that Luke and I are
    >> working
    >>      > on, the NA detection could be virtually free in certain cases, which
    >> could
    >>      > give a nice boost for long vectors  that don't have any NAs (and
    >> 'know'
    >>      > that they don't).
    >>
    >> That *is* indeed a very promising prospect!
    >> Thank you in advance!
    >>
    >>      > Best,
    >>      > ~G
    >>
    >> I still am bit disappointed by the fact that it seems nobody has
    >> taken a good look at my ifelse2() proposal.
    >>

    > I plan to look at it soon. Thanks again for all your work.

    > ~G


    >>
    >> I really would like an alternative to ifelse() in *addition* to
    >> the current ifelse(), but hopefully in the future being used in
    >> quite a few places instead of ifelse()
    >> efficiency but for changed semantics, namely working for considerably
    >> more "vector like" classes of  'yes' and 'no'  than the current
    >> ifelse().
    >>
    >> As I said, the current proposal works for objects of class
    >>     "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
    >> and hopefully for "sparseVector" (in a next version of the 'Matrix' pkg).
    >>
    >> Martin
    >>
    >>      > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <
    >> maechler at stat.math.ethz.ch
    >>      >> wrote:
    >>
    >>      >> Finally getting back to this :
    >>      >>
    >>      >> >>>>> Hadley Wickham <h.wickham at gmail.com>
    >>      >> >>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:
    >>      >>
    >>      >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
    >>      >> > <h.wickham at gmail.com> wrote:
    >>      >> >>> >> One possibility would also be to consider a
    >>      >> >>> "numbers-only" or >> rather "same type"-only {e.g.,
    >>      >> >>> would also work for characters} >> version.
    >>      >> >>>
    >>      >> >>> > I don't know what you mean by these.
    >>      >> >>>
    >>      >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
    >>      >> >>> which is very relevant, thank you Bob!
    >>      >> >>>
    >>      >> >>> As I have found, that actually works in such a "same
    >>      >> >>> type"-only way: It does not try to coerce, but gives an
    >>      >> >>> error when the classes differ, even in this somewhat
    >>      >> >>> debatable case :
    >>      >> >>>
    >>      >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
    >>      >> >>> `false` has type 'double' not 'integer'
    >>      >> >>> >
    >>      >> >>>
    >>      >> >>> As documented, if_else() is clearly stricter than
    >>      >> >>> ifelse() and e.g., also does no recycling (but of
    >>      >> >>> length() 1).
    >>      >> >>
    >>      >> >> I agree that if_else() is currently too strict - it's
    >>      >> >> particularly annoying if you want to replace some values
    >>      >> >> with a missing:
    >>      >> >>
    >>      >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
    >>      >> >> has type 'integer' not 'logical'
    >>      >> >>
    >>      >> >> But I would like to make sure that this remains an error:
    >>      >> >>
    >>      >> >> if_else(x > 5, x, "BLAH")
    >>      >> >>
    >>      >> >> Because that seems more likely to be a user error (but
    >>      >> >> reasonable people might certainly believe that it should
    >>      >> >> just work)
    >>      >> >>
    >>      >> >> dplyr is more accommodating in other places (i.e. in
    >>      >> >> bind_rows(), collapse() and the joins) but it's
    >>      >> >> surprisingly hard to get all the details right. For
    >>      >> >> example, what should the result of this call be?
    >>      >> >>
    >>      >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
    >>      >> >> factor(c("c", "b"))
    >>      >> >>
    >>      >> >> Strictly speaking I think you could argue it's an error,
    >>      >> >> but that's not very user-friendly. Should it be a factor
    >>      >> >> with the union of the levels? Should it be a character
    >>      >> >> vector + warning? Should the behaviour change if one set
    >>      >> >> of levels is a subset of the other set?
    >>      >> >>
    >>      >> >> There are similar issues for POSIXct (if the time zones
    >>      >> >> are different, which should win?), and difftimes
    >>      >> >> (similarly for units).  Ideally you'd like the behaviour
    >>      >> >> to be extensible for new S3 classes, which suggests it
    >>      >> >> should be a generic (and for the most general case, it
    >>      >> >> would need to dispatch on both arguments).
    >>      >>
    >>      >> > One possible principle would be to use c() -
    >>      >> > i.e. construct out as
    >>      >>
    >>      >> > out <- c(yes[0], no[0]
    >>      >> > length(out) <- max(length(yes), length(no))
    >>      >>
    >>      >> yes; this would require that a  `length<-` method works for the
    >>      >> class of the result.
    >>      >>
    >>      >> Duncan Murdoch mentioned a version of this, in his very
    >>      >> first reply:
    >>      >>
    >>      >> ans <- c(yes, no)[seq_along(test)]
    >>      >> ans <- ans[seq_along(test)]
    >>      >>
    >>      >> which is less efficient for atomic vectors, but requires
    >>      >> less from the class: it "only" needs `c` and `[` to work
    >>      >>
    >>      >> and a mixture of your two proposals would be possible too:
    >>      >>
    >>      >> ans <- c(yes[0], no[0])
    >>      >> ans <- ans[seq_along(test)]
    >>      >>
    >>      >> which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
    >>      >> but that's a buglet in the  c.mpfr() implementation of my Rmpfr
    >>      >> package... (which has already been fixed in the development version
    >> on
    >>      >> R-forge,
    >>      >> https://r-forge.r-project.org/R/?group_id=386)
    >>      >>
    >>      >> > But of course that wouldn't help with factor responses.
    >>      >>
    >>      >> Yes.  However, a version of Duncan's suggestion -- of treating
    >> 'yes' first
    >>      >> -- does help in that case.
    >>      >>
    >>      >> For once, mainly as "feasability experiment",
    >>      >> I have created a github gist to make my current ifelse2() proposal
    >>      >> available
    >>      >> for commenting, cloning, pullrequesting, etc:
    >>      >>
    >>      >> Consisting of 2 files
    >>      >> - ifelse-def.R :  Functions definitions only, basically all the
    >> current
    >>      >> proposals, called  ifelse*()
    >>      >> - ifelse-checks.R : A simplistic checking function
    >>      >> and examples calling it, notably demonstrating that my
    >>      >> ifelse2()  does work with
    >>      >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
    >>      >> and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")
    >>      >>
    >>      >> Also if you are not on github, you can quickly get to the ifelse2()
    >>      >> definition :
    >>      >>
    >>      >> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
    >>      >> 94#file-ifelse-def-r-L168
    >>      >>
    >>      >> > Also, if you're considering an improved ifelse(), I'd
    >>      >> > strongly urge you to consider adding an `na` argument,
    >>      >>
    >>      >> I now did (called it 'NA.').
    >>      >>
    >>      >> > so that you can use ifelse() to transform all three
    >>      >> > possible values in a logical vector.
    >>      >>
    >>      >> > Hadley
    >>      >> > -- http://hadley.nz
    >>      >>
    >>      >> For those who really hate GH (and don't want or cannot easily
    >> follow the
    >>      >> above URL), here's my current definition:
    >>      >>
    >>      >>
    >>      >> ##' Martin Maechler, 14. Nov 2016 --- taking into account Duncan M.
    >> and
    >>      >> Hadley's
    >>      >> ##' ideas in the R-devel thread starting at (my mom's 86th
    >> birthday):
    >>      >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
    >>      >> ifelse2 <- function (test, yes, no, NA. = NA) {
    >>      >> if(!is.logical(test)) {
    >>      >> if(is.atomic(test))
    >>      >> storage.mode(test) <- "logical"
    >>      >> else ## typically a "class"; storage.mode<-() typically fails
    >>      >> test <- if(isS4(test)) methods::as(test, "logical") else
    >>      >> as.logical(test)
    >>      >> }
    >>      >>
    >>      >> ## No longer optimize the  "if (a) x else y"  cases:
    >>      >> ## Only "non-good" R users use ifelse(.) instead of if(.) in these
    >>      >> cases.
    >>      >>
    >>      >> ans <-
    >>      >> tryCatch(rep(if(is.object(yes) && identical(class(yes), class(no)))
    >>      >> ## as c(o) or o[0] may not work for the class
    >>      >> yes else c(yes[0], no[0]), length.out =
    >>      >> length(test)),
    >>      >> error = function(e) { ## try asymmetric, yes-leaning
    >>      >> r <- yes
    >>      >> r[!test] <- no[!test]
    >>      >> r
    >>      >> })
    >>      >> ok <- !(nas <- is.na(test))
    >>      >> if (any(test[ok]))
    >>      >> ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
    >>      >> if (any(!test[ok]))
    >>      >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
    >>      >> ans[nas] <- NA. # possibly coerced to class(ans)
    >>      >> ans
    >>      >> }
    >>      >>
    >>      >> ______________________________________________
    >>      >> R-devel at r-project.org mailing list
    >>      >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>      >>
    >>
    >>
    >>
    >>      > --
    >>      > Gabriel Becker, PhD
    >>      > Associate Scientist (Bioinformatics)
    >>      > Genentech Research
    >>
    >>      > [[alternative HTML version deleted]]
    >>
    >>


    > --
    > Gabriel Becker, PhD
    > Associate Scientist (Bioinformatics)
    > Genentech Research

    >     [[alternative HTML version deleted]]

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

Gabriel Becker
Well, that's embarrassing. Sorry for the noise on that front, everyone. I
misunderstood something from the aforementioned unrelated conversation I
was having, but not double checking is on me (I rarely use if else and when
I do I avoid that situation in my own code, which is why I didn't already
know this)


I'd still argue that situation should at least warn, possibly error, as it
seems indicative of a bug in the user's code.

On Mon, Nov 28, 2016 at 7:00 AM, Martin Maechler <[hidden email]
> wrote:

> >>>>> Suharto Anggono Suharto Anggono via R-devel <[hidden email]>
> >>>>>     on Sat, 26 Nov 2016 17:14:01 +0000 writes:
>
>     > Just stating, in 'ifelse', 'test' is not recycled. As I said in
> "R-intro: length of 'ifelse' result" (https://stat.ethz.ch/
> pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b)
> returns a vector of the length of 'condition', even if 'a' or 'b' is longer.
>
> yes and  ?ifelse (the help page) also does not say that test is
> recycled, rather
>
>    >>  If \code{yes} or \code{no} are too short, their elements are
> recycled.
>
> (*and* the problem you wrote the above has been corrected in the
>  R-intro manual shortly after).
>
>
>     > On current 'ifelse' code in R:
>
>     > * The part
>     > ans[nas] <- NA
>     > could be omitted because NA's are already in place.
>     > If the part is removed, variable 'nas' is no longer used.
>
> I agree that this seems logical.  If I apply the change, R's own
> full checks do not seem affected, and I may try to commit that
> change and "wait and see".
>
>
>     > * The any(*) part actually checks the thing that is used as the
> index vector. The index vector could be stored and then repeatedly used,
> like the following.
>
>     >     if (any(sel <- test & ok))
>     >     ans[sel] <- rep(yes, length.out = length(ans))[sel]
>
> yes, I know, and have had similar thoughts in the past.
> However note  (I know you that) the current code is
>
>     if (any(test[ok]))
>         ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
>
> and   any(test[ok])          may be considerably faster than
>       any(sel <- test & ok)
>
> OTOH I think the current code would only be faster (for the
> above) when any(.) returned FALSE ...
> I think it may depend on the typical use cases which of the two
> versions is more efficient.
>
>
>     > * If 'test' is a factor, doing
>     > storage.mode(test) <- "logical"
>     > is not appropriate, but is.atomic(test) returns TRUE. Maybe use
>     > if(!is.object(test))
>     >    instead of
>     > if(is.atomic(test)) .
>
> This would be a considerable change I think...
> Note that I'm currently really proposing to introduce an *additional*
> ifelse function with different "more reasonable" semantic,  and
> your last change would do that too.
>
> My alternative should really work
> - for factors
> - for "array"s including "matrix" (as the current ifelse() does!)
> - for "Date", "POSIXct", "ts"(timeseries), "zoo",
>       "sparseVector", "sparseMatrix" (*), or "mpfr",
>   without any special code, but rather by design.
>
>  *) Currently needs the R-forge version of  Matrix, version 1.2-8.
>
> A bit less than an hour ago, I have updated the gist with an updated
> proposal ifelse2() {and the current alternatives that I know},
> modified so it *does* keep more, e.g.  dim() attributes in
> reasonable cases.
>
> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
> 94#file-ifelse-def-r-L168
>
> Hence my ifelse2() became even a bit longer (but not slower)
> working for even more classes of  "yes" and "no".
>
>
>     > On ifelse-checks.R:
>     > * In function 'chkIfelse', if the fourth function argument names is
> not "NA.", the argument name is changed, but the function body still uses
> the old name. That makes error in chkIfelse(ifelseHW) .
>     > A fix:
>     >         if(names(formals(FUN))[[4]] != "NA.") {
>     >             body(FUN) <- do.call(substitute, list(body(FUN),
>     >                 setNames(list(quote(NA.)),
> names(formals(FUN))[[4]])))
>     >             names(formals(FUN))[[4]] <- "NA."
>     >         }
>
> yes, thank you!  (a bit embarrassing for me ..)
>
>     > After fixing, chkIfelse(ifelseHW) just fails at identical(iflt,
> as.POSIXlt(ifct)) .
>     > 'iflt' has NA as 'tzone' and 'isdst' components.
>     > * Because function 'chkIfelse' continues checking after failure,
>     > as.POSIXlt(ifct)
>     > may give error. The error happens, for example, in
> chkIfelse(ifelseR) . Maybe place it inside try(...).
>     > * If 'lt' is a "POSIXlt" object, (lt-100) is a "POSIXct" object.
>     > So,
>     > FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
>     > is an example of mixed class.
>
> good; thank you for the hint.
>
>     > * The part of function 'chkIfelse' in
>     > for(i in seq_len(nFact))
>     > uses 'NA.' function argument. That makes error when 'chkIfelse' is
> applied to function without fourth argument.
>     > The part should be wrapped in
>     > if(has.4th) .
> yes of course
>
>     > * Function 'ifelseJH' has fourth argument, but the argument is not
> for value if NA. So, instead of
>     > chkIfelse(ifelseJH) ,
>     > maybe call
>     > chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .
> You are right;
> I've decided to solve this differently.
>
> I'm looking at these suggestions now, notably also your proposals below;
> thank you, Suharto!
>
> (I wanted to put my improved 'ifelse2' out first, quickly).
> Martin
>
>
>     > A concrete version of 'ifelse2' that starts the result from 'yes':
>     > function(test, yes, no, NA. = NA) {
>     >     if(!is.logical(test))
>     >         test <- if(isS4(test)) methods::as(test, "logical") else
> as.logical(test)
>     >     n <- length(test)
>     >     ans <- rep(yes, length.out = n)
>     >     ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & !
> is.na(test)]
>     >     ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
>     >     ans
>     > }
>
>     > It requires 'rep' method that is compatible with subsetting. It also
> works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an
> appropriate result if time zones are the same.
>     > For coercion of 'test', there is no need of keeping attributes. So,
> it doesn't do
>     > storage.mode(test) <- "logical"
>     > and goes directly to 'as.logical'.
>     > It relies on subassignment for silent coercions of
>     > logical < integer < double < complex .
>     > Unlike 'ifelse', it never skips any subassignment. So, phenomenon as
> in "example of different return modes" in ?ifelse doesn't happen.
>
>     > Another version, for keeping attributes as pointed out by Duncan
> Murdoch:
>     > function(test, yes, no, NA. = NA) {
>     >     if(!is.logical(test))
>     >         test <- if(isS4(test)) methods::as(test, "logical") else
> as.logical(test)
>     >     n <- length(test)
>     >     n.yes <- length(yes); n.no <- length(no)
>     >     if (n.yes != n) {
>     >         if (n.no == n) {  # swap yes <-> no
>     >             test <- !test
>     >             ans <- yes; yes <- no; no <- ans
>     >             n.no <- n.yes
>     >         } else yes <- yes[rep_len(seq_len(n.yes), n)]
>     >     }
>     >     ans <- yes
>     >     if (n.no == 1L)
>     >         ans[!test] <- no
>     >     else
>     >         ans[!test & !is.na(test)] <- no[
>     >             if (n.no == n) !test & !is.na(test)
>     >             else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
>     >     stopifnot(length(NA.) == 1L)
>     >     ans[is.na(test)] <- NA.
>     >     ans
>     > }
>
>     > Note argument evaluation order: 'test', 'yes', 'no', 'NA.'.
>     > First, it chooses the first of 'yes' and 'no' that has the same
> length as the result. If none of 'yes' and 'no' matches the length of the
> result, it chooses recycled (or truncated) 'yes'.
>     > It uses 'rep' on the index and subsetting as a substitute for 'rep'
> on the value.
>     > It requires 'length' method that is compatible with subsetting.
>     > Additionally, it uses the same idea as dplyr::if_else, or more
> precisely the helper function 'replace_with'. It doesn't use 'rep' if the
> length of 'no' is 1 or is the same as the length of the result. For
> subassignment with value of length 1, recycling happens by itself and NA in
> index is OK.
>     > It limits 'NA.' to be of length 1, considering 'NA.' just as a label
> for NA.
>
>     > Cases where the last version above or 'ifelse2 or 'ifelseHW' in
> ifelse-def.R gives inappropriate answers:
>     > - 'yes' and 'no' are "difftime" objects with different "units"
> attribute
>     > - 'yes' and 'no' are "POSIXlt" objects with different time zone
>     > Example: 'yes' in "UTC" and 'no' in "EST5EDT". The reverse, 'yes' in
> "EST5EDT" and 'no' in "UTC" gives error.
>
>     > For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R
> gives a right answer for "POSIXlt" case.
>     > ---------------------
>     > Martin et al.,
>
>
>
>
>     > On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at
> stat.math.ethz.ch
>     >> wrote:
>
>     >>
>     >> Note that my premise was really to get *away* from inheriting
>     >> too much from 'test'.
>     >> Hence, I have *not* been talking about replacing ifelse() but
>     >> rather of providing a new  ifelse2()
>     >>
>     >>         [ or if_else()  if Hadley was willing to ditch the dplyr one
>     >>                         in favor of a base one]
>     >>
>     >>      > Specifically, based on an unrelated discussion with Henrik
> Bengtsson
>     >> on
>     >>      > Twitter, I wonder if preserving the recycling behavior test
> is
>     >> longer than
>     >>      > yes, no, but making the case where
>     >>
>     >>      > length( test ) < max(length( yes ), length( no ))
>     >>
>     >>      > would simplify usage for userRs in a useful way.
>     >>
>
>     > That was a copyediting bug on my part, it seems I hit send with my
> message
>     > only half-edited/proofread. Apologies.
>
>     > That should have said that making the case where test is the one
> that will
>     > be recycled (because it is shorter than either yes or no) an error.
> My
>     > claim is that the fact that test itself can be recycled, rather than
> just
>     > yes or no, is confusing to many R users. If we are writing an
> ifelse2 we
>     > might want to drop that feature and just throw an error in that case.
>     > (Users could still use the original ifelse if they understand and
>     > specifically want that behavior).
>
>     > Does that make more sense?
>
>
>
>     >>
>     >>      > Also, If we combine a stricter contract that the output will
> always
>     >> be of
>     >>      > length with the suggestion of a specified output class
>     >>
>     >>
>     > Here, again, I was talking about the restriction that the output be
>     > guaranteed to be the length of test, regardless of the length of yes
> and
>     > no. That, combined with a specific, guaranteed output class would
> make a
>     > much narrower/more restricted but also (I argue) much easier to
> understand
>     > function. Particularly for beginning and intermediate users.
>
>     > I do hear what you're saying about silent conversion, though, so
> what I'm
>     > describing might be a third function (ifelse3 for lack of a better
> name for
>     > now), as you pointed out.
>
>
>     >> that was not my intent here.... but would be another interesting
>     >> extension. However, I would like to keep  R-semantic silent
> coercions
>     >> such as
>     >>            logical < integer < double < complex
>     >>
>     >> and your pseudo code below would not work so easily I think.
>     >>
>     >>      > the pseudo code could be
>     >>
>     >> (I'm changing assignment '=' to  '<-' ...  [please!] )
>     >>
>     >>      > ifelse2 <- function(test, yes, no, outclass) {
>     >>      >   lenout  <- length(test)
>     >>      >   out <- as( rep(yes, length.out <- lenout), outclass)
>     >>      >   out[!test] <- as(rep(no, length.out = lenout)[!test],
> outclass)
>     >>      >   # handle NA stuff
>     >>      >   out
>     >>      > }
>     >>
>     >>
>     >>      > NAs could be tricky if outclass were allowed to be completely
>     >> general, but
>     >>      > doable, I think? Another approach  if we ARE fast-passing
> while
>     >> leaving
>     >>      > ifelse intact is that maybe NA's in test just aren't allowed
> in
>     >> ifelse2.
>     >>      > I'm not saying we should definitely do that, but it's
> possible and
>     >> would
>     >>      > make things faster.
>     >>
>     >>      > Finally, In terms of efficiency, with the stuff that Luke
> and I are
>     >> working
>     >>      > on, the NA detection could be virtually free in certain
> cases, which
>     >> could
>     >>      > give a nice boost for long vectors  that don't have any NAs
> (and
>     >> 'know'
>     >>      > that they don't).
>     >>
>     >> That *is* indeed a very promising prospect!
>     >> Thank you in advance!
>     >>
>     >>      > Best,
>     >>      > ~G
>     >>
>     >> I still am bit disappointed by the fact that it seems nobody has
>     >> taken a good look at my ifelse2() proposal.
>     >>
>
>     > I plan to look at it soon. Thanks again for all your work.
>
>     > ~G
>
>
>     >>
>     >> I really would like an alternative to ifelse() in *addition* to
>     >> the current ifelse(), but hopefully in the future being used in
>     >> quite a few places instead of ifelse()
>     >> efficiency but for changed semantics, namely working for
> considerably
>     >> more "vector like" classes of  'yes' and 'no'  than the current
>     >> ifelse().
>     >>
>     >> As I said, the current proposal works for objects of class
>     >>     "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
>     >> and hopefully for "sparseVector" (in a next version of the 'Matrix'
> pkg).
>     >>
>     >> Martin
>     >>
>     >>      > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <
>     >> maechler at stat.math.ethz.ch
>     >>      >> wrote:
>     >>
>     >>      >> Finally getting back to this :
>     >>      >>
>     >>      >> >>>>> Hadley Wickham <h.wickham at gmail.com>
>     >>      >> >>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:
>     >>      >>
>     >>      >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
>     >>      >> > <h.wickham at gmail.com> wrote:
>     >>      >> >>> >> One possibility would also be to consider a
>     >>      >> >>> "numbers-only" or >> rather "same type"-only {e.g.,
>     >>      >> >>> would also work for characters} >> version.
>     >>      >> >>>
>     >>      >> >>> > I don't know what you mean by these.
>     >>      >> >>>
>     >>      >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
>     >>      >> >>> which is very relevant, thank you Bob!
>     >>      >> >>>
>     >>      >> >>> As I have found, that actually works in such a "same
>     >>      >> >>> type"-only way: It does not try to coerce, but gives an
>     >>      >> >>> error when the classes differ, even in this somewhat
>     >>      >> >>> debatable case :
>     >>      >> >>>
>     >>      >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
>     >>      >> >>> `false` has type 'double' not 'integer'
>     >>      >> >>> >
>     >>      >> >>>
>     >>      >> >>> As documented, if_else() is clearly stricter than
>     >>      >> >>> ifelse() and e.g., also does no recycling (but of
>     >>      >> >>> length() 1).
>     >>      >> >>
>     >>      >> >> I agree that if_else() is currently too strict - it's
>     >>      >> >> particularly annoying if you want to replace some values
>     >>      >> >> with a missing:
>     >>      >> >>
>     >>      >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
>     >>      >> >> has type 'integer' not 'logical'
>     >>      >> >>
>     >>      >> >> But I would like to make sure that this remains an error:
>     >>      >> >>
>     >>      >> >> if_else(x > 5, x, "BLAH")
>     >>      >> >>
>     >>      >> >> Because that seems more likely to be a user error (but
>     >>      >> >> reasonable people might certainly believe that it should
>     >>      >> >> just work)
>     >>      >> >>
>     >>      >> >> dplyr is more accommodating in other places (i.e. in
>     >>      >> >> bind_rows(), collapse() and the joins) but it's
>     >>      >> >> surprisingly hard to get all the details right. For
>     >>      >> >> example, what should the result of this call be?
>     >>      >> >>
>     >>      >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
>     >>      >> >> factor(c("c", "b"))
>     >>      >> >>
>     >>      >> >> Strictly speaking I think you could argue it's an error,
>     >>      >> >> but that's not very user-friendly. Should it be a factor
>     >>      >> >> with the union of the levels? Should it be a character
>     >>      >> >> vector + warning? Should the behaviour change if one set
>     >>      >> >> of levels is a subset of the other set?
>     >>      >> >>
>     >>      >> >> There are similar issues for POSIXct (if the time zones
>     >>      >> >> are different, which should win?), and difftimes
>     >>      >> >> (similarly for units).  Ideally you'd like the behaviour
>     >>      >> >> to be extensible for new S3 classes, which suggests it
>     >>      >> >> should be a generic (and for the most general case, it
>     >>      >> >> would need to dispatch on both arguments).
>     >>      >>
>     >>      >> > One possible principle would be to use c() -
>     >>      >> > i.e. construct out as
>     >>      >>
>     >>      >> > out <- c(yes[0], no[0]
>     >>      >> > length(out) <- max(length(yes), length(no))
>     >>      >>
>     >>      >> yes; this would require that a  `length<-` method works for
> the
>     >>      >> class of the result.
>     >>      >>
>     >>      >> Duncan Murdoch mentioned a version of this, in his very
>     >>      >> first reply:
>     >>      >>
>     >>      >> ans <- c(yes, no)[seq_along(test)]
>     >>      >> ans <- ans[seq_along(test)]
>     >>      >>
>     >>      >> which is less efficient for atomic vectors, but requires
>     >>      >> less from the class: it "only" needs `c` and `[` to work
>     >>      >>
>     >>      >> and a mixture of your two proposals would be possible too:
>     >>      >>
>     >>      >> ans <- c(yes[0], no[0])
>     >>      >> ans <- ans[seq_along(test)]
>     >>      >>
>     >>      >> which does *not* work for my "mpfr" numbers (CRAN package
> 'Rmpfr'),
>     >>      >> but that's a buglet in the  c.mpfr() implementation of my
> Rmpfr
>     >>      >> package... (which has already been fixed in the development
> version
>     >> on
>     >>      >> R-forge,
>     >>      >> https://r-forge.r-project.org/R/?group_id=386)
>     >>      >>
>     >>      >> > But of course that wouldn't help with factor responses.
>     >>      >>
>     >>      >> Yes.  However, a version of Duncan's suggestion -- of
> treating
>     >> 'yes' first
>     >>      >> -- does help in that case.
>     >>      >>
>     >>      >> For once, mainly as "feasability experiment",
>     >>      >> I have created a github gist to make my current ifelse2()
> proposal
>     >>      >> available
>     >>      >> for commenting, cloning, pullrequesting, etc:
>     >>      >>
>     >>      >> Consisting of 2 files
>     >>      >> - ifelse-def.R :  Functions definitions only, basically all
> the
>     >> current
>     >>      >> proposals, called  ifelse*()
>     >>      >> - ifelse-checks.R : A simplistic checking function
>     >>      >> and examples calling it, notably demonstrating that my
>     >>      >> ifelse2()  does work with
>     >>      >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
>     >>      >> and "mpfr" (the arbitrary-precision numbers in my package
> "Rmpfr")
>     >>      >>
>     >>      >> Also if you are not on github, you can quickly get to the
> ifelse2()
>     >>      >> definition :
>     >>      >>
>     >>      >> https://gist.github.com/mmaechler/
> 9cfc3219c4b89649313bfe6853d878
>     >>      >> 94#file-ifelse-def-r-L168
>     >>      >>
>     >>      >> > Also, if you're considering an improved ifelse(), I'd
>     >>      >> > strongly urge you to consider adding an `na` argument,
>     >>      >>
>     >>      >> I now did (called it 'NA.').
>     >>      >>
>     >>      >> > so that you can use ifelse() to transform all three
>     >>      >> > possible values in a logical vector.
>     >>      >>
>     >>      >> > Hadley
>     >>      >> > -- http://hadley.nz
>     >>      >>
>     >>      >> For those who really hate GH (and don't want or cannot
> easily
>     >> follow the
>     >>      >> above URL), here's my current definition:
>     >>      >>
>     >>      >>
>     >>      >> ##' Martin Maechler, 14. Nov 2016 --- taking into account
> Duncan M.
>     >> and
>     >>      >> Hadley's
>     >>      >> ##' ideas in the R-devel thread starting at (my mom's 86th
>     >> birthday):
>     >>      >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/
> 072970.html
>     >>      >> ifelse2 <- function (test, yes, no, NA. = NA) {
>     >>      >> if(!is.logical(test)) {
>     >>      >> if(is.atomic(test))
>     >>      >> storage.mode(test) <- "logical"
>     >>      >> else ## typically a "class"; storage.mode<-() typically
> fails
>     >>      >> test <- if(isS4(test)) methods::as(test, "logical") else
>     >>      >> as.logical(test)
>     >>      >> }
>     >>      >>
>     >>      >> ## No longer optimize the  "if (a) x else y"  cases:
>     >>      >> ## Only "non-good" R users use ifelse(.) instead of if(.)
> in these
>     >>      >> cases.
>     >>      >>
>     >>      >> ans <-
>     >>      >> tryCatch(rep(if(is.object(yes) && identical(class(yes),
> class(no)))
>     >>      >> ## as c(o) or o[0] may not work for the class
>     >>      >> yes else c(yes[0], no[0]), length.out =
>     >>      >> length(test)),
>     >>      >> error = function(e) { ## try asymmetric, yes-leaning
>     >>      >> r <- yes
>     >>      >> r[!test] <- no[!test]
>     >>      >> r
>     >>      >> })
>     >>      >> ok <- !(nas <- is.na(test))
>     >>      >> if (any(test[ok]))
>     >>      >> ans[test & ok] <- rep(yes, length.out = length(ans))[test &
> ok]
>     >>      >> if (any(!test[ok]))
>     >>      >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test
> & ok]
>     >>      >> ans[nas] <- NA. # possibly coerced to class(ans)
>     >>      >> ans
>     >>      >> }
>     >>      >>
>     >>      >> ______________________________________________
>     >>      >> R-devel at r-project.org mailing list
>     >>      >> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>      >>
>     >>
>     >>
>     >>
>     >>      > --
>     >>      > Gabriel Becker, PhD
>     >>      > Associate Scientist (Bioinformatics)
>     >>      > Genentech Research
>     >>
>     >>      > [[alternative HTML version deleted]]
>     >>
>     >>
>
>
>     > --
>     > Gabriel Becker, PhD
>     > Associate Scientist (Bioinformatics)
>     > Genentech Research
>
>     >     [[alternative HTML version deleted]]
>
>     > ______________________________________________
>     > [hidden email] mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ifelse() woes ... can we agree on a ifelse2() ?

R devel mailing list
In reply to this post by Duncan Murdoch-2
Interspersed below.
--------------------------------------------


 Subject: Re: ifelse() woes ... can we agree on a ifelse2() ?
 To: [hidden email]
 Date: Sunday, 27 November, 2016, 12:14 AM
 
On current 'ifelse' code in R:
...
* If 'test' is a factor, doing
storage.mode(test) <- "logical"
is not appropriate, but is.atomic(test) returns TRUE. Maybe use
if(!is.object(test))
instead of
if(is.atomic(test)) .
===================================
I now see that, for 'test' that is atomic and has "class" attribute, with current 'ifelse' code, changing
if(is.atomic(test))
to
if(!is.object(test))
removes class of 'test' and makes the result doesn't have class of 'test', which is not according to the documentation. The documentation of 'ifelse' says that the value is "A vector of the same length and attributes (including dimensions and "class") as 'test' ...".
===================================


function(test, yes, no, NA. = NA) {
    if(!is.logical(test))
        test <- if(isS4(test)) methods::as(test, "logical") else as.logical(test)
    n <- length(test)
    n.yes <- length(yes); n.no <- length(no)
    if (n.yes != n) {
        if (n.no == n) {  # swap yes <-> no
            test <- !test
            ans <- yes; yes <- no; no <- ans
            n.no <- n.yes
        } else yes <- yes[rep_len(seq_len(n.yes), n)]
    }
    ans <- yes
    if (n.no == 1L)
        ans[!test] <- no
    else
        ans[!test & !is.na(test)] <- no[
            if (n.no == n) !test & !is.na(test)
            else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
    stopifnot(length(NA.) == 1L)
    ans[is.na(test)] <- NA.
    ans
}

===================================
For data frame, indexing by logical matrix is different from indexing by logical vector.
Because there is an example like that, I think that it is better to remove
if(!is.logical(test))
in the function definition above, making 'as.logical' also applied to 'test' of mode "logical", stripping attributes. Doing so makes sure that 'test' is a plain logical vector, so that indexing is compatible with 'length'.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel