sum() vs cumsum() implicit type coercion

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

sum() vs cumsum() implicit type coercion

Rory Winston
Hi

I noticed a small inconsistency when using sum() vs cumsum()

 I have a char-based series

 > tryjpy$long

 [1] "0.0022"  "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022"

 [8] "0.0003"  "-0.0001" "-0.0004" "-0.0036" "-0.001"  "-0.0011" "-0.0012"

[15] "-0.0006" "0.0016"  "0.0006"

When I run sum() vs cumsum() , sum fails but cumsum converts the
series to numeric before summing:

> sum(tryjpy$long)
Error in sum(tryjpy$long) : invalid 'type' (character) of argument

> cumsum(tryjpy$long)
 [1]  0.0022  0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759
[10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816

Which I guess is due to the following line in do_cum():

PROTECT(t = coerceVector(CAR(args), REALSXP));

This might be fine and there may be very good reasons why there is no
coercion in sum - just seems a little inconsistent in usage

Cheers
-- Rory

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: sum() vs cumsum() implicit type coercion

Tomas Kalibera
On 8/23/20 5:02 PM, Rory Winston wrote:

> Hi
>
> I noticed a small inconsistency when using sum() vs cumsum()
>
>   I have a char-based series
>
>   > tryjpy$long
>
>   [1] "0.0022"  "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022"
>
>   [8] "0.0003"  "-0.0001" "-0.0004" "-0.0036" "-0.001"  "-0.0011" "-0.0012"
>
> [15] "-0.0006" "0.0016"  "0.0006"
>
> When I run sum() vs cumsum() , sum fails but cumsum converts the
> series to numeric before summing:
>
>> sum(tryjpy$long)
> Error in sum(tryjpy$long) : invalid 'type' (character) of argument
>
>> cumsum(tryjpy$long)
>   [1]  0.0022  0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759
> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816
>
> Which I guess is due to the following line in do_cum():
>
> PROTECT(t = coerceVector(CAR(args), REALSXP));
> This might be fine and there may be very good reasons why there is no
> coercion in sum - just seems a little inconsistent in usage

Yes. I don't know the reason for this design, but please note it is
documented in ?sum and in ?cumsum, which would also make it harder to
change. One can always use a consistent subset (not rely on the coercion
e.g. from characters).

Best
Tomas

>
> Cheers
> -- Rory
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: sum() vs cumsum() implicit type coercion

Martin Maechler
>>>>> Tomas Kalibera
>>>>>     on Tue, 25 Aug 2020 09:29:05 +0200 writes:

    > On 8/23/20 5:02 PM, Rory Winston wrote:
    >> Hi
    >>
    >> I noticed a small inconsistency when using sum() vs cumsum()
    >>
    >> I have a char-based series
    >>
    >> > tryjpy$long
    >>
    >> [1] "0.0022"  "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022"
    >>
    >> [8] "0.0003"  "-0.0001" "-0.0004" "-0.0036" "-0.001"  "-0.0011" "-0.0012"
    >>
    >> [15] "-0.0006" "0.0016"  "0.0006"
    >>
    >> When I run sum() vs cumsum() , sum fails but cumsum converts the
    >> series to numeric before summing:
    >>
    >>> sum(tryjpy$long)
    >> Error in sum(tryjpy$long) : invalid 'type' (character) of argument
    >>
    >>> cumsum(tryjpy$long)
    >> [1]  0.0022  0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759
    >> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816
    >>
    >> Which I guess is due to the following line in do_cum():
    >>
    >> PROTECT(t = coerceVector(CAR(args), REALSXP));
    >> This might be fine and there may be very good reasons why there is no
    >> coercion in sum - just seems a little inconsistent in usage

    > Yes. I don't know the reason for this design, but please note it is
    > documented in ?sum and in ?cumsum, which would also make it harder to
    > change. One can always use a consistent subset (not rely on the coercion
    > e.g. from characters).

    > Best
    > Tomas

Indeed.
Further note that most arithmetic/math  *fails* on
character vectors, so if a change would have to be made, it
should rather be such that cumsum() also rejects character
input.

We would have consistency then, but potentially break user code,
even package code which has hitherto assumed cumsum() to coerce
to numeric first.

If a majority of commentators and R core thinks we should make
such a change, I'd agree to consider it.

Otherwise, we save (ourselves and others) a bit of time.
Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: sum() vs cumsum() implicit type coercion

Hugh Parsonage
(If I may be so bold, although I think it's unlikely that a majority
would be in favour of this change, and I doubt anyone is actually
proposing it, I think quite a bit more than "a majority" should be
required before a change like this should be allowed.

Considering the feature that cumsum coerces to numeric is documented,
the consistency of type coercion between sum and cumsum has never been
advertised, and that a custom version of cumsum that addresses the
inconsistency would be very easy for users to create themselves, I'd
struggle to think the change could ever have merit. Even public
unanimity would probably not be enough.)

On Tue, 25 Aug 2020 at 20:25, Martin Maechler
<[hidden email]> wrote:

>
> >>>>> Tomas Kalibera
> >>>>>     on Tue, 25 Aug 2020 09:29:05 +0200 writes:
>
>     > On 8/23/20 5:02 PM, Rory Winston wrote:
>     >> Hi
>     >>
>     >> I noticed a small inconsistency when using sum() vs cumsum()
>     >>
>     >> I have a char-based series
>     >>
>     >> > tryjpy$long
>     >>
>     >> [1] "0.0022"  "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022"
>     >>
>     >> [8] "0.0003"  "-0.0001" "-0.0004" "-0.0036" "-0.001"  "-0.0011" "-0.0012"
>     >>
>     >> [15] "-0.0006" "0.0016"  "0.0006"
>     >>
>     >> When I run sum() vs cumsum() , sum fails but cumsum converts the
>     >> series to numeric before summing:
>     >>
>     >>> sum(tryjpy$long)
>     >> Error in sum(tryjpy$long) : invalid 'type' (character) of argument
>     >>
>     >>> cumsum(tryjpy$long)
>     >> [1]  0.0022  0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759
>     >> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816
>     >>
>     >> Which I guess is due to the following line in do_cum():
>     >>
>     >> PROTECT(t = coerceVector(CAR(args), REALSXP));
>     >> This might be fine and there may be very good reasons why there is no
>     >> coercion in sum - just seems a little inconsistent in usage
>
>     > Yes. I don't know the reason for this design, but please note it is
>     > documented in ?sum and in ?cumsum, which would also make it harder to
>     > change. One can always use a consistent subset (not rely on the coercion
>     > e.g. from characters).
>
>     > Best
>     > Tomas
>
> Indeed.
> Further note that most arithmetic/math  *fails* on
> character vectors, so if a change would have to be made, it
> should rather be such that cumsum() also rejects character
> input.
>
> We would have consistency then, but potentially break user code,
> even package code which has hitherto assumed cumsum() to coerce
> to numeric first.
>
> If a majority of commentators and R core thinks we should make
> such a change, I'd agree to consider it.
>
> Otherwise, we save (ourselves and others) a bit of time.
> Martin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel