Quantcast

There is pmin and pmax each taking na.rm, how about psum?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

There is pmin and pmax each taking na.rm, how about psum?

Matthew Dowle

Hi,

Please consider the following :

x = c(1,3,NA,5)
y = c(2,NA,4,1)

min(x,y,na.rm=TRUE)    # ok
[1] 1
max(x,y,na.rm=TRUE)    # ok
[1] 5
sum(x,y,na.rm=TRUE)    # ok
[1] 16

pmin(x,y,na.rm=TRUE)   # ok
[1] 1 3 4 1
pmax(x,y,na.rm=TRUE)   # ok
[1] 2 3 4 5
psum(x,y,na.rm=TRUE)
[1] 3 3 4 6                             # expected result
Error: could not find function "psum"   # actual result

I realise that + is already like psum, but what about NA?

x+y
[1]  3 NA NA  6        # can't supply `na.rm=TRUE` to `+`

Is there a case to add psum? Or have I missed something.

This question survived when I asked on Stack Overflow :
http://stackoverflow.com/questions/13123638/there-is-pmin-and-pmax-each-taking-na-rm-why-no-psum

And a search of the archives found that has Gabor has suggested it too as
an aside :
http://r.789695.n4.nabble.com/How-to-do-it-without-for-loops-tp794745p794750.html

If someone from R core is willing to sponsor the idea, I am willing to
write, test and submit the code for psum. Implemented in a very similar
fashion to pmin and pmax.  Or perhaps it exists already in a package
somewhere (I searched but didn't find it).

Matthew

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

ONKELINX, Thierry
Why don't you make a matrix and use colSums or rowSums?

x = c(1,3,NA,5)
y = c(2,NA,4,1)
colSums(rbind(x, y), na.rm = TRUE)


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
[hidden email]
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey


-----Oorspronkelijk bericht-----
Van: [hidden email] [mailto:[hidden email]] Namens Matthew Dowle
Verzonden: dinsdag 30 oktober 2012 12:03
Aan: [hidden email]
Onderwerp: [Rd] There is pmin and pmax each taking na.rm, how about psum?


Hi,

Please consider the following :

x = c(1,3,NA,5)
y = c(2,NA,4,1)

min(x,y,na.rm=TRUE)    # ok
[1] 1
max(x,y,na.rm=TRUE)    # ok
[1] 5
sum(x,y,na.rm=TRUE)    # ok
[1] 16

pmin(x,y,na.rm=TRUE)   # ok
[1] 1 3 4 1
pmax(x,y,na.rm=TRUE)   # ok
[1] 2 3 4 5
psum(x,y,na.rm=TRUE)
[1] 3 3 4 6                             # expected result
Error: could not find function "psum"   # actual result

I realise that + is already like psum, but what about NA?

x+y
[1]  3 NA NA  6        # can't supply `na.rm=TRUE` to `+`

Is there a case to add psum? Or have I missed something.

This question survived when I asked on Stack Overflow :
http://stackoverflow.com/questions/13123638/there-is-pmin-and-pmax-each-taking-na-rm-why-no-psum

And a search of the archives found that has Gabor has suggested it too as an aside :
http://r.789695.n4.nabble.com/How-to-do-it-without-for-loops-tp794745p794750.html

If someone from R core is willing to sponsor the idea, I am willing to write, test and submit the code for psum. Implemented in a very similar fashion to pmin and pmax.  Or perhaps it exists already in a package somewhere (I searched but didn't find it).

Matthew

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

Matthew Dowle

Because that's inconsistent with pmin and pmax when two NAs are summed.

x = c(1,3,NA,NA,5)
y = c(2,NA,4,NA,1)
colSums(rbind(x, y), na.rm = TRUE)
[1] 3 3 4 0 6    # actual
[1] 3 3 4 NA 6   # desired

and it would be less convenient/natural (and slower) than a psum which
would call .Internal(psum(na.rm,...)) in the same way as pmin and pmax.

> Why don't you make a matrix and use colSums or rowSums?
>
> x = c(1,3,NA,5)
> y = c(2,NA,4,1)
> colSums(rbind(x, y), na.rm = TRUE)
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> [hidden email]
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to
> say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of
> data.
> ~ John Tukey
>
>
> -----Oorspronkelijk bericht-----
> Van: [hidden email] [mailto:[hidden email]]
> Namens Matthew Dowle
> Verzonden: dinsdag 30 oktober 2012 12:03
> Aan: [hidden email]
> Onderwerp: [Rd] There is pmin and pmax each taking na.rm, how about psum?
>
>
> Hi,
>
> Please consider the following :
>
> x = c(1,3,NA,5)
> y = c(2,NA,4,1)
>
> min(x,y,na.rm=TRUE)    # ok
> [1] 1
> max(x,y,na.rm=TRUE)    # ok
> [1] 5
> sum(x,y,na.rm=TRUE)    # ok
> [1] 16
>
> pmin(x,y,na.rm=TRUE)   # ok
> [1] 1 3 4 1
> pmax(x,y,na.rm=TRUE)   # ok
> [1] 2 3 4 5
> psum(x,y,na.rm=TRUE)
> [1] 3 3 4 6                             # expected result
> Error: could not find function "psum"   # actual result
>
> I realise that + is already like psum, but what about NA?
>
> x+y
> [1]  3 NA NA  6        # can't supply `na.rm=TRUE` to `+`
>
> Is there a case to add psum? Or have I missed something.
>
> This question survived when I asked on Stack Overflow :
> http://stackoverflow.com/questions/13123638/there-is-pmin-and-pmax-each-taking-na-rm-why-no-psum
>
> And a search of the archives found that has Gabor has suggested it too as
> an aside :
> http://r.789695.n4.nabble.com/How-to-do-it-without-for-loops-tp794745p794750.html
>
> If someone from R core is willing to sponsor the idea, I am willing to
> write, test and submit the code for psum. Implemented in a very similar
> fashion to pmin and pmax.  Or perhaps it exists already in a package
> somewhere (I searched but didn't find it).
>
> Matthew
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
> weer en binden het INBO onder geen enkel beding, zolang dit bericht niet
> bevestigd is door een geldig ondertekend document.
> The views expressed in this message and any annex are purely those of the
> writer and may not be regarded as stating an official position of INBO, as
> long as the message is not confirmed by a duly signed document.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

hadley wickham
In reply to this post by Matthew Dowle
> Is there a case to add psum? Or have I missed something.

If psum, then why not pdiff (-), pprod (*) and precip (/) ?  And
similarly, what about equivalent functions for ^, %%, %/%, &, and | ?

Hadley

--
RStudio / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

Matthew Dowle

Not pdiff because i) psum(x,-y,na.rm=TRUE) would do that and ii) diff is
quite unlike -. Yes, pprod too, but not pdiv (or precip) because
pprod(x,y^-1,na.rm=TRUE) would dominate that.

> what about equivalent functions for ^, %%, %/%, &, and | ?

I like the suggestion, but not as useful as psum and pprod. It would
probably be going too far to add those too. Plus in ?groupGeneric, under
section 3, there are 7 functions listed :

    min, max, sum, prod, range, all, any

The p* would be extended to 2 more of those 7. Wouldn't make sense for
prange, pall or pany.  So, just psum and pprod. ^, %%, %/%, &, and | are
listed in section 2, Group "ops" and seem different to sum and prod in
that sense.


>> Is there a case to add psum? Or have I missed something.
>
> If psum, then why not pdiff (-), pprod (*) and precip (/) ?  And
> similarly, what about equivalent functions for ^, %%, %/%, &, and | ?
>
> Hadley
>
> --
> RStudio / Rice University
> http://had.co.nz/
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

Justin Talbot
In reply to this post by Matthew Dowle
> Because that's inconsistent with pmin and pmax when two NAs are summed.
>
> x = c(1,3,NA,NA,5)
> y = c(2,NA,4,NA,1)
> colSums(rbind(x, y), na.rm = TRUE)
> [1] 3 3 4 0 6    # actual
> [1] 3 3 4 NA 6   # desired

But your desired result would be inconsistent with sum:
sum(NA,NA,na.rm=TRUE)
[1] 0

>From a language definition perspective I think having psum return 0
here is right choice. R consistently distinguishes between operators
that have a sensible identity (+:0, *:1, &:TRUE, |:FALSE) which return
the identity if removing NAs results in no items, and those that kind
of don't (pmin, pmax) which return NA. Let's not break that.

(I would argue that pmin and pmax should return their actual
identities too: Inf and -Inf respectively, but I can understand the
current behavior.)


My 2 cents on psum:

R has a natural set of associative & commutative operators: +, *, &,
|, pmin, pmax.

These correspond directly to the reduction functions: sum, prod, all,
any, min, max

The current problem is that pmin and pmax are more powerful than +, *,
&, and |. The right fix is to extend the rest of the associative &
commutative operators to have the same power as pmin and pmax.

Thus, + should have the signature: `+`(..., na.rm=FALSE), which would
allow you to do things like:

`+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6)

If you don't like typing `+`, you could always alias psum to `+`.

Additionally, R currently has two simple reduction functions that
don't have corresponding operators: range and length. Having a prange
operator and a plength operator would nicely round out the language.

Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

Matthew Dowle
In reply to this post by Matthew Dowle

Justin Talbot <jtalbot <at> stanford.edu> writes:

>
> > Because that's inconsistent with pmin and pmax when two NAs are summed.
> >
> > x = c(1,3,NA,NA,5)
> > y = c(2,NA,4,NA,1)
> > colSums(rbind(x, y), na.rm = TRUE)
> > [1] 3 3 4 0 6    # actual
> > [1] 3 3 4 NA 6   # desired
>
> But your desired result would be inconsistent with sum:
> sum(NA,NA,na.rm=TRUE)
> [1] 0
>
> >From a language definition perspective I think having psum return 0
> here is right choice.

Ok, you've sold me. psum(NA,NA,na.rm=TRUE) returning 0 sounds good. And
pprod(NA,NA,na.rm=TRUE) returning 1, consistent with prod then.

Then the case for psum is more for convenience and speed -vs-
colSums(rbind(x,y), na.rm=TRUE)), since rbind will copy x and y into a new
matrix. The case for pprod is similar, plus colProds doesn't exist.

> Thus, + should have the signature: `+`(..., na.rm=FALSE), which would
> allow you to do things like:
>
> `+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6)
>
> If you don't like typing `+`, you could always alias psum to `+`.

But there would be a cost, wouldn't there? `+` is a dyadic .Primitive.
Changing that to take `...` and `na.rm` could slow it down (iiuc), and any
changes to the existing language are risky.  For example :
    `+`(1,2,3)
is currently an error. Changing that to do something might have
implications for some of the 4,000 packages (some might rely on that being
an error), with a possible speed cost too.

In contrast, adding two functions that didn't exist before: psum and pprod,
seems to be a safer and simpler proposition.

Matthew

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

Justin Talbot
>
> Then the case for psum is more for convenience and speed -vs-
> colSums(rbind(x,y), na.rm=TRUE)), since rbind will copy x and y into a new
> matrix. The case for pprod is similar, plus colProds doesn't exist.
>

Right, and consistency; for what that's worth.

>> Thus, + should have the signature: `+`(..., na.rm=FALSE), which would
>> allow you to do things like:
>>
>> `+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6)
>>
>> If you don't like typing `+`, you could always alias psum to `+`.
>
> But there would be a cost, wouldn't there? `+` is a dyadic .Primitive.
> Changing that to take `...` and `na.rm` could slow it down (iiuc), and any
> changes to the existing language are risky.  For example :
>     `+`(1,2,3)
> is currently an error. Changing that to do something might have
> implications for some of the 4,000 packages (some might rely on that being
> an error), with a possible speed cost too.
>

There would be a very slight performance cost for the current
interpreter. For the new bytecode compiler though there would be no
performance cost since the common binary form can be detected at
compile time and an optimized bytecode can be emitted for it.

Taking what's currently an error and making it legal is a pretty safe
change; unless someone is currently relying on `+`(1,2,3) to return an
error, which I doubt. I think the bigger question on making this
change work would be on the S3 dispatch logic. I don't understand the
intricacies of S3 well enough to know if this change is plausible or
not.

> In contrast, adding two functions that didn't exist before: psum and pprod,
> seems to be a safer and simpler proposition.

Definitely easier. Leaves the language a bit more complicated, but
that might be the right trade off. I would strongly suggest adding
pany and pall as well. I find myself wishing for them all the time.
prange would be nice as well.

Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

Henrik Bengtsson-3
On Sun, Nov 4, 2012 at 6:35 AM, Justin Talbot <[hidden email]> wrote:

>>
>> Then the case for psum is more for convenience and speed -vs-
>> colSums(rbind(x,y), na.rm=TRUE)), since rbind will copy x and y into a new
>> matrix. The case for pprod is similar, plus colProds doesn't exist.
>>
>
> Right, and consistency; for what that's worth.
>
>>> Thus, + should have the signature: `+`(..., na.rm=FALSE), which would
>>> allow you to do things like:
>>>
>>> `+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6)
>>>
>>> If you don't like typing `+`, you could always alias psum to `+`.
>>
>> But there would be a cost, wouldn't there? `+` is a dyadic .Primitive.
>> Changing that to take `...` and `na.rm` could slow it down (iiuc), and any
>> changes to the existing language are risky.  For example :
>>     `+`(1,2,3)
>> is currently an error. Changing that to do something might have
>> implications for some of the 4,000 packages (some might rely on that being
>> an error), with a possible speed cost too.
>>
>
> There would be a very slight performance cost for the current
> interpreter. For the new bytecode compiler though there would be no
> performance cost since the common binary form can be detected at
> compile time and an optimized bytecode can be emitted for it.
>
> Taking what's currently an error and making it legal is a pretty safe
> change; unless someone is currently relying on `+`(1,2,3) to return an
> error, which I doubt. I think the bigger question on making this
> change work would be on the S3 dispatch logic. I don't understand the
> intricacies of S3 well enough to know if this change is plausible or
> not.
>
>> In contrast, adding two functions that didn't exist before: psum and pprod,
>> seems to be a safer and simpler proposition.
>
> Definitely easier. Leaves the language a bit more complicated, but
> that might be the right trade off. I would strongly suggest adding
> pany and pall as well. I find myself wishing for them all the time.
> prange would be nice as well.

Have a look at the matrixStats package; it might bring what you're looking for:

  http://cran.r-project.org/web/packages/matrixStats

/Henrik

>
> Justin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: There is pmin and pmax each taking na.rm, how about psum?

Matthew Dowle
> On Sun, Nov 4, 2012 at 6:35 AM, Justin Talbot <[hidden email]>
> wrote:
>>>
>>> Then the case for psum is more for convenience and speed -vs-
>>> colSums(rbind(x,y), na.rm=TRUE)), since rbind will copy x and y into a
>>> new
>>> matrix. The case for pprod is similar, plus colProds doesn't exist.
>>>
>>
>> Right, and consistency; for what that's worth.
>>
>>>> Thus, + should have the signature: `+`(..., na.rm=FALSE), which would
>>>> allow you to do things like:
>>>>
>>>> `+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6)
>>>>
>>>> If you don't like typing `+`, you could always alias psum to `+`.
>>>
>>> But there would be a cost, wouldn't there? `+` is a dyadic .Primitive.
>>> Changing that to take `...` and `na.rm` could slow it down (iiuc), and
>>> any
>>> changes to the existing language are risky.  For example :
>>>     `+`(1,2,3)
>>> is currently an error. Changing that to do something might have
>>> implications for some of the 4,000 packages (some might rely on that
>>> being
>>> an error), with a possible speed cost too.
>>>
>>
>> There would be a very slight performance cost for the current
>> interpreter. For the new bytecode compiler though there would be no
>> performance cost since the common binary form can be detected at
>> compile time and an optimized bytecode can be emitted for it.
>>
>> Taking what's currently an error and making it legal is a pretty safe
>> change; unless someone is currently relying on `+`(1,2,3) to return an
>> error, which I doubt. I think the bigger question on making this
>> change work would be on the S3 dispatch logic. I don't understand the
>> intricacies of S3 well enough to know if this change is plausible or
>> not.

Interesting. Sounds more possible than I thought.

>>
>>> In contrast, adding two functions that didn't exist before: psum and
>>> pprod,
>>> seems to be a safer and simpler proposition.
>>
>> Definitely easier. Leaves the language a bit more complicated, but
>> that might be the right trade off. I would strongly suggest adding
>> pany and pall as well. I find myself wishing for them all the time.
>> prange would be nice as well.
>
> Have a look at the matrixStats package; it might bring what you're looking
> for:
>
>   http://cran.r-project.org/web/packages/matrixStats
>
> /Henrik

Nice package and very handy. It has colProds, too. But its functions take
a matrix.

' Then the case for psum is more for convenience and speed
-vs-colSums(rbind(x,y), na.rm=TRUE)), since rbind will copy x and y into a
new matrix. '

Matthew

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...