
12

Hello, All:
Consider:
Browse[2]> set.seed(1)
Browse[2]> rpois(9, 1e10)
NAs produced[1] NA NA NA NA NA NA NA NA NA
Should this happen?
I think that for, say, lambda>1e6, rpois should return rnorm(.,
lambda, sqrt(lambda)).
For my particular Monte Carlo, I have replaced my call to rpois
with a call to the following:
rpois. < function(n, lambda){
n2 < max(length(n), length(lambda))
n < rep_len(n, n2)
lambda < rep_len(lambda, n2)
#
big < (lambda>1e6)
out < rep(NA, n2)
out[big] < rnorm(sum(big), lambda[big], sqrt(lambda[big]))
out[!big] < rpois(sum(!big), lambda[!big])
out
}
Comments?
Thanks,
Spencer Graves
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


> 
> Hello, All:
>
>
> Consider:
>
>
> Browse[2]> set.seed(1)
> Browse[2]> rpois(9, 1e10)
> NAs produced[1] NA NA NA NA NA NA NA NA NA
>
>
> Should this happen?
>
>
> I think that for, say, lambda>1e6, rpois should return rnorm(.,
> lambda, sqrt(lambda)).
But need to implement carefully; rpois should always return a
nonnegative integer, whereas rnorm always returns numeric...
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


On 20200119 09:34, Benjamin Tyner wrote:
>> 
>> Hello, All:
>>
>>
>> Consider:
>>
>>
>> Browse[2]> set.seed(1)
>> Browse[2]> rpois(9, 1e10)
>> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>
>>
>> Should this happen?
>>
>>
>> I think that for, say, lambda>1e6, rpois should return rnorm(.,
>> lambda, sqrt(lambda)).
> But need to implement carefully; rpois should always return a
> nonnegative integer, whereas rnorm always returns numeric...
>
Thanks for the reply.
However, I think it's not acceptable to get an NA from a number
that cannot be expressed as an integer. Whenever a randomly generated
number would exceed .Machine$integer.max, the choice is between
returning NA or a noninteger numeric. Consider:
> 2*.Machine$integer.max
[1] 4294967294
> as.integer(2*.Machine$integer.max)
[1] NA
Warning message:
NAs introduced by coercion to integer range
I'd rather have the noninteger numeric.
Spencer
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Maybe there should be code for 64 bit R to use long long or the like?
On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves < [hidden email]>
wrote:
>
>
> On 20200119 09:34, Benjamin Tyner wrote:
> >> 
> >> Hello, All:
> >>
> >>
> >> Consider:
> >>
> >>
> >> Browse[2]> set.seed(1)
> >> Browse[2]> rpois(9, 1e10)
> >> NAs produced[1] NA NA NA NA NA NA NA NA NA
> >>
> >>
> >> Should this happen?
> >>
> >>
> >> I think that for, say, lambda>1e6, rpois should return rnorm(.,
> >> lambda, sqrt(lambda)).
> > But need to implement carefully; rpois should always return a
> > nonnegative integer, whereas rnorm always returns numeric...
> >
>
> Thanks for the reply.
>
>
> However, I think it's not acceptable to get an NA from a number
> that cannot be expressed as an integer. Whenever a randomly generated
> number would exceed .Machine$integer.max, the choice is between
> returning NA or a noninteger numeric. Consider:
>
>
> > 2*.Machine$integer.max
> [1] 4294967294
> > as.integer(2*.Machine$integer.max)
> [1] NA
> Warning message:
> NAs introduced by coercion to integer range
>
>
> I'd rather have the noninteger numeric.
>
>
> Spencer
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>

Sent from Gmail Mobile
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


So imagine rpois is changed, such that the storage mode of its return
value is sometimes integer and sometimes numeric. Then imagine the case
where lambda is itself a realization of a random variable. Do we really
want the storage mode to inherit that randomness?
On 1/19/20 10:47 AM, Avraham Adler wrote:
> Maybe there should be code for 64 bit R to use long long or the like?
>
> On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
> < [hidden email] <mailto: [hidden email]>> wrote:
>
>
>
> On 20200119 09:34, Benjamin Tyner wrote:
> >>
> 
> >> Hello, All:
> >>
> >>
> >> Consider:
> >>
> >>
> >> Browse[2]> set.seed(1)
> >> Browse[2]> rpois(9, 1e10)
> >> NAs produced[1] NA NA NA NA NA NA NA NA NA
> >>
> >>
> >> Should this happen?
> >>
> >>
> >> I think that for, say, lambda>1e6, rpois should return
> rnorm(.,
> >> lambda, sqrt(lambda)).
> > But need to implement carefully; rpois should always return a
> > nonnegative integer, whereas rnorm always returns numeric...
> >
>
> Thanks for the reply.
>
>
> However, I think it's not acceptable to get an NA from a
> number
> that cannot be expressed as an integer. Whenever a randomly
> generated
> number would exceed .Machine$integer.max, the choice is between
> returning NA or a noninteger numeric. Consider:
>
>
> > 2*.Machine$integer.max
> [1] 4294967294
> > as.integer(2*.Machine$integer.max)
> [1] NA
> Warning message:
> NAs introduced by coercion to integer range
>
>
> I'd rather have the noninteger numeric.
>
>
> Spencer
>
> ______________________________________________
> [hidden email] <mailto: [hidden email]> mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>
> 
> Sent from Gmail Mobile
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Technically, lambda can always be numeric. It is the observations which
must be integral.
Would hitting everything larger than maxint or maxlonglong with floor or
round fundamentally change the distribution? Well, yes, but enough that it
would matter over process risk?
Avi
On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner < [hidden email]> wrote:
> So imagine rpois is changed, such that the storage mode of its return
> value is sometimes integer and sometimes numeric. Then imagine the case
> where lambda is itself a realization of a random variable. Do we really
> want the storage mode to inherit that randomness?
>
>
> On 1/19/20 10:47 AM, Avraham Adler wrote:
> > Maybe there should be code for 64 bit R to use long long or the like?
> >
> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
> > < [hidden email] <mailto: [hidden email]>>
> wrote:
> >
> >
> >
> > On 20200119 09:34, Benjamin Tyner wrote:
> > >>
> >
> 
> > >> Hello, All:
> > >>
> > >>
> > >> Consider:
> > >>
> > >>
> > >> Browse[2]> set.seed(1)
> > >> Browse[2]> rpois(9, 1e10)
> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
> > >>
> > >>
> > >> Should this happen?
> > >>
> > >>
> > >> I think that for, say, lambda>1e6, rpois should return
> > rnorm(.,
> > >> lambda, sqrt(lambda)).
> > > But need to implement carefully; rpois should always return a
> > > nonnegative integer, whereas rnorm always returns numeric...
> > >
> >
> > Thanks for the reply.
> >
> >
> > However, I think it's not acceptable to get an NA from a
> > number
> > that cannot be expressed as an integer. Whenever a randomly
> > generated
> > number would exceed .Machine$integer.max, the choice is between
> > returning NA or a noninteger numeric. Consider:
> >
> >
> > > 2*.Machine$integer.max
> > [1] 4294967294
> > > as.integer(2*.Machine$integer.max)
> > [1] NA
> > Warning message:
> > NAs introduced by coercion to integer range
> >
> >
> > I'd rather have the noninteger numeric.
> >
> >
> > Spencer
> >
> > ______________________________________________
> > [hidden email] <mailto: [hidden email]> mailing list
> > https://stat.ethz.ch/mailman/listinfo/rdevel> >
> > 
> > Sent from Gmail Mobile
>

Sent from Gmail Mobile
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


This issue arose for me in simulations to estimate confidence,
prediction, and tolerance intervals from glm(., family=poisson) fits
embedded in a BMA::bic.glm fit using a simulate.bic.glm function I added
to the development version of Ecfun, available at
" https://github.com/sbgraves237/Ecfun". This is part of a vignette I'm
developing, available at
" https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd".
This includes a simulated mean of a mixture of Poissons that exceeds
2e22. It doesn't seem unreasonable to me to have rpois output a
numerics rather than integers when a number simulated exceeds
.Machine$integer.max. And it does seem to make less sense in such cases
to return NAs.
Alternatively, might it make sense to add another argument to
rpois to give the user the choice? E.g., an argument "bigOutput" with
(I hope) default = "numeric" and "NA" as a second option. Or NA is the
default, so no code that relied that feature of the current code would
be broken by the change. If someone wanted to use arbitrary precision
arithmetic, they could write their own version of this function with
"arbitraryPrecision" as an optional value for the "bigOutput" argument.
Comments?
Thanks,
Spencer Graves
On 20200119 10:28, Avraham Adler wrote:
> Technically, lambda can always be numeric. It is the observations
> which must be integral.
>
> Would hitting everything larger than maxint or maxlonglong with floor
> or round fundamentally change the distribution? Well, yes, but enough
> that it would matter over process risk?
>
> Avi
>
> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner < [hidden email]
> <mailto: [hidden email]>> wrote:
>
> So imagine rpois is changed, such that the storage mode of its return
> value is sometimes integer and sometimes numeric. Then imagine the
> case
> where lambda is itself a realization of a random variable. Do we
> really
> want the storage mode to inherit that randomness?
>
>
> On 1/19/20 10:47 AM, Avraham Adler wrote:
> > Maybe there should be code for 64 bit R to use long long or the
> like?
> >
> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
> > < [hidden email]
> <mailto: [hidden email]>
> <mailto: [hidden email]
> <mailto: [hidden email]>>> wrote:
> >
> >
> >
> > On 20200119 09:34, Benjamin Tyner wrote:
> > >>
> >
> 
> > >> Hello, All:
> > >>
> > >>
> > >> Consider:
> > >>
> > >>
> > >> Browse[2]> set.seed(1)
> > >> Browse[2]> rpois(9, 1e10)
> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
> > >>
> > >>
> > >> Should this happen?
> > >>
> > >>
> > >> I think that for, say, lambda>1e6, rpois should
> return
> > rnorm(.,
> > >> lambda, sqrt(lambda)).
> > > But need to implement carefully; rpois should always return a
> > > nonnegative integer, whereas rnorm always returns numeric...
> > >
> >
> > Thanks for the reply.
> >
> >
> > However, I think it's not acceptable to get an NA from a
> > number
> > that cannot be expressed as an integer. Whenever a randomly
> > generated
> > number would exceed .Machine$integer.max, the choice is between
> > returning NA or a noninteger numeric. Consider:
> >
> >
> > > 2*.Machine$integer.max
> > [1] 4294967294
> > > as.integer(2*.Machine$integer.max)
> > [1] NA
> > Warning message:
> > NAs introduced by coercion to integer range
> >
> >
> > I'd rather have the noninteger numeric.
> >
> >
> > Spencer
> >
> > ______________________________________________
> > [hidden email] <mailto: [hidden email]>
> <mailto: [hidden email] <mailto: [hidden email]>>
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/rdevel> >
> > 
> > Sent from Gmail Mobile
>
> 
> Sent from Gmail Mobile
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Crazy thought, but being that a sum of Poissons is Poisson in the sum, can
you break your “big” simulation into the sum of a few smaller ones? Or is
the order of magnitude difference just too great?
On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves < [hidden email]>
wrote:
> This issue arose for me in simulations to estimate confidence,
> prediction, and tolerance intervals from glm(., family=poisson) fits
> embedded in a BMA::bic.glm fit using a simulate.bic.glm function I added to
> the development version of Ecfun, available at
> " https://github.com/sbgraves237/Ecfun"
> < https://github.com/sbgraves237/Ecfun>. This is part of a vignette I'm
> developing, available at
> " https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
> < https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
> This includes a simulated mean of a mixture of Poissons that exceeds 2e22.
> It doesn't seem unreasonable to me to have rpois output a numerics rather
> than integers when a number simulated exceeds .Machine$integer.max. And it
> does seem to make less sense in such cases to return NAs.
>
>
> Alternatively, might it make sense to add another argument to rpois
> to give the user the choice? E.g., an argument "bigOutput" with (I hope)
> default = "numeric" and "NA" as a second option. Or NA is the default, so
> no code that relied that feature of the current code would be broken by the
> change. If someone wanted to use arbitrary precision arithmetic, they
> could write their own version of this function with "arbitraryPrecision" as
> an optional value for the "bigOutput" argument.
>
>
> Comments?
> Thanks,
> Spencer Graves
>
>
>
> On 20200119 10:28, Avraham Adler wrote:
>
> Technically, lambda can always be numeric. It is the observations which
> must be integral.
>
> Would hitting everything larger than maxint or maxlonglong with floor or
> round fundamentally change the distribution? Well, yes, but enough that it
> would matter over process risk?
>
> Avi
>
> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner < [hidden email]> wrote:
>
>> So imagine rpois is changed, such that the storage mode of its return
>> value is sometimes integer and sometimes numeric. Then imagine the case
>> where lambda is itself a realization of a random variable. Do we really
>> want the storage mode to inherit that randomness?
>>
>>
>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>> > Maybe there should be code for 64 bit R to use long long or the like?
>> >
>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>> > < [hidden email] <mailto: [hidden email]>>
>> wrote:
>> >
>> >
>> >
>> > On 20200119 09:34, Benjamin Tyner wrote:
>> > >>
>> >
>> 
>> > >> Hello, All:
>> > >>
>> > >>
>> > >> Consider:
>> > >>
>> > >>
>> > >> Browse[2]> set.seed(1)
>> > >> Browse[2]> rpois(9, 1e10)
>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>> > >>
>> > >>
>> > >> Should this happen?
>> > >>
>> > >>
>> > >> I think that for, say, lambda>1e6, rpois should return
>> > rnorm(.,
>> > >> lambda, sqrt(lambda)).
>> > > But need to implement carefully; rpois should always return a
>> > > nonnegative integer, whereas rnorm always returns numeric...
>> > >
>> >
>> > Thanks for the reply.
>> >
>> >
>> > However, I think it's not acceptable to get an NA from a
>> > number
>> > that cannot be expressed as an integer. Whenever a randomly
>> > generated
>> > number would exceed .Machine$integer.max, the choice is between
>> > returning NA or a noninteger numeric. Consider:
>> >
>> >
>> > > 2*.Machine$integer.max
>> > [1] 4294967294
>> > > as.integer(2*.Machine$integer.max)
>> > [1] NA
>> > Warning message:
>> > NAs introduced by coercion to integer range
>> >
>> >
>> > I'd rather have the noninteger numeric.
>> >
>> >
>> > Spencer
>> >
>> > ______________________________________________
>> > [hidden email] <mailto: [hidden email]> mailing list
>> > https://stat.ethz.ch/mailman/listinfo/rdevel>> >
>> > 
>> > Sent from Gmail Mobile
>>
> 
> Sent from Gmail Mobile
>
>
> 
Sent from Gmail Mobile
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


On 20200119 13:01, Avraham Adler wrote:
> Crazy thought, but being that a sum of Poissons is Poisson in the sum,
> can you break your “big” simulation into the sum of a few smaller
> ones? Or is the order of magnitude difference just too great?
I don't perceive that as feasible. Once I found what was
generating NAs, it was easy to code a function to return pseudorandom
numbers using the standard normal approximation to the Poisson for those
extreme cases. [For a Poisson with mean = 1e6, for example, the
skewness (third standardized moment) is 0.001. At least for my
purposes, that should be adequate.][1]
What are the negative consequences of having rpois return
numerics that are always nonnegative?
Spencer
[1] In the code I reported before, I just changed the threshold of 1e6
to 0.5*.Machine$integer.max. On my Mac, .Machine$integer.max =
2147483647 = 2^31 > 1e9. That still means that a Poisson distributed
pseudorandom number just under that would have to be over 23000
standard deviations above the mean to exceed .Machine$integer.max.
>
> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
> < [hidden email] <mailto: [hidden email]>> wrote:
>
> This issue arose for me in simulations to estimate
> confidence, prediction, and tolerance intervals from glm(.,
> family=poisson) fits embedded in a BMA::bic.glm fit using a
> simulate.bic.glm function I added to the development version of
> Ecfun, available at " https://github.com/sbgraves237/Ecfun"
> < https://github.com/sbgraves237/Ecfun>. This is part of a vignette
> I'm developing, available at
> " https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
> < https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
> This includes a simulated mean of a mixture of Poissons that
> exceeds 2e22. It doesn't seem unreasonable to me to have rpois
> output a numerics rather than integers when a number simulated
> exceeds .Machine$integer.max. And it does seem to make less sense
> in such cases to return NAs.
>
>
> Alternatively, might it make sense to add another argument
> to rpois to give the user the choice? E.g., an argument
> "bigOutput" with (I hope) default = "numeric" and "NA" as a second
> option. Or NA is the default, so no code that relied that feature
> of the current code would be broken by the change. If someone
> wanted to use arbitrary precision arithmetic, they could write
> their own version of this function with "arbitraryPrecision" as an
> optional value for the "bigOutput" argument.
>
>
> Comments?
> Thanks,
> Spencer Graves
>
>
>
> On 20200119 10:28, Avraham Adler wrote:
>> Technically, lambda can always be numeric. It is the observations
>> which must be integral.
>>
>> Would hitting everything larger than maxint or maxlonglong with
>> floor or round fundamentally change the distribution? Well, yes,
>> but enough that it would matter over process risk?
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner < [hidden email]
>> <mailto: [hidden email]>> wrote:
>>
>> So imagine rpois is changed, such that the storage mode of
>> its return
>> value is sometimes integer and sometimes numeric. Then
>> imagine the case
>> where lambda is itself a realization of a random variable. Do
>> we really
>> want the storage mode to inherit that randomness?
>>
>>
>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>> > Maybe there should be code for 64 bit R to use long long or
>> the like?
>> >
>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>> > < [hidden email]
>> <mailto: [hidden email]>
>> <mailto: [hidden email]
>> <mailto: [hidden email]>>> wrote:
>> >
>> >
>> >
>> > On 20200119 09:34, Benjamin Tyner wrote:
>> > >>
>> >
>> 
>> > >> Hello, All:
>> > >>
>> > >>
>> > >> Consider:
>> > >>
>> > >>
>> > >> Browse[2]> set.seed(1)
>> > >> Browse[2]> rpois(9, 1e10)
>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>> > >>
>> > >>
>> > >> Should this happen?
>> > >>
>> > >>
>> > >> I think that for, say, lambda>1e6, rpois
>> should return
>> > rnorm(.,
>> > >> lambda, sqrt(lambda)).
>> > > But need to implement carefully; rpois should always
>> return a
>> > > nonnegative integer, whereas rnorm always returns
>> numeric...
>> > >
>> >
>> > Thanks for the reply.
>> >
>> >
>> > However, I think it's not acceptable to get an
>> NA from a
>> > number
>> > that cannot be expressed as an integer. Whenever a randomly
>> > generated
>> > number would exceed .Machine$integer.max, the choice is
>> between
>> > returning NA or a noninteger numeric. Consider:
>> >
>> >
>> > > 2*.Machine$integer.max
>> > [1] 4294967294
>> > > as.integer(2*.Machine$integer.max)
>> > [1] NA
>> > Warning message:
>> > NAs introduced by coercion to integer range
>> >
>> >
>> > I'd rather have the noninteger numeric.
>> >
>> >
>> > Spencer
>> >
>> > ______________________________________________
>> > [hidden email] <mailto: [hidden email]>
>> <mailto: [hidden email] <mailto: [hidden email]>>
>> mailing list
>> > https://stat.ethz.ch/mailman/listinfo/rdevel>> >
>> > 
>> > Sent from Gmail Mobile
>>
>> 
>> Sent from Gmail Mobile
>
> 
> Sent from Gmail Mobile
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Floor (maybe round) of nonnegative numerics, though. Poisson should never
have anything after decimal.
Still think it’s worth allowing long long for R64 bit, just for purity
sake.
Avi
On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves < [hidden email]>
wrote:
>
>
> On 20200119 13:01, Avraham Adler wrote:
>
> Crazy thought, but being that a sum of Poissons is Poisson in the sum, can
> you break your “big” simulation into the sum of a few smaller ones? Or is
> the order of magnitude difference just too great?
>
>
>
> I don't perceive that as feasible. Once I found what was generating
> NAs, it was easy to code a function to return pseudorandom numbers using
> the standard normal approximation to the Poisson for those extreme cases.
> [For a Poisson with mean = 1e6, for example, the skewness (third
> standardized moment) is 0.001. At least for my purposes, that should be
> adequate.][1]
>
>
> What are the negative consequences of having rpois return numerics
> that are always nonnegative?
>
>
> Spencer
>
>
> [1] In the code I reported before, I just changed the threshold of 1e6 to
> 0.5*.Machine$integer.max. On my Mac, .Machine$integer.max = 2147483647 =
> 2^31 > 1e9. That still means that a Poisson distributed pseudorandom
> number just under that would have to be over 23000 standard deviations
> above the mean to exceed .Machine$integer.max.
>
>
> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves <
> [hidden email]> wrote:
>
>> This issue arose for me in simulations to estimate confidence,
>> prediction, and tolerance intervals from glm(., family=poisson) fits
>> embedded in a BMA::bic.glm fit using a simulate.bic.glm function I added to
>> the development version of Ecfun, available at
>> " https://github.com/sbgraves237/Ecfun"
>> < https://github.com/sbgraves237/Ecfun>. This is part of a vignette I'm
>> developing, available at
>> " https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
>> < https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
>> This includes a simulated mean of a mixture of Poissons that exceeds 2e22.
>> It doesn't seem unreasonable to me to have rpois output a numerics rather
>> than integers when a number simulated exceeds .Machine$integer.max. And it
>> does seem to make less sense in such cases to return NAs.
>>
>>
>> Alternatively, might it make sense to add another argument to
>> rpois to give the user the choice? E.g., an argument "bigOutput" with (I
>> hope) default = "numeric" and "NA" as a second option. Or NA is the
>> default, so no code that relied that feature of the current code would be
>> broken by the change. If someone wanted to use arbitrary precision
>> arithmetic, they could write their own version of this function with
>> "arbitraryPrecision" as an optional value for the "bigOutput" argument.
>>
>>
>> Comments?
>> Thanks,
>> Spencer Graves
>>
>>
>>
>> On 20200119 10:28, Avraham Adler wrote:
>>
>> Technically, lambda can always be numeric. It is the observations which
>> must be integral.
>>
>> Would hitting everything larger than maxint or maxlonglong with floor or
>> round fundamentally change the distribution? Well, yes, but enough that it
>> would matter over process risk?
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner < [hidden email]> wrote:
>>
>>> So imagine rpois is changed, such that the storage mode of its return
>>> value is sometimes integer and sometimes numeric. Then imagine the case
>>> where lambda is itself a realization of a random variable. Do we really
>>> want the storage mode to inherit that randomness?
>>>
>>>
>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>> > Maybe there should be code for 64 bit R to use long long or the like?
>>> >
>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>> > < [hidden email] <mailto: [hidden email]>>
>>> wrote:
>>> >
>>> >
>>> >
>>> > On 20200119 09:34, Benjamin Tyner wrote:
>>> > >>
>>> >
>>> 
>>> > >> Hello, All:
>>> > >>
>>> > >>
>>> > >> Consider:
>>> > >>
>>> > >>
>>> > >> Browse[2]> set.seed(1)
>>> > >> Browse[2]> rpois(9, 1e10)
>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>> > >>
>>> > >>
>>> > >> Should this happen?
>>> > >>
>>> > >>
>>> > >> I think that for, say, lambda>1e6, rpois should return
>>> > rnorm(.,
>>> > >> lambda, sqrt(lambda)).
>>> > > But need to implement carefully; rpois should always return a
>>> > > nonnegative integer, whereas rnorm always returns numeric...
>>> > >
>>> >
>>> > Thanks for the reply.
>>> >
>>> >
>>> > However, I think it's not acceptable to get an NA from a
>>> > number
>>> > that cannot be expressed as an integer. Whenever a randomly
>>> > generated
>>> > number would exceed .Machine$integer.max, the choice is between
>>> > returning NA or a noninteger numeric. Consider:
>>> >
>>> >
>>> > > 2*.Machine$integer.max
>>> > [1] 4294967294
>>> > > as.integer(2*.Machine$integer.max)
>>> > [1] NA
>>> > Warning message:
>>> > NAs introduced by coercion to integer range
>>> >
>>> >
>>> > I'd rather have the noninteger numeric.
>>> >
>>> >
>>> > Spencer
>>> >
>>> > ______________________________________________
>>> > [hidden email] <mailto: [hidden email]> mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/rdevel>>> >
>>> > 
>>> > Sent from Gmail Mobile
>>>
>> 
>> Sent from Gmail Mobile
>>
>>
>> 
> Sent from Gmail Mobile
>
>
> 
Sent from Gmail Mobile
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


On my Mac:
str(.Machine)
...
$ integer.max : int 2147483647
$ sizeof.long : int 8
$ sizeof.longlong : int 8
$ sizeof.longdouble : int 16
$ sizeof.pointer : int 8
On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
the same as on my Mac.
Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
log2(.Machine$integer.max) = 31. Then 8 bytes is what used to be called
double precision (2 words of 4 bytes each)? And $ sizeof.longdouble =
16 = 4 words of 4 bytes each?
Spencer
On 20200119 15:41, Avraham Adler wrote:
> Floor (maybe round) of nonnegative numerics, though. Poisson should
> never have anything after decimal.
>
> Still think it’s worth allowing long long for R64 bit, just for purity
> sake.
>
> Avi
>
> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
> < [hidden email] <mailto: [hidden email]>> wrote:
>
>
>
> On 20200119 13:01, Avraham Adler wrote:
>> Crazy thought, but being that a sum of Poissons is Poisson in the
>> sum, can you break your “big” simulation into the sum of a few
>> smaller ones? Or is the order of magnitude difference just too great?
>
>
> I don't perceive that as feasible. Once I found what was
> generating NAs, it was easy to code a function to return
> pseudorandom numbers using the standard normal approximation to
> the Poisson for those extreme cases. [For a Poisson with mean =
> 1e6, for example, the skewness (third standardized moment) is
> 0.001. At least for my purposes, that should be adequate.][1]
>
>
> What are the negative consequences of having rpois return
> numerics that are always nonnegative?
>
>
> Spencer
>
>
> [1] In the code I reported before, I just changed the threshold
> of 1e6 to 0.5*.Machine$integer.max. On my Mac,
> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
> that a Poisson distributed pseudorandom number just under that
> would have to be over 23000 standard deviations above the mean to
> exceed .Machine$integer.max.
>
>>
>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>> < [hidden email]
>> <mailto: [hidden email]>> wrote:
>>
>> This issue arose for me in simulations to estimate
>> confidence, prediction, and tolerance intervals from glm(.,
>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>> simulate.bic.glm function I added to the development version
>> of Ecfun, available at " https://github.com/sbgraves237/Ecfun"
>> < https://github.com/sbgraves237/Ecfun>. This is part of a
>> vignette I'm developing, available at
>> " https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
>> < https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
>> This includes a simulated mean of a mixture of Poissons that
>> exceeds 2e22. It doesn't seem unreasonable to me to have
>> rpois output a numerics rather than integers when a number
>> simulated exceeds .Machine$integer.max. And it does seem to
>> make less sense in such cases to return NAs.
>>
>>
>> Alternatively, might it make sense to add another
>> argument to rpois to give the user the choice? E.g., an
>> argument "bigOutput" with (I hope) default = "numeric" and
>> "NA" as a second option. Or NA is the default, so no code
>> that relied that feature of the current code would be broken
>> by the change. If someone wanted to use arbitrary precision
>> arithmetic, they could write their own version of this
>> function with "arbitraryPrecision" as an optional value for
>> the "bigOutput" argument.
>>
>>
>> Comments?
>> Thanks,
>> Spencer Graves
>>
>>
>>
>> On 20200119 10:28, Avraham Adler wrote:
>>> Technically, lambda can always be numeric. It is the
>>> observations which must be integral.
>>>
>>> Would hitting everything larger than maxint or maxlonglong
>>> with floor or round fundamentally change the distribution?
>>> Well, yes, but enough that it would matter over process risk?
>>>
>>> Avi
>>>
>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
>>> < [hidden email] <mailto: [hidden email]>> wrote:
>>>
>>> So imagine rpois is changed, such that the storage mode
>>> of its return
>>> value is sometimes integer and sometimes numeric. Then
>>> imagine the case
>>> where lambda is itself a realization of a random
>>> variable. Do we really
>>> want the storage mode to inherit that randomness?
>>>
>>>
>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>> > Maybe there should be code for 64 bit R to use long
>>> long or the like?
>>> >
>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>> > < [hidden email]
>>> <mailto: [hidden email]>
>>> <mailto: [hidden email]
>>> <mailto: [hidden email]>>> wrote:
>>> >
>>> >
>>> >
>>> > On 20200119 09:34, Benjamin Tyner wrote:
>>> > >>
>>> >
>>> 
>>> > >> Hello, All:
>>> > >>
>>> > >>
>>> > >> Consider:
>>> > >>
>>> > >>
>>> > >> Browse[2]> set.seed(1)
>>> > >> Browse[2]> rpois(9, 1e10)
>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>> > >>
>>> > >>
>>> > >> Should this happen?
>>> > >>
>>> > >>
>>> > >> I think that for, say, lambda>1e6,
>>> rpois should return
>>> > rnorm(.,
>>> > >> lambda, sqrt(lambda)).
>>> > > But need to implement carefully; rpois should
>>> always return a
>>> > > nonnegative integer, whereas rnorm always
>>> returns numeric...
>>> > >
>>> >
>>> > Thanks for the reply.
>>> >
>>> >
>>> > However, I think it's not acceptable to get
>>> an NA from a
>>> > number
>>> > that cannot be expressed as an integer. Whenever
>>> a randomly
>>> > generated
>>> > number would exceed .Machine$integer.max, the
>>> choice is between
>>> > returning NA or a noninteger numeric. Consider:
>>> >
>>> >
>>> > > 2*.Machine$integer.max
>>> > [1] 4294967294
>>> > > as.integer(2*.Machine$integer.max)
>>> > [1] NA
>>> > Warning message:
>>> > NAs introduced by coercion to integer range
>>> >
>>> >
>>> > I'd rather have the noninteger numeric.
>>> >
>>> >
>>> > Spencer
>>> >
>>> > ______________________________________________
>>> > [hidden email] <mailto: [hidden email]>
>>> <mailto: [hidden email]
>>> <mailto: [hidden email]>> mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/rdevel>>> >
>>> > 
>>> > Sent from Gmail Mobile
>>>
>>> 
>>> Sent from Gmail Mobile
>>
>> 
>> Sent from Gmail Mobile
>
> 
> Sent from Gmail Mobile
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [2^31,
+2^31).
It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.
I few notes inline below:
On Sun, 19 Jan 2020, Spencer Graves wrote:
> On my Mac:
>
>
> str(.Machine)
> ...
> $ integer.max : int 2147483647
> $ sizeof.long : int 8
> $ sizeof.longlong : int 8
> $ sizeof.longdouble : int 16
> $ sizeof.pointer : int 8
>
>
> On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
> the same as on my Mac.
One of many annoyances of Windows  done for compatibility with
ancient Window apps.
> Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
> log2(.Machine$integer.max) = 31. Then 8 bytes is what used to be called
> double precision (2 words of 4 bytes each)? And $ sizeof.longdouble =
> 16 = 4 words of 4 bytes each?
double precision is a floating point concept, not related to integers.
If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer  4 means 32 bits, 8 64 bits.
Best,
luke
>
>
> Spencer
>
>
> On 20200119 15:41, Avraham Adler wrote:
>> Floor (maybe round) of nonnegative numerics, though. Poisson should
>> never have anything after decimal.
>>
>> Still think it’s worth allowing long long for R64 bit, just for purity
>> sake.
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>> < [hidden email] <mailto: [hidden email]>> wrote:
>>
>>
>>
>> On 20200119 13:01, Avraham Adler wrote:
>>> Crazy thought, but being that a sum of Poissons is Poisson in the
>>> sum, can you break your “big” simulation into the sum of a few
>>> smaller ones? Or is the order of magnitude difference just too great?
>>
>>
>> I don't perceive that as feasible. Once I found what was
>> generating NAs, it was easy to code a function to return
>> pseudorandom numbers using the standard normal approximation to
>> the Poisson for those extreme cases. [For a Poisson with mean =
>> 1e6, for example, the skewness (third standardized moment) is
>> 0.001. At least for my purposes, that should be adequate.][1]
>>
>>
>> What are the negative consequences of having rpois return
>> numerics that are always nonnegative?
>>
>>
>> Spencer
>>
>>
>> [1] In the code I reported before, I just changed the threshold
>> of 1e6 to 0.5*.Machine$integer.max. On my Mac,
>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>> that a Poisson distributed pseudorandom number just under that
>> would have to be over 23000 standard deviations above the mean to
>> exceed .Machine$integer.max.
>>
>>>
>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>> < [hidden email]
>>> <mailto: [hidden email]>> wrote:
>>>
>>> This issue arose for me in simulations to estimate
>>> confidence, prediction, and tolerance intervals from glm(.,
>>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>>> simulate.bic.glm function I added to the development version
>>> of Ecfun, available at " https://github.com/sbgraves237/Ecfun"
>>> < https://github.com/sbgraves237/Ecfun>. This is part of a
>>> vignette I'm developing, available at
>>> " https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
>>> < https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
>>> This includes a simulated mean of a mixture of Poissons that
>>> exceeds 2e22. It doesn't seem unreasonable to me to have
>>> rpois output a numerics rather than integers when a number
>>> simulated exceeds .Machine$integer.max. And it does seem to
>>> make less sense in such cases to return NAs.
>>>
>>>
>>> Alternatively, might it make sense to add another
>>> argument to rpois to give the user the choice? E.g., an
>>> argument "bigOutput" with (I hope) default = "numeric" and
>>> "NA" as a second option. Or NA is the default, so no code
>>> that relied that feature of the current code would be broken
>>> by the change. If someone wanted to use arbitrary precision
>>> arithmetic, they could write their own version of this
>>> function with "arbitraryPrecision" as an optional value for
>>> the "bigOutput" argument.
>>>
>>>
>>> Comments?
>>> Thanks,
>>> Spencer Graves
>>>
>>>
>>>
>>> On 20200119 10:28, Avraham Adler wrote:
>>>> Technically, lambda can always be numeric. It is the
>>>> observations which must be integral.
>>>>
>>>> Would hitting everything larger than maxint or maxlonglong
>>>> with floor or round fundamentally change the distribution?
>>>> Well, yes, but enough that it would matter over process risk?
>>>>
>>>> Avi
>>>>
>>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
>>>> < [hidden email] <mailto: [hidden email]>> wrote:
>>>>
>>>> So imagine rpois is changed, such that the storage mode
>>>> of its return
>>>> value is sometimes integer and sometimes numeric. Then
>>>> imagine the case
>>>> where lambda is itself a realization of a random
>>>> variable. Do we really
>>>> want the storage mode to inherit that randomness?
>>>>
>>>>
>>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>>> > Maybe there should be code for 64 bit R to use long
>>>> long or the like?
>>>> >
>>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>>> > < [hidden email]
>>>> <mailto: [hidden email]>
>>>> <mailto: [hidden email]
>>>> <mailto: [hidden email]>>> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On 20200119 09:34, Benjamin Tyner wrote:
>>>> > >>
>>>> >
>>>> 
>>>> > >> Hello, All:
>>>> > >>
>>>> > >>
>>>> > >> Consider:
>>>> > >>
>>>> > >>
>>>> > >> Browse[2]> set.seed(1)
>>>> > >> Browse[2]> rpois(9, 1e10)
>>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>>> > >>
>>>> > >>
>>>> > >> Should this happen?
>>>> > >>
>>>> > >>
>>>> > >> I think that for, say, lambda>1e6,
>>>> rpois should return
>>>> > rnorm(.,
>>>> > >> lambda, sqrt(lambda)).
>>>> > > But need to implement carefully; rpois should
>>>> always return a
>>>> > > nonnegative integer, whereas rnorm always
>>>> returns numeric...
>>>> > >
>>>> >
>>>> > Thanks for the reply.
>>>> >
>>>> >
>>>> > However, I think it's not acceptable to get
>>>> an NA from a
>>>> > number
>>>> > that cannot be expressed as an integer. Whenever
>>>> a randomly
>>>> > generated
>>>> > number would exceed .Machine$integer.max, the
>>>> choice is between
>>>> > returning NA or a noninteger numeric. Consider:
>>>> >
>>>> >
>>>> > > 2*.Machine$integer.max
>>>> > [1] 4294967294
>>>> > > as.integer(2*.Machine$integer.max)
>>>> > [1] NA
>>>> > Warning message:
>>>> > NAs introduced by coercion to integer range
>>>> >
>>>> >
>>>> > I'd rather have the noninteger numeric.
>>>> >
>>>> >
>>>> > Spencer
>>>> >
>>>> > ______________________________________________
>>>> > [hidden email] <mailto: [hidden email]>
>>>> <mailto: [hidden email]
>>>> <mailto: [hidden email]>> mailing list
>>>> > https://stat.ethz.ch/mailman/listinfo/rdevel>>>> >
>>>> > 
>>>> > Sent from Gmail Mobile
>>>>
>>>> 
>>>> Sent from Gmail Mobile
>>>
>>> 
>>> Sent from Gmail Mobile
>>
>> 
>> Sent from Gmail Mobile
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>

Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 3193353386
Department of Statistics and Fax: 3193353017
Actuarial Science
241 Schaeffer Hall email: [hidden email]
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Thanks to Luke and Avi for their comments. I wrapped "round" around the
call to "rnorm" inside my "rpois.". For "lambda" really big, that
"round" won't do anything. However, it appears to give integers in
floating point representation that are larger than
.Machine$integer.max. That sounds very much like what someone would
want. Spencer
On 20200119 21:00, Tierney, Luke wrote:
> R uses the C 'int' type for its integer data and that is pretty much
> universally 32 bit these days. In fact R wont' compile if it is not.
> That means the range for integer data is the integers in [2^31,
> +2^31).
>
> It would be good to allow for a larger integer range for R integer
> objects, and several of us are thinking about how me might get there.
> But it isn't easy to get right, so it may take some time. I doubt
> anything can happen for R 4.0.0 this year, but 2021 may be possible.
>
> I few notes inline below:
>
> On Sun, 19 Jan 2020, Spencer Graves wrote:
>
>> On my Mac:
>>
>>
>> str(.Machine)
>> ...
>> $ integer.max : int 2147483647
>> $ sizeof.long : int 8
>> $ sizeof.longlong : int 8
>> $ sizeof.longdouble : int 16
>> $ sizeof.pointer : int 8
>>
>>
>> On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
>> the same as on my Mac.
> One of many annoyances of Windows  done for compatibility with
> ancient Window apps.
>
>> Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
>> log2(.Machine$integer.max) = 31. Then 8 bytes is what used to be called
>> double precision (2 words of 4 bytes each)? And $ sizeof.longdouble =
>> 16 = 4 words of 4 bytes each?
> double precision is a floating point concept, not related to integers.
>
> If you want to figure out whether you are running a 32 bit or 64 bit R
> look at sizeof.pointer  4 means 32 bits, 8 64 bits.
>
> Best,
>
> luke
>
>
>>
>> Spencer
>>
>>
>> On 20200119 15:41, Avraham Adler wrote:
>>> Floor (maybe round) of nonnegative numerics, though. Poisson should
>>> never have anything after decimal.
>>>
>>> Still think it’s worth allowing long long for R64 bit, just for purity
>>> sake.
>>>
>>> Avi
>>>
>>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>>> < [hidden email] <mailto: [hidden email]>> wrote:
>>>
>>>
>>>
>>> On 20200119 13:01, Avraham Adler wrote:
>>>> Crazy thought, but being that a sum of Poissons is Poisson in the
>>>> sum, can you break your “big” simulation into the sum of a few
>>>> smaller ones? Or is the order of magnitude difference just too great?
>>>
>>> I don't perceive that as feasible. Once I found what was
>>> generating NAs, it was easy to code a function to return
>>> pseudorandom numbers using the standard normal approximation to
>>> the Poisson for those extreme cases. [For a Poisson with mean =
>>> 1e6, for example, the skewness (third standardized moment) is
>>> 0.001. At least for my purposes, that should be adequate.][1]
>>>
>>>
>>> What are the negative consequences of having rpois return
>>> numerics that are always nonnegative?
>>>
>>>
>>> Spencer
>>>
>>>
>>> [1] In the code I reported before, I just changed the threshold
>>> of 1e6 to 0.5*.Machine$integer.max. On my Mac,
>>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>>> that a Poisson distributed pseudorandom number just under that
>>> would have to be over 23000 standard deviations above the mean to
>>> exceed .Machine$integer.max.
>>>
>>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>>> < [hidden email]
>>>> <mailto: [hidden email]>> wrote:
>>>>
>>>> This issue arose for me in simulations to estimate
>>>> confidence, prediction, and tolerance intervals from glm(.,
>>>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>>>> simulate.bic.glm function I added to the development version
>>>> of Ecfun, available at " https://github.com/sbgraves237/Ecfun"
>>>> < https://github.com/sbgraves237/Ecfun>. This is part of a
>>>> vignette I'm developing, available at
>>>> " https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
>>>> < https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
>>>> This includes a simulated mean of a mixture of Poissons that
>>>> exceeds 2e22. It doesn't seem unreasonable to me to have
>>>> rpois output a numerics rather than integers when a number
>>>> simulated exceeds .Machine$integer.max. And it does seem to
>>>> make less sense in such cases to return NAs.
>>>>
>>>>
>>>> Alternatively, might it make sense to add another
>>>> argument to rpois to give the user the choice? E.g., an
>>>> argument "bigOutput" with (I hope) default = "numeric" and
>>>> "NA" as a second option. Or NA is the default, so no code
>>>> that relied that feature of the current code would be broken
>>>> by the change. If someone wanted to use arbitrary precision
>>>> arithmetic, they could write their own version of this
>>>> function with "arbitraryPrecision" as an optional value for
>>>> the "bigOutput" argument.
>>>>
>>>>
>>>> Comments?
>>>> Thanks,
>>>> Spencer Graves
>>>>
>>>>
>>>>
>>>> On 20200119 10:28, Avraham Adler wrote:
>>>>> Technically, lambda can always be numeric. It is the
>>>>> observations which must be integral.
>>>>>
>>>>> Would hitting everything larger than maxint or maxlonglong
>>>>> with floor or round fundamentally change the distribution?
>>>>> Well, yes, but enough that it would matter over process risk?
>>>>>
>>>>> Avi
>>>>>
>>>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
>>>>> < [hidden email] <mailto: [hidden email]>> wrote:
>>>>>
>>>>> So imagine rpois is changed, such that the storage mode
>>>>> of its return
>>>>> value is sometimes integer and sometimes numeric. Then
>>>>> imagine the case
>>>>> where lambda is itself a realization of a random
>>>>> variable. Do we really
>>>>> want the storage mode to inherit that randomness?
>>>>>
>>>>>
>>>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>>>> > Maybe there should be code for 64 bit R to use long
>>>>> long or the like?
>>>>> >
>>>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>>>> > < [hidden email]
>>>>> <mailto: [hidden email]>
>>>>> <mailto: [hidden email]
>>>>> <mailto: [hidden email]>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On 20200119 09:34, Benjamin Tyner wrote:
>>>>> > >>
>>>>> >
>>>>> 
>>>>> > >> Hello, All:
>>>>> > >>
>>>>> > >>
>>>>> > >> Consider:
>>>>> > >>
>>>>> > >>
>>>>> > >> Browse[2]> set.seed(1)
>>>>> > >> Browse[2]> rpois(9, 1e10)
>>>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>>>> > >>
>>>>> > >>
>>>>> > >> Should this happen?
>>>>> > >>
>>>>> > >>
>>>>> > >> I think that for, say, lambda>1e6,
>>>>> rpois should return
>>>>> > rnorm(.,
>>>>> > >> lambda, sqrt(lambda)).
>>>>> > > But need to implement carefully; rpois should
>>>>> always return a
>>>>> > > nonnegative integer, whereas rnorm always
>>>>> returns numeric...
>>>>> > >
>>>>> >
>>>>> > Thanks for the reply.
>>>>> >
>>>>> >
>>>>> > However, I think it's not acceptable to get
>>>>> an NA from a
>>>>> > number
>>>>> > that cannot be expressed as an integer. Whenever
>>>>> a randomly
>>>>> > generated
>>>>> > number would exceed .Machine$integer.max, the
>>>>> choice is between
>>>>> > returning NA or a noninteger numeric. Consider:
>>>>> >
>>>>> >
>>>>> > > 2*.Machine$integer.max
>>>>> > [1] 4294967294
>>>>> > > as.integer(2*.Machine$integer.max)
>>>>> > [1] NA
>>>>> > Warning message:
>>>>> > NAs introduced by coercion to integer range
>>>>> >
>>>>> >
>>>>> > I'd rather have the noninteger numeric.
>>>>> >
>>>>> >
>>>>> > Spencer
>>>>> >
>>>>> > ______________________________________________
>>>>> > [hidden email] <mailto: [hidden email]>
>>>>> <mailto: [hidden email]
>>>>> <mailto: [hidden email]>> mailing list
>>>>> > https://stat.ethz.ch/mailman/listinfo/rdevel>>>>> >
>>>>> > 
>>>>> > Sent from Gmail Mobile
>>>>>
>>>>> 
>>>>> Sent from Gmail Mobile
>>>> 
>>>> Sent from Gmail Mobile
>>> 
>>> Sent from Gmail Mobile
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rdevel>>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


>>>>> Spencer Graves
>>>>> on Sun, 19 Jan 2020 21:35:04 0600 writes:
> Thanks to Luke and Avi for their comments. I wrapped "round" around the
> call to "rnorm" inside my "rpois.". For "lambda" really big, that
> "round" won't do anything. However, it appears to give integers in
> floating point representation that are larger than
> .Machine$integer.max. That sounds very much like what someone would
> want. Spencer
Coming late here  after enjoying a proper weekend ;) 
I have been agreeing (with Spencer, IIUC) on this for a long
time (~ 3 yrs, or more?), namely that I've come to see it as a
"design bug" that rpois() {and similar} must return return typeof() "integer".
More strongly, I'm actually pretty convinced they should return
(integervalued) double instead of NA_integer_ and for that
reason should always return double:
Even if we have (hopefully) a native 64bit integer in R,
2^64 is still teeny tiny compared .Machine$double.max
(and then maybe we'd have .Machine$longdouble.max which would
be considerably larger than double.max unless on Windows, where
the wise men at Microsoft decided to keep their workload simple
by defining "long double := double"  as 'long double'
unfortunately is not well defined by C standards)
Martin
> On 20200119 21:00, Tierney, Luke wrote:
>> R uses the C 'int' type for its integer data and that is pretty much
>> universally 32 bit these days. In fact R wont' compile if it is not.
>> That means the range for integer data is the integers in [2^31,
>> +2^31).
>>
>> It would be good to allow for a larger integer range for R integer
>> objects, and several of us are thinking about how me might get there.
>> But it isn't easy to get right, so it may take some time. I doubt
>> anything can happen for R 4.0.0 this year, but 2021 may be possible.
>>
>> I few notes inline below:
>>
>> On Sun, 19 Jan 2020, Spencer Graves wrote:
>>
>>> On my Mac:
>>>
>>>
>>> str(.Machine)
>>> ...
>>> $ integer.max : int 2147483647
>>> $ sizeof.long : int 8
>>> $ sizeof.longlong : int 8
>>> $ sizeof.longdouble : int 16
>>> $ sizeof.pointer : int 8
>>>
>>>
>>> On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
>>> the same as on my Mac.
>> One of many annoyances of Windows  done for compatibility with
>> ancient Window apps.
>>
>>> Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
>>> log2(.Machine$integer.max) = 31. Then 8 bytes is what used to be called
>>> double precision (2 words of 4 bytes each)? And $ sizeof.longdouble =
>>> 16 = 4 words of 4 bytes each?
>> double precision is a floating point concept, not related to integers.
>>
>> If you want to figure out whether you are running a 32 bit or 64 bit R
>> look at sizeof.pointer  4 means 32 bits, 8 64 bits.
>>
>> Best,
>>
>> luke
>>
>>
>>>
>>> Spencer
>>>
>>>
>>> On 20200119 15:41, Avraham Adler wrote:
>>>> Floor (maybe round) of nonnegative numerics, though. Poisson should
>>>> never have anything after decimal.
>>>>
>>>> Still think it’s worth allowing long long for R64 bit, just for purity
>>>> sake.
>>>>
>>>> Avi
>>>>
>>>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>>>> < [hidden email] <mailto: [hidden email]>> wrote:
>>>>
>>>>
>>>>
>>>> On 20200119 13:01, Avraham Adler wrote:
>>>>> Crazy thought, but being that a sum of Poissons is Poisson in the
>>>>> sum, can you break your “big” simulation into the sum of a few
>>>>> smaller ones? Or is the order of magnitude difference just too great?
>>>>
>>>> I don't perceive that as feasible. Once I found what was
>>>> generating NAs, it was easy to code a function to return
>>>> pseudorandom numbers using the standard normal approximation to
>>>> the Poisson for those extreme cases. [For a Poisson with mean =
>>>> 1e6, for example, the skewness (third standardized moment) is
>>>> 0.001. At least for my purposes, that should be adequate.][1]
>>>>
>>>>
>>>> What are the negative consequences of having rpois return
>>>> numerics that are always nonnegative?
>>>>
>>>>
>>>> Spencer
>>>>
>>>>
>>>> [1] In the code I reported before, I just changed the threshold
>>>> of 1e6 to 0.5*.Machine$integer.max. On my Mac,
>>>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>>>> that a Poisson distributed pseudorandom number just under that
>>>> would have to be over 23000 standard deviations above the mean to
>>>> exceed .Machine$integer.max.
>>>>
>>>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>>>> < [hidden email]
>>>>> <mailto: [hidden email]>> wrote:
>>>>>
>>>>> This issue arose for me in simulations to estimate
>>>>> confidence, prediction, and tolerance intervals from glm(.,
>>>>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>>>>> simulate.bic.glm function I added to the development version
>>>>> of Ecfun, available at " https://github.com/sbgraves237/Ecfun"
>>>>> < https://github.com/sbgraves237/Ecfun>. This is part of a
>>>>> vignette I'm developing, available at
>>>>> " https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
>>>>> < https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
>>>>> This includes a simulated mean of a mixture of Poissons that
>>>>> exceeds 2e22. It doesn't seem unreasonable to me to have
>>>>> rpois output a numerics rather than integers when a number
>>>>> simulated exceeds .Machine$integer.max. And it does seem to
>>>>> make less sense in such cases to return NAs.
>>>>>
>>>>>
>>>>> Alternatively, might it make sense to add another
>>>>> argument to rpois to give the user the choice? E.g., an
>>>>> argument "bigOutput" with (I hope) default = "numeric" and
>>>>> "NA" as a second option. Or NA is the default, so no code
>>>>> that relied that feature of the current code would be broken
>>>>> by the change. If someone wanted to use arbitrary precision
>>>>> arithmetic, they could write their own version of this
>>>>> function with "arbitraryPrecision" as an optional value for
>>>>> the "bigOutput" argument.
>>>>>
>>>>>
>>>>> Comments?
>>>>> Thanks,
>>>>> Spencer Graves
>>>>>
>>>>>
>>>>>
>>>>> On 20200119 10:28, Avraham Adler wrote:
>>>>> Technically, lambda can always be numeric. It is the
>>>>> observations which must be integral.
>>>>>>
>>>>> Would hitting everything larger than maxint or maxlonglong
>>>>> with floor or round fundamentally change the distribution?
>>>>> Well, yes, but enough that it would matter over process risk?
>>>>>>
>>>>> Avi
>>>>>>
>>>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
>>>>> < [hidden email] <mailto: [hidden email]>> wrote:
>>>>>>
>>>>> So imagine rpois is changed, such that the storage mode
>>>>> of its return
>>>>> value is sometimes integer and sometimes numeric. Then
>>>>> imagine the case
>>>>> where lambda is itself a realization of a random
>>>>> variable. Do we really
>>>>> want the storage mode to inherit that randomness?
>>>>>>
>>>>>>
>>>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>>>> > Maybe there should be code for 64 bit R to use long
>>>>> long or the like?
>>>>> >
>>>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>>>> > < [hidden email]
>>>>> <mailto: [hidden email]>
>>>>> <mailto: [hidden email]
>>>>> <mailto: [hidden email]>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On 20200119 09:34, Benjamin Tyner wrote:
>>>>> > >>
>>>>> >
>>>>> 
>>>>> > >> Hello, All:
>>>>> > >>
>>>>> > >>
>>>>> > >> Consider:
>>>>> > >>
>>>>> > >>
>>>>> > >> Browse[2]> set.seed(1)
>>>>> > >> Browse[2]> rpois(9, 1e10)
>>>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>>>> > >>
>>>>> > >>
>>>>> > >> Should this happen?
>>>>> > >>
>>>>> > >>
>>>>> > >> I think that for, say, lambda>1e6,
>>>>> rpois should return
>>>>> > rnorm(.,
>>>>> > >> lambda, sqrt(lambda)).
>>>>> > > But need to implement carefully; rpois should
>>>>> always return a
>>>>> > > nonnegative integer, whereas rnorm always
>>>>> returns numeric...
>>>>> > >
>>>>> >
>>>>> > Thanks for the reply.
>>>>> >
>>>>> >
>>>>> > However, I think it's not acceptable to get
>>>>> an NA from a
>>>>> > number
>>>>> > that cannot be expressed as an integer. Whenever
>>>>> a randomly
>>>>> > generated
>>>>> > number would exceed .Machine$integer.max, the
>>>>> choice is between
>>>>> > returning NA or a noninteger numeric. Consider:
>>>>> >
>>>>> >
>>>>> > > 2*.Machine$integer.max
>>>>> > [1] 4294967294
>>>>> > > as.integer(2*.Machine$integer.max)
>>>>> > [1] NA
>>>>> > Warning message:
>>>>> > NAs introduced by coercion to integer range
>>>>> >
>>>>> >
>>>>> > I'd rather have the noninteger numeric.
>>>>> >
>>>>> >
>>>>> > Spencer
>>>>> >
>>>>> > ______________________________________________
>>>>> > [hidden email] <mailto: [hidden email]>
>>>>> <mailto: [hidden email]
>>>>> <mailto: [hidden email]>> mailing list
>>>>> > https://stat.ethz.ch/mailman/listinfo/rdevel>>>>> >
>>>>> > 
>>>>> > Sent from Gmail Mobile
>>>>>>
>>>>> 
>>>>> Sent from Gmail Mobile
>>>>> 
>>>>> Sent from Gmail Mobile
>>>> 
>>>> Sent from Gmail Mobile
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/rdevel >>>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


On 1/20/20 4:26 AM, Martin Maechler wrote:
> Coming late here  after enjoying a proper weekend ;) 
> I have been agreeing (with Spencer, IIUC) on this for a long
> time (~ 3 yrs, or more?), namely that I've come to see it as a
> "design bug" that rpois() {and similar} must return return typeof() "integer".
>
> More strongly, I'm actually pretty convinced they should return
> (integervalued) double instead of NA_integer_ and for that
> reason should always return double:
> Even if we have (hopefully) a native 64bit integer in R,
> 2^64 is still teeny tiny compared .Machine$double.max
>
> (and then maybe we'd have .Machine$longdouble.max which would
> be considerably larger than double.max unless on Windows, where
> the wise men at Microsoft decided to keep their workload simple
> by defining "long double := double"  as 'long double'
> unfortunately is not well defined by C standards)
>
> Martin
>
Martin if you are in favor, then certainly no objection from me! ;)
So now what about other discrete distributions e.g. could a similar
enhancement apply here?
> rgeom(10L, 1e10)
[1] NA 1503061294 NA NA 1122447583 NA
[7] NA NA NA NA
Warning message:
In rgeom(10L, 1e10) : NAs produced
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


>>>>> Benjamin Tyner
>>>>> on Mon, 20 Jan 2020 08:10:49 0500 writes:
> On 1/20/20 4:26 AM, Martin Maechler wrote:
>> Coming late here  after enjoying a proper weekend ;) 
>> I have been agreeing (with Spencer, IIUC) on this for a long
>> time (~ 3 yrs, or more?), namely that I've come to see it as a
>> "design bug" that rpois() {and similar} must return return typeof() "integer".
>>
>> More strongly, I'm actually pretty convinced they should return
>> (integervalued) double instead of NA_integer_ and for that
>> reason should always return double:
>> Even if we have (hopefully) a native 64bit integer in R,
>> 2^64 is still teeny tiny compared .Machine$double.max
>>
>> (and then maybe we'd have .Machine$longdouble.max which would
>> be considerably larger than double.max unless on Windows, where
>> the wise men at Microsoft decided to keep their workload simple
>> by defining "long double := double"  as 'long double'
>> unfortunately is not well defined by C standards)
>>
>> Martin
>>
> Martin if you are in favor, then certainly no objection from me! ;)
> So now what about other discrete distributions e.g. could a similar
> enhancement apply here?
>> rgeom(10L, 1e10)
> [1] NA 1503061294 NA NA 1122447583 NA
> [7] NA NA NA NA
> Warning message:
> In rgeom(10L, 1e10) : NAs produced
yes, of course there are several such distributions.
It's really something that should be discussed (possibly not
here, .. but then I've started it here ...).
The NEWS for R 3.0.0 contain (in NEW FEATURES) :
* Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
rsignrank() and rwilcox() now return integer (not double)
vectors. This halves the storage requirements for large
simulations.
and what I've been suggesting is to revert this change
(svn rev r602256) which was purposefully and diligently done by
a fellow R core member, so indeed must be debatable.
Martin
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Ugh, sounds like competing priorities.
* maintain type consistency
* minimize storage (= current version, since 3.0.0)
* maximize utility for large lambda (= proposed change)
* keep user interface, and code, simple (e.g., it would be easy enough
to add a switch that provided user control of int vs double return value)
* backward compatibility
On 20200120 12:33 p.m., Martin Maechler wrote:
>>>>>> Benjamin Tyner
>>>>>> on Mon, 20 Jan 2020 08:10:49 0500 writes:
>
> > On 1/20/20 4:26 AM, Martin Maechler wrote:
> >> Coming late here  after enjoying a proper weekend ;) 
> >> I have been agreeing (with Spencer, IIUC) on this for a long
> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
> >> "design bug" that rpois() {and similar} must return return typeof() "integer".
> >>
> >> More strongly, I'm actually pretty convinced they should return
> >> (integervalued) double instead of NA_integer_ and for that
> >> reason should always return double:
> >> Even if we have (hopefully) a native 64bit integer in R,
> >> 2^64 is still teeny tiny compared .Machine$double.max
> >>
> >> (and then maybe we'd have .Machine$longdouble.max which would
> >> be considerably larger than double.max unless on Windows, where
> >> the wise men at Microsoft decided to keep their workload simple
> >> by defining "long double := double"  as 'long double'
> >> unfortunately is not well defined by C standards)
> >>
> >> Martin
> >>
> > Martin if you are in favor, then certainly no objection from me! ;)
>
> > So now what about other discrete distributions e.g. could a similar
> > enhancement apply here?
>
>
> >> rgeom(10L, 1e10)
> > [1] NA 1503061294 NA NA 1122447583 NA
> > [7] NA NA NA NA
> > Warning message:
> > In rgeom(10L, 1e10) : NAs produced
>
> yes, of course there are several such distributions.
>
> It's really something that should be discussed (possibly not
> here, .. but then I've started it here ...).
>
> The NEWS for R 3.0.0 contain (in NEW FEATURES) :
>
> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
> rsignrank() and rwilcox() now return integer (not double)
> vectors. This halves the storage requirements for large
> simulations.
>
> and what I've been suggesting is to revert this change
> (svn rev r602256) which was purposefully and diligently done by
> a fellow R core member, so indeed must be debatable.
>
> Martin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


>>>>> Ben Bolker
>>>>> on Mon, 20 Jan 2020 12:54:52 0500 writes:
> Ugh, sounds like competing priorities.
indeed.
> * maintain type consistency
> * minimize storage (= current version, since 3.0.0)
> * maximize utility for large lambda (= proposed change)
> * keep user interface, and code, simple (e.g., it would be easy enough
> to add a switch that provided user control of int vs double return value)
> * backward compatibility
Last night, it came to my mind that we should do what we have
been doing in quite a few places in R, the last couple of years:
Return integer when possible, and switch to return double when
integers don't fit.
We've been doing so even for 1:N (well, now with additional ALTREP wrapper),
seq(), and even the fundamental length() function.
So I sat down and implemented it .. and it seemed to work
perfectly: Returning the same random numbers as now, but
switching to use double (instead of returning NAs) when the
values are too large.
I'll probably commit that to Rdevel quite soonish.
Martin
> On 20200120 12:33 p.m., Martin Maechler wrote:
>>>>>>> Benjamin Tyner
>>>>>>> on Mon, 20 Jan 2020 08:10:49 0500 writes:
>>
>> > On 1/20/20 4:26 AM, Martin Maechler wrote:
>> >> Coming late here  after enjoying a proper weekend ;) 
>> >> I have been agreeing (with Spencer, IIUC) on this for a long
>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
>> >> "design bug" that rpois() {and similar} must return return typeof() "integer".
>> >>
>> >> More strongly, I'm actually pretty convinced they should return
>> >> (integervalued) double instead of NA_integer_ and for that
>> >> reason should always return double:
>> >> Even if we have (hopefully) a native 64bit integer in R,
>> >> 2^64 is still teeny tiny compared .Machine$double.max
>> >>
>> >> (and then maybe we'd have .Machine$longdouble.max which would
>> >> be considerably larger than double.max unless on Windows, where
>> >> the wise men at Microsoft decided to keep their workload simple
>> >> by defining "long double := double"  as 'long double'
>> >> unfortunately is not well defined by C standards)
>> >>
>> >> Martin
>> >>
>> > Martin if you are in favor, then certainly no objection from me! ;)
>>
>> > So now what about other discrete distributions e.g. could a similar
>> > enhancement apply here?
>>
>>
>> >> rgeom(10L, 1e10)
>> > [1] NA 1503061294 NA NA 1122447583 NA
>> > [7] NA NA NA NA
>> > Warning message:
>> > In rgeom(10L, 1e10) : NAs produced
>>
>> yes, of course there are several such distributions.
>>
>> It's really something that should be discussed (possibly not
>> here, .. but then I've started it here ...).
>>
>> The NEWS for R 3.0.0 contain (in NEW FEATURES) :
>>
>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>> rsignrank() and rwilcox() now return integer (not double)
>> vectors. This halves the storage requirements for large
>> simulations.
>>
>> and what I've been suggesting is to revert this change
>> (svn rev r602256) which was purposefully and diligently done by
>> a fellow R core member, so indeed must be debatable.
>>
>> Martin
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rdevel >>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


>>>>> Martin Maechler
>>>>> on Tue, 21 Jan 2020 09:25:19 +0100 writes:
>>>>> Ben Bolker
>>>>> on Mon, 20 Jan 2020 12:54:52 0500 writes:
>> Ugh, sounds like competing priorities.
> indeed.
>> * maintain type consistency
>> * minimize storage (= current version, since 3.0.0)
>> * maximize utility for large lambda (= proposed change)
>> * keep user interface, and code, simple (e.g., it would be easy enough
>> to add a switch that provided user control of int vs double return value)
>> * backward compatibility
> Last night, it came to my mind that we should do what we have
> been doing in quite a few places in R, the last couple of years:
> Return integer when possible, and switch to return double when
> integers don't fit.
> We've been doing so even for 1:N (well, now with additional ALTREP wrapper),
> seq(), and even the fundamental length() function.
> So I sat down and implemented it .. and it seemed to work
> perfectly: Returning the same random numbers as now, but
> switching to use double (instead of returning NAs) when the
> values are too large.
> I'll probably commit that to Rdevel quite soonish.
> Martin
Committed in svn rev 77690; this is really very advantageous, as
in some cases / applications or even just limit cases, you'd
easily get into overflow sitations.
The new R 4.0.0 behavior is IMO "the best of" being memory
efficient (integer storage) in most cases (back compatible to R 3.x.x) and
returning desired random numbers in large cases (compatible to R <= 2.x.x).
Martin
>> On 20200120 12:33 p.m., Martin Maechler wrote:
>>>>>>>> Benjamin Tyner
>>>>>>>> on Mon, 20 Jan 2020 08:10:49 0500 writes:
>>>
>>> > On 1/20/20 4:26 AM, Martin Maechler wrote:
>>> >> Coming late here  after enjoying a proper weekend ;) 
>>> >> I have been agreeing (with Spencer, IIUC) on this for a long
>>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
>>> >> "design bug" that rpois() {and similar} must return return typeof() "integer".
>>> >>
>>> >> More strongly, I'm actually pretty convinced they should return
>>> >> (integervalued) double instead of NA_integer_ and for that
>>> >> reason should always return double:
>>> >> Even if we have (hopefully) a native 64bit integer in R,
>>> >> 2^64 is still teeny tiny compared .Machine$double.max
>>> >>
>>> >> (and then maybe we'd have .Machine$longdouble.max which would
>>> >> be considerably larger than double.max unless on Windows, where
>>> >> the wise men at Microsoft decided to keep their workload simple
>>> >> by defining "long double := double"  as 'long double'
>>> >> unfortunately is not well defined by C standards)
>>> >>
>>> >> Martin
>>> >>
>>> > Martin if you are in favor, then certainly no objection from me! ;)
>>>
>>> > So now what about other discrete distributions e.g. could a similar
>>> > enhancement apply here?
>>>
>>>
>>> >> rgeom(10L, 1e10)
>>> > [1] NA 1503061294 NA NA 1122447583 NA
>>> > [7] NA NA NA NA
>>> > Warning message:
>>> > In rgeom(10L, 1e10) : NAs produced
>>>
>>> yes, of course there are several such distributions.
>>>
>>> It's really something that should be discussed (possibly not
>>> here, .. but then I've started it here ...).
>>>
>>> The NEWS for R 3.0.0 contain (in NEW FEATURES) :
>>>
>>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>>> rsignrank() and rwilcox() now return integer (not double)
>>> vectors. This halves the storage requirements for large
>>> simulations.
>>>
>>> and what I've been suggesting is to revert this change
>>> (svn rev r602256) which was purposefully and diligently done by
>>> a fellow R core member, so indeed must be debatable.
>>>
>>> Martin
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/rdevel >>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rdevel > ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


>>>>> Martin Maechler
>>>>> on Tue, 21 Jan 2020 09:25:19 +0100 writes:
>>>>> Ben Bolker
>>>>> on Mon, 20 Jan 2020 12:54:52 0500 writes:
>> Ugh, sounds like competing priorities.
> indeed.
>> * maintain type consistency
>> * minimize storage (= current version, since 3.0.0)
>> * maximize utility for large lambda (= proposed change)
>> * keep user interface, and code, simple (e.g., it would be easy enough
>> to add a switch that provided user control of int vs double return value)
>> * backward compatibility
> Last night, it came to my mind that we should do what we have
> been doing in quite a few places in R, the last couple of years:
> Return integer when possible, and switch to return double when
> integers don't fit.
> We've been doing so even for 1:N (well, now with additional ALTREP wrapper),
> seq(), and even the fundamental length() function.
> So I sat down and implemented it .. and it seemed to work
> perfectly: Returning the same random numbers as now, but
> switching to use double (instead of returning NAs) when the
> values are too large.
> I'll probably commit that to Rdevel quite soonish.
> Martin
Committed in svn rev 77690; this is really very advantageous, as
in some cases / applications or even just limit cases, you'd
easily get into overflow sitations.
The new R 4.0.0 behavior is IMO "the best of" being memory
efficient (integer storage) in most cases (back compatible to R 3.x.x) and
returning desired random numbers in large cases (compatible to R <= 2.x.x).
Martin
>> On 20200120 12:33 p.m., Martin Maechler wrote:
>>>>>>>> Benjamin Tyner
>>>>>>>> on Mon, 20 Jan 2020 08:10:49 0500 writes:
>>>
>>> > On 1/20/20 4:26 AM, Martin Maechler wrote:
>>> >> Coming late here  after enjoying a proper weekend ;) 
>>> >> I have been agreeing (with Spencer, IIUC) on this for a long
>>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a
>>> >> "design bug" that rpois() {and similar} must return return typeof() "integer".
>>> >>
>>> >> More strongly, I'm actually pretty convinced they should return
>>> >> (integervalued) double instead of NA_integer_ and for that
>>> >> reason should always return double:
>>> >> Even if we have (hopefully) a native 64bit integer in R,
>>> >> 2^64 is still teeny tiny compared .Machine$double.max
>>> >>
>>> >> (and then maybe we'd have .Machine$longdouble.max which would
>>> >> be considerably larger than double.max unless on Windows, where
>>> >> the wise men at Microsoft decided to keep their workload simple
>>> >> by defining "long double := double"  as 'long double'
>>> >> unfortunately is not well defined by C standards)
>>> >>
>>> >> Martin
>>> >>
>>> > Martin if you are in favor, then certainly no objection from me! ;)
>>>
>>> > So now what about other discrete distributions e.g. could a similar
>>> > enhancement apply here?
>>>
>>>
>>> >> rgeom(10L, 1e10)
>>> > [1] NA 1503061294 NA NA 1122447583 NA
>>> > [7] NA NA NA NA
>>> > Warning message:
>>> > In rgeom(10L, 1e10) : NAs produced
>>>
>>> yes, of course there are several such distributions.
>>>
>>> It's really something that should be discussed (possibly not
>>> here, .. but then I've started it here ...).
>>>
>>> The NEWS for R 3.0.0 contain (in NEW FEATURES) :
>>>
>>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(),
>>> rsignrank() and rwilcox() now return integer (not double)
>>> vectors. This halves the storage requirements for large
>>> simulations.
>>>
>>> and what I've been suggesting is to revert this change
>>> (svn rev r602256) which was purposefully and diligently done by
>>> a fellow R core member, so indeed must be debatable.
>>>
>>> Martin
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/rdevel >>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rdevel > ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel

12
