NAs produced by integer overflow, but only some time ...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

NAs produced by integer overflow, but only some time ...

Stefan Th. Gries
I have problem with integer overflow that I cannot understand.

I have a character vector curr.lemmas with the following properties:

length(curr.lemmas) # 61224
length(unique(curr.lemmas)) # 2652

That vector is the input to the following function:

yules.k1 <- function(input) {
   m1 <- length(input); temp <- table(table(input))
   m2 <- sum("*"(temp, as.numeric(names(temp))^2))
   return(10000*(m2-m1) / (m1*m1))
}

When I run this, I get the following output:

[1] NA
Warning message:
In m1 * m1 : NAs produced by integer overflow

But when I change the function to this one by just replacing m1*m1 by m1^2 ...

yules.k2 <- function(input) {
   m1 <- length(input); temp <- table(table(input))
   m2 <- sum("*"(temp, as.numeric(names(temp))^2))
   return(10000*(m2-m1) / (m1^2))
}

yules.k2(curr.lemmas) # -> 157.261

I am using RStudio 1.1.447 and here's my sessionInfo
######################
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.3

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
               LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] compiler_3.4.4  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
htmltools_0.3.6 tools_3.4.4     yaml_2.1.19     Rcpp_0.12.16
stringi_1.2.2
[10] rmarkdown_1.9   knitr_1.20      stringr_1.3.0   digest_0.6.15
evaluate_0.10.1
######################

What is even more puzzling is that one time I ran R in the console of
Geany and this happened:

> m1
[1] 61224
> 61224*61224
[1] 3748378176
> 61224^2
[1] 3748378176
> m1*m1
[1] NA
Warning message:
In m1 * m1 : NAs produced by integer overflow
> m1^2
[1] 3748378176

That is, the multiplication worked with the numbers but not the
numeric vectors; the above is literally copied from the console. Why
is that happening?

Any help would be much appreciated!
STG
--
Stefan Th. Gries
----------------------------------
Univ. of California, Santa Barbara
http://tinyurl.com/stgries

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NAs produced by integer overflow, but only some time ...

Jeff Newmiller
a) Numeric values may be either integers (signed 32 bit) or double precision (53 bit mantissa).

b) Double precision constants are numeric with no decoration (e.g. 61224). Integer constants have an L (e.g. 61224L).

c) 61224*61224 > 2^31-1 so that answer cannot fit into an integer.

d) Exponentiation is a floating point operation so the result of 61224L^2L is a floating point answer that CAN fit into the 53bit mantissa of a double precision value, so no overflow occurs.

e) Defining a function like yules.k1 and never showing how you called it does not constitute a reproducible example. To avoid such gaffes you can use the reprex package to confirm that the errors shown in your question are in fact reproducible.

f) On this mailing list, the fact that you are using RStudio is at best irrelevant, and at worst off-topic. If you don't see problems running your reproducible example from R in the terminal then the question probably belongs in the RStudio support forum. This is another reason to use the reprex package to check your reproducibility (this works even if you invoke it from RStudio).

g) Calling table on the result of table must be one of the more bizarre calculation sequences I have ever seen in R. I hope you are getting the answers you are expecting when you do use double precision numeric values. Also, using the prefix form of multiplication is unnecessarily obscure, and your use of the return function at the end of your function is redundant.

On May 8, 2018 7:54:26 PM PDT, "Stefan Th. Gries" <[hidden email]> wrote:

>I have problem with integer overflow that I cannot understand.
>
>I have a character vector curr.lemmas with the following properties:
>
>length(curr.lemmas) # 61224
>length(unique(curr.lemmas)) # 2652
>
>That vector is the input to the following function:
>
>yules.k1 <- function(input) {
>   m1 <- length(input); temp <- table(table(input))
>   m2 <- sum("*"(temp, as.numeric(names(temp))^2))
>   return(10000*(m2-m1) / (m1*m1))
>}
>
>When I run this, I get the following output:
>
>[1] NA
>Warning message:
>In m1 * m1 : NAs produced by integer overflow
>
>But when I change the function to this one by just replacing m1*m1 by
>m1^2 ...
>
>yules.k2 <- function(input) {
>   m1 <- length(input); temp <- table(table(input))
>   m2 <- sum("*"(temp, as.numeric(names(temp))^2))
>   return(10000*(m2-m1) / (m1^2))
>}
>
>yules.k2(curr.lemmas) # -> 157.261
>
>I am using RStudio 1.1.447 and here's my sessionInfo
>######################
>R version 3.4.4 (2018-03-15)
>Platform: x86_64-pc-linux-gnu (64-bit)
>Running under: Linux Mint 18.3
>
>Matrix products: default
>BLAS: /usr/lib/openblas-base/libblas.so.3
>LAPACK: /usr/lib/libopenblasp-r0.2.18.so
>
>locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>LC_MONETARY=en_US.UTF-8
> [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
>               LC_ADDRESS=C               LC_TELEPHONE=C
>[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>attached base packages:
>[1] stats     graphics  grDevices utils     datasets  methods   base
>
>loaded via a namespace (and not attached):
> [1] compiler_3.4.4  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
>htmltools_0.3.6 tools_3.4.4     yaml_2.1.19     Rcpp_0.12.16
>stringi_1.2.2
>[10] rmarkdown_1.9   knitr_1.20      stringr_1.3.0   digest_0.6.15
>evaluate_0.10.1
>######################
>
>What is even more puzzling is that one time I ran R in the console of
>Geany and this happened:
>
>> m1
>[1] 61224
>> 61224*61224
>[1] 3748378176
>> 61224^2
>[1] 3748378176
>> m1*m1
>[1] NA
>Warning message:
>In m1 * m1 : NAs produced by integer overflow
>> m1^2
>[1] 3748378176
>
>That is, the multiplication worked with the numbers but not the
>numeric vectors; the above is literally copied from the console. Why
>is that happening?
>
>Any help would be much appreciated!
>STG
>--
>Stefan Th. Gries
>----------------------------------
>Univ. of California, Santa Barbara
>http://tinyurl.com/stgries
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NAs produced by integer overflow, but only some time ...

Stefan Th. Gries
Before responding to Jeff's posting, let me reiterate my question: Why
does a function using m1*m1 produce an integer overflow, but m1^2 does
not?

As for Jeff's 'response':

> a) Numeric values may be either integers (signed 32 bit) or double precision (53 bit mantissa).
> b) Double precision constants are numeric with no decoration (e.g. 61224). Integer constants have an L (e.g. 61224L).
> c) 61224*61224 > 2^31-1 so that answer cannot fit into an integer.
> d) Exponentiation is a floating point operation so the result of 61224L^2L is a floating point answer that CAN fit into the 53bit mantissa of a double precision value, so no overflow occurs.
Yes, that's all great and I knew that from
<https://stackoverflow.com/questions/8804779/what-is-integer-overflow-in-r-and-how-can-it-happen>.

> e) Defining a function like yules.k1 and never showing how you called it does not constitute a reproducible example. To avoid such gaffes you can use the reprex package to confirm that the errors shown in your question are in fact reproducible.
Responding to a post and never seeing that the provided code does
actually show how I call the function does not constitute a useful
answer. To avoid such gaffes you can use your reading skills to
confirm that the perceived lack of a function call is in fact such a
lack. In addition, typing m1 <- 61224 makes the multiplication example
that I shows in the bottom part of the posting reproducible ...

> f) On this mailing list, the fact that you are using RStudio is at best irrelevant, and at worst off-topic. If you don't see problems running your reproducible example from R in the terminal then the question probably belongs in the RStudio support forum. This is another reason to use the reprex package to check your reproducibility (this works even if you invoke it from RStudio).
I did provide the information for the sake of comprehensiveness and I
did mention that the problem also showed up in the console; the whole
second part of the post was on that.

> g) Calling table on the result of table must be one of the more bizarre calculation sequences I have ever seen in R. I hope you are getting the answers you are expecting when you do use double precision numeric values. Also, using the prefix form of multiplication is unnecessarily obscure, and your use of the return function at the end of your function is redundant.
On this mailing list, your assessment of calculation sequences and
their comparison to others you have seen is at best irrelevant and at
worst off-topic since it doesn't answer the question. I didn't ask
(you or anyone) to grade my code and there are reasons why "*" and
return where used there as they are) but to answer the question why
m1*m1 returned an error and m1^2 does not.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NAs produced by integer overflow, but only some time ...

R help mailing list-2
In reply to this post by Jeff Newmiller
Printing a number does not show whether it is stored
as a 32-bit integer or as a 64-bit floating point value.
Use. e.g.,  str() or class() to see.
  > str(length(runif(3)))
   int 3
  > str(length(runif(3)) + 1)
   num 4
  > str(length(runif(3)) + 1L)
   int 4
  > str( 3L * 3L )
   int 9
  > str( 3L ^ 2L )
   num 9
You are right that various arithmetic operators map a pair
of integer arguments to various type: the power and division
operators map them to double precision while the the addition,
multiplication, and subtraction operators map them to integer
results (giving NA's if the result cannot fit into 32 bits).
Perhaps it was a mistake to include the integer type, but
at the time S was developed it made sense.

As for table(table(x)) being an unnatural construct, I use it
all the time instead of anyDuplicated to see the pattern of
duplications.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, May 9, 2018 at 12:04 AM, Jeff Newmiller <[hidden email]>
wrote:

> a) Numeric values may be either integers (signed 32 bit) or double
> precision (53 bit mantissa).
>
> b) Double precision constants are numeric with no decoration (e.g. 61224).
> Integer constants have an L (e.g. 61224L).
>
> c) 61224*61224 > 2^31-1 so that answer cannot fit into an integer.
>
> d) Exponentiation is a floating point operation so the result of 61224L^2L
> is a floating point answer that CAN fit into the 53bit mantissa of a double
> precision value, so no overflow occurs.
>
> e) Defining a function like yules.k1 and never showing how you called it
> does not constitute a reproducible example. To avoid such gaffes you can
> use the reprex package to confirm that the errors shown in your question
> are in fact reproducible.
>
> f) On this mailing list, the fact that you are using RStudio is at best
> irrelevant, and at worst off-topic. If you don't see problems running your
> reproducible example from R in the terminal then the question probably
> belongs in the RStudio support forum. This is another reason to use the
> reprex package to check your reproducibility (this works even if you invoke
> it from RStudio).
>
> g) Calling table on the result of table must be one of the more bizarre
> calculation sequences I have ever seen in R. I hope you are getting the
> answers you are expecting when you do use double precision numeric values.
> Also, using the prefix form of multiplication is unnecessarily obscure, and
> your use of the return function at the end of your function is redundant.
>
> On May 8, 2018 7:54:26 PM PDT, "Stefan Th. Gries" <[hidden email]>
> wrote:
> >I have problem with integer overflow that I cannot understand.
> >
> >I have a character vector curr.lemmas with the following properties:
> >
> >length(curr.lemmas) # 61224
> >length(unique(curr.lemmas)) # 2652
> >
> >That vector is the input to the following function:
> >
> >yules.k1 <- function(input) {
> >   m1 <- length(input); temp <- table(table(input))
> >   m2 <- sum("*"(temp, as.numeric(names(temp))^2))
> >   return(10000*(m2-m1) / (m1*m1))
> >}
> >
> >When I run this, I get the following output:
> >
> >[1] NA
> >Warning message:
> >In m1 * m1 : NAs produced by integer overflow
> >
> >But when I change the function to this one by just replacing m1*m1 by
> >m1^2 ...
> >
> >yules.k2 <- function(input) {
> >   m1 <- length(input); temp <- table(table(input))
> >   m2 <- sum("*"(temp, as.numeric(names(temp))^2))
> >   return(10000*(m2-m1) / (m1^2))
> >}
> >
> >yules.k2(curr.lemmas) # -> 157.261
> >
> >I am using RStudio 1.1.447 and here's my sessionInfo
> >######################
> >R version 3.4.4 (2018-03-15)
> >Platform: x86_64-pc-linux-gnu (64-bit)
> >Running under: Linux Mint 18.3
> >
> >Matrix products: default
> >BLAS: /usr/lib/openblas-base/libblas.so.3
> >LAPACK: /usr/lib/libopenblasp-r0.2.18.so
> >
> >locale:
> > [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >LC_MONETARY=en_US.UTF-8
> > [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
> >               LC_ADDRESS=C               LC_TELEPHONE=C
> >[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> >attached base packages:
> >[1] stats     graphics  grDevices utils     datasets  methods   base
> >
> >loaded via a namespace (and not attached):
> > [1] compiler_3.4.4  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
> >htmltools_0.3.6 tools_3.4.4     yaml_2.1.19     Rcpp_0.12.16
> >stringi_1.2.2
> >[10] rmarkdown_1.9   knitr_1.20      stringr_1.3.0   digest_0.6.15
> >evaluate_0.10.1
> >######################
> >
> >What is even more puzzling is that one time I ran R in the console of
> >Geany and this happened:
> >
> >> m1
> >[1] 61224
> >> 61224*61224
> >[1] 3748378176
> >> 61224^2
> >[1] 3748378176
> >> m1*m1
> >[1] NA
> >Warning message:
> >In m1 * m1 : NAs produced by integer overflow
> >> m1^2
> >[1] 3748378176
> >
> >That is, the multiplication worked with the numbers but not the
> >numeric vectors; the above is literally copied from the console. Why
> >is that happening?
> >
> >Any help would be much appreciated!
> >STG
> >--
> >Stefan Th. Gries
> >----------------------------------
> >Univ. of California, Santa Barbara
> >http://tinyurl.com/stgries
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NAs produced by integer overflow, but only some time ...

Stefan Th. Gries
> You are right that various arithmetic operators map a pair of integer arguments to various type: the power and division operators map them to double precision while the the addition, multiplication, and subtraction operators map them to integer results (giving NA's if the result cannot fit into 32 bits).
Ah, ok, _that_ explains it, thanks a lot, I did not know that, which
is why it never occurred to me to check str(m1)!

> As for table(table(x)) being an unnatural construct, I use it all the time instead of anyDuplicated to see the pattern of duplications.
Thanks for this, too.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NAs produced by integer overflow, but only some time ...

Jeff Newmiller
In reply to this post by Stefan Th. Gries
When you have cooled down you may notice that the answer to your question was in items a-d, though Bill's use of str made it clearer. Also, there was in fact no call to yules.k1, much less one that includes sample data. You will find that the solution to problems in R are very often related to the details of the data you are working with that _you_ aren't noticing, which makes providing sample data a key step to obtaining a straightforward resolution to most problems. There are various discussions online about how to do this. [1][2]

---

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html


On May 9, 2018 7:58:19 AM PDT, "Stefan Th. Gries" <[hidden email]> wrote:

>Before responding to Jeff's posting, let me reiterate my question: Why
>does a function using m1*m1 produce an integer overflow, but m1^2 does
>not?
>
>As for Jeff's 'response':
>
>> a) Numeric values may be either integers (signed 32 bit) or double
>precision (53 bit mantissa).
>> b) Double precision constants are numeric with no decoration (e.g.
>61224). Integer constants have an L (e.g. 61224L).
>> c) 61224*61224 > 2^31-1 so that answer cannot fit into an integer.
>> d) Exponentiation is a floating point operation so the result of
>61224L^2L is a floating point answer that CAN fit into the 53bit
>mantissa of a double precision value, so no overflow occurs.
>Yes, that's all great and I knew that from
><https://stackoverflow.com/questions/8804779/what-is-integer-overflow-in-r-and-how-can-it-happen>.
>
>> e) Defining a function like yules.k1 and never showing how you called
>it does not constitute a reproducible example. To avoid such gaffes you
>can use the reprex package to confirm that the errors shown in your
>question are in fact reproducible.
>Responding to a post and never seeing that the provided code does
>actually show how I call the function does not constitute a useful
>answer. To avoid such gaffes you can use your reading skills to
>confirm that the perceived lack of a function call is in fact such a
>lack. In addition, typing m1 <- 61224 makes the multiplication example
>that I shows in the bottom part of the posting reproducible ...
>
>> f) On this mailing list, the fact that you are using RStudio is at
>best irrelevant, and at worst off-topic. If you don't see problems
>running your reproducible example from R in the terminal then the
>question probably belongs in the RStudio support forum. This is another
>reason to use the reprex package to check your reproducibility (this
>works even if you invoke it from RStudio).
>I did provide the information for the sake of comprehensiveness and I
>did mention that the problem also showed up in the console; the whole
>second part of the post was on that.
>
>> g) Calling table on the result of table must be one of the more
>bizarre calculation sequences I have ever seen in R. I hope you are
>getting the answers you are expecting when you do use double precision
>numeric values. Also, using the prefix form of multiplication is
>unnecessarily obscure, and your use of the return function at the end
>of your function is redundant.
>On this mailing list, your assessment of calculation sequences and
>their comparison to others you have seen is at best irrelevant and at
>worst off-topic since it doesn't answer the question. I didn't ask
>(you or anyone) to grade my code and there are reasons why "*" and
>return where used there as they are) but to answer the question why
>m1*m1 returned an error and m1^2 does not.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: NAs produced by integer overflow, but only some time ...

Rolf Turner
In reply to this post by Stefan Th. Gries

On 10/05/18 02:58, Stefan Th. Gries wrote:

> Before responding to Jeff's posting, let me reiterate my question: Why
> does a function using m1*m1 produce an integer overflow, but m1^2 does
> not?

This was made clear in Jeff's initial response.

> As for Jeff's 'response':

<SNIP>

Your intemperate reaction to Jeff's response is completely uncalled for.
  I find Jeff's patience in giving a such a detailed answer to you your
rather muddled question to be remarkable.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.