Converting non-32-bit integers from python to R to use bit64: reticulate

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Converting non-32-bit integers from python to R to use bit64: reticulate

Juan Telleria Ruiz de Aguirre
Dear R Developers,

There is an interesting issue related to "reticulate" R package which
discusses how to convert Python's non-32 bit integers to R, which has had
quite an exhaustive discussion:

https://github.com/rstudio/reticulate/issues/323

Python seems to handle integers differently from R, and is dependant on the
system arquitecture: On 32 bit systems uses 32-bit integers, and on 64-bit
systems uses 64-bit integers.

So my question is:

As regards R's C Interface, how costly would it be to convert INTSXP from
32 bits to 64 bits using C, on 64 bits Systems? Do the benefits surpass the
costs? And should such development be handled from within R Core / Ordinary
Members , or it shall be left to package maintainers?

Thank you! :)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Converting non-32-bit integers from python to R to use bit64: reticulate

Gabriel Becker-2
Hi Juan,

Comments inline.

On Wed, May 29, 2019 at 12:48 PM Juan Telleria Ruiz de Aguirre <
[hidden email]> wrote:

> Dear R Developers,
>
> There is an interesting issue related to "reticulate" R package which
> discusses how to convert Python's non-32 bit integers to R, which has had
> quite an exhaustive discussion:
>
> https://github.com/rstudio/reticulate/issues/323
>
> Python seems to handle integers differently from R, and is dependant on the
> system arquitecture: On 32 bit systems uses 32-bit integers, and on 64-bit
> systems uses 64-bit integers.
>
> So my question is:
>
> As regards R's C Interface, how costly would it be to convert INTSXP from
> 32 bits to 64 bits using C, on 64 bits Systems? Do the benefits surpass the
> costs? And should such development be handled from within R Core / Ordinary
> Members , or it shall be left to package maintainers?
>

Well, I am not an R-core member, but I can mention a few things:

1. This seems like it would make the results of R code non-reproducible
between 32 and 64bit versions of R; at least some code would give different
results (at the very least in terms of when integer values overflow to NA,
which is documented behavior).
2. Obviously all integer data would take twice as much memory, memory
bandwidth, space in caches, etc, even when it doesn't need it.
3. Various places treat data /data pointers coming out of INTSXP and LGLSXP
objects the same within the internal R sources (as currently they're both
int/int*). Catching and fixing all those wouldn't be impossible, but it
would take at least some doing.

For me personally 1 seems like a big problem, and 3 makes the conversion
more work than it might have seemed initially.

As a related side note, as far as I understand what I've heard from R-core
members directly, the choice to not have multiple types of integers is
intentional and unlikely to change.

Best,
~G




>
> Thank you! :)
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Converting non-32-bit integers from python to R to use bit64: reticulate

Juan Telleria Ruiz de Aguirre
Thank you Gabriel for valuable insights on the 64-bit integers topic.

In addition, my statement was wrong, as Python3 seems to have unlimited
(and variable) size integers. Here is related CPython Code:

https://github.com/python/cpython/blob/master/Objects/longobject.c

Division between Int-32 and Int-64 seems to only happen in Python2.

Best,
Juan

El miércoles, 29 de mayo de 2019, Gabriel Becker <[hidden email]>
escribió:

> Hi Juan,
>
> Comments inline.
>
> On Wed, May 29, 2019 at 12:48 PM Juan Telleria Ruiz de Aguirre <
> [hidden email]> wrote:
>
>> Dear R Developers,
>>
>> There is an interesting issue related to "reticulate" R package which
>> discusses how to convert Python's non-32 bit integers to R, which has had
>> quite an exhaustive discussion:
>>
>> https://github.com/rstudio/reticulate/issues/323
>>
>> Python seems to handle integers differently from R, and is dependant on
>> the
>> system arquitecture: On 32 bit systems uses 32-bit integers, and on 64-bit
>> systems uses 64-bit integers.
>>
>> So my question is:
>>
>> As regards R's C Interface, how costly would it be to convert INTSXP from
>> 32 bits to 64 bits using C, on 64 bits Systems? Do the benefits surpass
>> the
>> costs? And should such development be handled from within R Core /
>> Ordinary
>> Members , or it shall be left to package maintainers?
>>
>
> Well, I am not an R-core member, but I can mention a few things:
>
> 1. This seems like it would make the results of R code non-reproducible
> between 32 and 64bit versions of R; at least some code would give different
> results (at the very least in terms of when integer values overflow to NA,
> which is documented behavior).
> 2. Obviously all integer data would take twice as much memory, memory
> bandwidth, space in caches, etc, even when it doesn't need it.
> 3. Various places treat data /data pointers coming out of INTSXP and
> LGLSXP objects the same within the internal R sources (as currently they're
> both int/int*). Catching and fixing all those wouldn't be impossible, but
> it would take at least some doing.
>
> For me personally 1 seems like a big problem, and 3 makes the conversion
> more work than it might have seemed initially.
>
> As a related side note, as far as I understand what I've heard from R-core
> members directly, the choice to not have multiple types of integers is
> intentional and unlikely to change.
>
> Best,
> ~G
>
>
>
>
>>
>> Thank you! :)
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Converting non-32-bit integers from python to R to use bit64: reticulate

Martin Maechler
>>>>> Juan Telleria Ruiz de Aguirre
>>>>>     on Thu, 30 May 2019 18:46:29 +0200 writes:

    >Thank you Gabriel for valuable insights on the 64-bit integers topic.
    >In addition, my statement was wrong, as Python3 seems to have unlimited
    >(and variable) size integers.
    ....

If you are interested in using unlimited size integers, you
could use the CRAN R package 'gmp' which builds on the GMP = GNU
MP = GNU Multi Precision C library.

   https://cran.r-project.org/package=gmp

(and for arbitrary precision "floats", see CRAN pkg 'Rmpfr'
 built on package gmp, and both the GNU C libraries  GMP and
 MPFR:
           https://cran.r-project.org/package=Rmpfr
)


    >Division between Int-32 and Int-64 seems to only happen in Python2.

    >Best,
    >Juan

    >El miércoles, 29 de mayo de 2019, Gabriel Becker <[hidden email]>
    >escribió:

    >> Hi Juan,
    >>
    >> Comments inline.
    >>
    >> On Wed, May 29, 2019 at 12:48 PM Juan Telleria Ruiz de Aguirre <
    >> [hidden email]> wrote:
    >>
    >>> Dear R Developers,
    >>>
    >>> There is an interesting issue related to "reticulate" R package which
    >>> discusses how to convert Python's non-32 bit integers to R, which has had
    >>> quite an exhaustive discussion:
    >>>
    >>> https://github.com/rstudio/reticulate/issues/323
    >>>
    >>> Python seems to handle integers differently from R, and is dependant on
    >>> the
    >>> system arquitecture: On 32 bit systems uses 32-bit integers, and on 64-bit
    >>> systems uses 64-bit integers.
    >>>
    >>> So my question is:
    >>>
    >>> As regards R's C Interface, how costly would it be to convert INTSXP from
    >>> 32 bits to 64 bits using C, on 64 bits Systems? Do the benefits surpass
    >>> the
    >>> costs? And should such development be handled from within R Core /
    >>> Ordinary
    >>> Members , or it shall be left to package maintainers?
    >>>
    >>
    >> Well, I am not an R-core member, but I can mention a few things:
    >>
    >> 1. This seems like it would make the results of R code non-reproducible
    >> between 32 and 64bit versions of R; at least some code would give different
    >> results (at the very least in terms of when integer values overflow to NA,
    >> which is documented behavior).
    >> 2. Obviously all integer data would take twice as much memory, memory
    >> bandwidth, space in caches, etc, even when it doesn't need it.
    >> 3. Various places treat data /data pointers coming out of INTSXP and
    >> LGLSXP objects the same within the internal R sources (as currently they're
    >> both int/int*). Catching and fixing all those wouldn't be impossible, but
    >> it would take at least some doing.
    >>
    >> For me personally 1 seems like a big problem, and 3 makes the conversion
    >> more work than it might have seemed initially.
    >>
    >> As a related side note, as far as I understand what I've heard from R-core
    >> members directly, the choice to not have multiple types of integers is
    >> intentional and unlikely to change.
    >>
    >> Best,
    >> ~G
    >>
    >>
    >>
    >>
    >>>
    >>> Thank you! :)

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Converting non-32-bit integers from python to R to use bit64: reticulate

Juan Telleria Ruiz de Aguirre
Thank you Martin for giving to know and developing 'Rmpfr' library for
unlimited size integers (GNU C GMP) and arbitrary precision floats (GNU C
MPFR):

https://cran.r-project.org/package=Rmpfr

My question is: In the long term (For R3.7.0 or R3.8.0):

Does it have sense that CMP substitutes INTSXP, and MPFR substitutes
REALSXP code? With this we would achieve that an integer is always an
integer, and a numeric double precision float always a numeric double
precision float, without sometimes casting underneath.

And would the R Community / R Ordinary Members would be willing to help R
Core on such implementation (If has sense, and wants to be adopted)?

Thank you all! :)


>
> If you are interested in using unlimited size integers, you
> could use the CRAN R package 'gmp' which builds on the GMP = GNU
> MP = GNU Multi Precision C library.
>
>    https://cran.r-project.org/package=gmp
>
> (and for arbitrary precision "floats", see CRAN pkg 'Rmpfr'
>  built on package gmp, and both the GNU C libraries  GMP and
>  MPFR:
>            https://cran.r-project.org/package=Rmpfr
> )
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Converting non-32-bit integers from python to R to use bit64: reticulate

Martin Maechler
>>>>> Juan Telleria Ruiz de Aguirre
>>>>>     on Mon, 3 Jun 2019 06:50:17 +0200 writes:

    > Thank you Martin for giving to know and developing 'Rmpfr' library for
    > unlimited size integers (GNU C GMP) and arbitrary precision floats (GNU C
    > MPFR):

    > https://cran.r-project.org/package=Rmpfr

    > My question is: In the long term (For R3.7.0 or R3.8.0):

    > Does it have sense that CMP substitutes INTSXP, and MPFR substitutes
    > REALSXP code? With this we would achieve that an integer is always an
    > integer, and a numeric double precision float always a numeric double
    > precision float, without sometimes casting underneath.

    > And would the R Community / R Ordinary Members would be willing to help R
    > Core on such implementation (If has sense, and wants to be adopted)?

No, such a change has "no sense" and hence won't be adopted (in
this form):

- INTSXP and REALSXP are part of the C API of R, and are well defined.
  Changing them will almost surely break 100s and by
  dependencies, probably 1000s of existing R packages.

- I'm sure Python and other system do have fixed size "double
  precision" vectors, because that's how you interface with all
  pre-existing computational libraries,
  and I am almost sure that support of arbitrary long integer
  (or double) is via another class/type.

- I know that Julia has adopted these (GMP and MPFR I think)
  types and nicely interfaces them on a relatively "base" level.
  With their nice class hierarchy (and very nice "S4 like" multi-argument
  method dispatch for *all* functions) it can look quite
  seemless for the user to work with these extended classes, but
  they are not all identical to the basic "real"/"double" or "integer" classes.
 
- I'm not the expert here (but there are not so many experts
  ..), but I'm pretty sure that adding new "basic types" in the
  underlying C level seems not at all easy for R.  It would mean a big
  break in all back compatibility -- which is conceivable --
  and *may* also need a big rewrite of much of the R code base
  which seems less conceivable in the mid term (2-3 years; long
  term: > 5 years).


    > Thank you all! :)

You are welcome.

I think we should close this thread here,  unless some real
experts join.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Converting non-32-bit integers from python to R to use bit64: reticulate

Kevin Ushey
I think a more productive conversation could be: what additions to R
would allow for user-defined types / classes that behave just like the
built-in vector types? As a motivating example, one cannot currently
use the 64bit integer objects from bit64 to subset data frames:

   > library(bit64); mtcars[as.integer64(1:3), ]
    [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
   <0 rows> (or 0-length row.names)

I think ALTREP presents a possibility here, in that we could have a
64bit integer ALTREP object that behaves either like an INTSXP or
REALSXP as necessary. But I'm not sure how we would handle large 64bit
integer values which won't fit in either an INTSXP or REALSXP (in the
REALSXP case, precision could be lost for values > 2^53).

One possibility would be to allow ALTREP objects to have a chance at
managing dispatch in some methods, so that (for example) in e.g.
data[<ALTREP>], the ALTREP object has the opportunity to choose how
the data object should be subsetted. Of course, this implies wiring
through yet another dispatch mechanism through a category of primitive
/ internal functions, which could be expensive in terms of
implementation / maintenance... and I'm not sure if this could play
well with the existing S3 / S4 dispatch mechanisms.

FWIW, I think most commonly 64bit integers arise as e.g. database keys
/ IDs, and are typically just used for subsetting / reordering of data
as opposed to math. In these cases, converting the 64bit integers to a
character vector is typically a viable workaround, although it's much
slower.

Still, at least to me, it seems like there is likely a path forward
with ALTREP for 64bit integer vectors that can behave (more or less)
just like builtin R vectors.

Best,
Kevin

On Tue, Jun 4, 2019 at 9:34 AM Martin Maechler
<[hidden email]> wrote:

>
> >>>>> Juan Telleria Ruiz de Aguirre
> >>>>>     on Mon, 3 Jun 2019 06:50:17 +0200 writes:
>
>     > Thank you Martin for giving to know and developing 'Rmpfr' library for
>     > unlimited size integers (GNU C GMP) and arbitrary precision floats (GNU C
>     > MPFR):
>
>     > https://cran.r-project.org/package=Rmpfr
>
>     > My question is: In the long term (For R3.7.0 or R3.8.0):
>
>     > Does it have sense that CMP substitutes INTSXP, and MPFR substitutes
>     > REALSXP code? With this we would achieve that an integer is always an
>     > integer, and a numeric double precision float always a numeric double
>     > precision float, without sometimes casting underneath.
>
>     > And would the R Community / R Ordinary Members would be willing to help R
>     > Core on such implementation (If has sense, and wants to be adopted)?
>
> No, such a change has "no sense" and hence won't be adopted (in
> this form):
>
> - INTSXP and REALSXP are part of the C API of R, and are well defined.
>   Changing them will almost surely break 100s and by
>   dependencies, probably 1000s of existing R packages.
>
> - I'm sure Python and other system do have fixed size "double
>   precision" vectors, because that's how you interface with all
>   pre-existing computational libraries,
>   and I am almost sure that support of arbitrary long integer
>   (or double) is via another class/type.
>
> - I know that Julia has adopted these (GMP and MPFR I think)
>   types and nicely interfaces them on a relatively "base" level.
>   With their nice class hierarchy (and very nice "S4 like" multi-argument
>   method dispatch for *all* functions) it can look quite
>   seemless for the user to work with these extended classes, but
>   they are not all identical to the basic "real"/"double" or "integer" classes.
>
> - I'm not the expert here (but there are not so many experts
>   ..), but I'm pretty sure that adding new "basic types" in the
>   underlying C level seems not at all easy for R.  It would mean a big
>   break in all back compatibility -- which is conceivable --
>   and *may* also need a big rewrite of much of the R code base
>   which seems less conceivable in the mid term (2-3 years; long
>   term: > 5 years).
>
>
>     > Thank you all! :)
>
> You are welcome.
>
> I think we should close this thread here,  unless some real
> experts join.
>
> Martin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel