Any plans for ALTREP lists (VECSXP)?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Any plans for ALTREP lists (VECSXP)?

Bemis, Kylie
Hello,

I was wondering if there were any plans for ALTREP lists (VECSXP)?

It seems to me that they could be supported in a similar way to how ALTSTRING works, with Elt() and Set_elt() methods, or would there be some problems with that I’m not seeing due to lists not being atomic vectors?

I was taking an approach of converting each list element (of a file-based list data structure) to an ALTREP representation to build up an “ALTREP list”.

This seems fine for shorter lists with large elements, but I noticed that for longer lists with smaller elements, this could be far more time-consuming than simply reading the entire list into memory and returning a non-ALTREP list:

> x
<34840 length> matter_list :: out-of-memory list
(1.1 MB real | 543.3 MB virtual)

> system.time(y <- as.list(x))
   user  system elapsed
  1.116   2.175   5.053

> system.time(z <- as.altrep(x))
   user  system elapsed
 36.295   4.717  41.216

> .Internal(inspect(y))
@108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
  @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0) 404.093,404.096,404.099,404.102,404.105,...
  @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0) 409.924,409.927,409.931,409.934,409.937,...
  @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0) 400.3,400.303,400.306,400.309,400.312,...
  @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0) 402.179,402.182,402.185,402.188,402.191,...
  @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0) 403.021,403.024,403.027,403.03,403.033,...
  ...

> .Internal(inspect(z))
@108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
  @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1129, mem=0)
  @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=890, mem=0)
  @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1878, mem=0)
  @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=2266, mem=0)
  @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1981, mem=0)
  ...

In this situation, it would be much faster and simpler for me to return a theoretical ALTREP list that serves SEXP elements on-demand, similar to how ALTSTRING seems to be implemented.

I don’t know how many other people would get a use out of ALTREP lists, but I certainly would.

Are there any plans for this?

Thanks!

~~~
Kylie Ariel Bemis
Khoury College of Computer Sciences
Northeastern University
kuwisdelu.github.io<https://kuwisdelu.github.io>











        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Any plans for ALTREP lists (VECSXP)?

Tierney, Luke
Eventually, but probably not in the next release. There are many more
issues to think through for vectors where the elements can be
arbitrary R object, and I don't think there will be time for that soon
given other issues on the table.

Best,

luke

On Tue, 23 Jul 2019, Bemis, Kylie wrote:

> Hello,
>
> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>
> It seems to me that they could be supported in a similar way to how ALTSTRING works, with Elt() and Set_elt() methods, or would there be some problems with that I’m not seeing due to lists not being atomic vectors?
>
> I was taking an approach of converting each list element (of a file-based list data structure) to an ALTREP representation to build up an “ALTREP list”.
>
> This seems fine for shorter lists with large elements, but I noticed that for longer lists with smaller elements, this could be far more time-consuming than simply reading the entire list into memory and returning a non-ALTREP list:
>
>> x
> <34840 length> matter_list :: out-of-memory list
> (1.1 MB real | 543.3 MB virtual)
>
>> system.time(y <- as.list(x))
>   user  system elapsed
>  1.116   2.175   5.053
>
>> system.time(z <- as.altrep(x))
>   user  system elapsed
> 36.295   4.717  41.216
>
>> .Internal(inspect(y))
> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>  @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0) 404.093,404.096,404.099,404.102,404.105,...
>  @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0) 409.924,409.927,409.931,409.934,409.937,...
>  @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0) 400.3,400.303,400.306,400.309,400.312,...
>  @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0) 402.179,402.182,402.185,402.188,402.191,...
>  @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0) 403.021,403.024,403.027,403.03,403.033,...
>  ...
>
>> .Internal(inspect(z))
> @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>  @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1129, mem=0)
>  @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=890, mem=0)
>  @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1878, mem=0)
>  @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=2266, mem=0)
>  @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1981, mem=0)
>  ...
>
> In this situation, it would be much faster and simpler for me to return a theoretical ALTREP list that serves SEXP elements on-demand, similar to how ALTSTRING seems to be implemented.
>
> I don’t know how many other people would get a use out of ALTREP lists, but I certainly would.
>
> Are there any plans for this?
>
> Thanks!
>
> ~~~
> Kylie Ariel Bemis
> Khoury College of Computer Sciences
> Northeastern University
> kuwisdelu.github.io<https://kuwisdelu.github.io>
>
>
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Any plans for ALTREP lists (VECSXP)?

R devel mailing list
In reply to this post by Bemis, Kylie
Hi Kylie,

As an alternative in the short term, you could consider deriving from
S4Vector's List class, implementing the getListElement() method to
lazily create the objects.

Michael

On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie <[hidden email]> wrote:

>
> Hello,
>
> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>
> It seems to me that they could be supported in a similar way to how ALTSTRING works, with Elt() and Set_elt() methods, or would there be some problems with that I’m not seeing due to lists not being atomic vectors?
>
> I was taking an approach of converting each list element (of a file-based list data structure) to an ALTREP representation to build up an “ALTREP list”.
>
> This seems fine for shorter lists with large elements, but I noticed that for longer lists with smaller elements, this could be far more time-consuming than simply reading the entire list into memory and returning a non-ALTREP list:
>
> > x
> <34840 length> matter_list :: out-of-memory list
> (1.1 MB real | 543.3 MB virtual)
>
> > system.time(y <- as.list(x))
>    user  system elapsed
>   1.116   2.175   5.053
>
> > system.time(z <- as.altrep(x))
>    user  system elapsed
>  36.295   4.717  41.216
>
> > .Internal(inspect(y))
> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0) 404.093,404.096,404.099,404.102,404.105,...
>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0) 409.924,409.927,409.931,409.934,409.937,...
>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0) 400.3,400.303,400.306,400.309,400.312,...
>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0) 402.179,402.182,402.185,402.188,402.191,...
>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0) 403.021,403.024,403.027,403.03,403.033,...
>   ...
>
> > .Internal(inspect(z))
> @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1129, mem=0)
>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=890, mem=0)
>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1878, mem=0)
>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=2266, mem=0)
>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1981, mem=0)
>   ...
>
> In this situation, it would be much faster and simpler for me to return a theoretical ALTREP list that serves SEXP elements on-demand, similar to how ALTSTRING seems to be implemented.
>
> I don’t know how many other people would get a use out of ALTREP lists, but I certainly would.
>
> Are there any plans for this?
>
> Thanks!
>
> ~~~
> Kylie Ariel Bemis
> Khoury College of Computer Sciences
> Northeastern University
> kuwisdelu.github.io<https://kuwisdelu.github.io>
>
>
>
>
>
>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
[hidden email]

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Any plans for ALTREP lists (VECSXP)?

Gabriel Becker-2
Hi Kylie,

Is it a list with only numerics in it? (I only see REALSXPs there, but
obviously inspect isn't showing all of them). If so, you could load it up
into one big vector and then also keep partitioning information around.
Bioconductor does this (see ?IRanges::CompressedList ). The potential
benefit here being that the underlying large vector could then be a big
out-of-memory altrep. How helpful this would be depends somewhat on what
you want to do with it, of course, but it is something that comes to mind.

Also, I would expect some overhead but that seems like a lot (without
having done super much in the way of benchmarking). What exactly is
as.altrep doing?

Best,
~G

On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel <
[hidden email]> wrote:

> Hi Kylie,
>
> As an alternative in the short term, you could consider deriving from
> S4Vector's List class, implementing the getListElement() method to
> lazily create the objects.
>
> Michael
>
> On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie <[hidden email]>
> wrote:
> >
> > Hello,
> >
> > I was wondering if there were any plans for ALTREP lists (VECSXP)?
> >
> > It seems to me that they could be supported in a similar way to how
> ALTSTRING works, with Elt() and Set_elt() methods, or would there be some
> problems with that I’m not seeing due to lists not being atomic vectors?
> >
> > I was taking an approach of converting each list element (of a
> file-based list data structure) to an ALTREP representation to build up an
> “ALTREP list”.
> >
> > This seems fine for shorter lists with large elements, but I noticed
> that for longer lists with smaller elements, this could be far more
> time-consuming than simply reading the entire list into memory and
> returning a non-ALTREP list:
> >
> > > x
> > <34840 length> matter_list :: out-of-memory list
> > (1.1 MB real | 543.3 MB virtual)
> >
> > > system.time(y <- as.list(x))
> >    user  system elapsed
> >   1.116   2.175   5.053
> >
> > > system.time(z <- as.altrep(x))
> >    user  system elapsed
> >  36.295   4.717  41.216
> >
> > > .Internal(inspect(y))
> > @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
> >   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0)
> 404.093,404.096,404.099,404.102,404.105,...
> >   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0)
> 409.924,409.927,409.931,409.934,409.937,...
> >   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0)
> 400.3,400.303,400.306,400.309,400.312,...
> >   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0)
> 402.179,402.182,402.185,402.188,402.191,...
> >   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0)
> 403.021,403.024,403.027,403.03,403.033,...
> >   ...
> >
> > > .Internal(inspect(z))
> > @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
> >   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> len=1129, mem=0)
> >   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> len=890, mem=0)
> >   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> len=1878, mem=0)
> >   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> len=2266, mem=0)
> >   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> len=1981, mem=0)
> >   ...
> >
> > In this situation, it would be much faster and simpler for me to return
> a theoretical ALTREP list that serves SEXP elements on-demand, similar to
> how ALTSTRING seems to be implemented.
> >
> > I don’t know how many other people would get a use out of ALTREP lists,
> but I certainly would.
> >
> > Are there any plans for this?
> >
> > Thanks!
> >
> > ~~~
> > Kylie Ariel Bemis
> > Khoury College of Computer Sciences
> > Northeastern University
> > kuwisdelu.github.io<https://kuwisdelu.github.io>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Michael Lawrence
> Scientist, Bioinformatics and Computational Biology
> Genentech, A Member of the Roche Group
> Office +1 (650) 225-7760
> [hidden email]
>
> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Any plans for ALTREP lists (VECSXP)?

Bemis, Kylie
Thanks for the suggestions, everyone.

Is it not a pressing issue requiring alternatives, since the ‘matter_list’ object already behaves like a list, and I am just looking for a way to present a native R list (VECSXP) when a regular list is required.

In this case (in my typical use case), the ‘matter_list’ is homogenous and I use it like a ragged array; however, in general each element could be a different atomic vector type (specifically raw, logical, integer, or double).

Here, as.altrep() is an S4 method for converting my custom ‘matter’-class out-of-memory objects into their native R representations using ALTREP.

Seems to work well for the ‘matter' vectors, matrices, and arrays, where it just .Call()s my C function for making the corresponding ALTREP object, but the lists were giving me trouble because there I use lapply() to extract and uncompress the ‘matter_list’ metadata for each list element into a separate S4 ‘matter_vec’ out-of-memory vector, each of which is then used to create an ALTREP object for the corresponding list element. So it gets costly...

The cost is mostly in re-creating all of the metadata as regular R objects that end up occupying the R_altrep_data1() spot for all of the individual list elements. If I could make an ALTREP list, I could leave the metadata as-is and avoid all of that.

Anyway, not a pressing issue for me either, just something I noticed where having an ALTREP list could be useful, so I was wondering if it was in the plans, which Luke answered.

Thanks,

-Kylie

On Jul 23, 2019, at 8:27 PM, Gabriel Becker <[hidden email]<mailto:[hidden email]>> wrote:

Hi Kylie,

Is it a list with only numerics in it? (I only see REALSXPs there, but obviously inspect isn't showing all of them). If so, you could load it up into one big vector and then also keep partitioning information around. Bioconductor does this (see ?IRanges::CompressedList ). The potential benefit here being that the underlying large vector could then be a big out-of-memory altrep. How helpful this would be depends somewhat on what you want to do with it, of course, but it is something that comes to mind.

Also, I would expect some overhead but that seems like a lot (without having done super much in the way of benchmarking). What exactly is as.altrep doing?

Best,
~G

On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel <[hidden email]<mailto:[hidden email]>> wrote:
Hi Kylie,

As an alternative in the short term, you could consider deriving from
S4Vector's List class, implementing the getListElement() method to
lazily create the objects.

Michael

On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie <[hidden email]<mailto:[hidden email]>> wrote:

>
> Hello,
>
> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>
> It seems to me that they could be supported in a similar way to how ALTSTRING works, with Elt() and Set_elt() methods, or would there be some problems with that I’m not seeing due to lists not being atomic vectors?
>
> I was taking an approach of converting each list element (of a file-based list data structure) to an ALTREP representation to build up an “ALTREP list”.
>
> This seems fine for shorter lists with large elements, but I noticed that for longer lists with smaller elements, this could be far more time-consuming than simply reading the entire list into memory and returning a non-ALTREP list:
>
> > x
> <34840 length> matter_list :: out-of-memory list
> (1.1 MB real | 543.3 MB virtual)
>
> > system.time(y <- as.list(x))
>    user  system elapsed
>   1.116   2.175   5.053
>
> > system.time(z <- as.altrep(x))
>    user  system elapsed
>  36.295   4.717  41.216
>
> > .Internal(inspect(y))
> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0) 404.093,404.096,404.099,404.102,404.105,...
>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0) 409.924,409.927,409.931,409.934,409.937,...
>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0) 400.3,400.303,400.306,400.309,400.312,...
>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0) 402.179,402.182,402.185,402.188,402.191,...
>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0) 403.021,403.024,403.027,403.03,403.033,...
>   ...
>
> > .Internal(inspect(z))
> @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1129, mem=0)
>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=890, mem=0)
>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1878, mem=0)
>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=2266, mem=0)
>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=1981, mem=0)
>   ...
>
> In this situation, it would be much faster and simpler for me to return a theoretical ALTREP list that serves SEXP elements on-demand, similar to how ALTSTRING seems to be implemented.
>
> I don’t know how many other people would get a use out of ALTREP lists, but I certainly would.
>
> Are there any plans for this?
>
> Thanks!
>
> ~~~
> Kylie Ariel Bemis
> Khoury College of Computer Sciences
> Northeastern University
> kuwisdelu.github.io<https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkuwisdelu.github.io&data=02%7C01%7Ck.bemis%40northeastern.edu%7C30d98923a37f405b4c9908d70f9b6875%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636995032467082904&sdata=QGe%2F4F1D%2B9Sz7LxkP9%2BsAXD2t5JDtLVkko450e5ecI4%3D&reserved=0><https://kuwisdelu.github.io<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkuwisdelu.github.io&data=02%7C01%7Ck.bemis%40northeastern.edu%7C30d98923a37f405b4c9908d70f9b6875%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636995032467092912&sdata=PZKJtQ1wh%2FCyJn44DdGBQ7dLLI6eAYt00lK0uO%2BOrzA%3D&reserved=0>>
>
>
>
>
>
>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Ck.bemis%40northeastern.edu%7C30d98923a37f405b4c9908d70f9b6875%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636995032467102920&sdata=3CNTeCYlKyul8JPFhVeEFKvKooGPSm16xU8UplfJJsA%3D&reserved=0>



--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
[hidden email]<mailto:[hidden email]>

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Ck.bemis%40northeastern.edu%7C30d98923a37f405b4c9908d70f9b6875%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636995032467102920&sdata=3CNTeCYlKyul8JPFhVeEFKvKooGPSm16xU8UplfJJsA%3D&reserved=0>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: Any plans for ALTREP lists (VECSXP)?

Tierney, Luke
In reply to this post by Gabriel Becker-2
If one of you wanted to try to create a patch to support ALTREP
generic vectors here are some notes:

The main challenge I am aware of (there might be others): Allowing
DATAPTR to return a writable pointer would be too dangerous because
the GC write barrier needs to see all mutations. So it would be best
if Dataptr and Dataptr_or_null methods were not allowed to be
defined. The default methods in altrep.c should do the right think.

A reasonable name for the abstract class would be 'altlist'.

'altrep' methods that a class can provide:

   Unserialize or UnserializeEX
   Serialized_state
   Duplicate or DuplicateEx
   Coerce
   Inspect
   Length

'altvec' methods a class should provide:

   Extract_subset
   not Dataptr
   not Dataptr_or_null

'altlist' specific methods:

   Elt
   Set_elt

Best,

luke

On Tue, 23 Jul 2019, Gabriel Becker wrote:

> Hi Kylie,
>
> Is it a list with only numerics in it? (I only see REALSXPs there, but
> obviously inspect isn't showing all of them). If so, you could load it up
> into one big vector and then also keep partitioning information around.
> Bioconductor does this (see ?IRanges::CompressedList ). The potential
> benefit here being that the underlying large vector could then be a big
> out-of-memory altrep. How helpful this would be depends somewhat on what
> you want to do with it, of course, but it is something that comes to mind.
>
> Also, I would expect some overhead but that seems like a lot (without
> having done super much in the way of benchmarking). What exactly is
> as.altrep doing?
>
> Best,
> ~G
>
> On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel <
> [hidden email]> wrote:
>
>> Hi Kylie,
>>
>> As an alternative in the short term, you could consider deriving from
>> S4Vector's List class, implementing the getListElement() method to
>> lazily create the objects.
>>
>> Michael
>>
>> On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie <[hidden email]>
>> wrote:
>>>
>>> Hello,
>>>
>>> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>>>
>>> It seems to me that they could be supported in a similar way to how
>> ALTSTRING works, with Elt() and Set_elt() methods, or would there be some
>> problems with that I’m not seeing due to lists not being atomic vectors?
>>>
>>> I was taking an approach of converting each list element (of a
>> file-based list data structure) to an ALTREP representation to build up an
>> “ALTREP list”.
>>>
>>> This seems fine for shorter lists with large elements, but I noticed
>> that for longer lists with smaller elements, this could be far more
>> time-consuming than simply reading the entire list into memory and
>> returning a non-ALTREP list:
>>>
>>>> x
>>> <34840 length> matter_list :: out-of-memory list
>>> (1.1 MB real | 543.3 MB virtual)
>>>
>>>> system.time(y <- as.list(x))
>>>    user  system elapsed
>>>   1.116   2.175   5.053
>>>
>>>> system.time(z <- as.altrep(x))
>>>    user  system elapsed
>>>  36.295   4.717  41.216
>>>
>>>> .Internal(inspect(y))
>>> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>>>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0)
>> 404.093,404.096,404.099,404.102,404.105,...
>>>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0)
>> 409.924,409.927,409.931,409.934,409.937,...
>>>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0)
>> 400.3,400.303,400.306,400.309,400.312,...
>>>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0)
>> 402.179,402.182,402.185,402.188,402.191,...
>>>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0)
>> 403.021,403.024,403.027,403.03,403.033,...
>>>   ...
>>>
>>>> .Internal(inspect(z))
>>> @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>>>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1129, mem=0)
>>>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=890, mem=0)
>>>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1878, mem=0)
>>>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=2266, mem=0)
>>>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> len=1981, mem=0)
>>>   ...
>>>
>>> In this situation, it would be much faster and simpler for me to return
>> a theoretical ALTREP list that serves SEXP elements on-demand, similar to
>> how ALTSTRING seems to be implemented.
>>>
>>> I don’t know how many other people would get a use out of ALTREP lists,
>> but I certainly would.
>>>
>>> Are there any plans for this?
>>>
>>> Thanks!
>>>
>>> ~~~
>>> Kylie Ariel Bemis
>>> Khoury College of Computer Sciences
>>> Northeastern University
>>> kuwisdelu.github.io<https://kuwisdelu.github.io>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>> --
>> Michael Lawrence
>> Scientist, Bioinformatics and Computational Biology
>> Genentech, A Member of the Roche Group
>> Office +1 (650) 225-7760
>> [hidden email]
>>
>> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: Any plans for ALTREP lists (VECSXP)?

Gabriel Becker-2
I can work on this. Thanks Luke.

~G

On Wed, Jul 24, 2019 at 8:25 AM Tierney, Luke <[hidden email]>
wrote:

> If one of you wanted to try to create a patch to support ALTREP
> generic vectors here are some notes:
>
> The main challenge I am aware of (there might be others): Allowing
> DATAPTR to return a writable pointer would be too dangerous because
> the GC write barrier needs to see all mutations. So it would be best
> if Dataptr and Dataptr_or_null methods were not allowed to be
> defined. The default methods in altrep.c should do the right think.
>
> A reasonable name for the abstract class would be 'altlist'.
>
> 'altrep' methods that a class can provide:
>
>    Unserialize or UnserializeEX
>    Serialized_state
>    Duplicate or DuplicateEx
>    Coerce
>    Inspect
>    Length
>
> 'altvec' methods a class should provide:
>
>    Extract_subset
>    not Dataptr
>    not Dataptr_or_null
>
> 'altlist' specific methods:
>
>    Elt
>    Set_elt
>
> Best,
>
> luke
>
> On Tue, 23 Jul 2019, Gabriel Becker wrote:
>
> > Hi Kylie,
> >
> > Is it a list with only numerics in it? (I only see REALSXPs there, but
> > obviously inspect isn't showing all of them). If so, you could load it up
> > into one big vector and then also keep partitioning information around.
> > Bioconductor does this (see ?IRanges::CompressedList ). The potential
> > benefit here being that the underlying large vector could then be a big
> > out-of-memory altrep. How helpful this would be depends somewhat on what
> > you want to do with it, of course, but it is something that comes to
> mind.
> >
> > Also, I would expect some overhead but that seems like a lot (without
> > having done super much in the way of benchmarking). What exactly is
> > as.altrep doing?
> >
> > Best,
> > ~G
> >
> > On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel <
> > [hidden email]> wrote:
> >
> >> Hi Kylie,
> >>
> >> As an alternative in the short term, you could consider deriving from
> >> S4Vector's List class, implementing the getListElement() method to
> >> lazily create the objects.
> >>
> >> Michael
> >>
> >> On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie <[hidden email]>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I was wondering if there were any plans for ALTREP lists (VECSXP)?
> >>>
> >>> It seems to me that they could be supported in a similar way to how
> >> ALTSTRING works, with Elt() and Set_elt() methods, or would there be
> some
> >> problems with that I’m not seeing due to lists not being atomic vectors?
> >>>
> >>> I was taking an approach of converting each list element (of a
> >> file-based list data structure) to an ALTREP representation to build up
> an
> >> “ALTREP list”.
> >>>
> >>> This seems fine for shorter lists with large elements, but I noticed
> >> that for longer lists with smaller elements, this could be far more
> >> time-consuming than simply reading the entire list into memory and
> >> returning a non-ALTREP list:
> >>>
> >>>> x
> >>> <34840 length> matter_list :: out-of-memory list
> >>> (1.1 MB real | 543.3 MB virtual)
> >>>
> >>>> system.time(y <- as.list(x))
> >>>    user  system elapsed
> >>>   1.116   2.175   5.053
> >>>
> >>>> system.time(z <- as.altrep(x))
> >>>    user  system elapsed
> >>>  36.295   4.717  41.216
> >>>
> >>>> .Internal(inspect(y))
> >>> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
> >>>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0)
> >> 404.093,404.096,404.099,404.102,404.105,...
> >>>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0)
> >> 409.924,409.927,409.931,409.934,409.937,...
> >>>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0)
> >> 400.3,400.303,400.306,400.309,400.312,...
> >>>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0)
> >> 402.179,402.182,402.185,402.188,402.191,...
> >>>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0)
> >> 403.021,403.024,403.027,403.03,403.033,...
> >>>   ...
> >>>
> >>>> .Internal(inspect(z))
> >>> @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
> >>>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> >> len=1129, mem=0)
> >>>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> >> len=890, mem=0)
> >>>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> >> len=1878, mem=0)
> >>>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> >> len=2266, mem=0)
> >>>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
> >> len=1981, mem=0)
> >>>   ...
> >>>
> >>> In this situation, it would be much faster and simpler for me to return
> >> a theoretical ALTREP list that serves SEXP elements on-demand, similar
> to
> >> how ALTSTRING seems to be implemented.
> >>>
> >>> I don’t know how many other people would get a use out of ALTREP lists,
> >> but I certainly would.
> >>>
> >>> Are there any plans for this?
> >>>
> >>> Thanks!
> >>>
> >>> ~~~
> >>> Kylie Ariel Bemis
> >>> Khoury College of Computer Sciences
> >>> Northeastern University
> >>> kuwisdelu.github.io<https://kuwisdelu.github.io>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >>
> >>
> >> --
> >> Michael Lawrence
> >> Scientist, Bioinformatics and Computational Biology
> >> Genentech, A Member of the Roche Group
> >> Office +1 (650) 225-7760
> >> [hidden email]
> >>
> >> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   [hidden email]
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: Any plans for ALTREP lists (VECSXP)?

Gabriel Becker-2
@Kylie happy to collaborate on it if you're interested.

~G

On Wed, Jul 24, 2019 at 10:43 AM Gabriel Becker <[hidden email]>
wrote:

> I can work on this. Thanks Luke.
>
> ~G
>
> On Wed, Jul 24, 2019 at 8:25 AM Tierney, Luke <[hidden email]>
> wrote:
>
>> If one of you wanted to try to create a patch to support ALTREP
>> generic vectors here are some notes:
>>
>> The main challenge I am aware of (there might be others): Allowing
>> DATAPTR to return a writable pointer would be too dangerous because
>> the GC write barrier needs to see all mutations. So it would be best
>> if Dataptr and Dataptr_or_null methods were not allowed to be
>> defined. The default methods in altrep.c should do the right think.
>>
>> A reasonable name for the abstract class would be 'altlist'.
>>
>> 'altrep' methods that a class can provide:
>>
>>    Unserialize or UnserializeEX
>>    Serialized_state
>>    Duplicate or DuplicateEx
>>    Coerce
>>    Inspect
>>    Length
>>
>> 'altvec' methods a class should provide:
>>
>>    Extract_subset
>>    not Dataptr
>>    not Dataptr_or_null
>>
>> 'altlist' specific methods:
>>
>>    Elt
>>    Set_elt
>>
>> Best,
>>
>> luke
>>
>> On Tue, 23 Jul 2019, Gabriel Becker wrote:
>>
>> > Hi Kylie,
>> >
>> > Is it a list with only numerics in it? (I only see REALSXPs there, but
>> > obviously inspect isn't showing all of them). If so, you could load it
>> up
>> > into one big vector and then also keep partitioning information around.
>> > Bioconductor does this (see ?IRanges::CompressedList ). The potential
>> > benefit here being that the underlying large vector could then be a big
>> > out-of-memory altrep. How helpful this would be depends somewhat on what
>> > you want to do with it, of course, but it is something that comes to
>> mind.
>> >
>> > Also, I would expect some overhead but that seems like a lot (without
>> > having done super much in the way of benchmarking). What exactly is
>> > as.altrep doing?
>> >
>> > Best,
>> > ~G
>> >
>> > On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel <
>> > [hidden email]> wrote:
>> >
>> >> Hi Kylie,
>> >>
>> >> As an alternative in the short term, you could consider deriving from
>> >> S4Vector's List class, implementing the getListElement() method to
>> >> lazily create the objects.
>> >>
>> >> Michael
>> >>
>> >> On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie <[hidden email]
>> >
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>> >>>
>> >>> It seems to me that they could be supported in a similar way to how
>> >> ALTSTRING works, with Elt() and Set_elt() methods, or would there be
>> some
>> >> problems with that I’m not seeing due to lists not being atomic
>> vectors?
>> >>>
>> >>> I was taking an approach of converting each list element (of a
>> >> file-based list data structure) to an ALTREP representation to build
>> up an
>> >> “ALTREP list”.
>> >>>
>> >>> This seems fine for shorter lists with large elements, but I noticed
>> >> that for longer lists with smaller elements, this could be far more
>> >> time-consuming than simply reading the entire list into memory and
>> >> returning a non-ALTREP list:
>> >>>
>> >>>> x
>> >>> <34840 length> matter_list :: out-of-memory list
>> >>> (1.1 MB real | 543.3 MB virtual)
>> >>>
>> >>>> system.time(y <- as.list(x))
>> >>>    user  system elapsed
>> >>>   1.116   2.175   5.053
>> >>>
>> >>>> system.time(z <- as.altrep(x))
>> >>>    user  system elapsed
>> >>>  36.295   4.717  41.216
>> >>>
>> >>>> .Internal(inspect(y))
>> >>> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>> >>>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0)
>> >> 404.093,404.096,404.099,404.102,404.105,...
>> >>>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0)
>> >> 409.924,409.927,409.931,409.934,409.937,...
>> >>>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0)
>> >> 400.3,400.303,400.306,400.309,400.312,...
>> >>>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0)
>> >> 402.179,402.182,402.185,402.188,402.191,...
>> >>>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0)
>> >> 403.021,403.024,403.027,403.03,403.033,...
>> >>>   ...
>> >>>
>> >>>> .Internal(inspect(z))
>> >>> @108210000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>> >>>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=1129, mem=0)
>> >>>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=890, mem=0)
>> >>>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=1878, mem=0)
>> >>>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=2266, mem=0)
>> >>>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4,
>> >> len=1981, mem=0)
>> >>>   ...
>> >>>
>> >>> In this situation, it would be much faster and simpler for me to
>> return
>> >> a theoretical ALTREP list that serves SEXP elements on-demand, similar
>> to
>> >> how ALTSTRING seems to be implemented.
>> >>>
>> >>> I don’t know how many other people would get a use out of ALTREP
>> lists,
>> >> but I certainly would.
>> >>>
>> >>> Are there any plans for this?
>> >>>
>> >>> Thanks!
>> >>>
>> >>> ~~~
>> >>> Kylie Ariel Bemis
>> >>> Khoury College of Computer Sciences
>> >>> Northeastern University
>> >>> kuwisdelu.github.io<https://kuwisdelu.github.io>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> [hidden email] mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >>
>> >>
>> >> --
>> >> Michael Lawrence
>> >> Scientist, Bioinformatics and Computational Biology
>> >> Genentech, A Member of the Roche Group
>> >> Office +1 (650) 225-7760
>> >> [hidden email]
>> >>
>> >> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>     Actuarial Science
>> 241 Schaeffer Hall                  email:   [hidden email]
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel