function pointers?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

function pointers?

PaulJohnson32gmail
We have a project that calls for the creation of a list of many
distribution objects.  Distributions can be of various types, with
various parameters, but we ran into some problems. I started testing
on a simple list of rnorm-based objects.

I was a little surprised at the RAM storage requirements, here's an example:

N <- 10000
closureList <- vector("list", N)
nsize = sample(x = 1:100, size = N, replace = TRUE)
for (i in seq_along(nsize)){
    closureList[[i]] <- list(func = rnorm, n = nsize[i])
}
format(object.size(closureList), units = "Mb")

Output says
22.4 MB

I noticed that if I do not name the objects in the list, then the
storage drops to 19.9 MB.

That seemed like a lot of storage for a function's name. Why so much?
My colleagues think the RAM use is high because this is a closure
(hence closureList).  I can't even convince myself it actually is a
closure. The R source has

rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)

The storage holding 10000 copies of rnorm, but we really only need 1,
which we can use in the objects.

Thinking of this like C,  I am looking to pass in a pointer to the
function.  I found my way to the idea of putting a function in an
environment in order to pass it by reference:

rnormPointer <- function(inputValue1, inputValue2){
    object <- new.env(parent=globalenv())
    object$distr <- inputValue1
    object$n <- inputValue2
    class(object) <- 'pointer'
    object
}

## Experiment with that
gg <- rnormPointer(rnorm, 33)
gg$distr(gg$n)

ptrList <- vector("list", N)
for(i in seq_along(nsize)) {
    ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
}
format(object.size(ptrList), units = "Mb")

The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
required for closureList.  This thing works in the way I expect

## can pass in the unnamed arguments for n, mean and sd here
ptrList[[1]]$distr(33, 100, 10)
## Or the named arguments
ptrList[[1]]$distr(1, sd = 100)

This environment trick mostly works, so far as I can see, but I have
these questions.

1. Is the object.size() return accurate for ptrList?  Do I really
reduce storage to that amount, or is the required storage someplace
else (in the new environment) that is not included in object.size()?

2. Am I running with scissors here? Unexpected bad things await?

3. Why is the storage for closureList so great? It looks to me like
rnorm is just this little thing:

function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<bytecode: 0x55cc9988cae0>

4. Could I learn (you show me?) to store the bytecode address as a
thing and use it in the objects?  I'd guess that is the fastest
possible way. In an Objective-C problem in the olden days, we found
the method-lookup was a major slowdown and one of the programmers
showed us how to save the lookup and use it over and over.

pj



--
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: function pointers?

Duncan Murdoch-2
On 22/11/2017 11:29 AM, Paul Johnson wrote:

> We have a project that calls for the creation of a list of many
> distribution objects.  Distributions can be of various types, with
> various parameters, but we ran into some problems. I started testing
> on a simple list of rnorm-based objects.
>
> I was a little surprised at the RAM storage requirements, here's an example:
>
> N <- 10000
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
>      closureList[[i]] <- list(func = rnorm, n = nsize[i])
> }
> format(object.size(closureList), units = "Mb")
>
> Output says
> 22.4 MB
>

You should read the help page for object.size.  You're doing exactly the
kind of thing that causes it to give overestimates of the amount of
memory being used.

I'd suggest turning on memory profiling in Rprof() for a more accurate
result, but it seems to be broken:

 > Rprof(memory.profiling=TRUE)
 > N <- 10000
 > closureList <- vector("list", N)
 > nsize = sample(x = 1:100, size = N, replace = TRUE)
 > for (i in seq_along(nsize)){
+     closureList[[i]] <- list(func = rnorm, n = nsize[i])
+ }
 > format(object.size(closureList), units = "Mb")
[1] "19.2 Mb"
 > Rprof(NULL)
 > summaryRprof()
Error in rowsum.default(c(as.vector(new.ftable), fcounts),
c(names(new.ftable),  :
   unimplemented type 'NULL' in 'HashTableSetup'
In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'


Duncan Murdoch

> I noticed that if I do not name the objects in the list, then the
> storage drops to 19.9 MB.
>
> That seemed like a lot of storage for a function's name. Why so much?
> My colleagues think the RAM use is high because this is a closure
> (hence closureList).  I can't even convince myself it actually is a
> closure. The R source has
>
> rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)
>
> The storage holding 10000 copies of rnorm, but we really only need 1,
> which we can use in the objects.
>
> Thinking of this like C,  I am looking to pass in a pointer to the
> function.  I found my way to the idea of putting a function in an
> environment in order to pass it by reference:
>
> rnormPointer <- function(inputValue1, inputValue2){
>      object <- new.env(parent=globalenv())
>      object$distr <- inputValue1
>      object$n <- inputValue2
>      class(object) <- 'pointer'
>      object
> }
>
> ## Experiment with that
> gg <- rnormPointer(rnorm, 33)
> gg$distr(gg$n)
>
> ptrList <- vector("list", N)
> for(i in seq_along(nsize)) {
>      ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
> }
> format(object.size(ptrList), units = "Mb")
>
> The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
> required for closureList.  This thing works in the way I expect
>
> ## can pass in the unnamed arguments for n, mean and sd here
> ptrList[[1]]$distr(33, 100, 10)
> ## Or the named arguments
> ptrList[[1]]$distr(1, sd = 100)
>
> This environment trick mostly works, so far as I can see, but I have
> these questions.
>
> 1. Is the object.size() return accurate for ptrList?  Do I really
> reduce storage to that amount, or is the required storage someplace
> else (in the new environment) that is not included in object.size()?
>
> 2. Am I running with scissors here? Unexpected bad things await?
>
> 3. Why is the storage for closureList so great? It looks to me like
> rnorm is just this little thing:
>
> function (n, mean = 0, sd = 1)
> .Call(C_rnorm, n, mean, sd)
> <bytecode: 0x55cc9988cae0>
>
> 4. Could I learn (you show me?) to store the bytecode address as a
> thing and use it in the objects?  I'd guess that is the fastest
> possible way. In an Objective-C problem in the olden days, we found
> the method-lookup was a major slowdown and one of the programmers
> showed us how to save the lookup and use it over and over.
>
> pj
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: function pointers?

jholtman
In reply to this post by PaulJohnson32gmail
I am replying to the first part of the question about the size of the
object.  It is probably best to use the "object_size" function in the
"pryr" package:

 ‘object_size’ works similarly to ‘object.size’, but counts more
     accurately and includes the size of environments. ‘compare_size’
     makes it easy to compare the output of ‘object_size’ and
     ‘object.size’.

Here is what you get from the same code:

> N <- 10000
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
+     closureList[[i]] <- list(func = rnorm, n = nsize[i])
+ }
> format(object.size(closureList), units = "Mb")
[1] "22.4 Mb"
> pryr::compare_size(closureList)
    base     pryr
23520040  2241776

You will notice that you get back a size that is 10X smaller because it is
accounting for the shared space.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Nov 22, 2017 at 11:29 AM, Paul Johnson <[hidden email]> wrote:

> We have a project that calls for the creation of a list of many
> distribution objects.  Distributions can be of various types, with
> various parameters, but we ran into some problems. I started testing
> on a simple list of rnorm-based objects.
>
> I was a little surprised at the RAM storage requirements, here's an
> example:
>
> N <- 10000
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
>     closureList[[i]] <- list(func = rnorm, n = nsize[i])
> }
> format(object.size(closureList), units = "Mb")
>
> Output says
> 22.4 MB
>
> I noticed that if I do not name the objects in the list, then the
> storage drops to 19.9 MB.
>
> That seemed like a lot of storage for a function's name. Why so much?
> My colleagues think the RAM use is high because this is a closure
> (hence closureList).  I can't even convince myself it actually is a
> closure. The R source has
>
> rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)
>
> The storage holding 10000 copies of rnorm, but we really only need 1,
> which we can use in the objects.
>
> Thinking of this like C,  I am looking to pass in a pointer to the
> function.  I found my way to the idea of putting a function in an
> environment in order to pass it by reference:
>
> rnormPointer <- function(inputValue1, inputValue2){
>     object <- new.env(parent=globalenv())
>     object$distr <- inputValue1
>     object$n <- inputValue2
>     class(object) <- 'pointer'
>     object
> }
>
> ## Experiment with that
> gg <- rnormPointer(rnorm, 33)
> gg$distr(gg$n)
>
> ptrList <- vector("list", N)
> for(i in seq_along(nsize)) {
>     ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
> }
> format(object.size(ptrList), units = "Mb")
>
> The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
> required for closureList.  This thing works in the way I expect
>
> ## can pass in the unnamed arguments for n, mean and sd here
> ptrList[[1]]$distr(33, 100, 10)
> ## Or the named arguments
> ptrList[[1]]$distr(1, sd = 100)
>
> This environment trick mostly works, so far as I can see, but I have
> these questions.
>
> 1. Is the object.size() return accurate for ptrList?  Do I really
> reduce storage to that amount, or is the required storage someplace
> else (in the new environment) that is not included in object.size()?
>
> 2. Am I running with scissors here? Unexpected bad things await?
>
> 3. Why is the storage for closureList so great? It looks to me like
> rnorm is just this little thing:
>
> function (n, mean = 0, sd = 1)
> .Call(C_rnorm, n, mean, sd)
> <bytecode: 0x55cc9988cae0>
>
> 4. Could I learn (you show me?) to store the bytecode address as a
> thing and use it in the objects?  I'd guess that is the fastest
> possible way. In an Objective-C problem in the olden days, we found
> the method-lookup was a major slowdown and one of the programmers
> showed us how to save the lookup and use it over and over.
>
> pj
>
>
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis
> http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.