Hi all,
I've been trying to get a better handle on what manipulations lead R to duplicate a vector, creating small experiments and using tracemem to observe what happens (all in 2.15.1). That's lead me to a few questions, illustrated using the snippet below. x <- 1:10 tracemem(x) # [1] "<0x1058f8238>" x[5] <- 5 # tracemem[0x1058f8238 -> 0x105994ab0]: x[11] <- 11 Why does x[5] <- 5 create a copy, when x[11] (which should be extending a vector does not) ? I can understand that maybe x[5] <- 5 hasn't yet been optimised to not make a copy, but if that's the case then why doesn't x[11] <- 11 make one? I thought it might be because somehow tracemem loses track, but adding an additional tracemem(x) after x[5] <- 5 doesn't change the output. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hadley Wickham <hadley <at> rice.edu> writes:
> Why does x[5] <- 5 create a copy That assigns 5 not 5L. x is being coerced from integer to double. x[5] <- 5L doesn't copy. > , when x[11] (which should be > extending a vector does not) ? I can understand that maybe x[5] <- 5 > hasn't yet been optimised to not make a copy, but if that's the case > then why doesn't x[11] <- 11 make one? Extending a vector is creating a new (longer) vector and copying the old (shorter) one in. That's different to duplicate(). tracemem only reports calls to duplicate(). Matthew ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by Hadley Wickham-2
Read the help carefully as to what 'copy' means:
When an object is traced any copying of the object by the C function ‘duplicate’ produces a message to standard output, as does type coercion and copying when passing arguments to ‘.C’ or ‘.Fortran’. If you want to understand when 'duplicate' is called, you need to read the source code. File src/main/subassign.c will explain the different paths taken by your two cases. But isn't it rather obvious that duplicating x is not useful when a new longer vector needs to be created? (BTW, in earlier versions of R tracemem reported some transformations of x to objects of the same length, but not at all consistently.] On 12/07/2012 17:15, Hadley Wickham wrote: > Hi all, > > I've been trying to get a better handle on what manipulations lead R > to duplicate a vector, creating small experiments and using tracemem > to observe what happens (all in 2.15.1). That's lead me to a few > questions, illustrated using the snippet below. > > x <- 1:10 > tracemem(x) > # [1] "<0x1058f8238>" > x[5] <- 5 > # tracemem[0x1058f8238 -> 0x105994ab0]: > x[11] <- 11 > > Why does x[5] <- 5 create a copy, when x[11] (which should be > extending a vector does not) ? I can understand that maybe x[5] <- 5 > hasn't yet been optimised to not make a copy, but if that's the case > then why doesn't x[11] <- 11 make one? I thought it might be because > somehow tracemem loses track, but adding an additional tracemem(x) > after x[5] <- 5 doesn't change the output. > > Hadley > -- Brian D. Ripley, [hidden email] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
> Read the help carefully as to what 'copy' means:
> > When an object is traced any copying of the object by the C > function ‘duplicate’ produces a message to standard output, as > does type coercion and copying when passing arguments to ‘.C’ or > ‘.Fortran’. > > If you want to understand when 'duplicate' is called, you need to read the > source code. File src/main/subassign.c will explain the different paths > taken by your two cases. But isn't it rather obvious that duplicating x is > not useful when a new longer vector needs to be created? Thanks, that's useful. Is there any way to detect when a new longer vector is created? i.e. I know that this creates a new vector: x <- 1:10 x[11] <- 11L And this doesn't y <- list2env(as.list(x)) y$a <- 11 But does this? z <- as.list(x) z$a <- 11 And thanks to the off-list commenters who pointed out that x[5] <- 5 is duplicated because 5 is numeric, not integer (oops!) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
On 12/07/2012 18:20, Hadley Wickham wrote:
>> Read the help carefully as to what 'copy' means: >> >> When an object is traced any copying of the object by the C >> function ‘duplicate’ produces a message to standard output, as >> does type coercion and copying when passing arguments to ‘.C’ or >> ‘.Fortran’. >> >> If you want to understand when 'duplicate' is called, you need to read the >> source code. File src/main/subassign.c will explain the different paths >> taken by your two cases. But isn't it rather obvious that duplicating x is >> not useful when a new longer vector needs to be created? > > Thanks, that's useful. > > Is there any way to detect when a new longer vector is created? i.e. > I know that this creates a new vector: Not programmatically. > x <- 1:10 > x[11] <- 11L > > And this doesn't > > y <- list2env(as.list(x)) > y$a <- 11 > > But does this? > > z <- as.list(x) > z$a <- 11 Yes of course, as z is now of length 11. There is no provision in R to extend a vector except by creating a new one. (Well, there is at C level but I think it is not currently used.) > And thanks to the off-list commenters who pointed out that x[5] <- 5 > is duplicated because 5 is numeric, not integer (oops!) AFAIK, it does not actually duplicate: see 'type coercion' above. But note that x <- 1:10 tracemem(x) x[10:1] <- x necessarily duplicates. > > Hadley > -- Brian D. Ripley, [hidden email] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
>> But does this?
>> >> z <- as.list(x) >> z$a <- 11 > > Yes of course, as z is now of length 11. There is no provision in R to > extend a vector except by creating a new one. (Well, there is at C level > but I think it is not currently used.) I guess a better example is z <- list(a = 1:1e6, b = runif(1e6)) z$c <- 1 The list gets copied, but do a and b, or does the new list point to the existing locations? The following test suggests that it's a deep copy. x <- 1:1e7 z <- list(a = x) system.time(replicate(100, z$b <- 1L)) / 100 # ~ 0.05s system.time(replicate(100, x[1e6 + 1L] <- 1L)) / 100 # ~ 0.04s Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
> The list gets copied, but do a and b, or does the new list point to
> the existing locations? The following test suggests that it's a deep > copy. > > x <- 1:1e7 > z <- list(a = x) > > system.time(replicate(100, z$b <- 1L)) / 100 > # ~ 0.05s > system.time(replicate(100, x[1e6 + 1L] <- 1L)) / 100 > # ~ 0.04s But that should be system.time(replicate(100, x[1e7 + 1L] <- 1L)) / 100 # ~0.10s system.time(replicate(100, z$b <- 1L)) / 100 # ~ 0.04s which suggests that it's not a deep copy. But x <- 1:1e6 z <- list(a = x) system.time(replicate(100, z$b <- 1L)) / 100 # ~0.005s which suggests it's not a shallow copy either. But then neither of those are probably good tests because they modify in place. I'll think more. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Free forum by Nabble | Edit this page |