Quantcast

Understanding tracemem

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Understanding tracemem

Hadley Wickham-2
Hi all,

I've been trying to get a better handle on what manipulations lead R
to duplicate a vector, creating small experiments and using tracemem
to observe what happens (all in 2.15.1). That's lead me to a few
questions, illustrated using the snippet below.

x <- 1:10
tracemem(x)
# [1] "<0x1058f8238>"
x[5] <- 5
# tracemem[0x1058f8238 -> 0x105994ab0]:
x[11] <- 11

Why does x[5] <- 5 create a copy, when x[11] (which should be
extending a vector does not) ?  I can understand that maybe x[5] <- 5
hasn't yet been optimised to not make a copy, but if that's the case
then why doesn't x[11] <- 11 make one? I thought it might be because
somehow tracemem loses track, but adding an additional tracemem(x)
after x[5] <- 5 doesn't change the output.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Understanding tracemem

Matthew Dowle
Hadley Wickham <hadley <at> rice.edu> writes:

> Why does x[5] <- 5 create a copy

That assigns 5 not 5L. x is being coerced from integer to double.

x[5] <- 5L doesn't copy.

> , when x[11] (which should be
> extending a vector does not) ?  I can understand that maybe x[5] <- 5
> hasn't yet been optimised to not make a copy, but if that's the case
> then why doesn't x[11] <- 11 make one?

Extending a vector is creating a new (longer) vector and copying the old
(shorter) one in.  That's different to duplicate().  tracemem only reports
calls to duplicate().

Matthew

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Understanding tracemem

Prof Brian Ripley
In reply to this post by Hadley Wickham-2
Read the help carefully as to what 'copy' means:

      When an object is traced any copying of the object by the C
      function ‘duplicate’ produces a message to standard output, as
      does type coercion and copying when passing arguments to ‘.C’ or
      ‘.Fortran’.

If you want to understand when 'duplicate' is called, you need to read
the source code.  File src/main/subassign.c will explain the different
paths taken by your two cases.  But isn't it rather obvious that
duplicating x is not useful when a new longer vector needs to be created?

(BTW, in earlier versions of R tracemem reported some transformations of
x to objects of the same length, but not at all consistently.]

On 12/07/2012 17:15, Hadley Wickham wrote:

> Hi all,
>
> I've been trying to get a better handle on what manipulations lead R
> to duplicate a vector, creating small experiments and using tracemem
> to observe what happens (all in 2.15.1). That's lead me to a few
> questions, illustrated using the snippet below.
>
> x <- 1:10
> tracemem(x)
> # [1] "<0x1058f8238>"
> x[5] <- 5
> # tracemem[0x1058f8238 -> 0x105994ab0]:
> x[11] <- 11
>
> Why does x[5] <- 5 create a copy, when x[11] (which should be
> extending a vector does not) ?  I can understand that maybe x[5] <- 5
> hasn't yet been optimised to not make a copy, but if that's the case
> then why doesn't x[11] <- 11 make one? I thought it might be because
> somehow tracemem loses track, but adding an additional tracemem(x)
> after x[5] <- 5 doesn't change the output.
>
> Hadley
>


--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Understanding tracemem

Hadley Wickham-2
> Read the help carefully as to what 'copy' means:
>
>      When an object is traced any copying of the object by the C
>      function ‘duplicate’ produces a message to standard output, as
>      does type coercion and copying when passing arguments to ‘.C’ or
>      ‘.Fortran’.
>
> If you want to understand when 'duplicate' is called, you need to read the
> source code.  File src/main/subassign.c will explain the different paths
> taken by your two cases.  But isn't it rather obvious that duplicating x is
> not useful when a new longer vector needs to be created?

Thanks, that's useful.

Is there any way to detect when a new longer vector is created?  i.e.
I know that this creates a new vector:

x <- 1:10
x[11] <- 11L

And this doesn't

y <- list2env(as.list(x))
y$a <- 11

But does this?

z <- as.list(x)
z$a <- 11

And thanks to the off-list commenters who pointed out that x[5] <- 5
is duplicated because 5 is numeric, not integer (oops!)

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Understanding tracemem

Prof Brian Ripley
On 12/07/2012 18:20, Hadley Wickham wrote:

>> Read the help carefully as to what 'copy' means:
>>
>>       When an object is traced any copying of the object by the C
>>       function ‘duplicate’ produces a message to standard output, as
>>       does type coercion and copying when passing arguments to ‘.C’ or
>>       ‘.Fortran’.
>>
>> If you want to understand when 'duplicate' is called, you need to read the
>> source code.  File src/main/subassign.c will explain the different paths
>> taken by your two cases.  But isn't it rather obvious that duplicating x is
>> not useful when a new longer vector needs to be created?
>
> Thanks, that's useful.
>
> Is there any way to detect when a new longer vector is created?  i.e.
> I know that this creates a new vector:

Not programmatically.

> x <- 1:10
> x[11] <- 11L
>
> And this doesn't
>
> y <- list2env(as.list(x))
> y$a <- 11
>
> But does this?
>
> z <- as.list(x)
> z$a <- 11

Yes of course, as z is now of length 11.  There is no provision in R to
extend a vector except by creating a new one.  (Well, there is at C
level but I think it is not currently used.)

> And thanks to the off-list commenters who pointed out that x[5] <- 5
> is duplicated because 5 is numeric, not integer (oops!)

AFAIK, it does not actually duplicate: see 'type coercion' above.  But
note that

x <- 1:10
tracemem(x)
x[10:1] <- x

necessarily duplicates.

>
> Hadley
>


--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Understanding tracemem

Hadley Wickham-2
>> But does this?
>>
>> z <- as.list(x)
>> z$a <- 11
>
> Yes of course, as z is now of length 11.  There is no provision in R to
> extend a vector except by creating a new one.  (Well, there is at C level
> but I think it is not currently used.)

I guess a better example is

z <- list(a = 1:1e6, b = runif(1e6))
z$c <- 1

The list gets copied, but do a and b, or does the new list point to
the existing locations?  The following test suggests that it's a deep
copy.

x <- 1:1e7
z <- list(a = x)

system.time(replicate(100, z$b <- 1L)) / 100
# ~ 0.05s
system.time(replicate(100, x[1e6 + 1L] <- 1L)) / 100
# ~ 0.04s

Hadley


--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Understanding tracemem

Hadley Wickham-2
> The list gets copied, but do a and b, or does the new list point to
> the existing locations?  The following test suggests that it's a deep
> copy.
>
> x <- 1:1e7
> z <- list(a = x)
>
> system.time(replicate(100, z$b <- 1L)) / 100
> # ~ 0.05s
> system.time(replicate(100, x[1e6 + 1L] <- 1L)) / 100
> # ~ 0.04s

But that should be

system.time(replicate(100, x[1e7 + 1L] <- 1L)) / 100
# ~0.10s
system.time(replicate(100, z$b <- 1L)) / 100
# ~ 0.04s

which suggests that it's not a deep copy.

But

x <- 1:1e6
z <- list(a = x)
system.time(replicate(100, z$b <- 1L)) / 100
# ~0.005s

which suggests it's not a shallow copy either.


But then neither of those are probably good tests because they modify
in place.  I'll think more.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...