RProfmem output format

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RProfmem output format

Hadley Wickham-2
Hi all,

When I run the example in RProfmem, I get:


     Rprofmem("Rprofmem.out", threshold=1000)
     example(glm)
     Rprofmem(NULL)
     noquote(readLines("Rprofmem.out", n=5))

...

[1] 1384 :5416 :5416 :1064 :1064 :"readRDS" "index.search" "example"
[2] 1064 :"readRDS" "index.search" "example"
[3] 4712 :"readRDS" "index.search" "example"
[4] 4712 :"readRDS" "index.search" "example"
[5] 1064 :"readRDS" "index.search" "example"

What do 5 the numbers in the first line mean?

In the subsequence lines I'm assuming the structure is bytes allocated : call.

Hadley


--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RProfmem output format

Thomas Lumley-2
On Sat, May 14, 2011 at 5:00 AM, Hadley Wickham <[hidden email]> wrote:

> Hi all,
>
> When I run the example in RProfmem, I get:
>
>
>     Rprofmem("Rprofmem.out", threshold=1000)
>     example(glm)
>     Rprofmem(NULL)
>     noquote(readLines("Rprofmem.out", n=5))
>
> ...
>
> [1] 1384 :5416 :5416 :1064 :1064 :"readRDS" "index.search" "example"
> [2] 1064 :"readRDS" "index.search" "example"
> [3] 4712 :"readRDS" "index.search" "example"
> [4] 4712 :"readRDS" "index.search" "example"
> [5] 1064 :"readRDS" "index.search" "example"
>
> What do 5 the numbers in the first line mean?
>
> In the subsequence lines I'm assuming the structure is bytes allocated : call.

I think the five numbers come from four memory allocations before
example() is called.  Looking at the code in src/main/memory.c, it
prints newline only when the call stack is not empty.

I don't see why this is done, and I may well be the person who did it
(I don't have svn on this computer to check), but it is clearly
deliberate.

   -thomas

--
Thomas Lumley
Professor of Biostatistics
University of Auckland

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RProfmem output format

Hadley Wickham-2
>> In the subsequence lines I'm assuming the structure is bytes allocated : call.
>
> I think the five numbers come from four memory allocations before
> example() is called.  Looking at the code in src/main/memory.c, it
> prints newline only when the call stack is not empty.

Looking into that example in more detail, here's the distribution of
allocation numbers:

   1    3    4
4621   30    2

(with a threshold of 5k)

So it happens ~30 times in total.

So what causes allocations when the call stack is empty?  Something
internal?  Does the garbage collector trigger allocations (i.e. could
it be caused by moving data to contiguous memory)?

Any ideas what the correct thing to do with these memory allocations?
Ignore them because they're not really related to the function they're
attributed to?  Sum them up?

> I don't see why this is done, and I may well be the person who did it
> (I don't have svn on this computer to check), but it is clearly
> deliberate.

It seems like it would be more consistent to always print a newline,
and then it would obvious those allocations occurred when the call
stack was empty.  This would make parsing the file a little bit
easier.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RProfmem output format

Hadley Wickham-2
Also, would you mind commenting how RProfmem is misleading?

There are three ways to profile memory use over time in R code. ...
All can be misleading, for different reasons.
--- http://cran.r-project.org/doc/manuals/R-exts.html#Profiling-R-code-for-memory-use

The other two ways describe why they are misleading.

Hadley

On Sun, May 15, 2011 at 8:02 AM, Hadley Wickham <[hidden email]> wrote:

>>> In the subsequence lines I'm assuming the structure is bytes allocated : call.
>>
>> I think the five numbers come from four memory allocations before
>> example() is called.  Looking at the code in src/main/memory.c, it
>> prints newline only when the call stack is not empty.
>
> Looking into that example in more detail, here's the distribution of
> allocation numbers:
>
>   1    3    4
> 4621   30    2
>
> (with a threshold of 5k)
>
> So it happens ~30 times in total.
>
> So what causes allocations when the call stack is empty?  Something
> internal?  Does the garbage collector trigger allocations (i.e. could
> it be caused by moving data to contiguous memory)?
>
> Any ideas what the correct thing to do with these memory allocations?
> Ignore them because they're not really related to the function they're
> attributed to?  Sum them up?
>
>> I don't see why this is done, and I may well be the person who did it
>> (I don't have svn on this computer to check), but it is clearly
>> deliberate.
>
> It seems like it would be more consistent to always print a newline,
> and then it would obvious those allocations occurred when the call
> stack was empty.  This would make parsing the file a little bit
> easier.
>
> Hadley
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>



--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RProfmem output format

Thomas Lumley-2
In reply to this post by Hadley Wickham-2
On Mon, May 16, 2011 at 1:02 AM, Hadley Wickham <[hidden email]> wrote:
> So what causes allocations when the call stack is empty?  Something
> internal?  Does the garbage collector trigger allocations (i.e. could
> it be caused by moving data to contiguous memory)?

The garbage collector doesn't move anything, it just swaps pointers in
a linked list.

The lexer, parser, and evaluator all have  to do some work before a
function context is set up for the top-level function, so I assume
that's where it is happening.

> Any ideas what the correct thing to do with these memory allocations?
> Ignore them because they're not really related to the function they're
> attributed to?  Sum them up?
>
>> I don't see why this is done, and I may well be the person who did it
>> (I don't have svn on this computer to check), but it is clearly
>> deliberate.
>
> It seems like it would be more consistent to always print a newline,
> and then it would obvious those allocations occurred when the call
> stack was empty.  This would make parsing the file a little bit
> easier.

Yes. It's obviously better to always print a newline, and so clearly
deliberate not to, that I suspect there may have been a good reason.
If I can't work it out (after my grant deadline this week) I will just
assume it's wrong.


   -thomas

--
Thomas Lumley
Professor of Biostatistics
University of Auckland

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RProfmem output format

Henrik Bengtsson-5
I'm picking up this 5-year old thread.

1. About the four memory allocations without a stacktrace

I think the four memory allocations without a stacktrace reported by Rprofmem():

> Rprofmem(); x <- raw(2000); Rprofmem("")
> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
192 :360 :360 :1064 :2040 :"raw"

are due to some initialization of R that is independent of Rprofmem(),
because they can be avoided if one allocates some memory before (in a
fresh R session):

> z <- raw(1000); dummy <- gc()
> Rprofmem(); x <- raw(2000); Rprofmem("")
> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
2040 :"raw"


2. About missing newlines when stacktrace is empty

As a refresher, the problem is that memory allocations an empty
stracktrace are reported without newlines, i.e.

192 :360 :360 :1064 :2040 :"raw"

The question is why this is not reported as:

192 :
360 :
360 :
1064 :
2040 :"raw"

This was/is because C function R_OutputStackTrace() - part of
src/main/memory.c  - looks like:

static void R_OutputStackTrace(FILE *file)
{
    int newline = 0;
    RCNTXT *cptr;

    for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
        if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
           && TYPEOF(cptr->call) == LANGSXP) {
           SEXP fun = CAR(cptr->call);
           if (!newline) newline = 1;
           fprintf(file, "\"%s\" ",
               TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
               "<Anonymous>");
        }
    }
    if (newline) fprintf(file, "\n");
}


Thomas, your last comment was:

> Yes. It's obviously better to always print a newline, and so clearly
> deliberate not to, that I suspect there may have been a good reason.
> If I can't work it out (after my grant deadline this week) I will just
> assume it's wrong.

When I search the code and the commit history
(https://github.com/wch/r-source/commit/3d5eb2a09f2d75893efdc8bbf1c72d17603886a0),
it appears that this was there from the very first commit.  Also,
searching the code for usages of R_OutputStackTrace(), I only find
R_ReportAllocation() and R_ReportNewPage(), both part of of
src/main/memory.c (see below).

static void R_ReportAllocation(R_size_t size)
{
    if (R_IsMemReporting) {
if(size > R_MemReportingThreshold) {
   fprintf(R_MemReportingOutfile, "%lu :", (unsigned long) size);
   R_OutputStackTrace(R_MemReportingOutfile);
}
    }
    return;
}

static void R_ReportNewPage(void)
{
    if (R_IsMemReporting) {
fprintf(R_MemReportingOutfile, "new page:");
R_OutputStackTrace(R_MemReportingOutfile);
    }
    return;
}


Could it be that when you wrote it you had another usage for
R_OutputStackTrace() in mind as well?  If so, it makes sense that
R_OutputStackTrace() shouldn't output a newline if the stack trace was
empty.  But if the above is the only usage, to me it looks pretty safe
to always add a newline.

> sessionInfo()
R version 3.3.0 Patched (2016-05-26 r70682)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

/Henrik


On Sun, May 15, 2011 at 1:16 PM, Thomas Lumley <[hidden email]> wrote:

> On Mon, May 16, 2011 at 1:02 AM, Hadley Wickham <[hidden email]> wrote:
>> So what causes allocations when the call stack is empty?  Something
>> internal?  Does the garbage collector trigger allocations (i.e. could
>> it be caused by moving data to contiguous memory)?
>
> The garbage collector doesn't move anything, it just swaps pointers in
> a linked list.
>
> The lexer, parser, and evaluator all have  to do some work before a
> function context is set up for the top-level function, so I assume
> that's where it is happening.
>
>> Any ideas what the correct thing to do with these memory allocations?
>> Ignore them because they're not really related to the function they're
>> attributed to?  Sum them up?
>>
>>> I don't see why this is done, and I may well be the person who did it
>>> (I don't have svn on this computer to check), but it is clearly
>>> deliberate.
>>
>> It seems like it would be more consistent to always print a newline,
>> and then it would obvious those allocations occurred when the call
>> stack was empty.  This would make parsing the file a little bit
>> easier.
>
> Yes. It's obviously better to always print a newline, and so clearly
> deliberate not to, that I suspect there may have been a good reason.
> If I can't work it out (after my grant deadline this week) I will just
> assume it's wrong.
>
>
>    -thomas
>
> --
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RProfmem output format

Henrik Bengtsson-5
UPDATE: Tomas Kalibera has fixed this bug (on missing newlines in
Rprofmem output) in R-devel r72747
(https://github.com/wch/r-source/commit/ba6665deace4a8dc239aebec977c17d0975fbc27).
  Using the example of this thread, Rprofmem() of R-devel now outputs
with newlines:

$ R Under development (unstable) (2017-05-30 r72747) -- "Unsuffered
Consequences"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
[...]

> Rprofmem(); x <- raw(2000); Rprofmem("")
> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
232 :
472 :
472 :
1064 :
2040 :"raw"

/Henrik

On Sat, Jun 4, 2016 at 12:40 PM, Henrik Bengtsson
<[hidden email]> wrote:

> I'm picking up this 5-year old thread.
>
> 1. About the four memory allocations without a stacktrace
>
> I think the four memory allocations without a stacktrace reported by Rprofmem():
>
>> Rprofmem(); x <- raw(2000); Rprofmem("")
>> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
> 192 :360 :360 :1064 :2040 :"raw"
>
> are due to some initialization of R that is independent of Rprofmem(),
> because they can be avoided if one allocates some memory before (in a
> fresh R session):
>
>> z <- raw(1000); dummy <- gc()
>> Rprofmem(); x <- raw(2000); Rprofmem("")
>> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
> 2040 :"raw"
>
>
> 2. About missing newlines when stacktrace is empty
>
> As a refresher, the problem is that memory allocations an empty
> stracktrace are reported without newlines, i.e.
>
> 192 :360 :360 :1064 :2040 :"raw"
>
> The question is why this is not reported as:
>
> 192 :
> 360 :
> 360 :
> 1064 :
> 2040 :"raw"
>
> This was/is because C function R_OutputStackTrace() - part of
> src/main/memory.c  - looks like:
>
> static void R_OutputStackTrace(FILE *file)
> {
>     int newline = 0;
>     RCNTXT *cptr;
>
>     for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
>         if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
>            && TYPEOF(cptr->call) == LANGSXP) {
>            SEXP fun = CAR(cptr->call);
>            if (!newline) newline = 1;
>            fprintf(file, "\"%s\" ",
>                TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
>                "<Anonymous>");
>         }
>     }
>     if (newline) fprintf(file, "\n");
> }
>
>
> Thomas, your last comment was:
>
>> Yes. It's obviously better to always print a newline, and so clearly
>> deliberate not to, that I suspect there may have been a good reason.
>> If I can't work it out (after my grant deadline this week) I will just
>> assume it's wrong.
>
> When I search the code and the commit history
> (https://github.com/wch/r-source/commit/3d5eb2a09f2d75893efdc8bbf1c72d17603886a0),
> it appears that this was there from the very first commit.  Also,
> searching the code for usages of R_OutputStackTrace(), I only find
> R_ReportAllocation() and R_ReportNewPage(), both part of of
> src/main/memory.c (see below).
>
> static void R_ReportAllocation(R_size_t size)
> {
>     if (R_IsMemReporting) {
> if(size > R_MemReportingThreshold) {
>    fprintf(R_MemReportingOutfile, "%lu :", (unsigned long) size);
>    R_OutputStackTrace(R_MemReportingOutfile);
> }
>     }
>     return;
> }
>
> static void R_ReportNewPage(void)
> {
>     if (R_IsMemReporting) {
> fprintf(R_MemReportingOutfile, "new page:");
> R_OutputStackTrace(R_MemReportingOutfile);
>     }
>     return;
> }
>
>
> Could it be that when you wrote it you had another usage for
> R_OutputStackTrace() in mind as well?  If so, it makes sense that
> R_OutputStackTrace() shouldn't output a newline if the stack trace was
> empty.  But if the above is the only usage, to me it looks pretty safe
> to always add a newline.
>
>> sessionInfo()
> R version 3.3.0 Patched (2016-05-26 r70682)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> /Henrik
>
>
> On Sun, May 15, 2011 at 1:16 PM, Thomas Lumley <[hidden email]> wrote:
>> On Mon, May 16, 2011 at 1:02 AM, Hadley Wickham <[hidden email]> wrote:
>>> So what causes allocations when the call stack is empty?  Something
>>> internal?  Does the garbage collector trigger allocations (i.e. could
>>> it be caused by moving data to contiguous memory)?
>>
>> The garbage collector doesn't move anything, it just swaps pointers in
>> a linked list.
>>
>> The lexer, parser, and evaluator all have  to do some work before a
>> function context is set up for the top-level function, so I assume
>> that's where it is happening.
>>
>>> Any ideas what the correct thing to do with these memory allocations?
>>> Ignore them because they're not really related to the function they're
>>> attributed to?  Sum them up?
>>>
>>>> I don't see why this is done, and I may well be the person who did it
>>>> (I don't have svn on this computer to check), but it is clearly
>>>> deliberate.
>>>
>>> It seems like it would be more consistent to always print a newline,
>>> and then it would obvious those allocations occurred when the call
>>> stack was empty.  This would make parsing the file a little bit
>>> easier.
>>
>> Yes. It's obviously better to always print a newline, and so clearly
>> deliberate not to, that I suspect there may have been a good reason.
>> If I can't work it out (after my grant deadline this week) I will just
>> assume it's wrong.
>>
>>
>>    -thomas
>>
>> --
>> Thomas Lumley
>> Professor of Biostatistics
>> University of Auckland
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...