Memory issues on a 64-bit debian system (quantreg)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Memory issues on a 64-bit debian system (quantreg)

Jonathan Greenberg
Rers:

    I installed R 2.9.0 from the Debian package manager on our amd64
system that currently has 6GB of RAM -- my first question is whether
this installation is a true 64-bit installation (should R have access to
 > 4GB of RAM?)  I suspect so, because I was running an rqss() (package
quantreg, installed via install.packages() -- I noticed it required a
compilation of the source) and watched the memory usage spike to 4.9GB
(my input data contains > 500,000 samples).

    With this said, after 30 mins or so of processing, I got the
following error:

tahoe_rq <-
rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.99,data=boundary_data)
Error: cannot allocate vector of size 1.5 Gb

    The dataset is a bit big (300mb or so), so I'm not providing it
unless necessary to solve this memory problem.

    Thoughts?  Do I need to compile either the main R "by hand" or the
quantreg package?

--j

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Memory issues on a 64-bit debian system (quantreg)

RKoenker
Jonathan,

Take a look at the output of sessionInfo(), it should say x86-64 if  
you have a 64bit installation, or at least I think this is the case.

Regarding rqss(),  my experience is that (usually) memory problems are  
due to the fact that early on the processing there is
a call to model.matrix()  which is supposed to create a design, aka X,  
matrix  for the problem.  This matrix is then coerced to
matrix.csr sparse format, but the dense form is often too big for the  
machine to cope with.  Ideally, someone would write an
R version of model.matrix that would permit building the matrix in  
sparse form from the get-go, but this is a non-trivial task.
(Or at least so it appeared to me when I looked into it a few years  
ago.)  An option is to roll your own X matrix:  take a smalller
version of the data, apply the formula, look at the structure of X and  
then try to make a sparse version of the full X matrix.
This is usually not that difficult, but "usually" is based on a rather  
small sample that may not be representative of your problems.

Hope that this helps,

Roger

url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Urbana, IL 61801



On Jun 24, 2009, at 4:07 PM, Jonathan Greenberg wrote:

> Rers:
>
>   I installed R 2.9.0 from the Debian package manager on our amd64  
> system that currently has 6GB of RAM -- my first question is whether  
> this installation is a true 64-bit installation (should R have  
> access to > 4GB of RAM?)  I suspect so, because I was running an  
> rqss() (package quantreg, installed via install.packages() -- I  
> noticed it required a compilation of the source) and watched the  
> memory usage spike to 4.9GB (my input data contains > 500,000  
> samples).
>
>   With this said, after 30 mins or so of processing, I got the  
> following error:
>
> tahoe_rq <-  
> rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.
> 99,data=boundary_data)
> Error: cannot allocate vector of size 1.5 Gb
>
>   The dataset is a bit big (300mb or so), so I'm not providing it  
> unless necessary to solve this memory problem.
>
>   Thoughts?  Do I need to compile either the main R "by hand" or the  
> quantreg package?
>
> --j
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Memory issues on a 64-bit debian system (quantreg)

Dirk Eddelbuettel
In reply to this post by Jonathan Greenberg

On 24 June 2009 at 14:07, Jonathan Greenberg wrote:
|     I installed R 2.9.0 from the Debian package manager on our amd64
| system that currently has 6GB of RAM -- my first question is whether
| this installation is a true 64-bit installation (should R have access to
|  > 4GB of RAM?)  I suspect so, because I was running an rqss() (package
| quantreg, installed via install.packages() -- I noticed it required a
| compilation of the source) and watched the memory usage spike to 4.9GB
| (my input data contains > 500,000 samples).

As yu suspect, that's proof enough :)

With a 32-bit OS, even when the system has so much ram, you'd never get to
allocate that much to a single process. I can't recall the hard limit but I
think the effective limit I have seen with R on 32bit systems with 8 gb was
around 3 gb.  So you are on 64 bit and you are squeezing the existing
hardware as well as you can.

|     With this said, after 30 mins or so of processing, I got the
| following error:
|
| tahoe_rq <-
| rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.99,data=boundary_data)
| Error: cannot allocate vector of size 1.5 Gb

R needs an additional 1.5 GiB which it tends to need as contiguous memory.

|     The dataset is a bit big (300mb or so), so I'm not providing it
| unless necessary to solve this memory problem.
|
|     Thoughts?  Do I need to compile either the main R "by hand" or the
| quantreg package?

No, you are as far as you get for free. Rebuilding R or quantreg does not
change anything.

Now you either need to buy more ram (the system is likely capabaly of 16 gb
if not more) or parcel your data into smaller chunks, ie re-work your
analysis.

Dirk

--
Three out of two people have difficulties with fractions.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Memory issues on a 64-bit debian system (quantreg)

Jonathan Greenberg
In reply to this post by RKoenker
Yep, its looking like a memory issue -- we have 6GB RAM and 1GB swap --
I did notice that the analysis takes far less memory (and runs) if I:

tahoe_rq <-
rqss(ltbmu_4_stemsha_30m_exp.img~ltbmu_eto_annual_mm.img,tau=.99,data=boundary_data)
    (which I assume fits a line to the quantiles)
vs.
tahoe_rq <-
rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.99,data=boundary_data)
    (which is fitting a spline)

Unless anyone else has any hints as to whether or not I'm making a
mistake in my call (beyond randomly subsetting the data -- I'd like to
run the analysis on the full dataset to begin with) -- I'd like to fit a
spline to the upper 1% of the data, I'll just wait until my new computer
comes in next week which has more RAM.  Thanks!

--j


roger koenker wrote:

> Jonathan,
>
> Take a look at the output of sessionInfo(), it should say x86-64 if
> you have a 64bit installation, or at least I think this is the case.
>
> Regarding rqss(),  my experience is that (usually) memory problems are
> due to the fact that early on the processing there is
> a call to model.matrix()  which is supposed to create a design, aka X,
> matrix  for the problem.  This matrix is then coerced to
> matrix.csr sparse format, but the dense form is often too big for the
> machine to cope with.  Ideally, someone would write an
> R version of model.matrix that would permit building the matrix in
> sparse form from the get-go, but this is a non-trivial task.
> (Or at least so it appeared to me when I looked into it a few years
> ago.)  An option is to roll your own X matrix:  take a smalller
> version of the data, apply the formula, look at the structure of X and
> then try to make a sparse version of the full X matrix.
> This is usually not that difficult, but "usually" is based on a rather
> small sample that may not be representative of your problems.
>
> Hope that this helps,
>
> Roger
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Urbana, IL 61801
>
>
>
> On Jun 24, 2009, at 4:07 PM, Jonathan Greenberg wrote:
>
>> Rers:
>>
>>   I installed R 2.9.0 from the Debian package manager on our amd64
>> system that currently has 6GB of RAM -- my first question is whether
>> this installation is a true 64-bit installation (should R have access
>> to > 4GB of RAM?)  I suspect so, because I was running an rqss()
>> (package quantreg, installed via install.packages() -- I noticed it
>> required a compilation of the source) and watched the memory usage
>> spike to 4.9GB (my input data contains > 500,000 samples).
>>
>>   With this said, after 30 mins or so of processing, I got the
>> following error:
>>
>> tahoe_rq <-
>> rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.99,data=boundary_data)
>>
>> Error: cannot allocate vector of size 1.5 Gb
>>
>>   The dataset is a bit big (300mb or so), so I'm not providing it
>> unless necessary to solve this memory problem.
>>
>>   Thoughts?  Do I need to compile either the main R "by hand" or the
>> quantreg package?
>>
>> --j
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Memory issues on a 64-bit debian system (quantreg)

RKoenker
my earlier comment is probably irrelevant since you are fitting only  
one qss component and have no other covariates.
A word of warning though when you go back to this on your new  machine  
-- you are almost surely going to want to specify
a large lambda for the qss component  in the rqss call.  The default  
of 1 is likely to produce something very very rough with
such a large dataset.


url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Urbana, IL 61801



On Jun 24, 2009, at 5:04 PM, Jonathan Greenberg wrote:

> Yep, its looking like a memory issue -- we have 6GB RAM and 1GB swap  
> -- I did notice that the analysis takes far less memory (and runs)  
> if I:
>
> tahoe_rq <-  
> rqss(ltbmu_4_stemsha_30m_exp.img~ltbmu_eto_annual_mm.img,tau=.
> 99,data=boundary_data)
>   (which I assume fits a line to the quantiles)
> vs.
> tahoe_rq <-  
> rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.
> 99,data=boundary_data)
>   (which is fitting a spline)
>
> Unless anyone else has any hints as to whether or not I'm making a  
> mistake in my call (beyond randomly subsetting the data -- I'd like  
> to run the analysis on the full dataset to begin with) -- I'd like  
> to fit a spline to the upper 1% of the data, I'll just wait until my  
> new computer comes in next week which has more RAM.  Thanks!
>
> --j
>
>
> roger koenker wrote:
>> Jonathan,
>>
>> Take a look at the output of sessionInfo(), it should say x86-64 if  
>> you have a 64bit installation, or at least I think this is the case.
>>
>> Regarding rqss(),  my experience is that (usually) memory problems  
>> are due to the fact that early on the processing there is
>> a call to model.matrix()  which is supposed to create a design, aka  
>> X, matrix  for the problem.  This matrix is then coerced to
>> matrix.csr sparse format, but the dense form is often too big for  
>> the machine to cope with.  Ideally, someone would write an
>> R version of model.matrix that would permit building the matrix in  
>> sparse form from the get-go, but this is a non-trivial task.
>> (Or at least so it appeared to me when I looked into it a few years  
>> ago.)  An option is to roll your own X matrix:  take a smalller
>> version of the data, apply the formula, look at the structure of X  
>> and then try to make a sparse version of the full X matrix.
>> This is usually not that difficult, but "usually" is based on a  
>> rather small sample that may not be representative of your problems.
>>
>> Hope that this helps,
>>
>> Roger
>>
>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>> email    [hidden email]            Department of Economics
>> vox:     217-333-4558                University of Illinois
>> fax:       217-244-6678                Urbana, IL 61801
>>
>>
>>
>> On Jun 24, 2009, at 4:07 PM, Jonathan Greenberg wrote:
>>
>>> Rers:
>>>
>>>  I installed R 2.9.0 from the Debian package manager on our amd64  
>>> system that currently has 6GB of RAM -- my first question is  
>>> whether this installation is a true 64-bit installation (should R  
>>> have access to > 4GB of RAM?)  I suspect so, because I was running  
>>> an rqss() (package quantreg, installed via install.packages() -- I  
>>> noticed it required a compilation of the source) and watched the  
>>> memory usage spike to 4.9GB (my input data contains > 500,000  
>>> samples).
>>>
>>>  With this said, after 30 mins or so of processing, I got the  
>>> following error:
>>>
>>> tahoe_rq <-  
>>> rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.
>>> 99,data=boundary_data)
>>> Error: cannot allocate vector of size 1.5 Gb
>>>
>>>  The dataset is a bit big (300mb or so), so I'm not providing it  
>>> unless necessary to solve this memory problem.
>>>
>>>  Thoughts?  Do I need to compile either the main R "by hand" or  
>>> the quantreg package?
>>>
>>> --j
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Memory issues on a 64-bit debian system (quantreg)

Jonathan Greenberg
Just wanted to leave a note on this, after I got my new iMac (and
installed R64 from the AT&T site) -- quantreg did run, after topping out
at whopping 12GB of swap space (MacOS X, at least, should theoretically
have as much swap space as there is space on the HD -- it will
dynamically increase it as memory usage goes up).  I did get a "caught
segfault" error but it wasn't until I did a ?rqss and clicked on a PDF
vignette in the help browser (I was able to summary(tahoe_rq) with no
problem).  I don't know if the mac help browser has some issue under 64
bit systems, may be worth looking into.

I figure its best to first work out the parameters (tau) with a random
subset first, at least for efficiency sake, then deploy the algorithm on
the entire dataset.

--j

roger koenker wrote:

> my earlier comment is probably irrelevant since you are fitting only
> one qss component and have no other covariates.
> A word of warning though when you go back to this on your new  machine
> -- you are almost surely going to want to specify
> a large lambda for the qss component  in the rqss call.  The default
> of 1 is likely to produce something very very rough with
> such a large dataset.
>
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Urbana, IL 61801
>
>
>
> On Jun 24, 2009, at 5:04 PM, Jonathan Greenberg wrote:
>
>> Yep, its looking like a memory issue -- we have 6GB RAM and 1GB swap
>> -- I did notice that the analysis takes far less memory (and runs) if I:
>>
>> tahoe_rq <-
>> rqss(ltbmu_4_stemsha_30m_exp.img~ltbmu_eto_annual_mm.img,tau=.99,data=boundary_data)
>>
>>   (which I assume fits a line to the quantiles)
>> vs.
>> tahoe_rq <-
>> rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.99,data=boundary_data)
>>
>>   (which is fitting a spline)
>>
>> Unless anyone else has any hints as to whether or not I'm making a
>> mistake in my call (beyond randomly subsetting the data -- I'd like
>> to run the analysis on the full dataset to begin with) -- I'd like to
>> fit a spline to the upper 1% of the data, I'll just wait until my new
>> computer comes in next week which has more RAM.  Thanks!
>>
>> --j
>>
>>
>> roger koenker wrote:
>>> Jonathan,
>>>
>>> Take a look at the output of sessionInfo(), it should say x86-64 if
>>> you have a 64bit installation, or at least I think this is the case.
>>>
>>> Regarding rqss(),  my experience is that (usually) memory problems
>>> are due to the fact that early on the processing there is
>>> a call to model.matrix()  which is supposed to create a design, aka
>>> X, matrix  for the problem.  This matrix is then coerced to
>>> matrix.csr sparse format, but the dense form is often too big for
>>> the machine to cope with.  Ideally, someone would write an
>>> R version of model.matrix that would permit building the matrix in
>>> sparse form from the get-go, but this is a non-trivial task.
>>> (Or at least so it appeared to me when I looked into it a few years
>>> ago.)  An option is to roll your own X matrix:  take a smalller
>>> version of the data, apply the formula, look at the structure of X
>>> and then try to make a sparse version of the full X matrix.
>>> This is usually not that difficult, but "usually" is based on a
>>> rather small sample that may not be representative of your problems.
>>>
>>> Hope that this helps,
>>>
>>> Roger
>>>
>>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>>> email    [hidden email]            Department of Economics
>>> vox:     217-333-4558                University of Illinois
>>> fax:       217-244-6678                Urbana, IL 61801
>>>
>>>
>>>
>>> On Jun 24, 2009, at 4:07 PM, Jonathan Greenberg wrote:
>>>
>>>> Rers:
>>>>
>>>>  I installed R 2.9.0 from the Debian package manager on our amd64
>>>> system that currently has 6GB of RAM -- my first question is
>>>> whether this installation is a true 64-bit installation (should R
>>>> have access to > 4GB of RAM?)  I suspect so, because I was running
>>>> an rqss() (package quantreg, installed via install.packages() -- I
>>>> noticed it required a compilation of the source) and watched the
>>>> memory usage spike to 4.9GB (my input data contains > 500,000
>>>> samples).
>>>>
>>>>  With this said, after 30 mins or so of processing, I got the
>>>> following error:
>>>>
>>>> tahoe_rq <-
>>>> rqss(ltbmu_4_stemsha_30m_exp.img~qss(ltbmu_eto_annual_mm.img),tau=.99,data=boundary_data)
>>>>
>>>> Error: cannot allocate vector of size 1.5 Gb
>>>>
>>>>  The dataset is a bit big (300mb or so), so I'm not providing it
>>>> unless necessary to solve this memory problem.
>>>>
>>>>  Thoughts?  Do I need to compile either the main R "by hand" or the
>>>> quantreg package?
>>>>
>>>> --j
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Cell: 415-794-5043
AIM: jgrn307, MSN: [hidden email], Gchat: jgrn307

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.