Reading in 9.6GB .DAT File - OK with 64-bit R?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading in 9.6GB .DAT File - OK with 64-bit R?

RHelpPlease
Hi there,
I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine) - to then delete a substantial number of rows & then convert to a .csv file.  Upon the first attempt the computer crashed (at some point last night).

I'm rerunning this now & am closely monitoring Processor/CPU/Memory.

Apart from this crash being a computer issue alone (possibly), is R equipped to handle this much data?  I read up on the FAQs page that 64-bit R can handle larger data sets than 32-bit.

I'm using the read.fwf function to read in the data.  I don't have access to a database program (SQL, for instance).

Advice is most appreciated!

Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

Jeff Newmiller
My opinion is that you should be spending your effort on setting up a SQL engine and importing it there. If you have 32GB of RAM your current direction might work, but working with sampled data rather than population data seems pretty typical for statistical analysis.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

RHelpPlease <[hidden email]> wrote:

>Hi there,
>I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows
>machine)
>- to then delete a substantial number of rows & then convert to a .csv
>file.
>Upon the first attempt the computer crashed (at some point last night).
>
>I'm rerunning this now & am closely monitoring Processor/CPU/Memory.
>
>Apart from this crash being a computer issue alone (possibly), is R
>equipped
>to handle this much data?  I read up on the FAQs page that 64-bit R can
>handle larger data sets than 32-bit.
>
>I'm using the read.fwf function to read in the data.  I don't have
>access to
>a database program (SQL, for instance).
>
>Advice is most appreciated!
>
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4457220.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

Steve Lianoglou-6
In reply to this post by RHelpPlease
Hi,

On Thu, Mar 8, 2012 at 1:19 PM, RHelpPlease <[hidden email]> wrote:

> Hi there,
> I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine)
> - to then delete a substantial number of rows & then convert to a .csv file.
> Upon the first attempt the computer crashed (at some point last night).
>
> I'm rerunning this now & am closely monitoring Processor/CPU/Memory.
>
> Apart from this crash being a computer issue alone (possibly), is R equipped
> to handle this much data?  I read up on the FAQs page that 64-bit R can
> handle larger data sets than 32-bit.
>
> I'm using the read.fwf function to read in the data.  I don't have access to
> a database program (SQL, for instance).

Keep in mind that sqlite3 is just a `install.packages('RSQLite')` away ...

and this SO thread might be useful w.r.t sqlite performance and big db files:

http://stackoverflow.com/questions/784173

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

barry rowlingson
In reply to this post by RHelpPlease
On Thu, Mar 8, 2012 at 6:19 PM, RHelpPlease <[hidden email]> wrote:
> Hi there,
> I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine)
> - to then delete a substantial number of rows & then convert to a .csv file.
> Upon the first attempt the computer crashed (at some point last night).

 If you are trying to delete a substantial number of rows as a one-off
operation to get a small dataset, then you might be better filtering
it with a tool like perl, awk, or sed - something that reads a line at
a time, processes it, and then perhaps writes a line output.

 For example, suppose you only want lines where the 25th character in
each line is an 'X'. Then all you do is:

 awk 'substr($0,25,1)=="X"' < bigfile.dat >justX.dat

Here I've used awk to filter input based on a condition. It never
reads in the whole file so memory usage isn't a problem.

 Awk for windows is available, possibly as a native version or as part
of Cygwin.

 You could do a similar thing in R by opening a text connection to
your file and reading one line at a time, writing the modified or
selected lines to a new file.

Barry

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

RHelpPlease
In reply to this post by Steve Lianoglou-6
Hi Jeff & Steve,
Thanks for your responses.  After seven hours R/machine ran out of memory (and thus ended).  Currently the machine has 4GB RAM.  I'm looking to install more RAM tomorrow.

I will look into SQLLite3; thanks!

I've read that SQL would be a great program for data of this size (read-in, manipulate), but I understand there is a hefty price tag (similar to the cost of SAS? [licensing]).  At this time I'm looking for a low-cost solution, if possible.  After this data event, a program like SQL would not be needed in the future; also, with these multiple data sets to synthesize, only a handful are of this size.

Thanks & please lend any other advice!
Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

Sarah Goslee
Hi,

On Thu, Mar 8, 2012 at 6:45 PM, RHelpPlease <[hidden email]> wrote:
> Hi Jeff & Steve,
> Thanks for your responses.  After seven hours R/machine ran out of memory
> (and thus ended).  Currently the machine has 4GB RAM.  I'm looking to
> install more RAM tomorrow.

You can't load a 9.6GB dataset into 4GB of RAM.

> I will look into SQLLite3; thanks!
>
> I've read that SQL would be a great program for data of this size (read-in,
> manipulate), but I understand there is a hefty price tag (similar to the
> cost of SAS? [licensing]).  At this time I'm looking for a low-cost
> solution, if possible.  After this data event, a program like SQL would not
> be needed in the future; also, with these multiple data sets to synthesize,
> only a handful are of this size.

sqlite3 is free, and doesn't require you to set up a server and client. More
powerful relational database solutions like MySQL and postgreSQL are
also available for free, but require more effort to configure and use.

> Thanks & please lend any other advice!

Sarah

--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

RHelpPlease
In reply to this post by barry rowlingson
Hi Barry,

"You could do a similar thing in R by opening a text connection to
your file and reading one line at a time, writing the modified or
selected lines to a new file."

Great!  I'm aware of this existing, but don't know the commands for R.  I have a variable [560,1] to use to pare down the incoming large data set (I'm sure of millions of rows).  With other data sets they've been small enough where I've been able to use the merge function after data has been read in.  Obviously I'm having trouble reading in this large data set in in the first place.

Any additional help would be great!  
Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

RHelpPlease
In reply to this post by Sarah Goslee
Hi Sarah,
Thanks for the SQL info!  I'll look into these straightaway, along with the notion of opening a text connection.

Thanks again!

Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

Gabor Grothendieck
In reply to this post by RHelpPlease
On Thu, Mar 8, 2012 at 1:19 PM, RHelpPlease <[hidden email]> wrote:

> Hi there,
> I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine)
> - to then delete a substantial number of rows & then convert to a .csv file.
> Upon the first attempt the computer crashed (at some point last night).
>
> I'm rerunning this now & am closely monitoring Processor/CPU/Memory.
>
> Apart from this crash being a computer issue alone (possibly), is R equipped
> to handle this much data?  I read up on the FAQs page that 64-bit R can
> handle larger data sets than 32-bit.
>
> I'm using the read.fwf function to read in the data.  I don't have access to
> a database program (SQL, for instance).

   # next line installs the sqldf package and all its dependencies
including sqlite
   install.packages("sqldf")


   library(sqldf)
   DF <- read.csv.sql("bigfile.csv", sql = "select * from file where a
> 3", ...other args...)

The single line creates an sqlite database, creates an appropriate
table layout for your data, reads your data into the table, performs
the sql statement and then only after all that reads it into R.  It
then destroys the database it created.

Replace "bigfile.csv" with the name of your file and where a > 3 with
your condition.  Also the ...other args... parts should specify the
format of your file.

See ?read.csv.sql
and also http://sqldf.googlecode.com

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reading in 9.6GB .DAT File - OK with 64-bit R?

Jan van der LAan-2
In reply to this post by RHelpPlease

You could also have a look at the LaF package which is written to  
handle large text files:

http://cran.r-project.org/web/packages/LaF/index.html

Under the vignettes you'll find a manual.

Note: LaF does not help you to fit 9GB of data in 4GB of memory, but  
it could help you reading your file block by block and filtering it.

Jan






RHelpPlease <[hidden email]> schreef:

> Hi Barry,
>
> "You could do a similar thing in R by opening a text connection to
> your file and reading one line at a time, writing the modified or
> selected lines to a new file."
>
> Great!  I'm aware of this existing, but don't know the commands for R.  I
> have a variable [560,1] to use to pare down the incoming large data set (I'm
> sure of millions of rows).  With other data sets they've been small enough
> where I've been able to use the merge function after data has been read in.
> Obviously I'm having trouble reading in this large data set in in the first
> place.
>
> Any additional help would be great!
>
>
> --
> View this message in context:  
> http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4458074.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.