Quantcast

read.csv size limits

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

read.csv size limits

andy1983
I have been using the read.csv function for a while now without any problems. My files are usually 20-50 MBs and they take up to a minute to import. They have all been under 50,000 rows and under 100 columns.

Recently, I tried importing a file of a similar size (which means about the same amount of data), but with ~500,000 columns and ~20 rows. The process is taking forever (~1 hour so far). In Task Manager, I see the CPU is at max, but memory slows down to a halt at around 50 MBs (far below memory limit).

Is this normal? Is there a way to optimize this operation or at least check the progress? Will this take 2 hours or 200 hours?

All I was trying to do is transpose my extra-wide table with a process that I assumed would take 5 minutes. Maybe R is not the solution I am looking for?

Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: read.csv size limits

jholtman
Have you used colClasses to define what each of the columns contain?  Can
you use 'scan'?  I haven't tried anything with 500,000 columns, but if they
are numeric, this should not take too long.  So I create a 20 line file with
500,000 columns and here is what it took reading it both as numeric and
character:

> system.time(x <- scan('/tempyy.txt', what=0))
Read 10000000 items
[1] 17.57  0.12 18.89    NA    NA
> str(x)
 num [1:10000000] 12345 12345 12345 12345 12345 ...
> system.time(x <- scan('/tempyy.txt', what=''))
Read 10000000 items
[1]  9.03  0.10 11.21    NA    NA
> str(x)
 chr [1:10000000] "12345" "12345" "12345" "12345" "12345" "12345" "12345"
"12345" ...
>





On 2/27/07, andy1983 <[hidden email]> wrote:

>
>
> I have been using the read.csv function for a while now without any
> problems.
> My files are usually 20-50 MBs and they take up to a minute to import.
> They
> have all been under 50,000 rows and under 100 columns.
>
> Recently, I tried importing a file of a similar size (which means about
> the
> same amount of data), but with ~500,000 columns and ~20 rows. The process
> is
> taking forever (~1 hour so far). In Task Manager, I see the CPU is at max,
> but memory slows down to a halt at around 50 MBs (far below memory limit).
>
> Is this normal? Is there a way to optimize this operation or at least
> check
> the progress? Will this take 2 hours or 200 hours?
>
> All I was trying to do is transpose my extra-wide table with a process
> that
> I assumed would take 5 minutes. Maybe R is not the solution I am looking
> for?
>
> Thanks.
>
> --
> View this message in context:
> http://www.nabble.com/read.csv-size-limits-tf3302366.html#a9186136
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...