Quantcast

help with read.table.ffdf parameters

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

help with read.table.ffdf parameters

Marck Vaisman
Hello fellow R users,

I am trying to read a 6.9 million row text file with 26 columns separated by
spaces into R using ff. When I specify a small number for first.rows,
next.rows and nrows it is read with no issue. However, when I try to specify
larger next.rows values and no nrows parameter to read the entire file, I
keep getting errors. Please see code below.

I am trying to this on a m1.large EC2 machine running R with 14.8 GB of
memory. I haven't been able to read the entire dataset into memory using
traditional read.table.

I guess I am not sure given the error message if I need to specify further
parameters.

Thank you,
Marck Vaisman

[hidden email]
http://www.linkedin.com/in/marckvaisman
http://twitter.com/#!/wahalulu <http://twitter.com/#%21/wahalulu>

> results.five <- read.table("./results/results.txt",
+                          header = F, nrows = 5)   # read 5 lines for
structure
> classes <- sapply(results.five, class)   # to specify colClasses
> classes
       V1        V2        V3        V4        V5        V6        V7
V8
"integer"  "factor" "integer" "integer" "integer" "integer" "integer"
"numeric"
       V9       V10       V11       V12       V13       V14       V15
V16
"numeric" "numeric" "numeric" "integer" "numeric" "numeric" "numeric"
"numeric"
      V17       V18       V19       V20       V21       V22       V23
V24
"integer" "numeric" "numeric" "numeric" "numeric"  "factor" "numeric"
"numeric"
      V25       V26
"numeric" "numeric"
> library(ff)
> results.ff <- read.table.ffdf(file = "./results/results.txt",
+                                     header = F,
+                                     colClasses = classes,
+                                     first.rows = 1000,
+                                     next.rows = 1000,
+                                     nrows = 10000)
> dim(results.ff)
[1] 10000    26
> results.ff <- read.table.ffdf(file = "./results/results.txt",
+                                     header = F,
+                                     colClasses = classes,
+                                     first.rows = 10000,
+                                     next.rows = 100000)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
  scan() expected 'an integer', got '3e+05'
> rff <- read.table.ffdf(file = "./results/results.txt",
+                                     header = F,
+                                     colClasses = classes,
+                                     first.rows = 10000,
+                                     next.rows = 100000)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
  scan() expected 'an integer', got '3e+05'
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help with read.table.ffdf parameters

MCOM
Marck,

A little late, but perhaps this will help someone in the future.  I am guessing that some of your "integer" fields contain scientific notation, and for some reason read.table is not interpreting those as integers.  Consider changing the affected column classes from "integer" to "numeric" and I bet you'll see better results.

Regards,

Matt
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: help with read.table.ffdf parameters

Bharat Warule
Thanks MCOM,

This is really helpful for me.
Bharat Warule
Pune
Loading...