Quantcast

read.table segfaults

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

read.table segfaults

Göran Broström
 > fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t")

Program received signal SIGSEGV, Segmentation fault.
0x000000000041c2e1 in RunGenCollect (size_needed=8192000) at memory.c:1514
1514    PROCESS_NODES();
(gdb)

 > sessionInfo()
R version 2.13.1 Patched (2011-08-25 r56798)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
 >

The text file 'fil2_s.txt' is Huge, around 11 million records and 17
variables, but ...?



--
Göran Broström

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: read.table segfaults

Scott
This post was updated on .
It does look like you've got a memory issue. perhaps using
  as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments to read.table

if you don't specify these sorts of things, R can have to look through the file and figure out which columns are characters/factors etc and so the larger files cause more of a headache for R - but I don't have a complete knowledge of how this works. Hopefully someone else can comment further on this? I'd try toggling TRUE/FALSE for as.is and stringsAsFactors.

   do you have other objects loaded in memory as well? this file by itself might not be the problem - but it's a cumulative issue.
   have you checked the file structure in any other manner?
   how large (Mb/kb) is the file that you're trying to read?
   if you just read in parts of the file, is it okay?
      read.table(filename,header=FALSE,sep="\t",nrows=100)
      read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)

EDIT: try colClasses as well within read.table. also, does the file cause problems in the latest *released* version of R?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: read.table segfaults

Göran Broström
In reply to this post by Göran Broström
Another one:

The 'death.RData' was created about a year ago, but ...? Same info as below.

Göran

> load("../Data/death.RData")
> summary(death)

 *** caught segfault ***
address 0x40000e04959, cause 'memory not mapped'

Traceback:
 1: match(x, levels)
 2: factor(a, levels = ll[!(ll %in% exclude)], exclude = if (useNA ==
   "no") NA)
 3: table(object)
 4: summary.factor(X[[6L]], ...)
 5: FUN(X[[6L]], ...)
 6: lapply(as.list(object), summary, maxsum = maxsum, digits = 12,     ...)
 7: summary.data.frame(death)
 8: summary(death)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:


2011/8/26 Göran Broström <[hidden email]>:

>  > fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t")
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x000000000041c2e1 in RunGenCollect (size_needed=8192000) at memory.c:1514
> 1514        PROCESS_NODES();
> (gdb)
>
>  > sessionInfo()
> R version 2.13.1 Patched (2011-08-25 r56798)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>  >
>
> The text file 'fil2_s.txt' is Huge, around 11 million records and 17
> variables, but ...?
>
>
>
> --
> Göran Broström
>



--
Göran Broström

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: read.table segfaults

Göran Broström
One further note:

No problem with  R version 2.13.0 (2011-04-13)

Göran

2011/8/26 Göran Broström <[hidden email]>:

> Another one:
>
> The 'death.RData' was created about a year ago, but ...? Same info as below.
>
> Göran
>
>> load("../Data/death.RData")
>> summary(death)
>
>  *** caught segfault ***
> address 0x40000e04959, cause 'memory not mapped'
>
> Traceback:
>  1: match(x, levels)
>  2: factor(a, levels = ll[!(ll %in% exclude)], exclude = if (useNA ==
>   "no") NA)
>  3: table(object)
>  4: summary.factor(X[[6L]], ...)
>  5: FUN(X[[6L]], ...)
>  6: lapply(as.list(object), summary, maxsum = maxsum, digits = 12,     ...)
>  7: summary.data.frame(death)
>  8: summary(death)
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> Selection:
>
>
> 2011/8/26 Göran Broström <[hidden email]>:
>>  > fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t")
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x000000000041c2e1 in RunGenCollect (size_needed=8192000) at memory.c:1514
>> 1514        PROCESS_NODES();
>> (gdb)
>>
>>  > sessionInfo()
>> R version 2.13.1 Patched (2011-08-25 r56798)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>  >
>>
>> The text file 'fil2_s.txt' is Huge, around 11 million records and 17
>> variables, but ...?
>>
>>
>>
>> --
>> Göran Broström
>>
>
>
>
> --
> Göran Broström
>



--
Göran Broström

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: read.table segfaults

bbolker
In reply to this post by Scott
Scott <ncbi2r <at> googlemail.com> writes:

>
> It does look like you've got a memory issue. perhaps using
>   as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments
> to read.table
>
> if you don't specify these sorts of things, R can have to look through the
> file and figure out which columns are characters/factors etc and so the
> larger files cause more of a headache for R I'm guess. Hopefully someone
> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
> stringsAsFactors.
>
>    do you have other objects loaded in memory as well? this file by itself
> might not be the problem - but it's a cumulative issue.
>    have you checked the file structure in any other manner?
>    how large (Mb/kb) is the file that you're trying to read?
>    if you just read in parts of the file, is it okay?
>       read.table(filename,header=FALSE,sep="\t",nrows=100)
>       read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)

  There seem to be two issues here:

1. what can the original poster (OP) do to work around this problem?
(e.g. get the data into a relational data base and import it from
there; use something from the High Performance task view such as
ff or data.table ...)

2. reporting a bug -- according to the R FAQ, any low-level
(segmentation-fault-type) crash of R when one is not messing
around with dynamically loaded code constitutes a bug. Unfortunately,
debugging problems like this is a huge pain in the butt.

  Goran, can you randomly or systematically generate an
object of this size, write it to disk, read it back in, and
generate the same error?  In other words, does something like

set.seed(1001)
d <- data.frame(label=rep(LETTERS[1:11],1e6),
                values=matrix(rep(1.0,11*17*1e6),ncol=17)
write.table(d,file="big.txt")
read.table("big.txt")

do the same thing?

Reducing it to this kind of reproducible example will make
it possible for others to debug it without needing to gain
access to your huge file ...

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: read.table segfaults

Göran Broström
On Fri, Aug 26, 2011 at 11:55 PM, Ben Bolker <[hidden email]> wrote:

> Scott <ncbi2r <at> googlemail.com> writes:
>
>>
>> It does look like you've got a memory issue. perhaps using
>>   as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments
>> to read.table
>>
>> if you don't specify these sorts of things, R can have to look through the
>> file and figure out which columns are characters/factors etc and so the
>> larger files cause more of a headache for R I'm guess. Hopefully someone
>> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
>> stringsAsFactors.
>>
>>    do you have other objects loaded in memory as well? this file by itself
>> might not be the problem - but it's a cumulative issue.
>>    have you checked the file structure in any other manner?
>>    how large (Mb/kb) is the file that you're trying to read?
>>    if you just read in parts of the file, is it okay?
>>       read.table(filename,header=FALSE,sep="\t",nrows=100)
>>       read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)
>
>  There seem to be two issues here:
>
> 1. what can the original poster (OP) do to work around this problem?
> (e.g. get the data into a relational data base and import it from
> there; use something from the High Performance task view such as
> ff or data.table ...)

Interestingly, the text file was created by a selection from an SQL
data base. I have access to 'db2' on an ubuntu machine, I run, at the
bash prompt,

$ db2 < file2.sql

where file2.sql contains

connect to linnedb user goran using xxxxxxxxxxx
export to '/home/goran/ALC/SQL/fil2_s.txt' of del modified by coldelX09
 select  linneid, fodelsear, kon, ....... from u09021.fil2
connect reset

How do I get a direct connection between  R  and the data base 'linnedb'?

> 2. reporting a bug -- according to the R FAQ, any low-level
> (segmentation-fault-type) crash of R when one is not messing
> around with dynamically loaded code constitutes a bug. Unfortunately,
> debugging problems like this is a huge pain in the butt.
>
>  Goran, can you randomly or systematically generate an
> object of this size, write it to disk, read it back in, and
> generate the same error?  In other words, does something like
>
> set.seed(1001)
> d <- data.frame(label=rep(LETTERS[1:11],1e6),
>                values=matrix(rep(1.0,11*17*1e6),ncol=17)
> write.table(d,file="big.txt")
> read.table("big.txt")
>
> do the same thing?

No but I get new errors:

> ss <- read.table("big.txt")
Error in read.table("big.txt") : duplicate 'row.names' are not allowed

(there are no duplicates)

I tried to add an item to the first line and

> ss <- read.table("big.txt", header = TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 10610008 did not have 19 elements

which is wrong; that line has 19 elements.

Göran

> Reducing it to this kind of reproducible example will make
> it possible for others to debug it without needing to gain
> access to your huge file ...
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Göran Broström

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: read.table segfaults

Göran Broström
In reply to this post by Scott
On Fri, Aug 26, 2011 at 9:41 PM, Scott <[hidden email]> wrote:

> It does look like you've got a memory issue. perhaps using
>  as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments
> to read.table
>
> if you don't specify these sorts of things, R can have to look through the
> file and figure out which columns are characters/factors etc and so the
> larger files cause more of a headache for R I'm guess. Hopefully someone
> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
> stringsAsFactors.
>
>   do you have other objects loaded in memory as well? this file by itself
> might not be the problem - but it's a cumulative issue.
>   have you checked the file structure in any other manner?
>   how large (Mb/kb) is the file that you're trying to read?
>   if you just read in parts of the file, is it okay?
>      read.table(filename,header=FALSE,sep="\t",nrows=100)
>      read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)

Today, after a night's sleep, there are no segfaults! (The computer
also slept, I turned it off.)  So what is going on? Maybe I shouldn't
bother.... but I installed the latest patched version yesterday,
immediately tried to read the file with a segfault as a result, turned
the machine off and on, and no problems. Do we need to reboot after a
new install (note, this is not Windows)?

Göran

>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/read-table-segfaults-tp3771793p3771817.html
> Sent from the R devel mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Göran Broström

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...