|
> fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t")
Program received signal SIGSEGV, Segmentation fault. 0x000000000041c2e1 in RunGenCollect (size_needed=8192000) at memory.c:1514 1514 PROCESS_NODES(); (gdb) > sessionInfo() R version 2.13.1 Patched (2011-08-25 r56798) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > The text file 'fil2_s.txt' is Huge, around 11 million records and 17 variables, but ...? -- Göran Broström ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
This post was updated on .
It does look like you've got a memory issue. perhaps using
as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments to read.table if you don't specify these sorts of things, R can have to look through the file and figure out which columns are characters/factors etc and so the larger files cause more of a headache for R - but I don't have a complete knowledge of how this works. Hopefully someone else can comment further on this? I'd try toggling TRUE/FALSE for as.is and stringsAsFactors. do you have other objects loaded in memory as well? this file by itself might not be the problem - but it's a cumulative issue. have you checked the file structure in any other manner? how large (Mb/kb) is the file that you're trying to read? if you just read in parts of the file, is it okay? read.table(filename,header=FALSE,sep="\t",nrows=100) read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100) EDIT: try colClasses as well within read.table. also, does the file cause problems in the latest *released* version of R? |
|
In reply to this post by Göran Broström
Another one:
The 'death.RData' was created about a year ago, but ...? Same info as below. Göran > load("../Data/death.RData") > summary(death) *** caught segfault *** address 0x40000e04959, cause 'memory not mapped' Traceback: 1: match(x, levels) 2: factor(a, levels = ll[!(ll %in% exclude)], exclude = if (useNA == "no") NA) 3: table(object) 4: summary.factor(X[[6L]], ...) 5: FUN(X[[6L]], ...) 6: lapply(as.list(object), summary, maxsum = maxsum, digits = 12, ...) 7: summary.data.frame(death) 8: summary(death) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: 2011/8/26 Göran Broström <[hidden email]>: > > fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t") > > Program received signal SIGSEGV, Segmentation fault. > 0x000000000041c2e1 in RunGenCollect (size_needed=8192000) at memory.c:1514 > 1514 PROCESS_NODES(); > (gdb) > > > sessionInfo() > R version 2.13.1 Patched (2011-08-25 r56798) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > > The text file 'fil2_s.txt' is Huge, around 11 million records and 17 > variables, but ...? > > > > -- > Göran Broström > -- Göran Broström ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
One further note:
No problem with R version 2.13.0 (2011-04-13) Göran 2011/8/26 Göran Broström <[hidden email]>: > Another one: > > The 'death.RData' was created about a year ago, but ...? Same info as below. > > Göran > >> load("../Data/death.RData") >> summary(death) > > *** caught segfault *** > address 0x40000e04959, cause 'memory not mapped' > > Traceback: > 1: match(x, levels) > 2: factor(a, levels = ll[!(ll %in% exclude)], exclude = if (useNA == > "no") NA) > 3: table(object) > 4: summary.factor(X[[6L]], ...) > 5: FUN(X[[6L]], ...) > 6: lapply(as.list(object), summary, maxsum = maxsum, digits = 12, ...) > 7: summary.data.frame(death) > 8: summary(death) > > Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace > Selection: > > > 2011/8/26 Göran Broström <[hidden email]>: >> > fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t") >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x000000000041c2e1 in RunGenCollect (size_needed=8192000) at memory.c:1514 >> 1514 PROCESS_NODES(); >> (gdb) >> >> > sessionInfo() >> R version 2.13.1 Patched (2011-08-25 r56798) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> > >> >> The text file 'fil2_s.txt' is Huge, around 11 million records and 17 >> variables, but ...? >> >> >> >> -- >> Göran Broström >> > > > > -- > Göran Broström > -- Göran Broström ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Scott
Scott <ncbi2r <at> googlemail.com> writes:
> > It does look like you've got a memory issue. perhaps using > as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments > to read.table > > if you don't specify these sorts of things, R can have to look through the > file and figure out which columns are characters/factors etc and so the > larger files cause more of a headache for R I'm guess. Hopefully someone > else can comment further on this? I'd true toggling TRUE/FALSE for as.is and > stringsAsFactors. > > do you have other objects loaded in memory as well? this file by itself > might not be the problem - but it's a cumulative issue. > have you checked the file structure in any other manner? > how large (Mb/kb) is the file that you're trying to read? > if you just read in parts of the file, is it okay? > read.table(filename,header=FALSE,sep="\t",nrows=100) > read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100) There seem to be two issues here: 1. what can the original poster (OP) do to work around this problem? (e.g. get the data into a relational data base and import it from there; use something from the High Performance task view such as ff or data.table ...) 2. reporting a bug -- according to the R FAQ, any low-level (segmentation-fault-type) crash of R when one is not messing around with dynamically loaded code constitutes a bug. Unfortunately, debugging problems like this is a huge pain in the butt. Goran, can you randomly or systematically generate an object of this size, write it to disk, read it back in, and generate the same error? In other words, does something like set.seed(1001) d <- data.frame(label=rep(LETTERS[1:11],1e6), values=matrix(rep(1.0,11*17*1e6),ncol=17) write.table(d,file="big.txt") read.table("big.txt") do the same thing? Reducing it to this kind of reproducible example will make it possible for others to debug it without needing to gain access to your huge file ... ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
On Fri, Aug 26, 2011 at 11:55 PM, Ben Bolker <[hidden email]> wrote:
> Scott <ncbi2r <at> googlemail.com> writes: > >> >> It does look like you've got a memory issue. perhaps using >> as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments >> to read.table >> >> if you don't specify these sorts of things, R can have to look through the >> file and figure out which columns are characters/factors etc and so the >> larger files cause more of a headache for R I'm guess. Hopefully someone >> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and >> stringsAsFactors. >> >> do you have other objects loaded in memory as well? this file by itself >> might not be the problem - but it's a cumulative issue. >> have you checked the file structure in any other manner? >> how large (Mb/kb) is the file that you're trying to read? >> if you just read in parts of the file, is it okay? >> read.table(filename,header=FALSE,sep="\t",nrows=100) >> read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100) > > There seem to be two issues here: > > 1. what can the original poster (OP) do to work around this problem? > (e.g. get the data into a relational data base and import it from > there; use something from the High Performance task view such as > ff or data.table ...) Interestingly, the text file was created by a selection from an SQL data base. I have access to 'db2' on an ubuntu machine, I run, at the bash prompt, $ db2 < file2.sql where file2.sql contains connect to linnedb user goran using xxxxxxxxxxx export to '/home/goran/ALC/SQL/fil2_s.txt' of del modified by coldelX09 select linneid, fodelsear, kon, ....... from u09021.fil2 connect reset How do I get a direct connection between R and the data base 'linnedb'? > 2. reporting a bug -- according to the R FAQ, any low-level > (segmentation-fault-type) crash of R when one is not messing > around with dynamically loaded code constitutes a bug. Unfortunately, > debugging problems like this is a huge pain in the butt. > > Goran, can you randomly or systematically generate an > object of this size, write it to disk, read it back in, and > generate the same error? In other words, does something like > > set.seed(1001) > d <- data.frame(label=rep(LETTERS[1:11],1e6), > values=matrix(rep(1.0,11*17*1e6),ncol=17) > write.table(d,file="big.txt") > read.table("big.txt") > > do the same thing? No but I get new errors: > ss <- read.table("big.txt") Error in read.table("big.txt") : duplicate 'row.names' are not allowed (there are no duplicates) I tried to add an item to the first line and > ss <- read.table("big.txt", header = TRUE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 10610008 did not have 19 elements which is wrong; that line has 19 elements. Göran > Reducing it to this kind of reproducible example will make > it possible for others to debug it without needing to gain > access to your huge file ... > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Göran Broström ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Scott
On Fri, Aug 26, 2011 at 9:41 PM, Scott <[hidden email]> wrote:
> It does look like you've got a memory issue. perhaps using > as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments > to read.table > > if you don't specify these sorts of things, R can have to look through the > file and figure out which columns are characters/factors etc and so the > larger files cause more of a headache for R I'm guess. Hopefully someone > else can comment further on this? I'd true toggling TRUE/FALSE for as.is and > stringsAsFactors. > > do you have other objects loaded in memory as well? this file by itself > might not be the problem - but it's a cumulative issue. > have you checked the file structure in any other manner? > how large (Mb/kb) is the file that you're trying to read? > if you just read in parts of the file, is it okay? > read.table(filename,header=FALSE,sep="\t",nrows=100) > read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100) Today, after a night's sleep, there are no segfaults! (The computer also slept, I turned it off.) So what is going on? Maybe I shouldn't bother.... but I installed the latest patched version yesterday, immediately tried to read the file with a segfault as a result, turned the machine off and on, and no problems. Do we need to reboot after a new install (note, this is not Windows)? Göran > > > > -- > View this message in context: http://r.789695.n4.nabble.com/read-table-segfaults-tp3771793p3771817.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Göran Broström ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
| Powered by Nabble | Edit this page |
