Tab Separated File Reading Error

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Tab Separated File Reading Error

Dario Strbenac-2
Hello,

I have a seemingly simple problem that a tab-delimited file can't be read in.

> annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 5933 did not have 12 elements

However, all lines do have 12 columns.

> lines <- readLines("matched.txt")
> tabsPosns <- gregexpr("\t", lines)
> table(sapply(tabsPosns, length))

    11
367274

> system("wc -l matched.txt")
367274 matched.txt

You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt

The line does not contain comment or quote characters. What can you suggest ?

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8  
 [7] LC_PAPER=C                 LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base    

loaded via a namespace (and not attached):
[1] tools_3.0.1

--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Tab Separated File Reading Error

William Dunlap
> > annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   line 5933 did not have 12 elements
>
> However, all lines do have 12 columns.
>
> > lines <- readLines("matched.txt")
> ...[many omitted lines]...
> The line does not contain comment or quote characters. What can you suggest ?

I suggest looking at the lines preceding the one where the error was found, with both
print and cat:
    print(lines[5933 - (10:0)])
    cat(lines[5933 - (10:0)], sep="\n")

If things are not obvious after looking at them, see if read.table can read just those lines
    read.table(text=lines[5933 - (10:0)], sep="\t", stringsAsFactors=FALSE)
If it can, try backing up more than 10 lines.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Dario Strbenac
> Sent: Friday, October 04, 2013 5:01 AM
> To: [hidden email]
> Subject: [R] Tab Separated File Reading Error
>
> Hello,
>
> I have a seemingly simple problem that a tab-delimited file can't be read in.
>
> > annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   line 5933 did not have 12 elements
>
> However, all lines do have 12 columns.
>
> > lines <- readLines("matched.txt")
> > tabsPosns <- gregexpr("\t", lines)
> > table(sapply(tabsPosns, length))
>
>     11
> 367274
>
> > system("wc -l matched.txt")
> 367274 matched.txt
>
> You can obtain the file from
> https://dl.dropboxusercontent.com/u/37992150/matched.txt
>
> The line does not contain comment or quote characters. What can you suggest ?
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.1
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Tab Separated File Reading Error

arun kirshna
In reply to this post by Dario Strbenac-2
Hi,
Try:
annoTranscripts<- read.csv("matched.txt", sep = '\t', stringsAsFactors = FALSE,quote="",header=FALSE)
 str(annoTranscripts)
'data.frame':    367274 obs. of  12 variables:
 $ V1 : chr  "comp103529_c0_seq1" "comp129123_c0_seq1" "comp129123_c0_seq1" "comp129124_c0_seq1" ...
 $ V2 : chr  "XM_003723822" "XM_778057" "EU116908" "XM_786928" ...
 $ V3 : chr  "PREDICTED: Strongylocentrotus purpuratus neuromedin-U receptor 2-like (LOC100888633), mRNA" "PREDICTED: Strongylocentrotus purpuratus 60S ribosomal protein L30-like (LOC577852), mRNA" "Barentsia elongata putative ribosomal protein L30 mRNA, complete cds" "PREDICTED: Strongylocentrotus purpuratus 60S ribosomal protein L29-1-like (LOC587182), mRNA" ...
 $ V4 : int  91 392 69 149 149 451 399 203 193 185 ...
 $ V5 : int  136 479 203 209 209 541 463 451 456 472 ...
 $ V6 : int  15 16 40 20 20 24 20 71 83 85 ...
 $ V7 : int  0 11 4 0 0 5 1 10 4 9 ...
 $ V8 : num  2e-38 0e+00 6e-26 2e-70 2e-70 ...
 $ V9 : int  1 22 210 135 135 131 189 205 196 185 ...
 $ V10: int  136 499 410 343 343 669 650 650 649 653 ...
 $ V11: int  576 159 27 1 1 1 21 23 140 22 ...
 $ V12: int  441 627 227 209 209 538 483 468 593 487 ...
 dim(annoTranscripts)
[1] 367274     12
A.K.




----- Original Message -----
From: Dario Strbenac <[hidden email]>
To: "[hidden email]" <[hidden email]>
Cc:
Sent: Friday, October 4, 2013 8:00 AM
Subject: [R] Tab Separated File Reading Error

Hello,

I have a seemingly simple problem that a tab-delimited file can't be read in.

> annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 5933 did not have 12 elements

However, all lines do have 12 columns.

> lines <- readLines("matched.txt")
> tabsPosns <- gregexpr("\t", lines)
> table(sapply(tabsPosns, length))

    11
367274

> system("wc -l matched.txt")
367274 matched.txt

You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt

The line does not contain comment or quote characters. What can you suggest ?

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C             
[3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8   
[5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8 
[7] LC_PAPER=C                 LC_NAME=C               
[9] LC_ADDRESS=C               LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C     

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods 
[7] base   

loaded via a namespace (and not attached):
[1] tools_3.0.1

--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.