Quantcast

uneven vector length issue with read.zoo?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

uneven vector length issue with read.zoo?

knavero
I truncated and simplified my code and the read in data that I'm working with to isolate the issue. Here is the read in data and R script respectively:

test.csv

http://pastebin.com/rCdaDqPm

Here is the terminal/R shell output that I hope the above replicates on your screen:
> source("elecLoad.r", echo = TRUE)

> #Load packages
> library(zoo)

> library(chron)

> #Initial assignments for format (fmt), timezone (TZ), and user
> #defined chron function (chr)
> fmt = "%m/%d/%y %I:%M %p"

> TZ = "PDT"

> chr = function(x) as.chron(x, fmt)

> #Read in data as zoo object using relevant arguments in read.zoo()
> #for details of arguments, see Kevin Navero or see ?read.zoo
> #and ?read.table .... [TRUNCATED]
Error in read.zoo("http://dl.dropbox.com/u/41922443/test.csv", skip = 1,  :
  index has bad entries at data rows: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

I was hoping that the "NULL" in colClasses() would've taken care of this uneven vector length issue, however, that was not the case. Any ideas? Thanks in advance. Sorry if my post didn't follow the forum rules exactly. I tried to make small scale reproducible code and what not. I'm still a bit of a noob here and there.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
So far I see two options: (1) nrows argument to specify max number of rows to read in or (2) go into excel, and put a bunch of NA's . Both which are inefficient in that they're not so "automated".  For case (1), I have to wait till an error pops up each time and deal with each one individually taking into account the skip and header args, and for case (2), now I'm just not even using R to do the dirty work...anyway, I'm going to continue to go through this R documentation to see if I find anything else for ?read.table and ?read.zoo.  
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
Make that 3 options actually. In case (3) I would have to take each category on the spreadsheet and isolate each to its own csv file using excel. Fun stuff...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
Case (4) - use the fill argument in ?read.table....this looks useful...guess I answered my own question...going to delete this thread now...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
Actually case (4) didn't work. The issue is also with the index.."fill" only seems to work with the dimensions/columns that contain the data associated to the index. Dang.....yeah, I need help here.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
blank.lines.skip is not working either...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
case (6) - regress back to read.table apparently....
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

Rui Barradas
Hello,

knavero wrote
case (6) - regress back to read.table apparently....
Or to readLines.


tmp <- readLines("http://dl.dropbox.com/u/41922443/test.csv")
# Why doesn't it work?
sapply(strsplit(tmp, ","), length)
# Don't argue with computers, they don't listen.
tmp <- tmp[-1]
tmp <- strsplit(tmp, ",")
tmp <- do.call(rbind, tmp)
nms <- tmp[1, ]
tmp <- tmp[-1, ]
tmp <- data.frame(tmp, stringsAsFactors=FALSE)
colnames(tmp) <- nms
# Now see what we've got
str(tmp) # Messy: one col without a name, dates and nums are chars, etc.


Hope this helps,

Rui Barradas
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
So with case (6) here's the general structure of what I have:

chw = read.table("crac.csv", skip = 1, header = TRUE,
   colClasses = rep(c("NULL", NA, "numeric", "NULL"),
      c(3, 1, 1, 24)),
   sep = ",")
chw$Time.1 = as.POSIXct(chw$Time.1, format = fmt, tz = TZ)
chw = na.omit(chw)
chw = read.zoo(chw, header = TRUE,
   colClasses = rep(c(NA, "numeric"), c(1, 1)),
   FUN = chr, aggregate = tail1)

You don't have to try this, but the main point is that

read.table -> POSIXct -> na.omit -> read.zoo and chron

I guess this alternative solution is adequate along with using readLines. Initially I was hoping just a simple read.zoo would do the trick. The catch is that I need the index/timestamp column to be in chron format for an easy na.approx function to deal with things. Thank you for the readLines suggestion Rui. Much appreciated.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

Gabor Grothendieck
In reply to this post by knavero
On Wed, May 2, 2012 at 3:55 PM, knavero <[hidden email]> wrote:

> I truncated and simplified my code and the read in data that I'm working with
> to isolate the issue. Here is the read in data and R script respectively:
>
> http://r.789695.n4.nabble.com/file/n4604287/test.csv test.csv
>
> http://pastebin.com/rCdaDqPm
>
> Here is the terminal/R shell output that I hope the above replicates on your
> screen:
>> source("elecLoad.r", echo = TRUE)
>
>> #Load packages
>> library(zoo)
>
>> library(chron)
>
>> #Initial assignments for format (fmt), timezone (TZ), and user
>> #defined chron function (chr)
>> fmt = "%m/%d/%y %I:%M %p"
>
>> TZ = "PDT"
>
>> chr = function(x) as.chron(x, fmt)
>
>> #Read in data as zoo object using relevant arguments in read.zoo()
>> #for details of arguments, see Kevin Navero or see ?read.zoo
>> #and ?read.table .... [TRUNCATED]
> Error in read.zoo("http://dl.dropbox.com/u/41922443/test.csv", skip = 1,  :
>  index has bad entries at data rows: 14 15 16 17 18 19 20 21 22 23 24 25 26
> 27 28
>
> I was hoping that the "NULL" in colClasses() would've taken care of this
> uneven vector length issue, however, that was not the case. Any ideas?
> Thanks in advance. Sorry if my post didn't follow the forum rules exactly. I
> tried to make small scale reproducible code and what not. I'm still a bit of
> a noob here and there.
>

Try this using the same library statements, fmt and chr from ijn post:

URL <- ""http://dl.dropbox.com/u/41922443/test.csv"
DF1 <- read.table(URL, skip = 1, header = TRUE, sep = ",", fill = TRUE,
  as.is = TRUE)
DF2 <- na.omit(DF1[1:2])
z <- read.zoo(DF2, FUN = chr)



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
Hey Gabor, just trying to understand this here..sorry for the noob question:

DF1 <- read.table(URL, skip = 1, header = TRUE, sep = ",", fill = TRUE,
  as.is = TRUE)

I'm not to familiar with as.is, however I quickly read the R documentation on that. From my understanding it converts character to factor in terms of atomic vector class/mode...sorta like what colClasses would do. Why is it needed here for this specific case?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
In reply to this post by Gabor Grothendieck
Thank you for the suggestion Gabor. It's definitely more elegant than what I had above. Instead of going from character representation to POSIXct to chron, it looks at the character representation and goes straight to chron. It's good. However, I do wonder why it still complains of the vector length even though I nulled out the other columns. It's an interesting error to run into. Probably looks at FUN before nulling out the other columns was my theory.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
"However, I do wonder why it still complains of the vector length even though I nulled out the other columns. It's an interesting error to run into. Probably looks at FUN before nulling out the other columns was my theory. "

Referring to just a straight up read.zoo in this case ^
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

Gabor Grothendieck
On Fri, May 4, 2012 at 3:37 PM, knavero <[hidden email]> wrote:
> "However, I do wonder why it still complains of the vector length even though
> I nulled out the other columns. It's an interesting error to run into.
> Probably looks at FUN before nulling out the other columns was my theory. "
>
> Referring to just a straight up read.zoo in this case ^
>

If you are referring to your pastebin code then the actual error that
code similar to it gives is:

> crac <- read.zoo(URL, skip = 1, header = TRUE,
+    colClasses = rep(c(NA, "numeric", "NULL"), c(1, 1, 3)),
+    FUN = chr, sep = ",")
Error in read.zoo(URL, skip = 1, header = TRUE, colClasses = rep(c(NA,  :
  index has bad entries at data rows: 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28

and that is because there are empty values in the index field.

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
Right, but it seems to me that the error being the NA's in the index field are caused by the longer vector lengths of columns 4 and 5. I would think that the EOF in the scanf() (assuming C is used for the source code) would be called where the NA's begin in columns 1 and 2 since columns 3:5 are nulled out. Does this sound like a possible case?

So, if the read in data only contained columns 1 and 2, it wouldn't even look at columns 3:5 and thus, rows 14 and so on wouldn't even be looked at and that would be EOF already - resulting in no error.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

Gabor Grothendieck
On Sun, May 6, 2012 at 3:42 PM, knavero <[hidden email]> wrote:

> Right, but it seems to me that the error being the NA's in the index field
> are caused by the longer vector lengths of columns 4 and 5. I would think
> that the EOF in the scanf() (assuming C is used for the source code) would
> be called where the NA's begin in columns 1 and 2 since columns 3:5 are
> nulled out. Does this sound like a possible case?
>
> So, if the read in data only contained columns 1 and 2, it wouldn't even
> look at columns 3:5 and thus, rows 14 and so on wouldn't even be looked at
> and that would be EOF already - resulting in no error.
>

Don't know what "longer vector lengths" refers to but every line in
your pastebin data has 5 fields -- they don't vary.

> range(count.fields(URL, sep = ","))
[1] 5 5

Furthermore, the error message seems pretty clear.  Its saying that
the index has a bad entry and is even telling which row or rows it
occurrs at.

Here is another smaller example where the missing entry in row 3
triggers the same sort of message:

> read.zoo(text = "1,2\n2,3\n,4\n6,7", sep = ",")
Error in read.zoo(text = "1,2\n2,3\n,4\n6,7", sep = ",") :
  index has bad entry at data row 3



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
Yeah, I was unclear about what I mean by "uneven vector lengths". I should say "uneven valid vectors" instead where "valid" refers to (1) a field containing a value that is not NA, for this specific case, and (2) a value that is compatible with the vector class assigned through colClasses etc., and therefore avoids the read.zoo error. I understand and agree that the error is clear. I have no issue with that. My issue is with the need to use read.table and then read.zoo shortly after (this seems inefficient).

I was simply pushing toward the idea of where this type of situation could be avoided for future users in where if there are uneven valid vectors that there would be a logical argument saying that it's okay to truncate to the shortest valid vector (in this case columns 1 and 2). My raw data consisted of a lot of uneven valid vectors. My expected thought of nulling out columns 3:5 would be that there would have no need for read.zoo to try to read in the bad data entry rows in columns 1:2 containing NA's that's already outside of the valid vector length.

Anyway, this is probably trivial now considering that this problem is already solved haha, and also I don't mean to offend and criticize. I simply see an efficiency opportunity and an opportunity to create more robust source code. Why use read.table with read.zoo if you can just do it all with read.zoo? Do you not agree?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: uneven vector length issue with read.zoo?

knavero
In reply to this post by Gabor Grothendieck
For simplicity sake though, yes I understand the issue and solution, and the solution using read.table, na.omit, and read.zoo is sound. Thanks Gabor! :)
Loading...