reading time series csv file with read.zoo issues, then align time stamps

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

reading time series csv file with read.zoo issues, then align time stamps

Henry
Goal: get time series data interpolated on to desired time stamps.
I have two or more data sets that have time stamps that vary from 5 mins to 3-5 hours.
I want to get all the data put on common time stamps e.g. "00:05:00" intervals.

I asked Gabor and got some very good code ( zoo aggregate, na.spline, na.approx) but I'm having trouble getting the csv file read in and converted to a zoo object so I can try getting these functions going again.  Here is what Gabor sent last time.

_____________________start of what Gabor sent ______________________
If you are using zoo then the zoo FAQ discusses grids
   http://cran.r-project.org/web/packages/zoo/index.html
and the other 4 vignettes (pdf documents) and reference manual on that
page discuss more.

zoo does not supply its own time classes except where classes are
elsewhere missing.   Its design is completely independent of the time
class and it works with any time class that supports certain methods
(and that includes all popular ones).  See R News 4/1 for more on date
and time classes.

Here is some code:

Lines <- "10/11/2011 23:30:01     432.22
10/11/2011 23:31:17     432.32
10/11/2011 23:35:00     432.32
10/11/2011 23:36:18     432.22
10/11/2011 23:37:18     432.72
10/11/2011 23:39:19     432.23
10/11/2011 23:40:02     432.23
10/11/2011 23:45:00     432.23
10/11/2011 23:45:20     429.75
10/11/2011 23:46:20     429.65
10/11/2011 23:50:00     429.65
10/11/2011 23:51:22     429.75
10/11/2011 23:55:01     429.75
10/11/2011 23:56:23     429.55
10/12/2011 0:00:07      429.55
10/12/2011 0:01:24      429.95
10/12/2011 0:05:00      429.95
10/12/2011 0:06:25      429.85
10/12/2011 0:10:00      429.85
10/12/2011 0:11:26      428.85
10/12/2011 0:15:00      428.85
10/12/2011 0:20:03      428.85
10/12/2011 0:21:29      428.75
10/12/2011 0:25:01      428.75
10/12/2011 0:30:01      428.75
10/12/2011 0:31:31      428.75"

library(zoo)
library(chron)

fmt <- "%m/%d/%Y %H:%M:%S"
toChron <- function(d, t) as.chron(paste(d, t), format = fmt)

z <- read.zoo(text = Lines, index = 1:2, FUN = toChron)

# 5 minute aggregates
m5 <- times("00:05:00")
ag5 <- aggregate(z, trunc(time(z), m5), mean)

# 5 minute spline fit
g <- seq(trunc(start(z), m5), end(z), by = m5)
na.spline(z, xout = g)

# 5 minute linear approx
na.approx(z, xout = g)
________________end of what Gabor sent_________________

My csv data looks like this.....when I look at the file with NotePad++ I see the commas.


TimeStamp Sea_Temperature_F
12/31/2011 13:24:00 52
12/31/2011 16:44:06 52
12/31/2011 20:44:06 53
01/01/2012 00:44:06 53
01/01/2012 04:44:06 53
01/01/2012 08:44:07 54
01/01/2012 12:26:00 54
01/01/2012 12:44:07 53
01/01/2012 16:44:07 53
01/01/2012 20:44:06 54
01/02/2012 00:44:09 54
01/02/2012 04:44:06 55
01/02/2012 08:44:07 55
01/02/2012 12:44:06 56
01/02/2012 13:04:00 56
01/02/2012 16:44:07 57
01/02/2012 20:44:07 58
01/03/2012 00:44:07 58
01/03/2012 04:44:06 59
01/03/2012 08:44:06 59
01/03/2012 10:48:00 59
01/03/2012 12:44:06 58
01/03/2012 16:44:06 58
01/03/2012 20:44:07 59
01/04/2012 00:44:06 59
01/04/2012 04:44:07 58
01/04/2012 08:44:07 58
01/04/2012 12:44:07 57
01/04/2012 15:30:00 57
01/04/2012 16:44:07 57
01/04/2012 20:44:06 57
01/05/2012 00:44:06 57


The R code I'm trying to get working is as follows: (I'm trying to follow code provided by Gabor) but I'm too embarrassed to ask him directly again.

fmt <- "%M/%D/%Y %H:%M:%S"
toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE, FUN=toChron)

I get errors:

> fmt <- "%M/%D/%Y %H:%M:%S"
> toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE, FUN=toChron)
Error in paste(d, t) : argument "t" is missing, with no default
>

If I take the "FUN=toChron" out I get this error. There are 542 rows of data.

> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE)
Error in read.zoo("SampleSeaTempData-2.csv", sep = ",", header = TRUE) :
  index has 542 bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 ...
>

I guess there is too much going on that I don't understand:
- what does the toChron line do?  how are "d" and "t" defined?
- why does the Gabor read.zoo line have "index=1:2" ?
- why does the Gabor code have " FUN=toChron"  ?


The idea is to get two or more data streams "converted" to exact timestamp csv files with interpolated values and then I guess cbind the data into one data frame so I can plot together.

I've read re. zoo csv file read issues/posts - e.g. getting the seconds (":00") to appear in the csv file to eliminate duplicate row index entries.

Maybe it would be easier/cleaner to read the csv file into a regular R dataframe and then "convert" to a zoo object?

In my analysis and plotting I use POSIXlt for time.


Help appreciated.  Thanks.




Reply | Threaded
Open this post in threaded view
|

Re: reading time series csv file with read.zoo issues, then align time stamps

Gabor Grothendieck
index = 1:2 is missing.

On Sun, Jun 15, 2014 at 2:39 PM, Henry <[hidden email]> wrote:

> Goal: get time series data interpolated on to desired time stamps.
> I have two or more data sets that have time stamps that vary from 5 mins to
> 3-5 hours.
> I want to get all the data put on common time stamps e.g. "00:05:00"
> intervals.
>
> I asked Gabor and got some very good code ( zoo aggregate, na.spline,
> na.approx) but I'm having trouble getting the csv file read in and converted
> to a zoo object so I can try getting these functions going again.  Here is
> what Gabor sent last time.
>
> _____________________start of what Gabor sent ______________________
> If you are using zoo then the zoo FAQ discusses grids
>    http://cran.r-project.org/web/packages/zoo/index.html
> and the other 4 vignettes (pdf documents) and reference manual on that
> page discuss more.
>
> zoo does not supply its own time classes except where classes are
> elsewhere missing.   Its design is completely independent of the time
> class and it works with any time class that supports certain methods
> (and that includes all popular ones).  See R News 4/1 for more on date
> and time classes.
>
> Here is some code:
>
> Lines <- "10/11/2011 23:30:01     432.22
> 10/11/2011 23:31:17     432.32
> 10/11/2011 23:35:00     432.32
> 10/11/2011 23:36:18     432.22
> 10/11/2011 23:37:18     432.72
> 10/11/2011 23:39:19     432.23
> 10/11/2011 23:40:02     432.23
> 10/11/2011 23:45:00     432.23
> 10/11/2011 23:45:20     429.75
> 10/11/2011 23:46:20     429.65
> 10/11/2011 23:50:00     429.65
> 10/11/2011 23:51:22     429.75
> 10/11/2011 23:55:01     429.75
> 10/11/2011 23:56:23     429.55
> 10/12/2011 0:00:07      429.55
> 10/12/2011 0:01:24      429.95
> 10/12/2011 0:05:00      429.95
> 10/12/2011 0:06:25      429.85
> 10/12/2011 0:10:00      429.85
> 10/12/2011 0:11:26      428.85
> 10/12/2011 0:15:00      428.85
> 10/12/2011 0:20:03      428.85
> 10/12/2011 0:21:29      428.75
> 10/12/2011 0:25:01      428.75
> 10/12/2011 0:30:01      428.75
> 10/12/2011 0:31:31      428.75"
>
> library(zoo)
> library(chron)
>
> fmt <- "%m/%d/%Y %H:%M:%S"
> toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
>
> z <- read.zoo(text = Lines, index = 1:2, FUN = toChron)
>
> # 5 minute aggregates
> m5 <- times("00:05:00")
> ag5 <- aggregate(z, trunc(time(z), m5), mean)
>
> # 5 minute spline fit
> g <- seq(trunc(start(z), m5), end(z), by = m5)
> na.spline(z, xout = g)
>
> # 5 minute linear approx
> na.approx(z, xout = g)
> ________________end of what Gabor sent_________________
>
> My csv data looks like this.....when I look at the file with NotePad++ I see
> the commas.
>
>
> TimeStamp       Sea_Temperature_F
> 12/31/2011 13:24:00     52
> 12/31/2011 16:44:06     52
> 12/31/2011 20:44:06     53
> 01/01/2012 00:44:06     53
> 01/01/2012 04:44:06     53
> 01/01/2012 08:44:07     54
> 01/01/2012 12:26:00     54
> 01/01/2012 12:44:07     53
> 01/01/2012 16:44:07     53
> 01/01/2012 20:44:06     54
> 01/02/2012 00:44:09     54
> 01/02/2012 04:44:06     55
> 01/02/2012 08:44:07     55
> 01/02/2012 12:44:06     56
> 01/02/2012 13:04:00     56
> 01/02/2012 16:44:07     57
> 01/02/2012 20:44:07     58
> 01/03/2012 00:44:07     58
> 01/03/2012 04:44:06     59
> 01/03/2012 08:44:06     59
> 01/03/2012 10:48:00     59
> 01/03/2012 12:44:06     58
> 01/03/2012 16:44:06     58
> 01/03/2012 20:44:07     59
> 01/04/2012 00:44:06     59
> 01/04/2012 04:44:07     58
> 01/04/2012 08:44:07     58
> 01/04/2012 12:44:07     57
> 01/04/2012 15:30:00     57
> 01/04/2012 16:44:07     57
> 01/04/2012 20:44:06     57
> 01/05/2012 00:44:06     57
>
>
> The R code I'm trying to get working is as follows: (I'm trying to follow
> code provided by Gabor) but I'm too embarrassed to ask him directly again.
>
> fmt <- "%M/%D/%Y %H:%M:%S"
> toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE,
> FUN=toChron)
>
> I get errors:
>
>> fmt <- "%M/%D/%Y %H:%M:%S"
>> toChron <- function(d, t) as.chron(paste(d, t), format = fmt)
>> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE,
>> FUN=toChron)
> Error in paste(d, t) : argument "t" is missing, with no default
>>
>
> If I take the "FUN=toChron" out I get this error. There are 542 rows of
> data.
>
>> seatemp <- read.zoo ("SampleSeaTempData-2.csv", sep=",", header=TRUE)
> Error in read.zoo("SampleSeaTempData-2.csv", sep = ",", header = TRUE) :
>   index has 542 bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
> 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
> 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
> 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
> 90 91 92 93 94 95 96 97 98 99 100 ...
>>
>
> I guess there is too much going on that I don't understand:
> - what does the toChron line do?  how are "d" and "t" defined?
> - why does the Gabor read.zoo line have "index=1:2" ?
> - why does the Gabor code have " FUN=toChron"  ?
>
>
> The idea is to get two or more data streams "converted" to exact timestamp
> csv files with interpolated values and then I guess cbind the data into one
> data frame so I can plot together.
>
> I've read re. zoo csv file read issues/posts - e.g. getting the seconds
> (":00") to appear in the csv file to eliminate duplicate row index entries.
>
> Maybe it would be easier/cleaner to read the csv file into a regular R
> dataframe and then "convert" to a zoo object?
>
> In my analysis and plotting I use POSIXlt for time.
>
>
> Help appreciated.  Thanks.
>
>
>
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/reading-time-series-csv-file-with-read-zoo-issues-then-align-time-stamps-tp4692157.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.