Functional data anlysis for unequal length and unequal width time series

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Functional data anlysis for unequal length and unequal width time series

Souradeep Chattapadhyay
Dear All,
            I apologize if you have already seen in Stack Overflow. I
have not got any response from there so I am posting for help here.

I have data on 1318 time series. Many of these series are of unequal
length. Apart from this also quite a few time points for each of the
series are observed at different time points. For example consider the
following four series

t1 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
V1 <- c(-0.1710, -0.0824, -0.0419, -0.0416, -0.0216, -0.0792, -0.0656,-
0.0273, -0.0589)
ser1 <- cbind(t1, V1)

t2 <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
V2 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231, 0.2264)
ser2 <- cbind(t2, V2)

t3 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
25.88, 25.97, 25.99)
V3 <- c(0.0897, -0.0533, -0.3497, -0.5684, -0.4294, -0.1109, 0.0352,
0.0550, -0.0536, 0.0185, -0.0295, -0.0324)
ser3 <- cbind(t3, V3)

t4 <- c(24.5, 24.67, 24.71, 24.98, 25.17)
V4 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231)
ser4 <- cbind(t4, V4)

Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
observations made at over those time points. The time points in the
actual data are Julian dates so they look like these, just that they
are much larger decimal figures like 2452450.6225.

I am trying to cluster these time series using functional data approach
for which I am using the "funFEM" package in R. Th examples present are
for equispaced and equal length time series so I am not sure how to use
the package for my data. Initially I tried by making all the time
series equal in length to the time series having the highest number of
observations (here equal to ser3) by adding NA's to the time series. So
following this example I made ser2 as

t2_n <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
25.88, 25.97, 25.99)
V2_na <- c(V2, rep(NA, 6))
ser2_na <- cbind(t2_n, V2_na)

Note that to make t2 equal to length of t3 I grabbed the last 6 time
points from t3. To make V2 equal in length to V3 I added NA's.

Then I created my data matrix as

dat <- rbind(V1_na, V2_na, V3, V4_na).

The code I used was

require(funFEM)
basis<- create.fourier.basis(c(min(t3), max(t3)), nbasis = 25)
fdobj <- smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd

Note that the range is constructed using the maximum and minumum time
point of ser_3 series.

res <- funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
"random")

But this gives me an error

Error in svd(X) : infinite or missing values in 'x'.

Can anyone tell please help me on how to deal with this dataset for
this package or any alternative package?

Sincerly,
Souradeep

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Functional data anlysis for unequal length and unequal width time series

Bert Gunter-2
Specialized: Probably need to email the maintainer. See ?maintainer

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Dec 17, 2018 at 9:27 AM <[hidden email]> wrote:

> Dear All,
>             I apologize if you have already seen in Stack Overflow. I
> have not got any response from there so I am posting for help here.
>
> I have data on 1318 time series. Many of these series are of unequal
> length. Apart from this also quite a few time points for each of the
> series are observed at different time points. For example consider the
> following four series
>
> t1 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
> V1 <- c(-0.1710, -0.0824, -0.0419, -0.0416, -0.0216, -0.0792, -0.0656,-
> 0.0273, -0.0589)
> ser1 <- cbind(t1, V1)
>
> t2 <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
> V2 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231, 0.2264)
> ser2 <- cbind(t2, V2)
>
> t3 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
> 25.88, 25.97, 25.99)
> V3 <- c(0.0897, -0.0533, -0.3497, -0.5684, -0.4294, -0.1109, 0.0352,
> 0.0550, -0.0536, 0.0185, -0.0295, -0.0324)
> ser3 <- cbind(t3, V3)
>
> t4 <- c(24.5, 24.67, 24.71, 24.98, 25.17)
> V4 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231)
> ser4 <- cbind(t4, V4)
>
> Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
> observations made at over those time points. The time points in the
> actual data are Julian dates so they look like these, just that they
> are much larger decimal figures like 2452450.6225.
>
> I am trying to cluster these time series using functional data approach
> for which I am using the "funFEM" package in R. Th examples present are
> for equispaced and equal length time series so I am not sure how to use
> the package for my data. Initially I tried by making all the time
> series equal in length to the time series having the highest number of
> observations (here equal to ser3) by adding NA's to the time series. So
> following this example I made ser2 as
>
> t2_n <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
> 25.88, 25.97, 25.99)
> V2_na <- c(V2, rep(NA, 6))
> ser2_na <- cbind(t2_n, V2_na)
>
> Note that to make t2 equal to length of t3 I grabbed the last 6 time
> points from t3. To make V2 equal in length to V3 I added NA's.
>
> Then I created my data matrix as
>
> dat <- rbind(V1_na, V2_na, V3, V4_na).
>
> The code I used was
>
> require(funFEM)
> basis<- create.fourier.basis(c(min(t3), max(t3)), nbasis = 25)
> fdobj <- smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd
>
> Note that the range is constructed using the maximum and minumum time
> point of ser_3 series.
>
> res <- funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
> "random")
>
> But this gives me an error
>
> Error in svd(X) : infinite or missing values in 'x'.
>
> Can anyone tell please help me on how to deal with this dataset for
> this package or any alternative package?
>
> Sincerly,
> Souradeep
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Functional data anlysis for unequal length and unequal width time series

Jeff Newmiller
In reply to this post by Souradeep Chattapadhyay
You will learn something useful if you search for "rolling join". The zoo package can handle this, as can the data.table package (read the vignette).

Your decision to pad with NA at the end was ill-considered... the first point of your first series is between the first two points of your second series... you need to interleave the points somehow.

You will need to decide whether you want to use piecewise linear approximation (as with the base "approx" function) or the more stable last-observation-carried-forward ("locf") or cubic splines or something more exotic like Fourier interpolation to identify the new interpolated "y" values in each series.

You can avoid the rolling join if you intend to resample the series to have points at regular intervals.  Just apply your preferred interpolation technique with your intended mesh of regular time values to each of your series in turn and then use cbind with the results.

I don't know anything about the package you mention, but getting time series data aligned is a common preprocessing step for many time series analysis.

Oh, and to you should probably be familiar with that CRAN Time Series Task View [1].

PS you should provide a link back to your original posting when moving the conversation to a different venue in case the discussion doesn't stay dead there.

[1] https://cran.r-project.org/web/views/TimeSeries.html

On December 17, 2018 8:50:09 AM PST, [hidden email] wrote:

>Dear All,
>            I apologize if you have already seen in Stack Overflow. I
>have not got any response from there so I am posting for help here.
>
>I have data on 1318 time series. Many of these series are of unequal
>length. Apart from this also quite a few time points for each of the
>series are observed at different time points. For example consider the
>following four series
>
>t1 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
>V1 <- c(-0.1710, -0.0824, -0.0419, -0.0416, -0.0216, -0.0792, -0.0656,-
>0.0273, -0.0589)
>ser1 <- cbind(t1, V1)
>
>t2 <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
>V2 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231, 0.2264)
>ser2 <- cbind(t2, V2)
>
>t3 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
>25.88, 25.97, 25.99)
>V3 <- c(0.0897, -0.0533, -0.3497, -0.5684, -0.4294, -0.1109, 0.0352,
>0.0550, -0.0536, 0.0185, -0.0295, -0.0324)
>ser3 <- cbind(t3, V3)
>
>t4 <- c(24.5, 24.67, 24.71, 24.98, 25.17)
>V4 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231)
>ser4 <- cbind(t4, V4)
>
>Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
>observations made at over those time points. The time points in the
>actual data are Julian dates so they look like these, just that they
>are much larger decimal figures like 2452450.6225.
>
>I am trying to cluster these time series using functional data approach
>for which I am using the "funFEM" package in R. Th examples present are
>for equispaced and equal length time series so I am not sure how to use
>the package for my data. Initially I tried by making all the time
>series equal in length to the time series having the highest number of
>observations (here equal to ser3) by adding NA's to the time series. So
>following this example I made ser2 as
>
>t2_n <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
>25.88, 25.97, 25.99)
>V2_na <- c(V2, rep(NA, 6))
>ser2_na <- cbind(t2_n, V2_na)
>
>Note that to make t2 equal to length of t3 I grabbed the last 6 time
>points from t3. To make V2 equal in length to V3 I added NA's.
>
>Then I created my data matrix as
>
>dat <- rbind(V1_na, V2_na, V3, V4_na).
>
>The code I used was
>
>require(funFEM)
>basis<- create.fourier.basis(c(min(t3), max(t3)), nbasis = 25)
>fdobj <- smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd
>
>Note that the range is constructed using the maximum and minumum time
>point of ser_3 series.
>
>res <- funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
>"random")
>
>But this gives me an error
>
>Error in svd(X) : infinite or missing values in 'x'.
>
>Can anyone tell please help me on how to deal with this dataset for
>this package or any alternative package?
>
>Sincerly,
>Souradeep
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.