

Dear All,
I apologize if you have already seen in Stack Overflow. I
have not got any response from there so I am posting for help here.
I have data on 1318 time series. Many of these series are of unequal
length. Apart from this also quite a few time points for each of the
series are observed at different time points. For example consider the
following four series
t1 < c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
V1 < c(0.1710, 0.0824, 0.0419, 0.0416, 0.0216, 0.0792, 0.0656,
0.0273, 0.0589)
ser1 < cbind(t1, V1)
t2 < c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
V2 < c(0.0280, 0.1980, 0.2556, 0.3131, 0.3231, 0.2264)
ser2 < cbind(t2, V2)
t3 < c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
25.88, 25.97, 25.99)
V3 < c(0.0897, 0.0533, 0.3497, 0.5684, 0.4294, 0.1109, 0.0352,
0.0550, 0.0536, 0.0185, 0.0295, 0.0324)
ser3 < cbind(t3, V3)
t4 < c(24.5, 24.67, 24.71, 24.98, 25.17)
V4 < c(0.0280, 0.1980, 0.2556, 0.3131, 0.3231)
ser4 < cbind(t4, V4)
Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
observations made at over those time points. The time points in the
actual data are Julian dates so they look like these, just that they
are much larger decimal figures like 2452450.6225.
I am trying to cluster these time series using functional data approach
for which I am using the "funFEM" package in R. Th examples present are
for equispaced and equal length time series so I am not sure how to use
the package for my data. Initially I tried by making all the time
series equal in length to the time series having the highest number of
observations (here equal to ser3) by adding NA's to the time series. So
following this example I made ser2 as
t2_n < c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
25.88, 25.97, 25.99)
V2_na < c(V2, rep(NA, 6))
ser2_na < cbind(t2_n, V2_na)
Note that to make t2 equal to length of t3 I grabbed the last 6 time
points from t3. To make V2 equal in length to V3 I added NA's.
Then I created my data matrix as
dat < rbind(V1_na, V2_na, V3, V4_na).
The code I used was
require(funFEM)
basis< create.fourier.basis(c(min(t3), max(t3)), nbasis = 25)
fdobj < smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd
Note that the range is constructed using the maximum and minumum time
point of ser_3 series.
res < funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
"random")
But this gives me an error
Error in svd(X) : infinite or missing values in 'x'.
Can anyone tell please help me on how to deal with this dataset for
this package or any alternative package?
Sincerly,
Souradeep
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Specialized: Probably need to email the maintainer. See ?maintainer
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
 Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Dec 17, 2018 at 9:27 AM < [hidden email]> wrote:
> Dear All,
> I apologize if you have already seen in Stack Overflow. I
> have not got any response from there so I am posting for help here.
>
> I have data on 1318 time series. Many of these series are of unequal
> length. Apart from this also quite a few time points for each of the
> series are observed at different time points. For example consider the
> following four series
>
> t1 < c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
> V1 < c(0.1710, 0.0824, 0.0419, 0.0416, 0.0216, 0.0792, 0.0656,
> 0.0273, 0.0589)
> ser1 < cbind(t1, V1)
>
> t2 < c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
> V2 < c(0.0280, 0.1980, 0.2556, 0.3131, 0.3231, 0.2264)
> ser2 < cbind(t2, V2)
>
> t3 < c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
> 25.88, 25.97, 25.99)
> V3 < c(0.0897, 0.0533, 0.3497, 0.5684, 0.4294, 0.1109, 0.0352,
> 0.0550, 0.0536, 0.0185, 0.0295, 0.0324)
> ser3 < cbind(t3, V3)
>
> t4 < c(24.5, 24.67, 24.71, 24.98, 25.17)
> V4 < c(0.0280, 0.1980, 0.2556, 0.3131, 0.3231)
> ser4 < cbind(t4, V4)
>
> Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
> observations made at over those time points. The time points in the
> actual data are Julian dates so they look like these, just that they
> are much larger decimal figures like 2452450.6225.
>
> I am trying to cluster these time series using functional data approach
> for which I am using the "funFEM" package in R. Th examples present are
> for equispaced and equal length time series so I am not sure how to use
> the package for my data. Initially I tried by making all the time
> series equal in length to the time series having the highest number of
> observations (here equal to ser3) by adding NA's to the time series. So
> following this example I made ser2 as
>
> t2_n < c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
> 25.88, 25.97, 25.99)
> V2_na < c(V2, rep(NA, 6))
> ser2_na < cbind(t2_n, V2_na)
>
> Note that to make t2 equal to length of t3 I grabbed the last 6 time
> points from t3. To make V2 equal in length to V3 I added NA's.
>
> Then I created my data matrix as
>
> dat < rbind(V1_na, V2_na, V3, V4_na).
>
> The code I used was
>
> require(funFEM)
> basis< create.fourier.basis(c(min(t3), max(t3)), nbasis = 25)
> fdobj < smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd
>
> Note that the range is constructed using the maximum and minumum time
> point of ser_3 series.
>
> res < funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
> "random")
>
> But this gives me an error
>
> Error in svd(X) : infinite or missing values in 'x'.
>
> Can anyone tell please help me on how to deal with this dataset for
> this package or any alternative package?
>
> Sincerly,
> Souradeep
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


In reply to this post by Souradeep Chattapadhyay
You will learn something useful if you search for "rolling join". The zoo package can handle this, as can the data.table package (read the vignette).
Your decision to pad with NA at the end was illconsidered... the first point of your first series is between the first two points of your second series... you need to interleave the points somehow.
You will need to decide whether you want to use piecewise linear approximation (as with the base "approx" function) or the more stable lastobservationcarriedforward ("locf") or cubic splines or something more exotic like Fourier interpolation to identify the new interpolated "y" values in each series.
You can avoid the rolling join if you intend to resample the series to have points at regular intervals. Just apply your preferred interpolation technique with your intended mesh of regular time values to each of your series in turn and then use cbind with the results.
I don't know anything about the package you mention, but getting time series data aligned is a common preprocessing step for many time series analysis.
Oh, and to you should probably be familiar with that CRAN Time Series Task View [1].
PS you should provide a link back to your original posting when moving the conversation to a different venue in case the discussion doesn't stay dead there.
[1] https://cran.rproject.org/web/views/TimeSeries.htmlOn December 17, 2018 8:50:09 AM PST, [hidden email] wrote:
>Dear All,
> I apologize if you have already seen in Stack Overflow. I
>have not got any response from there so I am posting for help here.
>
>I have data on 1318 time series. Many of these series are of unequal
>length. Apart from this also quite a few time points for each of the
>series are observed at different time points. For example consider the
>following four series
>
>t1 < c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
>V1 < c(0.1710, 0.0824, 0.0419, 0.0416, 0.0216, 0.0792, 0.0656,
>0.0273, 0.0589)
>ser1 < cbind(t1, V1)
>
>t2 < c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
>V2 < c(0.0280, 0.1980, 0.2556, 0.3131, 0.3231, 0.2264)
>ser2 < cbind(t2, V2)
>
>t3 < c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
>25.88, 25.97, 25.99)
>V3 < c(0.0897, 0.0533, 0.3497, 0.5684, 0.4294, 0.1109, 0.0352,
>0.0550, 0.0536, 0.0185, 0.0295, 0.0324)
>ser3 < cbind(t3, V3)
>
>t4 < c(24.5, 24.67, 24.71, 24.98, 25.17)
>V4 < c(0.0280, 0.1980, 0.2556, 0.3131, 0.3231)
>ser4 < cbind(t4, V4)
>
>Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
>observations made at over those time points. The time points in the
>actual data are Julian dates so they look like these, just that they
>are much larger decimal figures like 2452450.6225.
>
>I am trying to cluster these time series using functional data approach
>for which I am using the "funFEM" package in R. Th examples present are
>for equispaced and equal length time series so I am not sure how to use
>the package for my data. Initially I tried by making all the time
>series equal in length to the time series having the highest number of
>observations (here equal to ser3) by adding NA's to the time series. So
>following this example I made ser2 as
>
>t2_n < c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
>25.88, 25.97, 25.99)
>V2_na < c(V2, rep(NA, 6))
>ser2_na < cbind(t2_n, V2_na)
>
>Note that to make t2 equal to length of t3 I grabbed the last 6 time
>points from t3. To make V2 equal in length to V3 I added NA's.
>
>Then I created my data matrix as
>
>dat < rbind(V1_na, V2_na, V3, V4_na).
>
>The code I used was
>
>require(funFEM)
>basis< create.fourier.basis(c(min(t3), max(t3)), nbasis = 25)
>fdobj < smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd
>
>Note that the range is constructed using the maximum and minumum time
>point of ser_3 series.
>
>res < funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
>"random")
>
>But this gives me an error
>
>Error in svd(X) : infinite or missing values in 'x'.
>
>Can anyone tell please help me on how to deal with this dataset for
>this package or any alternative package?
>
>Sincerly,
>Souradeep
>
>______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp>PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html>and provide commented, minimal, selfcontained, reproducible code.

Sent from my phone. Please excuse my brevity.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

