C/C++/Fortran Rolling Window Regressions

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

C/C++/Fortran Rolling Window Regressions

jeremiah rounds-2
Hi,

A not unusual task is performing a multiple regression in a rolling window
on a time-series.    A standard piece of advice for doing in R is something
like the code that follows at the end of the email.  I am currently using
an "embed" variant of that code and that piece of advice is out there too.

But, it occurs to me that for such an easily specified matrix operation
standard R code is really slow.   rollapply constantly returns to R
interpreter at each window step for a new lm.   All lm is at its heart is
(X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
window you are just incrementing a counter and peeling off rows (or columns
of X and y) of a particular window size, and following that up with some
matrix multiplication in a loop.   The psuedo-code for that Rcpp
practically writes itself and you might want a wrapper of something like:
rolling_lm (y=y, x=x, width=4).

My question is this: has any of the thousands of R packages out there
published anything like that.  Rolling window multiple regressions that
stay in C/C++ until the rolling window completes?  No sense and writing it
if it exist.


Thanks,
Jeremiah

Standard (slow) advice for "rolling window regression" follows:


set.seed(1)
z <- zoo(matrix(rnorm(10), ncol = 2))
colnames(z) <- c("y", "x")

## rolling regression of width 4
rollapply(z, width = 4,
   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
   by.column = FALSE, align = "right")

## result is identical to
coef(lm(y ~ x, data = z[1:4,]))
coef(lm(y ~ x, data = z[2:5,]))

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

Gabor Grothendieck
Just replacing lm with a faster version would speed it up.  Try lm.fit
or even faster is fastLm in the RcppArmadillo package.

On Thu, Jul 21, 2016 at 2:02 PM, jeremiah rounds
<[hidden email]> wrote:

> Hi,
>
> A not unusual task is performing a multiple regression in a rolling window
> on a time-series.    A standard piece of advice for doing in R is something
> like the code that follows at the end of the email.  I am currently using
> an "embed" variant of that code and that piece of advice is out there too.
>
> But, it occurs to me that for such an easily specified matrix operation
> standard R code is really slow.   rollapply constantly returns to R
> interpreter at each window step for a new lm.   All lm is at its heart is
> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
> window you are just incrementing a counter and peeling off rows (or columns
> of X and y) of a particular window size, and following that up with some
> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> practically writes itself and you might want a wrapper of something like:
> rolling_lm (y=y, x=x, width=4).
>
> My question is this: has any of the thousands of R packages out there
> published anything like that.  Rolling window multiple regressions that
> stay in C/C++ until the rolling window completes?  No sense and writing it
> if it exist.
>
>
> Thanks,
> Jeremiah
>
> Standard (slow) advice for "rolling window regression" follows:
>
>
> set.seed(1)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> rollapply(z, width = 4,
>    function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>    by.column = FALSE, align = "right")
>
> ## result is identical to
> coef(lm(y ~ x, data = z[1:4,]))
> coef(lm(y ~ x, data = z[2:5,]))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

Achim Zeileis-4
In reply to this post by jeremiah rounds-2
Jeremiah,

for this purpose there are the "roll" and "RcppRoll" packages. Both use
Rcpp and the former also provides rolling lm models. The latter has a
generic interface that let's you define your own function.

One thing to pay attention to, though, is the numerical reliability.
Especially on large time series with relatively short windows there is a
good chance of encountering numerically challenging situations. The QR
decomposition used by lm is fairly robust while other more straightforward
matrix multiplications may not be. This should be kept in mind when
writing your own Rcpp code for plugging it into RcppRoll.

But I haven't check what the roll package does and how reliable that is...

hth,
Z

On Thu, 21 Jul 2016, jeremiah rounds wrote:

> Hi,
>
> A not unusual task is performing a multiple regression in a rolling window
> on a time-series.    A standard piece of advice for doing in R is something
> like the code that follows at the end of the email.  I am currently using
> an "embed" variant of that code and that piece of advice is out there too.
>
> But, it occurs to me that for such an easily specified matrix operation
> standard R code is really slow.   rollapply constantly returns to R
> interpreter at each window step for a new lm.   All lm is at its heart is
> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
> window you are just incrementing a counter and peeling off rows (or columns
> of X and y) of a particular window size, and following that up with some
> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> practically writes itself and you might want a wrapper of something like:
> rolling_lm (y=y, x=x, width=4).
>
> My question is this: has any of the thousands of R packages out there
> published anything like that.  Rolling window multiple regressions that
> stay in C/C++ until the rolling window completes?  No sense and writing it
> if it exist.
>
>
> Thanks,
> Jeremiah
>
> Standard (slow) advice for "rolling window regression" follows:
>
>
> set.seed(1)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> rollapply(z, width = 4,
>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>   by.column = FALSE, align = "right")
>
> ## result is identical to
> coef(lm(y ~ x, data = z[1:4,]))
> coef(lm(y ~ x, data = z[2:5,]))
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

jeremiah rounds-2
 Thanks all.  roll::roll_lm was essentially what I wanted.   I think maybe
I would prefer it to have options to return a few more things, but it is
the coefficients, and the remaining statistics you might want can be
calculated fast enough from there.


On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <[hidden email]>
wrote:

> Jeremiah,
>
> for this purpose there are the "roll" and "RcppRoll" packages. Both use
> Rcpp and the former also provides rolling lm models. The latter has a
> generic interface that let's you define your own function.
>
> One thing to pay attention to, though, is the numerical reliability.
> Especially on large time series with relatively short windows there is a
> good chance of encountering numerically challenging situations. The QR
> decomposition used by lm is fairly robust while other more straightforward
> matrix multiplications may not be. This should be kept in mind when writing
> your own Rcpp code for plugging it into RcppRoll.
>
> But I haven't check what the roll package does and how reliable that is...
>
> hth,
> Z
>
>
> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>
> Hi,
>>
>> A not unusual task is performing a multiple regression in a rolling window
>> on a time-series.    A standard piece of advice for doing in R is
>> something
>> like the code that follows at the end of the email.  I am currently using
>> an "embed" variant of that code and that piece of advice is out there too.
>>
>> But, it occurs to me that for such an easily specified matrix operation
>> standard R code is really slow.   rollapply constantly returns to R
>> interpreter at each window step for a new lm.   All lm is at its heart is
>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
>> window you are just incrementing a counter and peeling off rows (or
>> columns
>> of X and y) of a particular window size, and following that up with some
>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
>> practically writes itself and you might want a wrapper of something like:
>> rolling_lm (y=y, x=x, width=4).
>>
>> My question is this: has any of the thousands of R packages out there
>> published anything like that.  Rolling window multiple regressions that
>> stay in C/C++ until the rolling window completes?  No sense and writing it
>> if it exist.
>>
>>
>> Thanks,
>> Jeremiah
>>
>> Standard (slow) advice for "rolling window regression" follows:
>>
>>
>> set.seed(1)
>> z <- zoo(matrix(rnorm(10), ncol = 2))
>> colnames(z) <- c("y", "x")
>>
>> ## rolling regression of width 4
>> rollapply(z, width = 4,
>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>   by.column = FALSE, align = "right")
>>
>> ## result is identical to
>> coef(lm(y ~ x, data = z[1:4,]))
>> coef(lm(y ~ x, data = z[2:5,]))
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

Gabor Grothendieck
I would be careful about making assumptions regarding what is faster.
Performance tends to be nonintuitive.

When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
you provided rollapply/fastLm was three times faster than roll_lm.  Of
course this could change with data of different dimensions but it
would be worthwhile to do actual benchmarks before making assumptions.

I also noticed that roll_lm did not give the same coefficients as the other two.

set.seed(1)
library(zoo)
library(RcppArmadillo)
library(roll)
z <- zoo(matrix(rnorm(10), ncol = 2))
colnames(z) <- c("y", "x")

## rolling regression of width 4
library(rbenchmark)
benchmark(fastLm = rollapplyr(z, width = 4,
     function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
     by.column = FALSE),
   lm = rollapplyr(z, width = 4,
     function(x) coef(lm(y ~ x, data = as.data.frame(x))),
     by.column = FALSE),
   roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop = F]), 4,
     center = FALSE))[1:4]


     test replications elapsed relative
1  fastLm          100    0.22    1.000
2      lm          100    0.72    3.273
3 roll_lm          100    0.64    2.909

On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
<[hidden email]> wrote:

>  Thanks all.  roll::roll_lm was essentially what I wanted.   I think maybe
> I would prefer it to have options to return a few more things, but it is
> the coefficients, and the remaining statistics you might want can be
> calculated fast enough from there.
>
>
> On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <[hidden email]>
> wrote:
>
>> Jeremiah,
>>
>> for this purpose there are the "roll" and "RcppRoll" packages. Both use
>> Rcpp and the former also provides rolling lm models. The latter has a
>> generic interface that let's you define your own function.
>>
>> One thing to pay attention to, though, is the numerical reliability.
>> Especially on large time series with relatively short windows there is a
>> good chance of encountering numerically challenging situations. The QR
>> decomposition used by lm is fairly robust while other more straightforward
>> matrix multiplications may not be. This should be kept in mind when writing
>> your own Rcpp code for plugging it into RcppRoll.
>>
>> But I haven't check what the roll package does and how reliable that is...
>>
>> hth,
>> Z
>>
>>
>> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>>
>> Hi,
>>>
>>> A not unusual task is performing a multiple regression in a rolling window
>>> on a time-series.    A standard piece of advice for doing in R is
>>> something
>>> like the code that follows at the end of the email.  I am currently using
>>> an "embed" variant of that code and that piece of advice is out there too.
>>>
>>> But, it occurs to me that for such an easily specified matrix operation
>>> standard R code is really slow.   rollapply constantly returns to R
>>> interpreter at each window step for a new lm.   All lm is at its heart is
>>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
>>> window you are just incrementing a counter and peeling off rows (or
>>> columns
>>> of X and y) of a particular window size, and following that up with some
>>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
>>> practically writes itself and you might want a wrapper of something like:
>>> rolling_lm (y=y, x=x, width=4).
>>>
>>> My question is this: has any of the thousands of R packages out there
>>> published anything like that.  Rolling window multiple regressions that
>>> stay in C/C++ until the rolling window completes?  No sense and writing it
>>> if it exist.
>>>
>>>
>>> Thanks,
>>> Jeremiah
>>>
>>> Standard (slow) advice for "rolling window regression" follows:
>>>
>>>
>>> set.seed(1)
>>> z <- zoo(matrix(rnorm(10), ncol = 2))
>>> colnames(z) <- c("y", "x")
>>>
>>> ## rolling regression of width 4
>>> rollapply(z, width = 4,
>>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>>   by.column = FALSE, align = "right")
>>>
>>> ## result is identical to
>>> coef(lm(y ~ x, data = z[1:4,]))
>>> coef(lm(y ~ x, data = z[2:5,]))
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

jeremiah rounds-2
I appreciate the timing, so much so I changed the code to show the issue.
 It is a problem of scale.

 roll_lm probably has a heavy start-up cost but otherwise completely
out-performs those other versions at scale.  I suspect you are timing the
nearly  constant time start-up cost in small data.  I did give code to
paint a picture, but it was just cartoon code lifted from stackexchange.
If you want to characterize the real problem it is closer to:
30 day rolling windows on 24 daily (by hour) measurements for 5 years with
24+7 -1 dummy predictor variables and finally you need to do this for 300
sets of data.

Pseudo-code is closer to what follows and roll_lm can handle that input in
a timely manner.  You can do it with lm.fit, but you need to spend a lot of
time waiting.  The issue of accuracy needs a follow-up check.  Not sure why
it would be different.  Worth a check on that.

Thanks,
Jeremiah


library(rbenchmark)
N = 30*24*12*5
window = 30*24
npred = 15  #15 chosen arbitrarily...
set.seed(1)
library(zoo)
library(RcppArmadillo)
library(roll)
x = matrix(rnorm(N*(npred+1)), ncol = npred+1)
colnames(x) <- c("y",  paste0("x", 1:npred))
z <- zoo(x)


benchmark(
   roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, -1, drop =
F]), window,
     center = FALSE), replications=3)

Which comes out as:
     test replications elapsed relative user.self sys.self user.child
sys.child
1 roll_lm            3   6.273        1    38.312    0.654          0
  0





## You arn't going to get that below...

benchmark(fastLm = rollapplyr(z, width = window,
     function(x) coef(fastLm(cbind(1, x[, -1]), x[, 1])),
     by.column = FALSE),
   lm = rollapplyr(z, width = window,
     function(x) coef(lm(y ~ ., data = as.data.frame(x))),
     by.column = FALSE), replications=3)



On Thu, Jul 21, 2016 at 1:28 PM, Gabor Grothendieck <[hidden email]
> wrote:

> I would be careful about making assumptions regarding what is faster.
> Performance tends to be nonintuitive.
>
> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
> you provided rollapply/fastLm was three times faster than roll_lm.  Of
> course this could change with data of different dimensions but it
> would be worthwhile to do actual benchmarks before making assumptions.
>
> I also noticed that roll_lm did not give the same coefficients as the
> other two.
>
> set.seed(1)
> library(zoo)
> library(RcppArmadillo)
> library(roll)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> library(rbenchmark)
> benchmark(fastLm = rollapplyr(z, width = 4,
>      function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>      by.column = FALSE),
>    lm = rollapplyr(z, width = 4,
>      function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>      by.column = FALSE),
>    roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
> F]), 4,
>      center = FALSE))[1:4]
>
>
>      test replications elapsed relative
> 1  fastLm          100    0.22    1.000
> 2      lm          100    0.72    3.273
> 3 roll_lm          100    0.64    2.909
>
> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
> <[hidden email]> wrote:
> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
> maybe
> > I would prefer it to have options to return a few more things, but it is
> > the coefficients, and the remaining statistics you might want can be
> > calculated fast enough from there.
> >
> >
> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
> [hidden email]>
> > wrote:
> >
> >> Jeremiah,
> >>
> >> for this purpose there are the "roll" and "RcppRoll" packages. Both use
> >> Rcpp and the former also provides rolling lm models. The latter has a
> >> generic interface that let's you define your own function.
> >>
> >> One thing to pay attention to, though, is the numerical reliability.
> >> Especially on large time series with relatively short windows there is a
> >> good chance of encountering numerically challenging situations. The QR
> >> decomposition used by lm is fairly robust while other more
> straightforward
> >> matrix multiplications may not be. This should be kept in mind when
> writing
> >> your own Rcpp code for plugging it into RcppRoll.
> >>
> >> But I haven't check what the roll package does and how reliable that
> is...
> >>
> >> hth,
> >> Z
> >>
> >>
> >> On Thu, 21 Jul 2016, jeremiah rounds wrote:
> >>
> >> Hi,
> >>>
> >>> A not unusual task is performing a multiple regression in a rolling
> window
> >>> on a time-series.    A standard piece of advice for doing in R is
> >>> something
> >>> like the code that follows at the end of the email.  I am currently
> using
> >>> an "embed" variant of that code and that piece of advice is out there
> too.
> >>>
> >>> But, it occurs to me that for such an easily specified matrix operation
> >>> standard R code is really slow.   rollapply constantly returns to R
> >>> interpreter at each window step for a new lm.   All lm is at its heart
> is
> >>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in
> rolling
> >>> window you are just incrementing a counter and peeling off rows (or
> >>> columns
> >>> of X and y) of a particular window size, and following that up with
> some
> >>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> >>> practically writes itself and you might want a wrapper of something
> like:
> >>> rolling_lm (y=y, x=x, width=4).
> >>>
> >>> My question is this: has any of the thousands of R packages out there
> >>> published anything like that.  Rolling window multiple regressions that
> >>> stay in C/C++ until the rolling window completes?  No sense and
> writing it
> >>> if it exist.
> >>>
> >>>
> >>> Thanks,
> >>> Jeremiah
> >>>
> >>> Standard (slow) advice for "rolling window regression" follows:
> >>>
> >>>
> >>> set.seed(1)
> >>> z <- zoo(matrix(rnorm(10), ncol = 2))
> >>> colnames(z) <- c("y", "x")
> >>>
> >>> ## rolling regression of width 4
> >>> rollapply(z, width = 4,
> >>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
> >>>   by.column = FALSE, align = "right")
> >>>
> >>> ## result is identical to
> >>> coef(lm(y ~ x, data = z[1:4,]))
> >>> coef(lm(y ~ x, data = z[2:5,]))
> >>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

mark leeds
In reply to this post by Gabor Grothendieck
Hi Jermiah: another possibly faster way would be to use a kalman filtering
framework. I forget the details but duncan and horne have a paper which
shows how a regression can be re-computed each time a new data point is
added .I
forget if they handle taking one off of the back also which is what you
need.

The paper at the link below isn't the paper I'm talking about but it's
reference[1] in that paper. Note that this suggestion might not be a better
approach  than the various approaches already suggested so I wouldn't go
this route unless you're very interested.


Mark

https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf






On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck <[hidden email]
> wrote:

> I would be careful about making assumptions regarding what is faster.
> Performance tends to be nonintuitive.
>
> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
> you provided rollapply/fastLm was three times faster than roll_lm.  Of
> course this could change with data of different dimensions but it
> would be worthwhile to do actual benchmarks before making assumptions.
>
> I also noticed that roll_lm did not give the same coefficients as the
> other two.
>
> set.seed(1)
> library(zoo)
> library(RcppArmadillo)
> library(roll)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> library(rbenchmark)
> benchmark(fastLm = rollapplyr(z, width = 4,
>      function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>      by.column = FALSE),
>    lm = rollapplyr(z, width = 4,
>      function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>      by.column = FALSE),
>    roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
> F]), 4,
>      center = FALSE))[1:4]
>
>
>      test replications elapsed relative
> 1  fastLm          100    0.22    1.000
> 2      lm          100    0.72    3.273
> 3 roll_lm          100    0.64    2.909
>
> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
> <[hidden email]> wrote:
> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
> maybe
> > I would prefer it to have options to return a few more things, but it is
> > the coefficients, and the remaining statistics you might want can be
> > calculated fast enough from there.
> >
> >
> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
> [hidden email]>
> > wrote:
> >
> >> Jeremiah,
> >>
> >> for this purpose there are the "roll" and "RcppRoll" packages. Both use
> >> Rcpp and the former also provides rolling lm models. The latter has a
> >> generic interface that let's you define your own function.
> >>
> >> One thing to pay attention to, though, is the numerical reliability.
> >> Especially on large time series with relatively short windows there is a
> >> good chance of encountering numerically challenging situations. The QR
> >> decomposition used by lm is fairly robust while other more
> straightforward
> >> matrix multiplications may not be. This should be kept in mind when
> writing
> >> your own Rcpp code for plugging it into RcppRoll.
> >>
> >> But I haven't check what the roll package does and how reliable that
> is...
> >>
> >> hth,
> >> Z
> >>
> >>
> >> On Thu, 21 Jul 2016, jeremiah rounds wrote:
> >>
> >> Hi,
> >>>
> >>> A not unusual task is performing a multiple regression in a rolling
> window
> >>> on a time-series.    A standard piece of advice for doing in R is
> >>> something
> >>> like the code that follows at the end of the email.  I am currently
> using
> >>> an "embed" variant of that code and that piece of advice is out there
> too.
> >>>
> >>> But, it occurs to me that for such an easily specified matrix operation
> >>> standard R code is really slow.   rollapply constantly returns to R
> >>> interpreter at each window step for a new lm.   All lm is at its heart
> is
> >>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in
> rolling
> >>> window you are just incrementing a counter and peeling off rows (or
> >>> columns
> >>> of X and y) of a particular window size, and following that up with
> some
> >>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> >>> practically writes itself and you might want a wrapper of something
> like:
> >>> rolling_lm (y=y, x=x, width=4).
> >>>
> >>> My question is this: has any of the thousands of R packages out there
> >>> published anything like that.  Rolling window multiple regressions that
> >>> stay in C/C++ until the rolling window completes?  No sense and
> writing it
> >>> if it exist.
> >>>
> >>>
> >>> Thanks,
> >>> Jeremiah
> >>>
> >>> Standard (slow) advice for "rolling window regression" follows:
> >>>
> >>>
> >>> set.seed(1)
> >>> z <- zoo(matrix(rnorm(10), ncol = 2))
> >>> colnames(z) <- c("y", "x")
> >>>
> >>> ## rolling regression of width 4
> >>> rollapply(z, width = 4,
> >>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
> >>>   by.column = FALSE, align = "right")
> >>>
> >>> ## result is identical to
> >>> coef(lm(y ~ x, data = z[1:4,]))
> >>> coef(lm(y ~ x, data = z[2:5,]))
> >>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

Roy Mendelssohn - NOAA Federal
I have no idea which method produces the fastest results,  but the package KFAS has a function to do recursive regressions using the Kalman filter.  One difference is that it is not, as far as a I can telll, a moving window (so past data are being dropped),  just a recursively computed regression.

HTH,

-Roy

> On Jul 21, 2016, at 2:08 PM, Mark Leeds <[hidden email]> wrote:
>
> Hi Jermiah: another possibly faster way would be to use a kalman filtering
> framework. I forget the details but duncan and horne have a paper which
> shows how a regression can be re-computed each time a new data point is
> added .I
> forget if they handle taking one off of the back also which is what you
> need.
>
> The paper at the link below isn't the paper I'm talking about but it's
> reference[1] in that paper. Note that this suggestion might not be a better
> approach  than the various approaches already suggested so I wouldn't go
> this route unless you're very interested.
>
>
> Mark
>
> https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf
>
>
>
>
>
>
> On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck <[hidden email]
>> wrote:
>
>> I would be careful about making assumptions regarding what is faster.
>> Performance tends to be nonintuitive.
>>
>> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
>> you provided rollapply/fastLm was three times faster than roll_lm.  Of
>> course this could change with data of different dimensions but it
>> would be worthwhile to do actual benchmarks before making assumptions.
>>
>> I also noticed that roll_lm did not give the same coefficients as the
>> other two.
>>
>> set.seed(1)
>> library(zoo)
>> library(RcppArmadillo)
>> library(roll)
>> z <- zoo(matrix(rnorm(10), ncol = 2))
>> colnames(z) <- c("y", "x")
>>
>> ## rolling regression of width 4
>> library(rbenchmark)
>> benchmark(fastLm = rollapplyr(z, width = 4,
>>     function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>>     by.column = FALSE),
>>   lm = rollapplyr(z, width = 4,
>>     function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>     by.column = FALSE),
>>   roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
>> F]), 4,
>>     center = FALSE))[1:4]
>>
>>
>>     test replications elapsed relative
>> 1  fastLm          100    0.22    1.000
>> 2      lm          100    0.72    3.273
>> 3 roll_lm          100    0.64    2.909
>>
>> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
>> <[hidden email]> wrote:
>>> Thanks all.  roll::roll_lm was essentially what I wanted.   I think
>> maybe
>>> I would prefer it to have options to return a few more things, but it is
>>> the coefficients, and the remaining statistics you might want can be
>>> calculated fast enough from there.
>>>
>>>
>>> On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
>> [hidden email]>
>>> wrote:
>>>
>>>> Jeremiah,
>>>>
>>>> for this purpose there are the "roll" and "RcppRoll" packages. Both use
>>>> Rcpp and the former also provides rolling lm models. The latter has a
>>>> generic interface that let's you define your own function.
>>>>
>>>> One thing to pay attention to, though, is the numerical reliability.
>>>> Especially on large time series with relatively short windows there is a
>>>> good chance of encountering numerically challenging situations. The QR
>>>> decomposition used by lm is fairly robust while other more
>> straightforward
>>>> matrix multiplications may not be. This should be kept in mind when
>> writing
>>>> your own Rcpp code for plugging it into RcppRoll.
>>>>
>>>> But I haven't check what the roll package does and how reliable that
>> is...
>>>>
>>>> hth,
>>>> Z
>>>>
>>>>
>>>> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> A not unusual task is performing a multiple regression in a rolling
>> window
>>>>> on a time-series.    A standard piece of advice for doing in R is
>>>>> something
>>>>> like the code that follows at the end of the email.  I am currently
>> using
>>>>> an "embed" variant of that code and that piece of advice is out there
>> too.
>>>>>
>>>>> But, it occurs to me that for such an easily specified matrix operation
>>>>> standard R code is really slow.   rollapply constantly returns to R
>>>>> interpreter at each window step for a new lm.   All lm is at its heart
>> is
>>>>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in
>> rolling
>>>>> window you are just incrementing a counter and peeling off rows (or
>>>>> columns
>>>>> of X and y) of a particular window size, and following that up with
>> some
>>>>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
>>>>> practically writes itself and you might want a wrapper of something
>> like:
>>>>> rolling_lm (y=y, x=x, width=4).
>>>>>
>>>>> My question is this: has any of the thousands of R packages out there
>>>>> published anything like that.  Rolling window multiple regressions that
>>>>> stay in C/C++ until the rolling window completes?  No sense and
>> writing it
>>>>> if it exist.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Jeremiah
>>>>>
>>>>> Standard (slow) advice for "rolling window regression" follows:
>>>>>
>>>>>
>>>>> set.seed(1)
>>>>> z <- zoo(matrix(rnorm(10), ncol = 2))
>>>>> colnames(z) <- c("y", "x")
>>>>>
>>>>> ## rolling regression of width 4
>>>>> rollapply(z, width = 4,
>>>>>  function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>>>>  by.column = FALSE, align = "right")
>>>>>
>>>>> ## result is identical to
>>>>> coef(lm(y ~ x, data = z[1:4,]))
>>>>> coef(lm(y ~ x, data = z[2:5,]))
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: [hidden email] www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

jeremiah rounds-2
In reply to this post by mark leeds
I agree that when appropriate Kalman Filter/Smoothing the higher-quality
way to go about estimating a time-varying coefficient (given that is what
they do),  and I have noted that both the R package "dlm" and the function
"StructTS" handle these problems quickly.  I am working on that in
parallel.

One of the things I am unsure about with Kalman Filters is how to estimate
variance parameters when the process is unusual in some way that isn't in
the model and it is not feasible to adjust the model by-hand.  dlm's dlmMLE
seems to produce non-sense (not because of the author's work but because of
assumptions).  At least with moving window regressions after the unusual
event is past your window the influence of that event is gone.    That
isn't really a question for this group it is more about me reading more.
When I get that "how to handle all the strange things big data throws at
you" worked out for Kalman Filters, I will go back to those because I
certainly like what I see when everything is right.  There is a plethora of
related topics right?  Bayesian Model Averaging, G-ARCH models for
heteroscedasticity, etc.

Anyway... roll::roll_lm, cheers!

Thanks,
Jeremiah



On Thu, Jul 21, 2016 at 2:08 PM, Mark Leeds <[hidden email]> wrote:

> Hi Jermiah: another possibly faster way would be to use a kalman filtering
> framework. I forget the details but duncan and horne have a paper which
> shows how a regression can be re-computed each time a new data point is
> added .I
> forget if they handle taking one off of the back also which is what you
> need.
>
> The paper at the link below isn't the paper I'm talking about but it's
> reference[1] in that paper. Note that this suggestion might not be a better
> approach  than the various approaches already suggested so I wouldn't go
> this route unless you're very interested.
>
>
> Mark
>
> https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf
>
>
>
>
>
>
> On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck <
> [hidden email]> wrote:
>
>> I would be careful about making assumptions regarding what is faster.
>> Performance tends to be nonintuitive.
>>
>> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
>> you provided rollapply/fastLm was three times faster than roll_lm.  Of
>> course this could change with data of different dimensions but it
>> would be worthwhile to do actual benchmarks before making assumptions.
>>
>> I also noticed that roll_lm did not give the same coefficients as the
>> other two.
>>
>> set.seed(1)
>> library(zoo)
>> library(RcppArmadillo)
>> library(roll)
>> z <- zoo(matrix(rnorm(10), ncol = 2))
>> colnames(z) <- c("y", "x")
>>
>> ## rolling regression of width 4
>> library(rbenchmark)
>> benchmark(fastLm = rollapplyr(z, width = 4,
>>      function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>>      by.column = FALSE),
>>    lm = rollapplyr(z, width = 4,
>>      function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>      by.column = FALSE),
>>    roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
>> F]), 4,
>>      center = FALSE))[1:4]
>>
>>
>>      test replications elapsed relative
>> 1  fastLm          100    0.22    1.000
>> 2      lm          100    0.72    3.273
>> 3 roll_lm          100    0.64    2.909
>>
>> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
>> <[hidden email]> wrote:
>> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
>> maybe
>> > I would prefer it to have options to return a few more things, but it is
>> > the coefficients, and the remaining statistics you might want can be
>> > calculated fast enough from there.
>> >
>> >
>> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
>> [hidden email]>
>> > wrote:
>> >
>> >> Jeremiah,
>> >>
>> >> for this purpose there are the "roll" and "RcppRoll" packages. Both use
>> >> Rcpp and the former also provides rolling lm models. The latter has a
>> >> generic interface that let's you define your own function.
>> >>
>> >> One thing to pay attention to, though, is the numerical reliability.
>> >> Especially on large time series with relatively short windows there is
>> a
>> >> good chance of encountering numerically challenging situations. The QR
>> >> decomposition used by lm is fairly robust while other more
>> straightforward
>> >> matrix multiplications may not be. This should be kept in mind when
>> writing
>> >> your own Rcpp code for plugging it into RcppRoll.
>> >>
>> >> But I haven't check what the roll package does and how reliable that
>> is...
>> >>
>> >> hth,
>> >> Z
>> >>
>> >>
>> >> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>> >>
>> >> Hi,
>> >>>
>> >>> A not unusual task is performing a multiple regression in a rolling
>> window
>> >>> on a time-series.    A standard piece of advice for doing in R is
>> >>> something
>> >>> like the code that follows at the end of the email.  I am currently
>> using
>> >>> an "embed" variant of that code and that piece of advice is out there
>> too.
>> >>>
>> >>> But, it occurs to me that for such an easily specified matrix
>> operation
>> >>> standard R code is really slow.   rollapply constantly returns to R
>> >>> interpreter at each window step for a new lm.   All lm is at its
>> heart is
>> >>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in
>> rolling
>> >>> window you are just incrementing a counter and peeling off rows (or
>> >>> columns
>> >>> of X and y) of a particular window size, and following that up with
>> some
>> >>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
>> >>> practically writes itself and you might want a wrapper of something
>> like:
>> >>> rolling_lm (y=y, x=x, width=4).
>> >>>
>> >>> My question is this: has any of the thousands of R packages out there
>> >>> published anything like that.  Rolling window multiple regressions
>> that
>> >>> stay in C/C++ until the rolling window completes?  No sense and
>> writing it
>> >>> if it exist.
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Jeremiah
>> >>>
>> >>> Standard (slow) advice for "rolling window regression" follows:
>> >>>
>> >>>
>> >>> set.seed(1)
>> >>> z <- zoo(matrix(rnorm(10), ncol = 2))
>> >>> colnames(z) <- c("y", "x")
>> >>>
>> >>> ## rolling regression of width 4
>> >>> rollapply(z, width = 4,
>> >>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>> >>>   by.column = FALSE, align = "right")
>> >>>
>> >>> ## result is identical to
>> >>> coef(lm(y ~ x, data = z[1:4,]))
>> >>> coef(lm(y ~ x, data = z[2:5,]))
>> >>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

Jean-Claude Arbaut
In reply to this post by jeremiah rounds-2
This may be useful:

Sven Hammarling and Craig Lucas
"Updating the QR factorization and the least squares problem"
http://eprints.ma.man.ac.uk/1192/01/covered/MIMS_ep2008_111.pdf
http://www.maths.manchester.ac.uk/~clucas/updating/

2016-07-21 20:02 GMT+02:00 jeremiah rounds <[hidden email]>:

> Hi,
>
> A not unusual task is performing a multiple regression in a rolling window
> on a time-series.    A standard piece of advice for doing in R is something
> like the code that follows at the end of the email.  I am currently using
> an "embed" variant of that code and that piece of advice is out there too.
>
> But, it occurs to me that for such an easily specified matrix operation
> standard R code is really slow.   rollapply constantly returns to R
> interpreter at each window step for a new lm.   All lm is at its heart is
> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
> window you are just incrementing a counter and peeling off rows (or columns
> of X and y) of a particular window size, and following that up with some
> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> practically writes itself and you might want a wrapper of something like:
> rolling_lm (y=y, x=x, width=4).
>
> My question is this: has any of the thousands of R packages out there
> published anything like that.  Rolling window multiple regressions that
> stay in C/C++ until the rolling window completes?  No sense and writing it
> if it exist.
>
>
> Thanks,
> Jeremiah
>
> Standard (slow) advice for "rolling window regression" follows:
>
>
> set.seed(1)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> rollapply(z, width = 4,
>    function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>    by.column = FALSE, align = "right")
>
> ## result is identical to
> coef(lm(y ~ x, data = z[1:4,]))
> coef(lm(y ~ x, data = z[2:5,]))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

mark leeds
In reply to this post by jeremiah rounds-2
Hi Jeremiah: I think I wasn't that clear. I'm not suggesting  the kalman
filter to deal with time varying coefficients. As Roy pointed out, one can
use the kalman filter to do regular regression where one "sees" a new data
point as each time unit passes. It can be assumed that the coefficients do
not vary ( basically by having no variance in the system equation ).

The problem as I see it, is that Duncan and Horn's approach ( and Roy
alluded
to this problem also ), only deals with adding one point at a time to the
front of the data set.  It doesn't handle the fact that you want to drop
the nth observation and everything older than that observation  also.

don't know how easy it would be  to modify their approach to deal with the
fact that you are using a moving window rather than just adding one point
at a time.

The main point I wanted to get across here is that I was not suggesting the
KF as a way to handle varying coefficients. You can assume that they're
fixed and still use it. See the reference I pointed out for more on this
approach and my apologies for the confusion.
















On Thu, Jul 21, 2016 at 5:43 PM, jeremiah rounds <[hidden email]>
wrote:

> I agree that when appropriate Kalman Filter/Smoothing the higher-quality
> way to go about estimating a time-varying coefficient (given that is what
> they do),  and I have noted that both the R package "dlm" and the function
> "StructTS" handle these problems quickly.  I am working on that in
> parallel.
>
> One of the things I am unsure about with Kalman Filters is how to estimate
> variance parameters when the process is unusual in some way that isn't in
> the model and it is not feasible to adjust the model by-hand.  dlm's dlmMLE
> seems to produce non-sense (not because of the author's work but because of
> assumptions).  At least with moving window regressions after the unusual
> event is past your window the influence of that event is gone.    That
> isn't really a question for this group it is more about me reading more.
> When I get that "how to handle all the strange things big data throws at
> you" worked out for Kalman Filters, I will go back to those because I
> certainly like what I see when everything is right.  There is a plethora of
> related topics right?  Bayesian Model Averaging, G-ARCH models for
> heteroscedasticity, etc.
>
> Anyway... roll::roll_lm, cheers!
>
> Thanks,
> Jeremiah
>
>
>
> On Thu, Jul 21, 2016 at 2:08 PM, Mark Leeds <[hidden email]> wrote:
>
>> Hi Jermiah: another possibly faster way would be to use a kalman
>> filtering framework. I forget the details but duncan and horne have a paper
>> which shows how a regression can be re-computed each time a new data point
>> is added .I
>> forget if they handle taking one off of the back also which is what you
>> need.
>>
>> The paper at the link below isn't the paper I'm talking about but it's
>> reference[1] in that paper. Note that this suggestion might not be a better
>> approach  than the various approaches already suggested so I wouldn't go
>> this route unless you're very interested.
>>
>>
>> Mark
>>
>> https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck <
>> [hidden email]> wrote:
>>
>>> I would be careful about making assumptions regarding what is faster.
>>> Performance tends to be nonintuitive.
>>>
>>> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
>>> you provided rollapply/fastLm was three times faster than roll_lm.  Of
>>> course this could change with data of different dimensions but it
>>> would be worthwhile to do actual benchmarks before making assumptions.
>>>
>>> I also noticed that roll_lm did not give the same coefficients as the
>>> other two.
>>>
>>> set.seed(1)
>>> library(zoo)
>>> library(RcppArmadillo)
>>> library(roll)
>>> z <- zoo(matrix(rnorm(10), ncol = 2))
>>> colnames(z) <- c("y", "x")
>>>
>>> ## rolling regression of width 4
>>> library(rbenchmark)
>>> benchmark(fastLm = rollapplyr(z, width = 4,
>>>      function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>>>      by.column = FALSE),
>>>    lm = rollapplyr(z, width = 4,
>>>      function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>>      by.column = FALSE),
>>>    roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
>>> F]), 4,
>>>      center = FALSE))[1:4]
>>>
>>>
>>>      test replications elapsed relative
>>> 1  fastLm          100    0.22    1.000
>>> 2      lm          100    0.72    3.273
>>> 3 roll_lm          100    0.64    2.909
>>>
>>> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
>>> <[hidden email]> wrote:
>>> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
>>> maybe
>>> > I would prefer it to have options to return a few more things, but it
>>> is
>>> > the coefficients, and the remaining statistics you might want can be
>>> > calculated fast enough from there.
>>> >
>>> >
>>> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
>>> [hidden email]>
>>> > wrote:
>>> >
>>> >> Jeremiah,
>>> >>
>>> >> for this purpose there are the "roll" and "RcppRoll" packages. Both
>>> use
>>> >> Rcpp and the former also provides rolling lm models. The latter has a
>>> >> generic interface that let's you define your own function.
>>> >>
>>> >> One thing to pay attention to, though, is the numerical reliability.
>>> >> Especially on large time series with relatively short windows there
>>> is a
>>> >> good chance of encountering numerically challenging situations. The QR
>>> >> decomposition used by lm is fairly robust while other more
>>> straightforward
>>> >> matrix multiplications may not be. This should be kept in mind when
>>> writing
>>> >> your own Rcpp code for plugging it into RcppRoll.
>>> >>
>>> >> But I haven't check what the roll package does and how reliable that
>>> is...
>>> >>
>>> >> hth,
>>> >> Z
>>> >>
>>> >>
>>> >> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>>> >>
>>> >> Hi,
>>> >>>
>>> >>> A not unusual task is performing a multiple regression in a rolling
>>> window
>>> >>> on a time-series.    A standard piece of advice for doing in R is
>>> >>> something
>>> >>> like the code that follows at the end of the email.  I am currently
>>> using
>>> >>> an "embed" variant of that code and that piece of advice is out
>>> there too.
>>> >>>
>>> >>> But, it occurs to me that for such an easily specified matrix
>>> operation
>>> >>> standard R code is really slow.   rollapply constantly returns to R
>>> >>> interpreter at each window step for a new lm.   All lm is at its
>>> heart is
>>> >>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in
>>> rolling
>>> >>> window you are just incrementing a counter and peeling off rows (or
>>> >>> columns
>>> >>> of X and y) of a particular window size, and following that up with
>>> some
>>> >>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
>>> >>> practically writes itself and you might want a wrapper of something
>>> like:
>>> >>> rolling_lm (y=y, x=x, width=4).
>>> >>>
>>> >>> My question is this: has any of the thousands of R packages out there
>>> >>> published anything like that.  Rolling window multiple regressions
>>> that
>>> >>> stay in C/C++ until the rolling window completes?  No sense and
>>> writing it
>>> >>> if it exist.
>>> >>>
>>> >>>
>>> >>> Thanks,
>>> >>> Jeremiah
>>> >>>
>>> >>> Standard (slow) advice for "rolling window regression" follows:
>>> >>>
>>> >>>
>>> >>> set.seed(1)
>>> >>> z <- zoo(matrix(rnorm(10), ncol = 2))
>>> >>> colnames(z) <- c("y", "x")
>>> >>>
>>> >>> ## rolling regression of width 4
>>> >>> rollapply(z, width = 4,
>>> >>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>> >>>   by.column = FALSE, align = "right")
>>> >>>
>>> >>> ## result is identical to
>>> >>> coef(lm(y ~ x, data = z[1:4,]))
>>> >>> coef(lm(y ~ x, data = z[2:5,]))
>>> >>>
>>> >>>         [[alternative HTML version deleted]]
>>> >>>
>>> >>> ______________________________________________
>>> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>> PLEASE do read the posting guide
>>> >>> http://www.R-project.org/posting-guide.html
>>> >>> and provide commented, minimal, self-contained, reproducible code.
>>> >>>
>>> >>>
>>> >
>>> >         [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> --
>>> Statistics & Software Consulting
>>> GKX Group, GKX Associates Inc.
>>> tel: 1-877-GKX-GROUP
>>> email: ggrothendieck at gmail.com
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: C/C++/Fortran Rolling Window Regressions

Martin Maechler
In reply to this post by jeremiah rounds-2
>>>>> jeremiah rounds <[hidden email]>
>>>>>     on Thu, 21 Jul 2016 13:56:17 -0700 writes:

    > I appreciate the timing, so much so I changed the code to show the issue.
    > It is a problem of scale.

    > roll_lm probably has a heavy start-up cost but otherwise completely
    > out-performs those other versions at scale.  I suspect you are timing the
    > nearly  constant time start-up cost in small data.  I did give code to
    > paint a picture, but it was just cartoon code lifted from stackexchange.
    > If you want to characterize the real problem it is closer to:
    > 30 day rolling windows on 24 daily (by hour) measurements for 5 years with
    > 24+7 -1 dummy predictor variables and finally you need to do this for 300
    > sets of data.

    > Pseudo-code is closer to what follows and roll_lm can handle that input in
    > a timely manner.  You can do it with lm.fit, but you need to spend a lot of
    > time waiting.  The issue of accuracy needs a follow-up check.  Not sure why
    > it would be different.  Worth a check on that.

Just a note about lm(), lm.fit():
As Achim Zeileis noted, they are very reliable implementations
(based on QR) indeed.   For a year or so now, there's the bare
bone  .lm.fit()  which is even one order of magnitued factor
faster than lm.fit()  at least for simple cases, see the final
example on the  ?.lm.fit help page.

Martin Maechler
(who had added .lm.fit() to R )


    > Thanks,
    > Jeremiah


    > library(rbenchmark)
    > N = 30*24*12*5
    > window = 30*24
    > npred = 15  #15 chosen arbitrarily...
    > set.seed(1)
    > library(zoo)
    > library(RcppArmadillo)
    > library(roll)
    > x = matrix(rnorm(N*(npred+1)), ncol = npred+1)
    > colnames(x) <- c("y",  paste0("x", 1:npred))
    > z <- zoo(x)


    > benchmark(
    > roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, -1, drop =
    > F]), window,
    > center = FALSE), replications=3)

    > Which comes out as:
    > test replications elapsed relative user.self sys.self user.child
    > sys.child
    > 1 roll_lm            3   6.273        1    38.312    0.654          0
    > 0





    > ## You arn't going to get that below...

    > benchmark(fastLm = rollapplyr(z, width = window,
    > function(x) coef(fastLm(cbind(1, x[, -1]), x[, 1])),
    > by.column = FALSE),
    > lm = rollapplyr(z, width = window,
    > function(x) coef(lm(y ~ ., data = as.data.frame(x))),
    > by.column = FALSE), replications=3)



    > On Thu, Jul 21, 2016 at 1:28 PM, Gabor Grothendieck <[hidden email]
    >> wrote:

    >> I would be careful about making assumptions regarding what is faster.
    >> Performance tends to be nonintuitive.
    >>
    >> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
    >> you provided rollapply/fastLm was three times faster than roll_lm.  Of
    >> course this could change with data of different dimensions but it
    >> would be worthwhile to do actual benchmarks before making assumptions.
    >>
    >> I also noticed that roll_lm did not give the same coefficients as the
    >> other two.
    >>
    >> set.seed(1)
    >> library(zoo)
    >> library(RcppArmadillo)
    >> library(roll)
    >> z <- zoo(matrix(rnorm(10), ncol = 2))
    >> colnames(z) <- c("y", "x")
    >>
    >> ## rolling regression of width 4
    >> library(rbenchmark)
    >> benchmark(fastLm = rollapplyr(z, width = 4,
    >> function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
    >> by.column = FALSE),
    >> lm = rollapplyr(z, width = 4,
    >> function(x) coef(lm(y ~ x, data = as.data.frame(x))),
    >> by.column = FALSE),
    >> roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
    >> F]), 4,
    >> center = FALSE))[1:4]
    >>
    >>
    >> test replications elapsed relative
    >> 1  fastLm          100    0.22    1.000
    >> 2      lm          100    0.72    3.273
    >> 3 roll_lm          100    0.64    2.909
    >>
    >> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
    >> <[hidden email]> wrote:
    >> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
    >> maybe
    >> > I would prefer it to have options to return a few more things, but it is
    >> > the coefficients, and the remaining statistics you might want can be
    >> > calculated fast enough from there.
    >> >
    >> >
    >> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
    >> [hidden email]>
    >> > wrote:
    >> >
    >> >> Jeremiah,
    >> >>
    >> >> for this purpose there are the "roll" and "RcppRoll" packages. Both use
    >> >> Rcpp and the former also provides rolling lm models. The latter has a
    >> >> generic interface that let's you define your own function.
    >> >>
    >> >> One thing to pay attention to, though, is the numerical reliability.
    >> >> Especially on large time series with relatively short windows there is a
    >> >> good chance of encountering numerically challenging situations. The QR
    >> >> decomposition used by lm is fairly robust while other more
    >> straightforward
    >> >> matrix multiplications may not be. This should be kept in mind when
    >> writing
    >> >> your own Rcpp code for plugging it into RcppRoll.
    >> >>
    >> >> But I haven't check what the roll package does and how reliable that
    >> is...
    >> >>
    >> >> hth,
    >> >> Z
    >> >>
    >> >>
    >> >> On Thu, 21 Jul 2016, jeremiah rounds wrote:
    >> >>
    >> >> Hi,
    >> >>>
    >> >>> A not unusual task is performing a multiple regression in a rolling
    >> window
    >> >>> on a time-series.    A standard piece of advice for doing in R is
    >> >>> something
    >> >>> like the code that follows at the end of the email.  I am currently
    >> using
    >> >>> an "embed" variant of that code and that piece of advice is out there
    >> too.
    >> >>>
    >> >>> But, it occurs to me that for such an easily specified matrix operation
    >> >>> standard R code is really slow.   rollapply constantly returns to R
    >> >>> interpreter at each window step for a new lm.   All lm is at its heart
    >> is
    >> >>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in
    >> rolling
    >> >>> window you are just incrementing a counter and peeling off rows (or
    >> >>> columns
    >> >>> of X and y) of a particular window size, and following that up with
    >> some
    >> >>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
    >> >>> practically writes itself and you might want a wrapper of something
    >> like:
    >> >>> rolling_lm (y=y, x=x, width=4).
    >> >>>
    >> >>> My question is this: has any of the thousands of R packages out there
    >> >>> published anything like that.  Rolling window multiple regressions that
    >> >>> stay in C/C++ until the rolling window completes?  No sense and
    >> writing it
    >> >>> if it exist.
    >> >>>
    >> >>>
    >> >>> Thanks,
    >> >>> Jeremiah
    >> >>>
    >> >>> Standard (slow) advice for "rolling window regression" follows:
    >> >>>
    >> >>>
    >> >>> set.seed(1)
    >> >>> z <- zoo(matrix(rnorm(10), ncol = 2))
    >> >>> colnames(z) <- c("y", "x")
    >> >>>
    >> >>> ## rolling regression of width 4
    >> >>> rollapply(z, width = 4,
    >> >>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
    >> >>>   by.column = FALSE, align = "right")
    >> >>>
    >> >>> ## result is identical to
    >> >>> coef(lm(y ~ x, data = z[1:4,]))
    >> >>> coef(lm(y ~ x, data = z[2:5,]))
    >> >>>
    >> >>>         [[alternative HTML version deleted]]
    >> >>>
    >> >>> ______________________________________________
    >> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
    >> >>> PLEASE do read the posting guide
    >> >>> http://www.R-project.org/posting-guide.html
    >> >>> and provide commented, minimal, self-contained, reproducible code.
    >> >>>
    >> >>>
    >> >
    >> >         [[alternative HTML version deleted]]
    >> >
    >> > ______________________________________________
    >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >> > https://stat.ethz.ch/mailman/listinfo/r-help
    >> > PLEASE do read the posting guide
    >> http://www.R-project.org/posting-guide.html
    >> > and provide commented, minimal, self-contained, reproducible code.
    >>
    >>
    >>
    >> --
    >> Statistics & Software Consulting
    >> GKX Group, GKX Associates Inc.
    >> tel: 1-877-GKX-GROUP
    >> email: ggrothendieck at gmail.com
    >>

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.