# repeating regression

5 messages
Open this post in threaded view
|

## repeating regression

 Hi, I think my problem is a bit mundane but it's quite intriguing. Imagine I have a matrix of 10 by 2 million. The first 5 columns are x and the last 5 are y values. I have to regress y on x (assume 0 intercept) for each row to observe time series of the slope. I am wondering if there is any way to speed this calculation up? I tried with apply. But it is still slow. Is there any trick I should know? Thank you. Robert _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Open this post in threaded view
|

## Re: repeating regression

 You can probably specify the problem with model.matrix and use lm.fit directly, but what's probably even better is to remember that the slope can be calculated as correlation * std_y / std_x for this simple case of one independent variable and implement directly . E.g., something like apply(Data, 1, function(x) std(x[6:10])/std(x[1:5]) * cor(x[1:5],x[6:10])) You can do this even faster by taking x and y to big vectors, taking a rolling std and cor with length 5, and sampling each 5 steps as well, but it's after midnight and I'm doing a disastrous OJ project (don't ask....) for school so I can't really think right now. Michael On Mon, Nov 21, 2011 at 10:52 PM, Robert A'gata <[hidden email]> wrote: > Hi, > > I think my problem is a bit mundane but it's quite intriguing. Imagine > I have a matrix of 10 by 2 million. The first 5 columns are x and the > last 5 are y values. I have to regress y on x (assume 0 intercept) for > each row to observe time series of the slope. I am wondering if there > is any way to speed this calculation up? I tried with apply. But it is > still slow. Is there any trick I should know? Thank you. > > Robert > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance> -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Open this post in threaded view
|

## Re: repeating regression

 In reply to this post by Robert A'gata You may be interested in the fastLM function from the RcppArmadillo package On Mon, Nov 21, 2011 at 9:52 PM, Robert A'gata <[hidden email]> wrote: > Hi, > > I think my problem is a bit mundane but it's quite intriguing. Imagine > I have a matrix of 10 by 2 million. The first 5 columns are x and the > last 5 are y values. I have to regress y on x (assume 0 intercept) for > each row to observe time series of the slope. I am wondering if there > is any way to speed this calculation up? I tried with apply. But it is > still slow. Is there any trick I should know? Thank you. > > Robert > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance> -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. >         [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance-- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.