repeating regression

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

repeating regression

Robert A'gata
Hi,

I think my problem is a bit mundane but it's quite intriguing. Imagine
I have a matrix of 10 by 2 million. The first 5 columns are x and the
last 5 are y values. I have to regress y on x (assume 0 intercept) for
each row to observe time series of the slope. I am wondering if there
is any way to speed this calculation up? I tried with apply. But it is
still slow. Is there any trick I should know? Thank you.

Robert

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: repeating regression

Michael Weylandt
You can probably specify the problem with model.matrix and use lm.fit
directly, but what's probably even better is to remember that the
slope can be calculated as correlation * std_y / std_x for this simple
case of one independent variable and implement directly . E.g.,
something like

apply(Data, 1, function(x) std(x[6:10])/std(x[1:5]) * cor(x[1:5],x[6:10]))

You can do this even faster by taking x and y to big vectors, taking a
rolling std and cor with length 5, and sampling each 5 steps as well,
but it's after midnight and I'm doing a disastrous OJ project (don't
ask....) for school so I can't really think right now.

Michael

On Mon, Nov 21, 2011 at 10:52 PM, Robert A'gata <[hidden email]> wrote:

> Hi,
>
> I think my problem is a bit mundane but it's quite intriguing. Imagine
> I have a matrix of 10 by 2 million. The first 5 columns are x and the
> last 5 are y values. I have to regress y on x (assume 0 intercept) for
> each row to observe time series of the slope. I am wondering if there
> is any way to speed this calculation up? I tried with apply. But it is
> still slow. Is there any trick I should know? Thank you.
>
> Robert
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: repeating regression

gsee
In reply to this post by Robert A'gata
You may be interested in the fastLM function from the RcppArmadillo package

On Mon, Nov 21, 2011 at 9:52 PM, Robert A'gata <[hidden email]> wrote:

> Hi,
>
> I think my problem is a bit mundane but it's quite intriguing. Imagine
> I have a matrix of 10 by 2 million. The first 5 columns are x and the
> last 5 are y values. I have to regress y on x (assume 0 intercept) for
> each row to observe time series of the slope. I am wondering if there
> is any way to speed this calculation up? I tried with apply. But it is
> still slow. Is there any trick I should know? Thank you.
>
> Robert
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Using Grammatical Evolution to generate trading rules

Immanuel-2
Hallo all,

I recently read the book "biologically inspired algorithms for financial
modeling" and got hooked
by the idea to use Grammatical Evolution to generate trading rules (
examples are presented in the book ).

I decided to use GEVA (http://ncra.ucd.ie/Site/GEVA.html) as grammatical
evolution engine, and to evaluate the
generated trading rules in R. I somehow managed to hack together a proof
of concept. A quick and dirty overview
can be fount at: http://www.slideshare.net/my_slides_/overview-10277187

I'm now playing with the idea to put some more effort in a better setup.
With the goal to make the generation and testing
of serious trading rules feasible.

Any suggestions on how to approach this? Anyone interested in
collaborating on this project?

best regards,
Immanuel

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: repeating regression

Richard Herron
In reply to this post by gsee
The beta = cov(x, y) / var(x) proposed assumes an intercept, but it
sounds like you want to run the regression through the origin, which
is beta = summation x*y / summation x*x. Also, the summation gets you
quite a speed boost. Below is some code.

Although I imagine that with only five observations you won't be able
to statistically differentiate between the betas. I suggest rolling
regressions using differenced cumulative sums to create the cov(x, y)
and var(x) terms. HTH.

mat_x <- matrix(5 + rnorm(5*2e4), ncol = 5)
mat_epsilon <- matrix(rnorm(5*2e4, mean = 0, sd = 0.1), ncol = 5)
mat_y <- 5 + 5*mat_x + mat_epsilon
mat_xy <- cbind(mat_x, mat_y)

# doing the regression with cov/var assumes an intercept
fun_beta_cov <- function(x) {
    cov(x[1:5], x[6:10]) / var(x[1:5])
}
system.time({
    beta_1 <- apply(mat_xy, 1, FUN = fun_beta_cov)
})

# doing the regression with summations
system.time({
    beta_2 <- rowSums(mat_xy[, 1:5]*mat_xy[, 6:10])/rowSums(mat_xy[,
1:5]*mat_xy[, 1:5])
})

# doing the regression with `lm` without intercept
fun_beta_lm <- function(x) {
    lm(x[6:10] ~ x[1:5] - 1)$coefficients[1]
}
system.time({
    beta_3 <- apply(mat_xy, 1, FUN = fun_beta_lm)
})

# doing the regression with `lm` with intercept
fun_beta_lm_int <- function(x) {
    lm(x[6:10] ~ x[1:5])$coefficients[2]
}
system.time({
    beta_3_int <- apply(mat_xy, 1, FUN = fun_beta_lm_int)
})

# results
head(beta_1)
head(beta_2)
head(beta_3)
head(beta_3_int)

On Tue, Nov 22, 2011 at 10:34, G See <[hidden email]> wrote:

> You may be interested in the fastLM function from the RcppArmadillo package
>
> On Mon, Nov 21, 2011 at 9:52 PM, Robert A'gata <[hidden email]> wrote:
>
>> Hi,
>>
>> I think my problem is a bit mundane but it's quite intriguing. Imagine
>> I have a matrix of 10 by 2 million. The first 5 columns are x and the
>> last 5 are y values. I have to regress y on x (assume 0 intercept) for
>> each row to observe time series of the slope. I am wondering if there
>> is any way to speed this calculation up? I tried with apply. But it is
>> still slow. Is there any trick I should know? Thank you.
>>
>> Robert
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.