The beta = cov(x, y) / var(x) proposed assumes an intercept, but it

sounds like you want to run the regression through the origin, which

is beta = summation x*y / summation x*x. Also, the summation gets you

quite a speed boost. Below is some code.

Although I imagine that with only five observations you won't be able

to statistically differentiate between the betas. I suggest rolling

regressions using differenced cumulative sums to create the cov(x, y)

and var(x) terms. HTH.

mat_x <- matrix(5 + rnorm(5*2e4), ncol = 5)

mat_epsilon <- matrix(rnorm(5*2e4, mean = 0, sd = 0.1), ncol = 5)

mat_y <- 5 + 5*mat_x + mat_epsilon

mat_xy <- cbind(mat_x, mat_y)

# doing the regression with cov/var assumes an intercept

fun_beta_cov <- function(x) {

cov(x[1:5], x[6:10]) / var(x[1:5])

}

system.time({

beta_1 <- apply(mat_xy, 1, FUN = fun_beta_cov)

})

# doing the regression with summations

system.time({

beta_2 <- rowSums(mat_xy[, 1:5]*mat_xy[, 6:10])/rowSums(mat_xy[,

1:5]*mat_xy[, 1:5])

})

# doing the regression with `lm` without intercept

fun_beta_lm <- function(x) {

lm(x[6:10] ~ x[1:5] - 1)$coefficients[1]

}

system.time({

beta_3 <- apply(mat_xy, 1, FUN = fun_beta_lm)

})

# doing the regression with `lm` with intercept

fun_beta_lm_int <- function(x) {

lm(x[6:10] ~ x[1:5])$coefficients[2]

}

system.time({

beta_3_int <- apply(mat_xy, 1, FUN = fun_beta_lm_int)

})

# results

head(beta_1)

head(beta_2)

head(beta_3)

head(beta_3_int)

You may be interested in the fastLM function from the RcppArmadillo package

>

>> I think my problem is a bit mundane but it's quite intriguing. Imagine

>> I have a matrix of 10 by 2 million. The first 5 columns are x and the

>> last 5 are y values. I have to regress y on x (assume 0 intercept) for

>> each row to observe time series of the slope. I am wondering if there

>> is any way to speed this calculation up? I tried with apply. But it is

>> still slow. Is there any trick I should know? Thank you.

>>

>>

