Error in lm prediction

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Error in lm prediction

amol gupta
Hi

I am most likely committing an error in trying to predict  using linear
regression lm model. please help me figure out what am I doing wrong. I am
trying to regress a index and its constituents. Here is the code


#split ts inttwo parts
a<-300;

x1<-x[1:a,];
y1<-y[1:a,];

x2<-x[(a+1):nrow(x),];
y2<-y[(a+1):nrow(y),];


#regression
m1<-lm( y1~x1)
r1<-residuals(m1)
coef(m1)

##out of sample
y_hat<-predict.lm(m1,x2);
r2<-y_hat-y2;


x,y are xts. X contains multiple time series. The y_ hat turns out to be of
300 samples only, whereas x2 contains 1400 samples.

Please help me figure out how to predict using model that I have found
using regression.


--
Regards
Amol
+91-9897860992
+91-8889676918

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Error in lm prediction

Ed Herranz
Hi Amol,

My guess is that you can't use lm() directly on xts objects.  See this:

https://stackoverflow.com/questions/21692560/linear-regression-with-xts-object

Regards,
-Ed

On Sun, Jul 16, 2017 at 4:31 PM, amol gupta <[hidden email]> wrote:

> Hi
>
> I am most likely committing an error in trying to predict  using linear
> regression lm model. please help me figure out what am I doing wrong. I am
> trying to regress a index and its constituents. Here is the code
>
>
> #split ts inttwo parts
> a<-300;
>
> x1<-x[1:a,];
> y1<-y[1:a,];
>
> x2<-x[(a+1):nrow(x),];
> y2<-y[(a+1):nrow(y),];
>
>
> #regression
> m1<-lm( y1~x1)
> r1<-residuals(m1)
> coef(m1)
>
> ##out of sample
> y_hat<-predict.lm(m1,x2);
> r2<-y_hat-y2;
>
>
> x,y are xts. X contains multiple time series. The y_ hat turns out to be of
> 300 samples only, whereas x2 contains 1400 samples.
>
> Please help me figure out how to predict using model that I have found
> using regression.
>
>
> --
> Regards
> Amol
> +91-9897860992
> +91-8889676918
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Error in lm prediction

Joshua Ulrich
On Mon, Jul 24, 2017 at 1:10 PM, Ed Herranz <[hidden email]> wrote:
> Hi Amol,
>
> My guess is that you can't use lm() directly on xts objects.  See this:
>
> https://stackoverflow.com/questions/21692560/linear-regression-with-xts-object
>
Bad guess. :)

library(xts)
data(sample_matrix)
xtsObject <- as.xts(sample_matrix)
xtsObject$t <- seq_len(nrow(xtsObject))-1
lm(Open ~ t, data=xtsObject)

> Regards,
> -Ed
>
> On Sun, Jul 16, 2017 at 4:31 PM, amol gupta <[hidden email]> wrote:
>
>> Hi
>>
>> I am most likely committing an error in trying to predict  using linear
>> regression lm model. please help me figure out what am I doing wrong. I am
>> trying to regress a index and its constituents. Here is the code
>>
>>
>> #split ts inttwo parts
>> a<-300;
>>
>> x1<-x[1:a,];
>> y1<-y[1:a,];
>>
>> x2<-x[(a+1):nrow(x),];
>> y2<-y[(a+1):nrow(y),];
>>
>>
>> #regression
>> m1<-lm( y1~x1)
>> r1<-residuals(m1)
>> coef(m1)
>>
>> ##out of sample
>> y_hat<-predict.lm(m1,x2);
>> r2<-y_hat-y2;
>>
>>
>> x,y are xts. X contains multiple time series. The y_ hat turns out to be of
>> 300 samples only, whereas x2 contains 1400 samples.
>>
>> Please help me figure out how to predict using model that I have found
>> using regression.
>>
It's very difficult to help if you do not provide a reproducible
example.  Most people do not have, and will not spend, the time it
takes to imagine and create data required to reproduce the issue you
describe.

Please see: https://stackoverflow.com/q/5963269/271616

>>
>> --
>> Regards
>> Amol
>> +91-9897860992
>> +91-8889676918
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com
R/Finance 2017 | www.rinfinance.com

On Mon, Jul 24, 2017 at 1:10 PM, Ed Herranz <[hidden email]> wrote:

> Hi Amol,
>
> My guess is that you can't use lm() directly on xts objects.  See this:
>
> https://stackoverflow.com/questions/21692560/linear-regression-with-xts-object
>
> Regards,
> -Ed
>
> On Sun, Jul 16, 2017 at 4:31 PM, amol gupta <[hidden email]> wrote:
>
>> Hi
>>
>> I am most likely committing an error in trying to predict  using linear
>> regression lm model. please help me figure out what am I doing wrong. I am
>> trying to regress a index and its constituents. Here is the code
>>
>>
>> #split ts inttwo parts
>> a<-300;
>>
>> x1<-x[1:a,];
>> y1<-y[1:a,];
>>
>> x2<-x[(a+1):nrow(x),];
>> y2<-y[(a+1):nrow(y),];
>>
>>
>> #regression
>> m1<-lm( y1~x1)
>> r1<-residuals(m1)
>> coef(m1)
>>
>> ##out of sample
>> y_hat<-predict.lm(m1,x2);
>> r2<-y_hat-y2;
>>
>>
>> x,y are xts. X contains multiple time series. The y_ hat turns out to be of
>> 300 samples only, whereas x2 contains 1400 samples.
>>
>> Please help me figure out how to predict using model that I have found
>> using regression.
>>
>>
>> --
>> Regards
>> Amol
>> +91-9897860992
>> +91-8889676918
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.



--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com
R/Finance 2017 | www.rinfinance.com

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Error in lm prediction

Kevin Dhingra
In reply to this post by amol gupta
Hi Amol,

The lm function is not intended to be used in the way you are calling it.
Even though you can actually pass y and x as actual data in the formula
argument (y~x), its better to pass the data set in the data argument and
use column names in the formula argument especially when you want to use
the predict function on the fitted object as predict.lm looks for variables
in the function environment. In your example, newdata and those variables
would not have similar length that results in length of y_hat equal to 300.

Now there might be some clever way to get around this with the same
function call that you used (you can try playing with the variable name of
new data to be same as column names in x) but I would rather suggest using
this -

a<-300
data_fit = data.frame(x = matrix(rnorm(1700*5), ncol = 5), y =
matrix(rnorm(1700)))
data_fit_is = data_fit[1:a,] #In Sample
data_fit_os = data_fit[(a+1):nrow(data_fit), ] #Out of Sample
m1 = lm(y~., data = data_fit_is)
length(predict(m1, data_fit_os[, 1:5])) #Should be equal to 1400 now and
300 now

Regards,
Kshitij Dhingra

On Sun, Jul 16, 2017 at 4:31 PM, amol gupta <[hidden email]> wrote:

> Hi
>
> I am most likely committing an error in trying to predict  using linear
> regression lm model. please help me figure out what am I doing wrong. I am
> trying to regress a index and its constituents. Here is the code
>
>
> #split ts inttwo parts
> a<-300;
>
> x1<-x[1:a,];
> y1<-y[1:a,];
>
> x2<-x[(a+1):nrow(x),];
> y2<-y[(a+1):nrow(y),];
>
>
> #regression
> m1<-lm( y1~x1)
> r1<-residuals(m1)
> coef(m1)
>
> ##out of sample
> y_hat<-predict.lm(m1,x2);
> r2<-y_hat-y2;
>
>
> x,y are xts. X contains multiple time series. The y_ hat turns out to be of
> 300 samples only, whereas x2 contains 1400 samples.
>
> Please help me figure out how to predict using model that I have found
> using regression.
>
>
> --
> Regards
> Amol
> +91-9897860992
> +91-8889676918
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>



--
Kshitij Dhingra
Applied Academics LLC
Office: +1.917.262.0516
Mobile: +1.206.696.5945
Email: [hidden email]
Website: http://www.AppliedAcademics.com

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Error in lm prediction

amol gupta
All

Thank you all for the response. I could resolve the issue by separating
formula and data. That is where I was committing error.

Joshua Ulrich
 I will try to send data set along  or ensure that the problem is
reproducible.



On Fri, Jul 28, 2017 at 7:28 PM, Kevin Dhingra <
[hidden email]> wrote:

> Hi Amol,
>
> The lm function is not intended to be used in the way you are calling it.
> Even though you can actually pass y and x as actual data in the formula
> argument (y~x), its better to pass the data set in the data argument and
> use column names in the formula argument especially when you want to use
> the predict function on the fitted object as predict.lm looks for variables
> in the function environment. In your example, newdata and those variables
> would not have similar length that results in length of y_hat equal to 300.
>
> Now there might be some clever way to get around this with the same
> function call that you used (you can try playing with the variable name of
> new data to be same as column names in x) but I would rather suggest using
> this -
>
> a<-300
> data_fit = data.frame(x = matrix(rnorm(1700*5), ncol = 5), y =
> matrix(rnorm(1700)))
> data_fit_is = data_fit[1:a,] #In Sample
> data_fit_os = data_fit[(a+1):nrow(data_fit), ] #Out of Sample
> m1 = lm(y~., data = data_fit_is)
> length(predict(m1, data_fit_os[, 1:5])) #Should be equal to 1400 now and
> 300 now
>
> Regards,
> Kshitij Dhingra
>
> On Sun, Jul 16, 2017 at 4:31 PM, amol gupta <[hidden email]> wrote:
>
>> Hi
>>
>> I am most likely committing an error in trying to predict  using linear
>> regression lm model. please help me figure out what am I doing wrong. I am
>> trying to regress a index and its constituents. Here is the code
>>
>>
>> #split ts inttwo parts
>> a<-300;
>>
>> x1<-x[1:a,];
>> y1<-y[1:a,];
>>
>> x2<-x[(a+1):nrow(x),];
>> y2<-y[(a+1):nrow(y),];
>>
>>
>> #regression
>> m1<-lm( y1~x1)
>> r1<-residuals(m1)
>> coef(m1)
>>
>> ##out of sample
>> y_hat<-predict.lm(m1,x2);
>> r2<-y_hat-y2;
>>
>>
>> x,y are xts. X contains multiple time series. The y_ hat turns out to be
>> of
>> 300 samples only, whereas x2 contains 1400 samples.
>>
>> Please help me figure out how to predict using model that I have found
>> using regression.
>>
>>
>> --
>> Regards
>> Amol
>> +91-9897860992
>> +91-8889676918
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
>
>
> --
> Kshitij Dhingra
> Applied Academics LLC
> Office: +1.917.262.0516
> Mobile: +1.206.696.5945
> Email: [hidden email]
> Website: http://www.AppliedAcademics.com
>



--
Regards
Amol
+91-9897860992
+91-8889676918

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.