predict.lm if regression vector is longer than predicton vector

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

predict.lm if regression vector is longer than predicton vector

frauke
Hi everybody,

recently a member of the community pointed me to the useful predict.lm() comment. While I was toying with it, I stumbled across the following problem.
I do the regression with data from five years. But I want to do a prediction with predict.lm for only one year. Thus my dataframe for predict.lm(mod, newdata=dataframe) is shorter than the orginial vector that I did the regression with. It gives you the following error:
Warning message:
'newdata' had 365 rows but variable(s) found have 1825 rows
Of course I can extend the new dataframe with a few thousands NAs, but is there a more elegant solution?

Thank you! Frauke
Reply | Threaded
Open this post in threaded view
|

Re: predict.lm if regression vector is longer than predicton vector

Stephen Ellison
> Of course I can extend the new dataframe with a few thousands
> NAs, but is there a more elegant solution?
That should not be necessary: predict.lm should work on any number of newdata rows, whether longer or shorter than the original data set.

However, the help page for predict.lm says (among other things)

    "If the fit is rank-deficient, some of the columns of the design
     matrix will have been dropped.  Prediction from such a fit only
     makes sense if 'newdata' is contained in the same subspace as the
     original data.  That cannot be checked accurately, so a warning is
     issued."

Could that be the situation you are in? If it is, it's not the new data that causes the problem, but the original fit.

S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: predict.lm if regression vector is longer than predicton vector

William Dunlap
In reply to this post by frauke
This can happen if your newdata data.frame does not include
all the predictors required by the formula in the model.  In that
case predict will look in the current evaluation environment to
find the missing predictors, and those will generally not match
what is in your newdata.   E.g.,

> x1 <- 1:6
> x2 <- 1/(1:6)
> y <- log(1:6)
> fit <- lm(y ~ x1 + x2)
> predict(fit)
           1            2            3            4            5            6
-0.008176128  0.725397589  1.089747865  1.361792281  1.596914353  1.813575253
> predict(fit, newdata=data.frame(x2=1:5)) # didn't supply x1
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
  variable lengths differ (found for 'x2')
In addition: Warning message:
'newdata' had 5 rows but variable(s) found have 6 rows

Put all the required variables into newdata and things are fine
> predict(fit, newdata=data.frame(x2=1:5, x1=sin(1:5)))
         1          2          3          4          5
-0.0366699 -1.1321492 -2.3778906 -3.6469522 -4.7909516

You can also get this problem if newdata is an environment or list
instead of a data.frame, because only data.frame forces all of
its components to have the same length.


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of frauke
> Sent: Wednesday, October 03, 2012 7:37 AM
> To: [hidden email]
> Subject: [R] predict.lm if regression vector is longer than predicton vector
>
> Hi everybody,
>
> recently a member of the community pointed me to the useful predict.lm()
> comment. While I was toying with it, I stumbled across the following
> problem.
> I do the regression with data from five years. But I want to do a prediction
> with predict.lm for only one year. Thus my dataframe for predict.lm(mod,
> newdata=dataframe) is shorter than the orginial vector that I did the
> regression with. It gives you the following error:
> Warning message:
> 'newdata' had 365 rows but variable(s) found have 1825 rows
> Of course I can extend the new dataframe with a few thousands NAs, but is
> there a more elegant solution?
>
> Thank you! Frauke
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression-
> vector-is-longer-than-predicton-vector-tp4644881.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: predict.lm if regression vector is longer than predicton vector

glsnow
In reply to this post by frauke
The most common case that I see that error is when someone fits their
model using syntax like:

fit <- lm( mydata$y ~ mydata$x )

instead of the preferred method:

fit <- lm( y ~ x, data=mydata )

The fix (if this is what you did and why you are getting the error) is
to not use the first way and instead use the second, preferred way.

On Wed, Oct 3, 2012 at 8:37 AM, frauke <[hidden email]> wrote:

> Hi everybody,
>
> recently a member of the community pointed me to the useful predict.lm()
> comment. While I was toying with it, I stumbled across the following
> problem.
> I do the regression with data from five years. But I want to do a prediction
> with predict.lm for only one year. Thus my dataframe for predict.lm(mod,
> newdata=dataframe) is shorter than the orginial vector that I did the
> regression with. It gives you the following error:
> Warning message:
> 'newdata' had 365 rows but variable(s) found have 1825 rows
> Of course I can extend the new dataframe with a few thousands NAs, but is
> there a more elegant solution?
>
> Thank you! Frauke
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression-vector-is-longer-than-predicton-vector-tp4644881.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.