glm predict on new data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

glm predict on new data

dirknbr
I am aware this has been asked before but I could not find a resolution.

I am doing a logit

lg <- glm(y[1:200] ~ x[1:200,1],family=binomial)

Then I want to predict a new set

pred <- predict(lg,x[201:250,1],type="response")

But I get varying error messages or warnings about the different number of rows. I  have tried data/newdata and also to wrap in data.frame() but cannot get to work.

Help would be appreciated.

Dirk.
Reply | Threaded
Open this post in threaded view
|

Re: glm predict on new data

"Dénes TÓTH"

Dear Dirk,

You should avoid indexing in the glm call so that the name of the terms
will not contain the indexing part. (Check str(lg) in your example.)
A more preferred solution uses predefined data frames in the original calls:
n <- 250
x <- rnorm(n)
noise <- rnorm(n,0,0.3)
y <- round(exp(x+noise)/(1+exp(x+noise)),digits=0)
datfr <- data.frame(x=x,y=y)
lg <- glm(y~x,data=datfr[1:200,],family="binomial")
pred <- predict(lg,newdata=datfr[201:n,],type="response")

HTH,
  Denes



> I am aware this has been asked before but I could not find a resolution.
>
> I am doing a logit
>
> lg <- glm(y[1:200] ~ x[1:200,1],family=binomial)
>
> Then I want to predict a new set
>
> pred <- predict(lg,x[201:250,1],type="response")
>
> But I get varying error messages or warnings about the different number of
> rows. I  have tried data/newdata and also to wrap in data.frame() but
> cannot
> get to work.
>
> Help would be appreciated.
>
> Dirk.
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/glm-predict-on-new-data-tp3431855p3431855.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glm predict on new data

Brian Diggs
In reply to this post by dirknbr
On 4/6/2011 2:17 PM, dirknbr wrote:
> I am aware this has been asked before but I could not find a resolution.
>
> I am doing a logit
>
> lg<- glm(y[1:200] ~ x[1:200,1],family=binomial)

glm (and most modeling functions) are designed to work with data frames,
not raw vectors.

> Then I want to predict a new set
>
> pred<- predict(lg,x[201:250,1],type="response")
>
> But I get varying error messages or warnings about the different number of
> rows. I  have tried data/newdata and also to wrap in data.frame() but cannot
> get to work.

I'll made up some data, show the way you approached it, show where it
went wrong, and then how it works more easily.

# data like what I think you had:
y <- rbinom(200, 1, prob=.8)
x <- data.frame(x=rnorm(250))

# your glm call:
lg <- glm(y[1:200]~x[1:200,1],family=binomial)

# take a look at print(lg).  Notice that your independent variable
# name is "x[1:200, 1]", which is what you would need to match in
# a call to predict.

# Make data.frames of the given and testing data.
DF <- data.frame(y=y, x=x[1:200,1])
DF.new <- data.frame(x=x[200:250,1])
# Notice DF.new has the same name (x) as DF.

lg <- glm(y~x, data=DF, family=binomial)
pred <- predict(lg, newdata=DF.new, type="response")
summary(pred)

> Help would be appreciated.
>
> Dirk.

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.