Names of variables needed in newdata for predict.glm

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Names of variables needed in newdata for predict.glm

Bendix Carstensen
I would like to extract the names, modes [numeric/factor] and levels
of variables needed in a data frame supplied as newdata= argument to
predict.glm()

Here is a small example illustrating my troubles; what I want from
(both of) the glm objects is the vector c("x","f","Y") and an
indication that f is a factor:

library( splines )
dd <- data.frame( D = sample(0:1,200,rep=T),
                  x = abs(rnorm(200)),
                  f = factor(sample(letters[1:4],200,rep=T)),
                  Y = runif(200,0.5,10) )
mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , family=poisson, data=dd)
mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), family=poisson, data=dd)

attr(mx$terms,"dataClasses")
attr(mi$terms,"dataClasses")
mi$xlevels
mx$xlevels

...so far not quite there.

Regards,

Bendix Carstensen

Senior Statistician
Steno Diabetes Center
Clinical Epidemiology
Niels Steensens Vej 2-4
DK-2820 Gentofte, Denmark
[hidden email]
[hidden email]
http://BendixCarstensen.com

________________________________


Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at videresende eller kopiere den.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Names of variables needed in newdata for predict.glm

R help mailing list-2
Hi,

Some try:
 > names(mi$xlevels)
[1] "f"
 > all.vars(mi$formula)
[1] "D" "x" "f" "Y"
 > names(mx$xlevels)
[1] "f"
 > all.vars(mx$formula)
[1] "D" "x" "f"

When offset is indicated out of the formula, it does not work...

Marc

Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :

> I would like to extract the names, modes [numeric/factor] and levels
> of variables needed in a data frame supplied as newdata= argument to
> predict.glm()
>
> Here is a small example illustrating my troubles; what I want from
> (both of) the glm objects is the vector c("x","f","Y") and an
> indication that f is a factor:
>
> library( splines )
> dd <- data.frame( D = sample(0:1,200,rep=T),
>                    x = abs(rnorm(200)),
>                    f = factor(sample(letters[1:4],200,rep=T)),
>                    Y = runif(200,0.5,10) )
> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , family=poisson, data=dd)
> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), family=poisson, data=dd)
>
> attr(mx$terms,"dataClasses")
> attr(mi$terms,"dataClasses")
> mi$xlevels
> mx$xlevels
>
> ...so far not quite there.
>
> Regards,
>
> Bendix Carstensen
>
> Senior Statistician
> Steno Diabetes Center
> Clinical Epidemiology
> Niels Steensens Vej 2-4
> DK-2820 Gentofte, Denmark
> [hidden email]
> [hidden email]
> http://BendixCarstensen.com
>
> ________________________________
>
>
> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at videresende eller kopiere den.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Names of variables needed in newdata for predict.glm

Marc Schwartz-3
Hi Bendix,

If the 'model' argument to glm() is TRUE (the default), you can get the structure of the model frame that was used to fit the model, by using:

> str(mx$data)
'data.frame': 200 obs. of  4 variables:
 $ D: int  0 1 0 1 1 0 1 1 1 1 ...
 $ x: num  0.705 2.15 0.572 1.249 0.807 ...
 $ f: Factor w/ 4 levels "a","b","c","d": 1 4 1 4 4 1 4 2 4 4 ...
 $ Y: num  0.787 8.267 3.085 5.738 9.593 ...


> str(mi$data)
'data.frame': 200 obs. of  4 variables:
 $ D: int  0 1 0 1 1 0 1 1 1 1 ...
 $ x: num  0.705 2.15 0.572 1.249 0.807 ...
 $ f: Factor w/ 4 levels "a","b","c","d": 1 4 1 4 4 1 4 2 4 4 ...
 $ Y: num  0.787 8.267 3.085 5.738 9.593 ...


The first column in the data frame will be the response variable.

In both cases, the offset variable 'Y' is included, whether the offset was part of the formula or specified as a separate argument.

You can then process the results as you need from there, such as:

> sapply(mx$data, class)
        D         x         f         Y
"integer" "numeric"  "factor" "numeric"


Regards,

Marc Schwartz




> On Mar 8, 2018, at 12:26 AM, Marc Girondot via R-help <[hidden email]> wrote:
>
> Hi,
>
> Some try:
> > names(mi$xlevels)
> [1] "f"
> > all.vars(mi$formula)
> [1] "D" "x" "f" "Y"
> > names(mx$xlevels)
> [1] "f"
> > all.vars(mx$formula)
> [1] "D" "x" "f"
>
> When offset is indicated out of the formula, it does not work...
>
> Marc
>
> Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :
>> I would like to extract the names, modes [numeric/factor] and levels
>> of variables needed in a data frame supplied as newdata= argument to
>> predict.glm()
>>
>> Here is a small example illustrating my troubles; what I want from
>> (both of) the glm objects is the vector c("x","f","Y") and an
>> indication that f is a factor:
>>
>> library( splines )
>> dd <- data.frame( D = sample(0:1,200,rep=T),
>>                   x = abs(rnorm(200)),
>>                   f = factor(sample(letters[1:4],200,rep=T)),
>>                   Y = runif(200,0.5,10) )
>> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , family=poisson, data=dd)
>> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), family=poisson, data=dd)
>>
>> attr(mx$terms,"dataClasses")
>> attr(mi$terms,"dataClasses")
>> mi$xlevels
>> mx$xlevels
>>
>> ...so far not quite there.
>>
>> Regards,
>>
>> Bendix Carstensen
>>
>> Senior Statistician
>> Steno Diabetes Center
>> Clinical Epidemiology
>> Niels Steensens Vej 2-4
>> DK-2820 Gentofte, Denmark
>> [hidden email]
>> [hidden email]
>> http://BendixCarstensen.com


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Names of variables needed in newdata for predict.glm

Bendix Carstensen
In reply to this post by R help mailing list-2
all.vars works fine, EXCEPT, it give a bit too much.
I only want the regression variables, but in the following example I also get "k" the variable holding the chosen knots. Any machinery to find only "real" regression variables?
cheers, Bendix

library( splines )
y <- rnorm(100)
x <- rnorm(100)
k <- -1:1
ml <-  lm( y ~ bs(x,knots=k) )
mg <- glm( y ~ bs(x,knots=k) )
all.vars(ml$terms)
all.vars(mg$terms)
all.vars(mg$formula)

________________________________________
Fra: Marc Girondot <[hidden email]>
Sendt: 8. marts 2018 06:26
Til: Bendix Carstensen; [hidden email]
Emne: Re: [R] Names of variables needed in newdata for predict.glm

Hi,

Some try:
 > names(mi$xlevels)
[1] "f"
 > all.vars(mi$formula)
[1] "D" "x" "f" "Y"
 > names(mx$xlevels)
[1] "f"
 > all.vars(mx$formula)
[1] "D" "x" "f"

When offset is indicated out of the formula, it does not work...

Marc

Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :

> I would like to extract the names, modes [numeric/factor] and levels
> of variables needed in a data frame supplied as newdata= argument to
> predict.glm()
>
> Here is a small example illustrating my troubles; what I want from
> (both of) the glm objects is the vector c("x","f","Y") and an
> indication that f is a factor:
>
> library( splines )
> dd <- data.frame( D = sample(0:1,200,rep=T),
>                    x = abs(rnorm(200)),
>                    f = factor(sample(letters[1:4],200,rep=T)),
>                    Y = runif(200,0.5,10) )
> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , family=poisson, data=dd)
> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), family=poisson, data=dd)
>
> attr(mx$terms,"dataClasses")
> attr(mi$terms,"dataClasses")
> mi$xlevels
> mx$xlevels
>
> ...so far not quite there.
>
> Regards,
>
> Bendix Carstensen
>
> Senior Statistician
> Steno Diabetes Center
> Clinical Epidemiology
> Niels Steensens Vej 2-4
> DK-2820 Gentofte, Denmark
> [hidden email]
> [hidden email]
> http://BendixCarstensen.com
>
> ________________________________
>
>
> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at videresende eller kopiere den.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


________________________________


Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at videresende eller kopiere den.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Names of variables needed in newdata for predict.glm

David Winsemius

> On Mar 31, 2018, at 8:48 AM, Bendix Carstensen <[hidden email]> wrote:
>
> all.vars works fine, EXCEPT, it give a bit too much.
> I only want the regression variables, but in the following example I also get "k" the variable holding the chosen knots. Any machinery to find only "real" regression variables?
> cheers, Bendix
>
> library( splines )
> y <- rnorm(100)
> x <- rnorm(100)
> k <- -1:1
> ml <-  lm( y ~ bs(x,knots=k) )
> mg <- glm( y ~ bs(x,knots=k) )
> all.vars(ml$terms)
> all.vars(mg$terms)
> all.vars(mg$formula)

If you allowed a requirement that "real" regression variables have been passed in a data argument, then this might succeed:

> ml <-  lm( y ~ bs(x,knots=k), data=dat )
> all.vars(ml$terms)
[1] "y" "x" "k"
> all.vars(ml$formula)
character(0)
> all.vars(ml$terms)[ all.vars(ml$terms) %in% names(dat)]
[1] "y" "x"

--
David.
>

> ________________________________________
> Fra: Marc Girondot <[hidden email]>
> Sendt: 8. marts 2018 06:26
> Til: Bendix Carstensen; [hidden email]
> Emne: Re: [R] Names of variables needed in newdata for predict.glm
>
> Hi,
>
> Some try:
>> names(mi$xlevels)
> [1] "f"
>> all.vars(mi$formula)
> [1] "D" "x" "f" "Y"
>> names(mx$xlevels)
> [1] "f"
>> all.vars(mx$formula)
> [1] "D" "x" "f"
>
> When offset is indicated out of the formula, it does not work...
>
> Marc
>
> Le 07/03/2018 à 06:20, Bendix Carstensen a écrit :
>> I would like to extract the names, modes [numeric/factor] and levels
>> of variables needed in a data frame supplied as newdata= argument to
>> predict.glm()
>>
>> Here is a small example illustrating my troubles; what I want from
>> (both of) the glm objects is the vector c("x","f","Y") and an
>> indication that f is a factor:
>>
>> library( splines )
>> dd <- data.frame( D = sample(0:1,200,rep=T),
>>                   x = abs(rnorm(200)),
>>                   f = factor(sample(letters[1:4],200,rep=T)),
>>                   Y = runif(200,0.5,10) )
>> mx <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) , offset=log(Y) , family=poisson, data=dd)
>> mi <- glm( D ~ ns(x,knots=1:2,Bo=c(0,5)) + f:I(x^2) + offset(log(Y)), family=poisson, data=dd)
>>
>> attr(mx$terms,"dataClasses")
>> attr(mi$terms,"dataClasses")
>> mi$xlevels
>> mx$xlevels
>>
>> ...so far not quite there.
>>
>> Regards,
>>
>> Bendix Carstensen
>>
>> Senior Statistician
>> Steno Diabetes Center
>> Clinical Epidemiology
>> Niels Steensens Vej 2-4
>> DK-2820 Gentofte, Denmark
>> [hidden email]
>> [hidden email]
>> http://BendixCarstensen.com
>>
>> ________________________________
>>
>>
>> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at videresende eller kopiere den.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> ________________________________
>
>
> Denne e-mail indeholder fortrolig information. Hvis du ikke er den rette modtager af denne e-mail eller hvis du modtager den ved en fejltagelse, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig bedes du slette e-mailen med det samme uden at videresende eller kopiere den.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.