Help : glm p-values for a factor predictor

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Help : glm p-values for a factor predictor

Benoît PELE
Hello,

i am a newby on R and i am trying to make a backward selection on a
binomial-logit glm on a large dataset (69000 lines for 145 predictors).

After 3 days working, the stepAIC function did not terminate. I do not
know if that is normal but i would like to try computing a "homemade"
backward with a repeated glm ; at each step, the predictor with the max
pvalue would be excluded until reaching a set of 20 predictors for
example.

My question is about the factor predictors with several levels. R provides
only the pvalues for each level whereas i need an overall pvalue for
testing the predictor.

On internet, the only solution i found suggests to compute a Khi2
log-likelihood test between the complete model and the model without the
factor predictor to emphasize its relevance.

Do you know other ways? Another R package managing this kind of issue?

Thank you and best regards, Benoit.
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help : glm p-values for a factor predictor

Bob O'Hara-2
It might help if you provided the code you used. It's possible that
you didn't use direction="backward" in stepAIC(). Or if you did, it
was still running, so whatever else you try will still be slow. The
statement "R provides only the pvalues for each level" is wrong: look
at the anova() function.

Bob

On 29 June 2017 at 11:13, Benoît PELE <[hidden email]> wrote:

> Hello,
>
> i am a newby on R and i am trying to make a backward selection on a
> binomial-logit glm on a large dataset (69000 lines for 145 predictors).
>
> After 3 days working, the stepAIC function did not terminate. I do not
> know if that is normal but i would like to try computing a "homemade"
> backward with a repeated glm ; at each step, the predictor with the max
> pvalue would be excluded until reaching a set of 20 predictors for
> example.
>
> My question is about the factor predictors with several levels. R provides
> only the pvalues for each level whereas i need an overall pvalue for
> testing the predictor.
>
> On internet, the only solution i found suggests to compute a Khi2
> log-likelihood test between the complete model and the model without the
> factor predictor to emphasize its relevance.
>
> Do you know other ways? Another R package managing this kind of issue?
>
> Thank you and best regards, Benoit.
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Bob O'Hara
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help : glm p-values for a factor predictor

Benoît PELE
Thank you for your answer.

The used code is the next one :

champ_model<-c("y","categ_juridique","Indic_CTRLAUTRE_RPOS","Indic_CTRLAUTRE_RNEG","Indic_CTRLCCA_RPOS",
 
"Indic_CTRLCCA_RNEG","Indic_CTRLCPAP_RPOS","Indic_CTRLCPAP_RNEG","Indic_CTRLLCTI_RPOS",
 
"Indic_Changement_NomLogiciel","Indic_Changement_NomEditeur","Changt_NomEditeurPaie",
 
"Changt_NomLogicielPaie","Infoabs_NomEditeurPaie","Infoabs_NomLogicielPaie",
 
"Indic_Decla_comple","Indic_Decla_AnnuRempl","class_ape","class_Logiciel","class_Editeur",
 
"moda_delai_soldeN_1","moda_delai_soldeN_2","moda_delai_soldeN_3","moda_delai_soldeN_4",
              "moda_delai_soldeN_5",
 
"moda_anciennete_debitN_1","moda_anciennete_debitN_2","moda_anciennete_debitN_3",
              "moda_anciennete_debitN_4","moda_anciennete_debitN_5",
              "moda_moy_anciennete_debit","moda_std_anciennete_debit",
              "moda_moy_delai_solde","moda_std_delai_solde",
 
var_cluster_Arome,var_cluster_BRC,var_cluster_Cedre,var_cluster_cntx2,var_cluster_ctrl,
 
var_cluster_DADS_assiette2,var_cluster_DADS_avantage2,var_cluster_DADS_contrat2,
              var_cluster_DADS_salarie2,var_cluster_Sequoia)

--> The predictors between quotes (excepted y) are qualitative ; others
are groups of continuous predictors

Var_model<-paste0("y ~ ", paste(champ_model_cont[-1],collapse=" + "))
Logit_appr<-glm(formula=Var_model,family=binomial(link="logit"),data=pop_ctrl_siren_cca2017_appr)

--> The results of this glm do not provide overall pvalues for the
qualitative predictors, only one pvalue by modality. And for selecting the
qualitative predictors, i need that overall pvalue that SAS for example
provides with PROC LOGISTIC.

Benoit Pel�.




De :    "Bob O'Hara" <[hidden email]>
A :     Beno�t PELE <[hidden email]>,
Cc :    r-help <[hidden email]>
Date :  29/06/2017 11:46
Objet : Re: [R] Help : glm p-values for a factor predictor



It might help if you provided the code you used. It's possible that
you didn't use direction="backward" in stepAIC(). Or if you did, it
was still running, so whatever else you try will still be slow. The
statement "R provides only the pvalues for each level" is wrong: look
at the anova() function.

Bob

On 29 June 2017 at 11:13, Beno�t PELE <[hidden email]> wrote:

> Hello,
>
> i am a newby on R and i am trying to make a backward selection on a
> binomial-logit glm on a large dataset (69000 lines for 145 predictors).
>
> After 3 days working, the stepAIC function did not terminate. I do not
> know if that is normal but i would like to try computing a "homemade"
> backward with a repeated glm ; at each step, the predictor with the max
> pvalue would be excluded until reaching a set of 20 predictors for
> example.
>
> My question is about the factor predictors with several levels. R
provides

> only the pvalues for each level whereas i need an overall pvalue for
> testing the predictor.
>
> On internet, the only solution i found suggests to compute a Khi2
> log-likelihood test between the complete model and the model without the
> factor predictor to emphasize its relevance.
>
> Do you know other ways? Another R package managing this kind of issue?
>
> Thank you and best regards, Benoit.
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Bob O'Hara
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org


        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help : glm p-values for a factor predictor

Michael Friendly
In reply to this post by Benoît PELE
On 6/29/17 11:13 AM, Benoît PELE wrote:
> My question is about the factor predictors with several levels. R provides
> only the pvalues for each level whereas i need an overall pvalue for
> testing the predictor.

What you ask is provided by anova() -- type I tests, and car::Anova() --
Type II & III tests.

Factors in stepwise methods must be handled specially, to allow all
levels to be included/excluded together.  I don't know of R software
that does this.

HTH

-Michael

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help : glm p-values for a factor predictor

Fox, John
In reply to this post by Benoît PELE
Hi Michael,

> -----Original Message-----
> From: R-help [mailto:[hidden email]] On Behalf Of Michael
> Friendly
> Sent: Thursday, June 29, 2017 9:04 AM
> To: Benoît PELE <[hidden email]>; [hidden email]
> Subject: Re: [R] Help : glm p-values for a factor predictor
>
> On 6/29/17 11:13 AM, Benoît PELE wrote:
> > My question is about the factor predictors with several levels. R
> > provides only the pvalues for each level whereas i need an overall
> > pvalue for testing the predictor.
>
> What you ask is provided by anova() -- type I tests, and car::Anova() -- Type II
> & III tests.
>
> Factors in stepwise methods must be handled specially, to allow all levels to
> be included/excluded together.  I don't know of R software that does this.

The step() function and stepAIC() in MASS both keep terms together and obey marginality.

Best,
 John

>
> HTH
>
> -Michael
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.