Is it generally known/has it been previously discussed here that the $aic() component in GLM-family objects (e.g. results of binomial(), poisson(), etc.) does not as implemented actually return the AIC, but rather -2*log-likelihood + 2*(model_has_scale_parameter) ? Can anyone in this forum gauge how a documentation patch would be received? This behaviour does not seem to be documented in ?family (or anywhere else I can find), which says: aic: function giving the AIC value if appropriate (but ‘NA’ for the quasi- families). See ‘logLik’ for the assumptions made about the dispersion parameter. For a demonstration that e.g. binomial()$aic() is really -2*log L and not the AIC, see: https://github.com/wch/r-source/blob/trunk/src/library/stats/R/family.R#L317 This document <https://github.com/lme4/lme4/blob/master/misc/notes/deviance.rmd> explicates the details a bit more ('L' denotes log-likelihood): * family()$aic computes $-2L$, which glm.fit translates to an AIC by adding $2k$ and storing it in model$aic * logLik.default retrieves model$aic and converts it back to a log-likelihood * stats:::AIC.default retrieves the log-likelihood and converts it back to an AIC (!) * family()$dev.resid() computes the squared deviance residuals * stats:::residuals.glm retrieves these values and takes the signed square root cheers Ben Bolker ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
>>>>> Ben Bolker
>>>>> on Sun, 3 Jun 2018 17:33:18 -0400 writes: > Is it generally known/has it been previously discussed here that the > $aic() component in GLM-family objects (e.g. results of binomial(), > poisson(), etc.) does not as implemented actually return the AIC, but > rather -2*log-likelihood + 2*(model_has_scale_parameter) ? This rings a faint bell from the last millennium with me, and the following "fortune" may contain the answer implicitly : ------------------------------------------------------------ > if(!require("fortunes")) install.packages("fortunes") > fortune("bug compatib") For quite a while, bug-for-bug compatibility with S-PLUS v 3.x was considered important to allow people to port their packages between systems. -- Peter Dalgaard R-help (February 2009) > ------------------------------------------------------------ Ideally, readers who still have access to a version of S-PLUS / S+ or who have read and internalized or (even co-written !) "The white book", notably Ch.6, may be able to shed a historic light on this. I note that the white book's Appendix B with function help pages, has a page ?family.object, accessible here https://sites.oxy.edu/lengyel/M150/Sueselbeck/helpfiles/family.object.html which does *not* mention a <fam>$dev.resid() component, but instead allows to use <fam>$residuals(*, residuals=TRUE) get the " vector of deviance residual, whose weighted sum of squares is the deviance " Given the above, and also the ?glm entry || Author(s): || || The original R implementation of ‘glm’ was written by Simon Davies || working for Ross Ihaka at the University of Auckland, but has || since been extensively re-written by members of the R Core team. || || The design was inspired by the S function of the same name || described in Hastie & Pregibon (1992). actually suggest that it may be hard nowadays to find the original "design specs" that Simon and Ross had used at the time, and also that they only were _inspired_ by the white book chapter 6 (= Hastie & Pregibon (1992)). > Can anyone in this forum gauge how a documentation patch > would be received? It depends on further answers to your questions (i.e, this thread), but I'd currently say "gratefully". I'd expect it would be a patch mainly to src/library/stats/man/family.Rd Note that help(AIC) has a non-small 'Details' section, but indeed it does not mention the family(*)$aic function. > This behaviour does not seem to be documented in ?family (or anywhere > else I can find), which says: > aic: function giving the AIC value if appropriate (but ‘NA’ for > the quasi- families). See ‘logLik’ for the assumptions made > about the dispersion parameter. > For a demonstration that e.g. binomial()$aic() is really -2*log L and > not the AIC, see: > https://github.com/wch/r-source/blob/trunk/src/library/stats/R/family.R#L317 > This document > <https://github.com/lme4/lme4/blob/master/misc/notes/deviance.rmd> > explicates the details a bit more ('L' denotes log-likelihood): > * family()$aic computes $-2L$, which glm.fit translates to an AIC by > adding $2k$ and storing it in model$aic > * logLik.default retrieves model$aic and converts it back to a > log-likelihood > * stats:::AIC.default retrieves the log-likelihood and converts it > back to an AIC (!) > * family()$dev.resid() computes the squared deviance residuals > * stats:::residuals.glm retrieves these values and takes the signed > square root > cheers > Ben Bolker ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
FWIW p. 206 of the White Book gives the following for
names(binomial()): family, names, link, inverse, deriv, initialize, variance, deviance, weight. So $aic wasn't there In The Beginning. I haven't done any more archaeology to try to figure out when/by whom it was first introduced ... Section 6.3.3, on extending families, doesn't give any other relevant info. A patch for src/library/stats/man/family.Rd below: please check what I've said about $aic and $mu.eta to make sure they're correct! I can submit this to the r bug list if preferred. ---- Index: family.Rd =================================================================== --- family.Rd (revision 74904) +++ family.Rd (working copy) @@ -31,7 +31,7 @@ \arguments{ \item{link}{a specification for the model link function. This can be a name/expression, a literal character string, a length-one character - vector or an object of class + vector, or an object of class \code{"\link[=make.link]{link-glm}"} (such as generated by \code{\link{make.link}}) provided it is not specified \emph{via} one of the standard names given next. @@ -45,7 +45,7 @@ the \code{Gamma} family the links \code{inverse}, \code{identity} and \code{log}; the \code{poisson} family the links \code{log}, \code{identity}, - and \code{sqrt} and the \code{inverse.gaussian} family the links + and \code{sqrt}; and the \code{inverse.gaussian} family the links \code{1/mu^2}, \code{inverse}, \code{identity} and \code{log}. @@ -105,8 +105,8 @@ \note{ The \code{link} and \code{variance} arguments have rather awkward semantics for back-compatibility. The recommended way is to supply - them is as quoted character strings, but they can also be supplied - unquoted (as names or expressions). In addition, they can also be + them as quoted character strings, but they can also be supplied + unquoted (as names or expressions). Additionally, they can be supplied as a length-one character vector giving the name of one of the options, or as a list (for \code{link}, of class \code{"link-glm"}). The restrictions apply only to links given as @@ -130,10 +130,18 @@ \item{dev.resids}{function giving the deviance residuals as a function of \code{(y, mu, wt)}.} \item{aic}{function giving the AIC value if appropriate (but \code{NA} - for the quasi- families). See \code{\link{logLik}} for the assumptions - made about the dispersion parameter.} - \item{mu.eta}{function: derivative \code{function(eta)} - \eqn{d\mu/d\eta}.} + for the quasi- families). More precisely, this function + returns \eqn{-2 L + 2 s}, where \eqn{L} is the log-likelihood and \eqn{s} + is the number of estimated scale parameters; the penalty term for the + location parameters is added elsewhere. + See \code{\link{logLik}} for the assumptions + made about the dispersion parameter.} + \item{mu.eta}{function: derivative of the inverse-link function + with respect to the linear predictor. If the inverse-link + function is \eqn{\mu=g^{-1}(\eta)}{mu=ginv(eta)} + where \eqn{eta}{\eta} is the value + of the linear predictor, then this function returns + \eqn{d(g^{-1})/d\eta=d\mu/d\eta}{d(ginv(eta))/d(eta)=d(mu)/d(eta)}.} \item{initialize}{expression. This needs to set up whatever data objects are needed for the family as well as \code{n} (needed for AIC in the binomial family) and \code{mustart} (see \code{\link{glm}}).} @@ -224,8 +232,8 @@ ## which case use an offset of 0 in the corresponding formula ## to get the null deviance right. -## Binomial with identity link: often not a good idea. -\dontrun{binomial(link = make.link("identity"))} +## Binomial with identity link: often computationally and conceptually difficult +\dontrun{binomial(link = "identity")} ## tests of quasi x <- rnorm(100) @@ -236,7 +244,7 @@ glm(y ~ x, family = quasi(variance = "mu^2", link = "log")) \dontrun{glm(y ~ x, family = quasi(variance = "mu^3", link = "log")) # fails} y <- rbinom(100, 1, plogis(x)) -# needs to set a starting value for the next fit +# need to set a starting value for the next fit glm(y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"), start = c(0,1)) } \keyword{models} On Mon, Jun 4, 2018 at 10:46 AM, Martin Maechler <[hidden email]> wrote: >>>>>> Ben Bolker >>>>>> on Sun, 3 Jun 2018 17:33:18 -0400 writes: > > > Is it generally known/has it been previously discussed here that the > > $aic() component in GLM-family objects (e.g. results of binomial(), > > poisson(), etc.) does not as implemented actually return the AIC, but > > rather -2*log-likelihood + 2*(model_has_scale_parameter) ? > > This rings a faint bell from the last millennium with me, > and the following "fortune" may contain the answer implicitly : > > ------------------------------------------------------------ > > if(!require("fortunes")) install.packages("fortunes") > > fortune("bug compatib") > > For quite a while, bug-for-bug compatibility with S-PLUS v 3.x was considered > important to allow people to port their packages between systems. > -- Peter Dalgaard > R-help (February 2009) > > > ------------------------------------------------------------ > > Ideally, readers who still have access to a version of S-PLUS / S+ > or who have read and internalized or (even co-written !) > "The white book", notably Ch.6, may be able to shed a historic > light on this. > > I note that the white book's Appendix B with function help > pages, has a page ?family.object, accessible here > https://sites.oxy.edu/lengyel/M150/Sueselbeck/helpfiles/family.object.html > > which does *not* mention a <fam>$dev.resid() component, but instead > allows to use <fam>$residuals(*, residuals=TRUE) > get the > " > vector of deviance residual, whose weighted sum of > squares is the deviance > " > > Given the above, and also the ?glm entry > > || Author(s): > || > || The original R implementation of ‘glm’ was written by Simon Davies > || working for Ross Ihaka at the University of Auckland, but has > || since been extensively re-written by members of the R Core team. > || > || The design was inspired by the S function of the same name > || described in Hastie & Pregibon (1992). > > actually suggest that it may be hard nowadays to find the > original "design specs" that Simon and Ross had used at the > time, and also that they only were _inspired_ by the white book > chapter 6 (= Hastie & Pregibon (1992)). > > > > > Can anyone in this forum gauge how a documentation patch > > would be received? > > It depends on further answers to your questions (i.e, this > thread), but I'd currently say "gratefully". > I'd expect it would be a patch mainly to > src/library/stats/man/family.Rd > > Note that help(AIC) has a non-small 'Details' section, but > indeed it does not mention the family(*)$aic function. > > > This behaviour does not seem to be documented in ?family (or anywhere > > else I can find), which says: > > > aic: function giving the AIC value if appropriate (but ‘NA’ for > > the quasi- families). See ‘logLik’ for the assumptions made > > about the dispersion parameter. > > > > For a demonstration that e.g. binomial()$aic() is really -2*log L and > > not the AIC, see: > > > https://github.com/wch/r-source/blob/trunk/src/library/stats/R/family.R#L317 > > > This document > > <https://github.com/lme4/lme4/blob/master/misc/notes/deviance.rmd> > > explicates the details a bit more ('L' denotes log-likelihood): > > > * family()$aic computes $-2L$, which glm.fit translates to an AIC by > > adding $2k$ and storing it in model$aic > > * logLik.default retrieves model$aic and converts it back to a > > log-likelihood > > * stats:::AIC.default retrieves the log-likelihood and converts it > > back to an AIC (!) > > * family()$dev.resid() computes the squared deviance residuals > > * stats:::residuals.glm retrieves these values and takes the signed > > square root > > > cheers > > Ben Bolker ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
>>>>> Ben Bolker
>>>>> on Sun, 17 Jun 2018 11:40:38 -0400 writes: > FWIW p. 206 of the White Book gives the following for > names(binomial()): family, names, link, inverse, deriv, > initialize, variance, deviance, weight. > So $aic wasn't there In The Beginning. I haven't done > any more archaeology to try to figure out when/by whom it > was first introduced ... Thank you Ben. I think I was already suggesting that it was by Simon and Ross and we cannot know who of the two. > Section 6.3.3, on extending families, doesn't give any > other relevant info. > A patch for src/library/stats/man/family.Rd below: please > check what I've said about $aic and $mu.eta to make sure > they're correct! I can submit this to the r bug list if > preferred. I've spent quite some time checking this - to some extent. Thank you for the patch. I will use an even slightly extended version ((and using the correct '\eqn{\eta}{eta}' )). Thank you indeed. Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by bbolker
In R, family has aic component since version 0.62. There is no aic component in family in R 0.61.3.
Looking at blame, https://github.com/wch/r-source/blame/tags/R-0-62/src/library/base/R/family.R , aic component in family is introduced in svn revision 640 (https://github.com/wch/r-source/commit/ac666741679b50bb1dfb5ce631717b375119f6ab): using aic(.) [Jim Lindsey]; use switch() rather than many if else else.. (MM) Components of family is documented since R 2.3.0. -------------------------------- >>>>> Ben Bolker >>>>> on Sun, 17 Jun 2018 11:40:38 -0400 writes: > FWIW p. 206 of the White Book gives the following for > names(binomial()): family, names, link, inverse, deriv, > initialize, variance, deviance, weight. > So $aic wasn't there In The Beginning. I haven't done > any more archaeology to try to figure out when/by whom it > was first introduced ... Thank you Ben. I think I was already suggesting that it was by Simon and Ross and we cannot know who of the two. > Section 6.3.3, on extending families, doesn't give any > other relevant info. > A patch for src/library/stats/man/family.Rd below: please > check what I've said about $aic and $mu.eta to make sure > they're correct! I can submit this to the r bug list if > preferred. I've spent quite some time checking this - to some extent. Thank you for the patch. I will use an even slightly extended version ((and using the correct '\eqn{\eta}{eta}' )). Thank you indeed. Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Free forum by Nabble | Edit this page |