Where is the SD in output of glm with Gaussian distribution

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Where is the SD in output of glm with Gaussian distribution

R help mailing list-2
Let do a simple glm:

 > y=rnorm(100)
 > gnul <- glm(y ~ 1)
 > gnul$coefficients
(Intercept)
   0.1399966

The logLik shows the fit of two parameters (DF=2) (intercept) and sd

 > logLik(gnul)
'log Lik.' -138.7902 (df=2)

But where is the sd term in the glm object?

If I do the same with optim, I can have its value

 > dnormx <- function(x, data) {1E9*-sum(dnorm(data, mean=x["mean"],
sd=x["sd"], log = TRUE))}
 > parg <- c(mean=0, sd=1)
 > o0 <- optim(par = parg, fn=dnormx, data=y, method="BFGS")
 > o0$value/1E9
[1] 138.7902
 > o0$par
      mean        sd

0.1399966 0.9694405

But I would like have the value in the glm.

(and in the meantime, I don't understand why gnul$df.residual returned
99... for me it should be 98=100 - number of observations) -1 (for mean)
- 1 (for sd); but it is statistical question... I have asked it in
crossvalidated [no answer still] !)

Thanks

Marc

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Where is the SD in output of glm with Gaussian distribution

Fox, John
Dear Marc,

For your simple model, the standard deviation of y is the square-root of the estimated dispersion parameter:

> set.seed(123)
> y <- rnorm(100)
> gnul <- glm(y ~ 1)
> summary(gnul)

Call:
glm(formula = y ~ 1)

Deviance Residuals:
     Min        1Q    Median        3Q       Max  
-2.39957  -0.58426  -0.02865   0.60141   2.09693  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.09041    0.09128    0.99    0.324

(Dispersion parameter for gaussian family taken to be 0.8332328)

    Null deviance: 82.49  on 99  degrees of freedom
Residual deviance: 82.49  on 99  degrees of freedom
AIC: 268.54

Number of Fisher Scoring iterations: 2

> sqrt(0.8332328)
[1] 0.9128159
> mean(y)
[1] 0.09040591
> sd(y)
[1] 0.9128159

I hope this helps,
 John

  -----------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Dec 9, 2019, at 10:16 AM, Marc Girondot via R-help <[hidden email]> wrote:
>
> Let do a simple glm:
>
> > y=rnorm(100)
> > gnul <- glm(y ~ 1)
> > gnul$coefficients
> (Intercept)
>   0.1399966
>
> The logLik shows the fit of two parameters (DF=2) (intercept) and sd
>
> > logLik(gnul)
> 'log Lik.' -138.7902 (df=2)
>
> But where is the sd term in the glm object?
>
> If I do the same with optim, I can have its value
>
> > dnormx <- function(x, data) {1E9*-sum(dnorm(data, mean=x["mean"], sd=x["sd"], log = TRUE))}
> > parg <- c(mean=0, sd=1)
> > o0 <- optim(par = parg, fn=dnormx, data=y, method="BFGS")
> > o0$value/1E9
> [1] 138.7902
> > o0$par
>      mean        sd
>
> 0.1399966 0.9694405
>
> But I would like have the value in the glm.
>
> (and in the meantime, I don't understand why gnul$df.residual returned 99... for me it should be 98=100 - number of observations) -1 (for mean) - 1 (for sd); but it is statistical question... I have asked it in crossvalidated [no answer still] !)
>
> Thanks
>
> Marc
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Where is the SD in output of glm with Gaussian distribution

Eric Berger
In reply to this post by R help mailing list-2
summary(gnul)

shows the std error of the coefficient estimate

On Mon, Dec 9, 2019 at 5:16 PM Marc Girondot via R-help
<[hidden email]> wrote:

>
> Let do a simple glm:
>
>  > y=rnorm(100)
>  > gnul <- glm(y ~ 1)
>  > gnul$coefficients
> (Intercept)
>    0.1399966
>
> The logLik shows the fit of two parameters (DF=2) (intercept) and sd
>
>  > logLik(gnul)
> 'log Lik.' -138.7902 (df=2)
>
> But where is the sd term in the glm object?
>
> If I do the same with optim, I can have its value
>
>  > dnormx <- function(x, data) {1E9*-sum(dnorm(data, mean=x["mean"],
> sd=x["sd"], log = TRUE))}
>  > parg <- c(mean=0, sd=1)
>  > o0 <- optim(par = parg, fn=dnormx, data=y, method="BFGS")
>  > o0$value/1E9
> [1] 138.7902
>  > o0$par
>       mean        sd
>
> 0.1399966 0.9694405
>
> But I would like have the value in the glm.
>
> (and in the meantime, I don't understand why gnul$df.residual returned
> 99... for me it should be 98=100 - number of observations) -1 (for mean)
> - 1 (for sd); but it is statistical question... I have asked it in
> crossvalidated [no answer still] !)
>
> Thanks
>
> Marc
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Where is the SD in output of glm with Gaussian distribution

Bert Gunter-2
In reply to this post by R help mailing list-2
In addition, as John's included output shows, only 1 parameter, the
intercept, is fit. As he also said, the sd is estimated from the residual
deviance -- it is not a model parameter.

Suggest you spend some time with a glm tutorial/text.

Bert

On Mon, Dec 9, 2019 at 7:17 AM Marc Girondot via R-help <
[hidden email]> wrote:

> Let do a simple glm:
>
>  > y=rnorm(100)
>  > gnul <- glm(y ~ 1)
>  > gnul$coefficients
> (Intercept)
>    0.1399966
>
> The logLik shows the fit of two parameters (DF=2) (intercept) and sd
>
>  > logLik(gnul)
> 'log Lik.' -138.7902 (df=2)
>
> But where is the sd term in the glm object?
>
> If I do the same with optim, I can have its value
>
>  > dnormx <- function(x, data) {1E9*-sum(dnorm(data, mean=x["mean"],
> sd=x["sd"], log = TRUE))}
>  > parg <- c(mean=0, sd=1)
>  > o0 <- optim(par = parg, fn=dnormx, data=y, method="BFGS")
>  > o0$value/1E9
> [1] 138.7902
>  > o0$par
>       mean        sd
>
> 0.1399966 0.9694405
>
> But I would like have the value in the glm.
>
> (and in the meantime, I don't understand why gnul$df.residual returned
> 99... for me it should be 98=100 - number of observations) -1 (for mean)
> - 1 (for sd); but it is statistical question... I have asked it in
> crossvalidated [no answer still] !)
>
> Thanks
>
> Marc
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Where is the SD in output of glm with Gaussian distribution

R help mailing list-2
Le 09/12/2019 à 16:45, Bert Gunter a écrit :
> In addition, as John's included output shows, only 1 parameter, the
> intercept, is fit. As he also said, the sd is estimated from the
> residual deviance -- it is not a model parameter.
>
> Suggest you spend some time with a glm tutorial/text.

I tried ! But I miss this point. I understand now this point. Thanks a 
lot... big progress for me.

But still I don't understand why AIC calculation uses 2 parameters if
the SD is estimated from the residual deviance.

 > y=rnorm(100)
 > gnul <- glm(y ~ 1)
 > logLik(gnul)
'log Lik.' -136.4343 (df=2)
 > AIC(gnul)
[1] 276.8687
 > -2*logLik(gnul)+2*2
'log Lik.' 276.8687 (df=2)

This is not intuitive when to count SD as a parameter (in AIC) or not in
df.resuidual !


>
> Bert
>
> On Mon, Dec 9, 2019 at 7:17 AM Marc Girondot via R-help
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Let do a simple glm:
>
>      > y=rnorm(100)
>      > gnul <- glm(y ~ 1)
>      > gnul$coefficients
>     (Intercept)
>        0.1399966
>
>     The logLik shows the fit of two parameters (DF=2) (intercept) and sd
>
>      > logLik(gnul)
>     'log Lik.' -138.7902 (df=2)
>
>     But where is the sd term in the glm object?
>
>     If I do the same with optim, I can have its value
>
>      > dnormx <- function(x, data) {1E9*-sum(dnorm(data, mean=x["mean"],
>     sd=x["sd"], log = TRUE))}
>      > parg <- c(mean=0, sd=1)
>      > o0 <- optim(par = parg, fn=dnormx, data=y, method="BFGS")
>      > o0$value/1E9
>     [1] 138.7902
>      > o0$par
>           mean        sd
>
>     0.1399966 0.9694405
>
>     But I would like have the value in the glm.
>
>     (and in the meantime, I don't understand why gnul$df.residual
>     returned
>     99... for me it should be 98=100 - number of observations) -1 (for
>     mean)
>     - 1 (for sd); but it is statistical question... I have asked it in
>     crossvalidated [no answer still] !)
>
>     Thanks
>
>     Marc
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Where is the SD in output of glm with Gaussian distribution

Fox, John
In reply to this post by R help mailing list-2
Dear Bert,

It's perhaps a bit pedantic to point it out, but the dispersion is estimated from the Pearson statistic (sum of squared residuals or weighted squared residuals) not from the residual deviance. You can see this in the code for summary.glm().

Best,
 John

  -----------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Dec 9, 2019, at 10:45 AM, Bert Gunter <[hidden email]> wrote:
>
> In addition, as John's included output shows, only 1 parameter, the
> intercept, is fit. As he also said, the sd is estimated from the residual
> deviance -- it is not a model parameter.
>
> Suggest you spend some time with a glm tutorial/text.
>
> Bert
>
> On Mon, Dec 9, 2019 at 7:17 AM Marc Girondot via R-help <
> [hidden email]> wrote:
>
>> Let do a simple glm:
>>
>>> y=rnorm(100)
>>> gnul <- glm(y ~ 1)
>>> gnul$coefficients
>> (Intercept)
>>   0.1399966
>>
>> The logLik shows the fit of two parameters (DF=2) (intercept) and sd
>>
>>> logLik(gnul)
>> 'log Lik.' -138.7902 (df=2)
>>
>> But where is the sd term in the glm object?
>>
>> If I do the same with optim, I can have its value
>>
>>> dnormx <- function(x, data) {1E9*-sum(dnorm(data, mean=x["mean"],
>> sd=x["sd"], log = TRUE))}
>>> parg <- c(mean=0, sd=1)
>>> o0 <- optim(par = parg, fn=dnormx, data=y, method="BFGS")
>>> o0$value/1E9
>> [1] 138.7902
>>> o0$par
>>      mean        sd
>>
>> 0.1399966 0.9694405
>>
>> But I would like have the value in the glm.
>>
>> (and in the meantime, I don't understand why gnul$df.residual returned
>> 99... for me it should be 98=100 - number of observations) -1 (for mean)
>> - 1 (for sd); but it is statistical question... I have asked it in
>> crossvalidated [no answer still] !)
>>
>> Thanks
>>
>> Marc
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.