

Dear R community,
I just stumbled upon the following behavior in R version 3.6.0:
set.seed(42)
y < rep(0, 30)
x < rbinom(30, 1, prob = 0.91)
# The following will not show any tstatistic or pvalue
summary(lm(y~x))
# The following will show tstatistic and pvalue
summary(lm(1+y~x))
My expected output is that the first case should report tstatistic and
pvalue. My intuition might be tricking me, but I think that a constant
shift of the data should be fully absorbed by the constant and not
affect inference about the slope.
Is this a bug or is there a reason why there should be a discrepancy
between the two outputs?
Best,
David
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi: In your example, you made the response zero in every case which
is going to cause problems. In glm's, I think they call it the donsker
effect. I'm not sure what it's called
in OLS. probably a lack of identifiability. Note that you probably
shouldn't be using zeros
and 1's as the response in a regression anyway.
If you change the response to below, you get what you'd expect.
y < c(rep(0, 15), rep(1,15))
On Fri, Sep 27, 2019 at 1:48 PM David J. Birke < [hidden email]> wrote:
> Dear R community,
>
> I just stumbled upon the following behavior in R version 3.6.0:
>
> set.seed(42)
> y < rep(0, 30)
> x < rbinom(30, 1, prob = 0.91)
> # The following will not show any tstatistic or pvalue
> summary(lm(y~x))
> # The following will show tstatistic and pvalue
> summary(lm(1+y~x))
>
> My expected output is that the first case should report tstatistic and
> pvalue. My intuition might be tricking me, but I think that a constant
> shift of the data should be fully absorbed by the constant and not
> affect inference about the slope.
>
> Is this a bug or is there a reason why there should be a discrepancy
> between the two outputs?
>
> Best,
> David
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


correction to my previous answer. I looked around and I don't think it's
called the donsker effect. It seems to
jbe referred to as just a case of "perfect separability.". if you google
for" perfect separation in glms", you'll get a
lot of information.
On Fri, Sep 27, 2019 at 2:35 PM Mark Leeds < [hidden email]> wrote:
> Hi: In your example, you made the response zero in every case which
> is going to cause problems. In glm's, I think they call it the donsker
> effect. I'm not sure what it's called
> in OLS. probably a lack of identifiability. Note that you probably
> shouldn't be using zeros
> and 1's as the response in a regression anyway.
>
> If you change the response to below, you get what you'd expect.
>
> y < c(rep(0, 15), rep(1,15))
>
> On Fri, Sep 27, 2019 at 1:48 PM David J. Birke < [hidden email]>
> wrote:
>
>> Dear R community,
>>
>> I just stumbled upon the following behavior in R version 3.6.0:
>>
>> set.seed(42)
>> y < rep(0, 30)
>> x < rbinom(30, 1, prob = 0.91)
>> # The following will not show any tstatistic or pvalue
>> summary(lm(y~x))
>> # The following will show tstatistic and pvalue
>> summary(lm(1+y~x))
>>
>> My expected output is that the first case should report tstatistic and
>> pvalue. My intuition might be tricking me, but I think that a constant
>> shift of the data should be fully absorbed by the constant and not
>> affect inference about the slope.
>>
>> Is this a bug or is there a reason why there should be a discrepancy
>> between the two outputs?
>>
>> Best,
>> David
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide
>> http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
Maybe FAQ 7.31?
Check the residuals, they are all "zero" in both cases:
fit0 < lm(y~x)
fit1 < lm(1+y~x)
# residuals
table(resid(fit0))
#
# 0
#30
table(resid(fit1))
#
#5.21223595241838e16 4.93038065763132e31 3.12734157145103e15
# 6 23 1
Hope this helps,
Rui Barradas
Às 18:05 de 27/09/19, David J. Birke escreveu:
> Dear R community,
>
> I just stumbled upon the following behavior in R version 3.6.0:
>
> set.seed(42)
> y < rep(0, 30)
> x < rbinom(30, 1, prob = 0.91)
> # The following will not show any tstatistic or pvalue
> summary(lm(y~x))
> # The following will show tstatistic and pvalue
> summary(lm(1+y~x))
>
> My expected output is that the first case should report tstatistic and
> pvalue. My intuition might be tricking me, but I think that a constant
> shift of the data should be fully absorbed by the constant and not
> affect inference about the slope.
>
> Is this a bug or is there a reason why there should be a discrepancy
> between the two outputs?
>
> Best,
> David
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Berwin: Yes, that's it. Donsker is famous for a functional CLT so I was
mixing up statistics
and stochastic processes I better stick to statistics. It's safer. !!!!!
Thanks for correction.
I'm ccing Rhelp since it may be useful to someone there. See below for
Berwin's
comment.
Mark
On Sat, Sep 28, 2019 at 3:36 AM Berwin A Turlach < [hidden email]>
wrote:
> G'day Mark,
>
> On Fri, 27 Sep 2019 14:43:28 0400
> Mark Leeds < [hidden email]> wrote:
>
> > correction to my previous answer. I looked around and I don't think
> > it's called the donsker effect.
>
> I think you meant the HauckDonner effect [1], which refers to the
> problem of separation for binomial GLMs (not all GLMs).
>
> Cheers,
>
> Berwin
>
> [1] Hauck, Jr., W.W. and Donner, A. (1977) Wald's test as applied to
> hypotheses in logit analysis. Journal of the American Statistical
> Association 72, 851853.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


In one case they are exactly 0 and in the other they are almost zero. This
is the reason for different results.
Of course, they should be exactly the same, but this is due to some integer
values not being exactly represented as real values on binary computers.
Best,
Aleš Žiberna
On Fri, Sep 27, 2019 at 9:01 PM Rui Barradas < [hidden email]> wrote:
> Hello,
>
> Maybe FAQ 7.31?
>
> Check the residuals, they are all "zero" in both cases:
>
> fit0 < lm(y~x)
> fit1 < lm(1+y~x)
>
> # residuals
> table(resid(fit0))
> #
> # 0
> #30
>
> table(resid(fit1))
> #
> #5.21223595241838e16 4.93038065763132e31 3.12734157145103e15
> # 6 23 1
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 18:05 de 27/09/19, David J. Birke escreveu:
> > Dear R community,
> >
> > I just stumbled upon the following behavior in R version 3.6.0:
> >
> > set.seed(42)
> > y < rep(0, 30)
> > x < rbinom(30, 1, prob = 0.91)
> > # The following will not show any tstatistic or pvalue
> > summary(lm(y~x))
> > # The following will show tstatistic and pvalue
> > summary(lm(1+y~x))
> >
> > My expected output is that the first case should report tstatistic and
> > pvalue. My intuition might be tricking me, but I think that a constant
> > shift of the data should be fully absorbed by the constant and not
> > affect inference about the slope.
> >
> > Is this a bug or is there a reason why there should be a discrepancy
> > between the two outputs?
> >
> > Best,
> > David
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

