# Interesting behavior of lm() with small, problematic data sets

7 messages
Open this post in threaded view
|

## Interesting behavior of lm() with small, problematic data sets

Open this post in threaded view
|

## Re: Interesting behavior of lm() with small, problematic data sets

Open this post in threaded view
|

## Re: Interesting behavior of lm() with small, problematic data sets

 In reply to this post by Glover, Tim-2 > On Sep 5, 2017, at 6:24 AM, Glover, Tim <[hidden email]> wrote: > > I've recently come across the following results reported from the lm() function when applied to a particular type of admittedly difficult data.  When working with > small data sets (for instance 3 points) with the same response for different predicting variable, the resulting slope estimate is a reasonable approximation of the expected 0.0, but the p-value of that slope estimate is a surprising value.  A reproducible example is included below, along with the output of the summary of results > > ######### example code > x <- c(1,2,3) > y <- c(1,1,1) > > #above results in{ (1,1) (2,1) (3,1)} data set to regress > > new.rez <- lm (y ~ x) # regress constant y on changing x) > summary(new.rez) # display results of regression > > ######## end of example code > > Results: > > Call: > lm(formula = y ~ x) > > Residuals: >         1          2          3 > 5.906e-17 -1.181e-16  5.906e-17 > > Coefficients: >              Estimate Std. Error    t value Pr(>|t|) > (Intercept)  1.000e+00  2.210e-16  4.525e+15   <2e-16 *** > x           -1.772e-16  1.023e-16 -1.732e+00    0.333 > --- > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Residual standard error: 1.447e-16 on 1 degrees of freedom > Multiple R-squared:  0.7794,    Adjusted R-squared:  0.5589 > F-statistic: 3.534 on 1 and 1 DF,  p-value: 0.3112 > > Warning message: > In summary.lm(new.rez) : essentially perfect fit: summary may be unreliable > > > ############## > > There is a warning that the summary may be unreliable sue to the essentially perfect fit, but a p-value of 0.3112 doesn’t seem reasonable. > As a side note, the various r^2 values seem odd too. You have an overfitted model with only 3 perfectly fit-able data points and you are whinging about a Wald statistic about which you were warned. I think you are wasting our time. (But I'm fully retired and I have a lot of time to waste.) I seem to remember that a t-distribution with 1 degree of freedom is actually the Cauchy distribution. I would point out that you can also get: > 2*pt(-1.732e+00, 1) [1] 0.3333414 So maybe from that perspective any value might be "reasonable" from the perspective that you have that particular number data points (so one degree of freedom) and are using an estimate of the t-statistic which is essentially the ratio of 0/0 from a numerical point of view. -- David. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Interesting behavior of lm() with small, problematic data sets

Open this post in threaded view
|

## Re: Interesting behavior of lm() with small, problematic data sets

 > I think what you're seeing is > https://en.wikipedia.org/wiki/Loss_of_significance. Almost. All the results in the OP's summary are reflections of finite precision in the analytically exact solution, leading to residuals smaller than the double precision limit. The summary is correctly warning that it's all potentially nonsense, and indeed the only things you can trust are the coefficient values (to within .Machine\$double.eps or thereabouts) Interestingly, though, my current version of R (3.4.0) gives numerically exact coefficients (c(1,0) and identically zero standard errors. So this particular example is apparently version-specific. S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.