lm fails on some large input

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

lm fails on some large input

Dingyuan Wang
Hi,

This input doesn't have any interesting properties except y is unix
time. Spreadsheets can do this well.
Is this a bug that lm can't do x ~ y?

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

 > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
101.632, 108.928, 94.08)
 > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873,
1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307,
1506705747.372)
 > m = lm(x ~ y)
 > summary(m)

Call:
lm(formula = x ~ y)

Residuals:
      Min       1Q   Median       3Q      Max
-27.0222 -14.9902  -0.6542  14.1938  29.1698

Coefficients: (1 not defined because of singularities)
             Estimate Std. Error t value Pr(>|t|)
(Intercept)   94.734      6.511   14.55 4.88e-07 ***
y                 NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.53 on 8 degrees of freedom

 > summary(lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
     Min      1Q  Median      3Q     Max
-2.1687 -1.3345 -0.9466  1.3826  2.6551

Coefficients:
              Estimate Std. Error   t value Pr(>|t|)
(Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.885 on 7 degrees of freedom
Multiple R-squared:  0.9788, Adjusted R-squared:  0.9758
F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Michael Dewey-3
Perhaps subtract 1506705766 from y?

Saying some other software does it well implies you know what the
_correct_ answer is here but I would question what that means with this
sort of data-set.

On 17/04/2019 07:26, Dingyuan Wang wrote:

> Hi,
>
> This input doesn't have any interesting properties except y is unix
> time. Spreadsheets can do this well.
> Is this a bug that lm can't do x ~ y?
>
> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> Copyright (C) 2018 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
>  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> 101.632, 108.928, 94.08)
>  > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873,
> 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307,
> 1506705747.372)
>  > m = lm(x ~ y)
>  > summary(m)
>
> Call:
> lm(formula = x ~ y)
>
> Residuals:
>       Min       1Q   Median       3Q      Max
> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>
> Coefficients: (1 not defined because of singularities)
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)   94.734      6.511   14.55 4.88e-07 ***
> y                 NA         NA      NA       NA
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 19.53 on 8 degrees of freedom
>
>  > summary(lm(y ~ x))
>
> Call:
> lm(formula = y ~ x)
>
> Residuals:
>      Min      1Q  Median      3Q     Max
> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>
> Coefficients:
>               Estimate Std. Error   t value Pr(>|t|)
> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
> x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.885 on 7 degrees of freedom
> Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
>

--
Michael
http://www.dewey.myzen.co.uk/home.html

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Fox, John
In reply to this post by Dingyuan Wang
Dear Michael and Dingyuan Wang,

> -----Original Message-----
> From: R-help [mailto:[hidden email]] On Behalf Of Michael
> Dewey
> Sent: Thursday, April 18, 2019 11:25 AM
> To: Dingyuan Wang <[hidden email]>; [hidden email]
> Subject: Re: [R] lm fails on some large input
>
> Perhaps subtract 1506705766 from y?
>
> Saying some other software does it well implies you know what the _correct_
> answer is here but I would question what that means with this sort of data-
> set.

It's rather an interesting problem, though, because the naïve computation of the LS solution works:

plot(x, y)
X <- cbind(1, x)
b <- solve(t(X) %*% X) %*% t(X) %*% y
b
abline(b)

That surprised me, because I expected that lm() computation, using the QR decomposition, would be more numerically stable.

Best,
 John

-----------------------------------------------------------------
John Fox
Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: https://socialsciences.mcmaster.ca/jfox/



>
> On 17/04/2019 07:26, Dingyuan Wang wrote:
> > Hi,
> >
> > This input doesn't have any interesting properties except y is unix
> > time. Spreadsheets can do this well.
> > Is this a bug that lm can't do x ~ y?
> >
> > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> > Copyright (C) 2018 The R Foundation for Statistical Computing
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> >  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> > 101.632, 108.928, 94.08)  > y = c(1506705739.385, 1506705766.895,
> > 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351,
> > 1506705756.26, 1506705761.307,
> > 1506705747.372)
> >  > m = lm(x ~ y)
> >  > summary(m)
> >
> > Call:
> > lm(formula = x ~ y)
> >
> > Residuals:
> >       Min       1Q   Median       3Q      Max
> > -27.0222 -14.9902  -0.6542  14.1938  29.1698
> >
> > Coefficients: (1 not defined because of singularities)
> >              Estimate Std. Error t value Pr(>|t|)
> > (Intercept)   94.734      6.511   14.55 4.88e-07 *** y
> > NA         NA      NA       NA
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 19.53 on 8 degrees of freedom
> >
> >  > summary(lm(y ~ x))
> >
> > Call:
> > lm(formula = y ~ x)
> >
> > Residuals:
> >      Min      1Q  Median      3Q     Max
> > -2.1687 -1.3345 -0.9466  1.3826  2.6551
> >
> > Coefficients:
> >               Estimate Std. Error   t value Pr(>|t|)
> > (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 *** x
> > 6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 1.885 on 7 degrees of freedom Multiple
> > R-squared:  0.9788,    Adjusted R-squared:  0.9758
> > F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ---
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Peter Dalgaard-2
Um, you need to reverse y and x there. The question was about lm(y ~ x)....

> X <- cbind(1, y)
> solve(crossprod(X))
Error in solve.default(crossprod(X)) :
  system is computationally singular: reciprocal condition number = 6.19587e-35

Actually, lm can QR perfectly OK, but it gets caught by its singularity detection:

> qr <- qr(X, tol=1e-10)
> qr # without the tol bit, you get same thing but $rank == 1
$qr
                             y
 [1,] -3.0000000 -4.520117e+09
 [2,]  0.3333333 -3.426530e+01
 [3,]  0.3333333 -2.947103e-02
 [4,]  0.3333333  4.252164e-01
 [5,]  0.3333333 -3.665468e-01
 [6,]  0.3333333 -3.488029e-01
 [7,]  0.3333333  2.614064e-01
 [8,]  0.3333333  4.086982e-01
 [9,]  0.3333333  2.018556e-03

$rank
[1] 2

$qraux
[1] 1.333333 1.571779

$pivot
[1] 1 2

attr(,"class")
[1] "qr"
> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 108.928, 94.08)
> qr.coef(qr,x)
                          y
-2.403345e+09  1.595099e+00

> lm(x~y)

Call:
lm(formula = x ~ y)

Coefficients:
(Intercept)            y  
      94.73           NA  

> lm(x~y, tol=1e-10)

Call:
lm(formula = x ~ y, tol = 1e-10)

Coefficients:
(Intercept)            y  
 -2.403e+09    1.595e+00  

> lm(x~I(y-mean(y)))

Call:
lm(formula = x ~ I(y - mean(y)))

Coefficients:
   (Intercept)  I(y - mean(y))  
        94.734           1.595  


> On 18 Apr 2019, at 17:56 , Fox, John <[hidden email]> wrote:
>
> Dear Michael and Dingyuan Wang,
>
>> -----Original Message-----
>> From: R-help [mailto:[hidden email]] On Behalf Of Michael
>> Dewey
>> Sent: Thursday, April 18, 2019 11:25 AM
>> To: Dingyuan Wang <[hidden email]>; [hidden email]
>> Subject: Re: [R] lm fails on some large input
>>
>> Perhaps subtract 1506705766 from y?
>>
>> Saying some other software does it well implies you know what the _correct_
>> answer is here but I would question what that means with this sort of data-
>> set.
>
> It's rather an interesting problem, though, because the naïve computation of the LS solution works:
>
> plot(x, y)
> X <- cbind(1, x)
> b <- solve(t(X) %*% X) %*% t(X) %*% y
> b
> abline(b)
>
> That surprised me, because I expected that lm() computation, using the QR decomposition, would be more numerically stable.
>
> Best,
> John
>
> -----------------------------------------------------------------
> John Fox
> Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> Web: https://socialsciences.mcmaster.ca/jfox/
>
>
>
>>
>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>>> Hi,
>>>
>>> This input doesn't have any interesting properties except y is unix
>>> time. Spreadsheets can do this well.
>>> Is this a bug that lm can't do x ~ y?
>>>
>>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>>> Copyright (C) 2018 The R Foundation for Statistical Computing
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>>> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>>> 101.632, 108.928, 94.08)  > y = c(1506705739.385, 1506705766.895,
>>> 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351,
>>> 1506705756.26, 1506705761.307,
>>> 1506705747.372)
>>>> m = lm(x ~ y)
>>>> summary(m)
>>>
>>> Call:
>>> lm(formula = x ~ y)
>>>
>>> Residuals:
>>>      Min       1Q   Median       3Q      Max
>>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>>>
>>> Coefficients: (1 not defined because of singularities)
>>>             Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)   94.734      6.511   14.55 4.88e-07 *** y
>>> NA         NA      NA       NA
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>> Residual standard error: 19.53 on 8 degrees of freedom
>>>
>>>> summary(lm(y ~ x))
>>>
>>> Call:
>>> lm(formula = y ~ x)
>>>
>>> Residuals:
>>>     Min      1Q  Median      3Q     Max
>>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>>>
>>> Coefficients:
>>>              Estimate Std. Error   t value Pr(>|t|)
>>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 *** x
>>> 6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>> Residual standard error: 1.885 on 7 degrees of freedom Multiple
>>> R-squared:  0.9788,    Adjusted R-squared:  0.9758
>>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ---
>>> This email has been checked for viruses by AVG.
>>> https://www.avg.com
>>>
>>>
>>
>> --
>> Michael
>> http://www.dewey.myzen.co.uk/home.html
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

R help mailing list-2
In reply to this post by Michael Dewey-3
This sort of data arises quite easily if you deal with time/dates around
now.  E.g.,

> d <- data.frame(
+     when = seq(as.POSIXct("2017-09-29 18:22:01"), by="secs", len=10),
+     measurement = log2(1:10))
> coef(lm(data=d, measurement ~ when))
       (Intercept)               when
2.1791061114716954                 NA
> as.numeric(d$when)[1:2]
[1] 1506734521 1506734522

There are problems with the time units (seconds vs. hours) if you subtract
off a time because the units of -.POSIXt depend on the data:

> coef(lm(data=d, measurement ~ I(when - min(when))))
        (Intercept) I(when - min(when))
0.68327571513124297 0.33240675474232279
> coef(lm(data=d, measurement ~ I(when - as.POSIXct("2017-09-29
00:00:00"))))
                                (Intercept) I(when - as.POSIXct("2017-09-29
00:00:00"))
                       -21978.3837546251634
1196.6643170736229


Hence you have to use difftime and specify the units

> coef(lm(data=d, measurement ~ difftime(when, as.POSIXct("2017-09-29
00:00:00"), units="secs")))
                                                      (Intercept)
                                          -2.1978383754612696e+04
difftime(when, as.POSIXct("2017-09-29 00:00:00"), units = "secs")
                                           3.3240675474248449e-01
> coef(lm(data=d, measurement ~ difftime(when, min(when), units="secs")))
                              (Intercept) difftime(when, min(when), units =
"secs")
                      0.68327571513124297
 0.33240675474232279



Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Apr 18, 2019 at 8:24 AM Michael Dewey <[hidden email]>
wrote:

> Perhaps subtract 1506705766 from y?
>
> Saying some other software does it well implies you know what the
> _correct_ answer is here but I would question what that means with this
> sort of data-set.
>
> On 17/04/2019 07:26, Dingyuan Wang wrote:
> > Hi,
> >
> > This input doesn't have any interesting properties except y is unix
> > time. Spreadsheets can do this well.
> > Is this a bug that lm can't do x ~ y?
> >
> > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> > Copyright (C) 2018 The R Foundation for Statistical Computing
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> >  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> > 101.632, 108.928, 94.08)
> >  > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873,
> > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307,
> > 1506705747.372)
> >  > m = lm(x ~ y)
> >  > summary(m)
> >
> > Call:
> > lm(formula = x ~ y)
> >
> > Residuals:
> >       Min       1Q   Median       3Q      Max
> > -27.0222 -14.9902  -0.6542  14.1938  29.1698
> >
> > Coefficients: (1 not defined because of singularities)
> >              Estimate Std. Error t value Pr(>|t|)
> > (Intercept)   94.734      6.511   14.55 4.88e-07 ***
> > y                 NA         NA      NA       NA
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 19.53 on 8 degrees of freedom
> >
> >  > summary(lm(y ~ x))
> >
> > Call:
> > lm(formula = y ~ x)
> >
> > Residuals:
> >      Min      1Q  Median      3Q     Max
> > -2.1687 -1.3345 -0.9466  1.3826  2.6551
> >
> > Coefficients:
> >               Estimate Std. Error   t value Pr(>|t|)
> > (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
> > x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 1.885 on 7 degrees of freedom
> > Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
> > F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ---
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Dingyuan Wang
In reply to this post by Michael Dewey-3
I just want to make a line out of timestamps vs some coordinates, so y~x
or x~y doesn't matter.

Yes, I know the answer. When trying R, I'm surprised that R can't solve
that either. I first noticed that PostgreSQL can't solve it, and found
that they fixed that in pg 12.

https://www.postgresql.org/message-id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org

Therefore I come to ask whether someone know how to fix this in R, or I
must submit it as a bug?

2019/4/18 23:24, Michael Dewey:

> Perhaps subtract 1506705766 from y?
>
> Saying some other software does it well implies you know what the
> _correct_ answer is here but I would question what that means with this
> sort of data-set.
>
> On 17/04/2019 07:26, Dingyuan Wang wrote:
>> Hi,
>>
>> This input doesn't have any interesting properties except y is unix
>> time. Spreadsheets can do this well.
>> Is this a bug that lm can't do x ~ y?
>>
>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>> Copyright (C) 2018 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>>  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>> 101.632, 108.928, 94.08)
>>  > y = c(1506705739.385, 1506705766.895, 1506705746.293,
>> 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26,
>> 1506705761.307, 1506705747.372)
>>  > m = lm(x ~ y)
>>  > summary(m)
>>
>> Call:
>> lm(formula = x ~ y)
>>
>> Residuals:
>>       Min       1Q   Median       3Q      Max
>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>>
>> Coefficients: (1 not defined because of singularities)
>>              Estimate Std. Error t value Pr(>|t|)
>> (Intercept)   94.734      6.511   14.55 4.88e-07 ***
>> y                 NA         NA      NA       NA
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Residual standard error: 19.53 on 8 degrees of freedom
>>
>>  > summary(lm(y ~ x))
>>
>> Call:
>> lm(formula = y ~ x)
>>
>> Residuals:
>>      Min      1Q  Median      3Q     Max
>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>>
>> Coefficients:
>>               Estimate Std. Error   t value Pr(>|t|)
>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>> x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Residual standard error: 1.885 on 7 degrees of freedom
>> Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ---
>> This email has been checked for viruses by AVG.
>> https://www.avg.com
>>
>>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Berry, Charles
In reply to this post by Michael Dewey-3


> On Apr 18, 2019, at 8:24 AM, Michael Dewey <[hidden email]> wrote:
>
> Perhaps subtract 1506705766 from y?

Good advice. Some further notes follow.

One can specify `tol` to have a smaller than default value

e.g.

  m2 <- lm(x ~ y, tol=1e-12)

which is accurate:

  plot(y,x)
  abline(coef=coef(m2))
 

Users of numerical procedures need to be mindful of the default settings of the algorithms they use.

As is well known, the use of a too large default for convergence of an optimization algorithm can lead to seriously wrong results. There is an example described here:

https://science.sciencemag.org/content/296/5575/1945/tab-pdf

One might quibble with the choice of tol=1e-7 (the default in lm.fit), and 64 bit floating point will support much smaller values. However, there are usually statistical issues surrounding fitting highly collinear variables.

So,  `tol = 1e-07` seems more like a feature than a bug.

HTH,

Chuck

>
> Saying some other software does it well implies you know what the _correct_ answer is here but I would question what that means with this sort of data-set.
>
> On 17/04/2019 07:26, Dingyuan Wang wrote:
>> Hi,
>> This input doesn't have any interesting properties except y is unix time. Spreadsheets can do this well.
>> Is this a bug that lm can't do x ~ y?
>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>> Copyright (C) 2018 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 108.928, 94.08)
>> > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, 1506705747.372)
>> > m = lm(x ~ y)
>> > summary(m)
>> Call:
>> lm(formula = x ~ y)
>> Residuals:
>>      Min       1Q   Median       3Q      Max
>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>> Coefficients: (1 not defined because of singularities)
>>             Estimate Std. Error t value Pr(>|t|)
>> (Intercept)   94.734      6.511   14.55 4.88e-07 ***
>> y                 NA         NA      NA       NA
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> Residual standard error: 19.53 on 8 degrees of freedom
>> > summary(lm(y ~ x))
>> Call:
>> lm(formula = y ~ x)
>> Residuals:
>>     Min      1Q  Median      3Q     Max
>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>> Coefficients:
>>              Estimate Std. Error   t value Pr(>|t|)
>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>> x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> Residual standard error: 1.885 on 7 degrees of freedom
>> Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> ---
>> This email has been checked for viruses by AVG.
>> https://www.avg.com
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Fox, John
In reply to this post by Peter Dalgaard-2
Dear Peter,

> -----Original Message-----
> From: peter dalgaard [mailto:[hidden email]]
> Sent: Thursday, April 18, 2019 12:23 PM
> To: Fox, John <[hidden email]>
> Cc: Michael Dewey <[hidden email]>; Dingyuan Wang
> <[hidden email]>; [hidden email]
> Subject: Re: [R] lm fails on some large input
>
> Um, you need to reverse y and x there. The question was about lm(y ~ x)....
>

Good catch! I missed that in the original posting, and lm() does indeed produce the LS solution for the regression of y on x. And, as I'd have expected, the naïve approach also fails for the regression of x on y:

> Y <- cbind(1, y)
> b <- solve(t(Y) %*% Y) %*% t(Y) %*% x
Error in solve.default(t(Y) %*% Y) :
  system is computationally singular: reciprocal condition number = 6.19587e-35

resolving the mystery.

Thanks,
 John

> > X <- cbind(1, y)
> > solve(crossprod(X))
> Error in solve.default(crossprod(X)) :
>   system is computationally singular: reciprocal condition number = 6.19587e-
> 35
>
> Actually, lm can QR perfectly OK, but it gets caught by its singularity detection:
>
> > qr <- qr(X, tol=1e-10)
> > qr # without the tol bit, you get same thing but $rank == 1
> $qr
>                              y
>  [1,] -3.0000000 -4.520117e+09
>  [2,]  0.3333333 -3.426530e+01
>  [3,]  0.3333333 -2.947103e-02
>  [4,]  0.3333333  4.252164e-01
>  [5,]  0.3333333 -3.665468e-01
>  [6,]  0.3333333 -3.488029e-01
>  [7,]  0.3333333  2.614064e-01
>  [8,]  0.3333333  4.086982e-01
>  [9,]  0.3333333  2.018556e-03
>
> $rank
> [1] 2
>
> $qraux
> [1] 1.333333 1.571779
>
> $pivot
> [1] 1 2
>
> attr(,"class")
> [1] "qr"
> > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632,
> > 108.928, 94.08)
> > qr.coef(qr,x)
>                           y
> -2.403345e+09  1.595099e+00
>
> > lm(x~y)
>
> Call:
> lm(formula = x ~ y)
>
> Coefficients:
> (Intercept)            y
>       94.73           NA
>
> > lm(x~y, tol=1e-10)
>
> Call:
> lm(formula = x ~ y, tol = 1e-10)
>
> Coefficients:
> (Intercept)            y
>  -2.403e+09    1.595e+00
>
> > lm(x~I(y-mean(y)))
>
> Call:
> lm(formula = x ~ I(y - mean(y)))
>
> Coefficients:
>    (Intercept)  I(y - mean(y))
>         94.734           1.595
>
>
> > On 18 Apr 2019, at 17:56 , Fox, John <[hidden email]> wrote:
> >
> > Dear Michael and Dingyuan Wang,
> >
> >> -----Original Message-----
> >> From: R-help [mailto:[hidden email]] On Behalf Of
> >> Michael Dewey
> >> Sent: Thursday, April 18, 2019 11:25 AM
> >> To: Dingyuan Wang <[hidden email]>; [hidden email]
> >> Subject: Re: [R] lm fails on some large input
> >>
> >> Perhaps subtract 1506705766 from y?
> >>
> >> Saying some other software does it well implies you know what the
> >> _correct_ answer is here but I would question what that means with
> >> this sort of data- set.
> >
> > It's rather an interesting problem, though, because the naïve computation of
> the LS solution works:
> >
> > plot(x, y)
> > X <- cbind(1, x)
> > b <- solve(t(X) %*% X) %*% t(X) %*% y
> > b
> > abline(b)
> >
> > That surprised me, because I expected that lm() computation, using the QR
> decomposition, would be more numerically stable.
> >
> > Best,
> > John
> >
> > -----------------------------------------------------------------
> > John Fox
> > Professor Emeritus
> > McMaster University
> > Hamilton, Ontario, Canada
> > Web: https://socialsciences.mcmaster.ca/jfox/
> >
> >
> >
> >>
> >> On 17/04/2019 07:26, Dingyuan Wang wrote:
> >>> Hi,
> >>>
> >>> This input doesn't have any interesting properties except y is unix
> >>> time. Spreadsheets can do this well.
> >>> Is this a bug that lm can't do x ~ y?
> >>>
> >>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> >>> Copyright (C) 2018 The R Foundation for Statistical Computing
> >>> Platform: x86_64-pc-linux-gnu (64-bit)
> >>>
> >>>> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> >>> 101.632, 108.928, 94.08)  > y = c(1506705739.385, 1506705766.895,
> >>> 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351,
> >>> 1506705756.26, 1506705761.307,
> >>> 1506705747.372)
> >>>> m = lm(x ~ y)
> >>>> summary(m)
> >>>
> >>> Call:
> >>> lm(formula = x ~ y)
> >>>
> >>> Residuals:
> >>>      Min       1Q   Median       3Q      Max
> >>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
> >>>
> >>> Coefficients: (1 not defined because of singularities)
> >>>             Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept)   94.734      6.511   14.55 4.88e-07 *** y
> >>> NA         NA      NA       NA
> >>> ---
> >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>>
> >>> Residual standard error: 19.53 on 8 degrees of freedom
> >>>
> >>>> summary(lm(y ~ x))
> >>>
> >>> Call:
> >>> lm(formula = y ~ x)
> >>>
> >>> Residuals:
> >>>     Min      1Q  Median      3Q     Max
> >>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
> >>>
> >>> Coefficients:
> >>>              Estimate Std. Error   t value Pr(>|t|)
> >>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 *** x
> >>> 6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> >>> ---
> >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>>
> >>> Residual standard error: 1.885 on 7 degrees of freedom Multiple
> >>> R-squared:  0.9788,    Adjusted R-squared:  0.9758
> >>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> ---
> >>> This email has been checked for viruses by AVG.
> >>> https://www.avg.com
> >>>
> >>>
> >>
> >> --
> >> Michael
> >> http://www.dewey.myzen.co.uk/home.html
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html and provide commented, minimal, self-contained,
> >> reproducible code.
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000
> Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Jeff Newmiller
In reply to this post by R help mailing list-2
I make a general rule not to stick time values into numerical analysis algorithms without first subtracting a reasonable epoch (to obtain difftime) and then using as.numeric.POSIXt with the units argument set explicitly so the analysis uses numeric values that I can interpret. While the explicit use of difftime function does something similar, if any other operations are performed on it the units could change again before the inevitable conversion to numeric occurs somewhere down the line so I think taking responsibility for the numeric conversion myself is less likely to leave surprises.

On April 18, 2019 9:32:09 AM PDT, William Dunlap via R-help <[hidden email]> wrote:

>This sort of data arises quite easily if you deal with time/dates
>around
>now.  E.g.,
>
>> d <- data.frame(
>+     when = seq(as.POSIXct("2017-09-29 18:22:01"), by="secs", len=10),
>+     measurement = log2(1:10))
>> coef(lm(data=d, measurement ~ when))
>       (Intercept)               when
>2.1791061114716954                 NA
>> as.numeric(d$when)[1:2]
>[1] 1506734521 1506734522
>
>There are problems with the time units (seconds vs. hours) if you
>subtract
>off a time because the units of -.POSIXt depend on the data:
>
>> coef(lm(data=d, measurement ~ I(when - min(when))))
>        (Intercept) I(when - min(when))
>0.68327571513124297 0.33240675474232279
>> coef(lm(data=d, measurement ~ I(when - as.POSIXct("2017-09-29
>00:00:00"))))
>                            (Intercept) I(when - as.POSIXct("2017-09-29
>00:00:00"))
>                       -21978.3837546251634
>1196.6643170736229
>
>
>Hence you have to use difftime and specify the units
>
>> coef(lm(data=d, measurement ~ difftime(when, as.POSIXct("2017-09-29
>00:00:00"), units="secs")))
>                                                      (Intercept)
>                                          -2.1978383754612696e+04
>difftime(when, as.POSIXct("2017-09-29 00:00:00"), units = "secs")
>                                           3.3240675474248449e-01
>> coef(lm(data=d, measurement ~ difftime(when, min(when),
>units="secs")))
>                          (Intercept) difftime(when, min(when), units =
>"secs")
>                      0.68327571513124297
> 0.33240675474232279
>
>
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>
>On Thu, Apr 18, 2019 at 8:24 AM Michael Dewey <[hidden email]>
>wrote:
>
>> Perhaps subtract 1506705766 from y?
>>
>> Saying some other software does it well implies you know what the
>> _correct_ answer is here but I would question what that means with
>this
>> sort of data-set.
>>
>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>> > Hi,
>> >
>> > This input doesn't have any interesting properties except y is unix
>> > time. Spreadsheets can do this well.
>> > Is this a bug that lm can't do x ~ y?
>> >
>> > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>> > Copyright (C) 2018 The R Foundation for Statistical Computing
>> > Platform: x86_64-pc-linux-gnu (64-bit)
>> >
>> >  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>> > 101.632, 108.928, 94.08)
>> >  > y = c(1506705739.385, 1506705766.895, 1506705746.293,
>1506705761.873,
>> > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307,
>> > 1506705747.372)
>> >  > m = lm(x ~ y)
>> >  > summary(m)
>> >
>> > Call:
>> > lm(formula = x ~ y)
>> >
>> > Residuals:
>> >       Min       1Q   Median       3Q      Max
>> > -27.0222 -14.9902  -0.6542  14.1938  29.1698
>> >
>> > Coefficients: (1 not defined because of singularities)
>> >              Estimate Std. Error t value Pr(>|t|)
>> > (Intercept)   94.734      6.511   14.55 4.88e-07 ***
>> > y                 NA         NA      NA       NA
>> > ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > Residual standard error: 19.53 on 8 degrees of freedom
>> >
>> >  > summary(lm(y ~ x))
>> >
>> > Call:
>> > lm(formula = y ~ x)
>> >
>> > Residuals:
>> >      Min      1Q  Median      3Q     Max
>> > -2.1687 -1.3345 -0.9466  1.3826  2.6551
>> >
>> > Coefficients:
>> >               Estimate Std. Error   t value Pr(>|t|)
>> > (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>> > x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>> > ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > Residual standard error: 1.885 on 7 degrees of freedom
>> > Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
>> > F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ---
>> > This email has been checked for viruses by AVG.
>> > https://www.avg.com
>> >
>> >
>>
>> --
>> Michael
>> http://www.dewey.myzen.co.uk/home.html
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Jeff Newmiller
In reply to this post by Dingyuan Wang
The fact that you think x~y is interchangeable with y~x suggests to me that you will have a difficult time convincing R Core that this is a bug. I recommend that you take at leastan upper division college course in linear regression first.

On April 18, 2019 9:35:55 AM PDT, Dingyuan Wang <[hidden email]> wrote:

>I just want to make a line out of timestamps vs some coordinates, so
>y~x
>or x~y doesn't matter.
>
>Yes, I know the answer. When trying R, I'm surprised that R can't solve
>
>that either. I first noticed that PostgreSQL can't solve it, and found
>that they fixed that in pg 12.
>
>https://www.postgresql.org/message-id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org
>
>Therefore I come to ask whether someone know how to fix this in R, or I
>
>must submit it as a bug?
>
>2019/4/18 23:24, Michael Dewey:
>> Perhaps subtract 1506705766 from y?
>>
>> Saying some other software does it well implies you know what the
>> _correct_ answer is here but I would question what that means with
>this
>> sort of data-set.
>>
>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>>> Hi,
>>>
>>> This input doesn't have any interesting properties except y is unix
>>> time. Spreadsheets can do this well.
>>> Is this a bug that lm can't do x ~ y?
>>>
>>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>>> Copyright (C) 2018 The R Foundation for Statistical Computing
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>>  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>>> 101.632, 108.928, 94.08)
>>>  > y = c(1506705739.385, 1506705766.895, 1506705746.293,
>>> 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26,
>>> 1506705761.307, 1506705747.372)
>>>  > m = lm(x ~ y)
>>>  > summary(m)
>>>
>>> Call:
>>> lm(formula = x ~ y)
>>>
>>> Residuals:
>>>       Min       1Q   Median       3Q      Max
>>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>>>
>>> Coefficients: (1 not defined because of singularities)
>>>              Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)   94.734      6.511   14.55 4.88e-07 ***
>>> y                 NA         NA      NA       NA
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>> Residual standard error: 19.53 on 8 degrees of freedom
>>>
>>>  > summary(lm(y ~ x))
>>>
>>> Call:
>>> lm(formula = y ~ x)
>>>
>>> Residuals:
>>>      Min      1Q  Median      3Q     Max
>>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>>>
>>> Coefficients:
>>>               Estimate Std. Error   t value Pr(>|t|)
>>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>>> x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>> Residual standard error: 1.885 on 7 degrees of freedom
>>> Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
>>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ---
>>> This email has been checked for viruses by AVG.
>>> https://www.avg.com
>>>
>>>
>>
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Fox, John
In reply to this post by Michael Dewey-3
Dear Dingyuan Wang,

But your question was answered clearly earlier in this thread (I forget by whom), showing that lm() provides the solution to the regression of x on y if the criterion for singularity is tightened:

> lm(x ~ y)

Call:
lm(formula = x ~ y)

Coefficients:
(Intercept)            y  
      94.73           NA  

> lm(x ~ y, tol=1e-10)

Call:
lm(formula = x ~ y, tol = 1e-10)

Coefficients:
(Intercept)            y  
 -2.403e+09    1.595e+00  

Best,
 John

> -----Original Message-----
> From: R-help [mailto:[hidden email]] On Behalf Of Dingyuan
> Wang
> Sent: Thursday, April 18, 2019 12:36 PM
> To: Michael Dewey <[hidden email]>; [hidden email]
> Subject: Re: [R] lm fails on some large input
>
> I just want to make a line out of timestamps vs some coordinates, so y~x or
> x~y doesn't matter.
>
> Yes, I know the answer. When trying R, I'm surprised that R can't solve that
> either. I first noticed that PostgreSQL can't solve it, and found that they fixed
> that in pg 12.
>
> https://www.postgresql.org/message-
> id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org
>
> Therefore I come to ask whether someone know how to fix this in R, or I must
> submit it as a bug?
>
> 2019/4/18 23:24, Michael Dewey:
> > Perhaps subtract 1506705766 from y?
> >
> > Saying some other software does it well implies you know what the
> > _correct_ answer is here but I would question what that means with
> > this sort of data-set.
> >
> > On 17/04/2019 07:26, Dingyuan Wang wrote:
> >> Hi,
> >>
> >> This input doesn't have any interesting properties except y is unix
> >> time. Spreadsheets can do this well.
> >> Is this a bug that lm can't do x ~ y?
> >>
> >> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> >> Copyright (C) 2018 The R Foundation for Statistical Computing
> >> Platform: x86_64-pc-linux-gnu (64-bit)
> >>
> >>  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> >> 101.632, 108.928, 94.08)
> >>  > y = c(1506705739.385, 1506705766.895, 1506705746.293,
> >> 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26,
> >> 1506705761.307, 1506705747.372)
> >>  > m = lm(x ~ y)
> >>  > summary(m)
> >>
> >> Call:
> >> lm(formula = x ~ y)
> >>
> >> Residuals:
> >>       Min       1Q   Median       3Q      Max
> >> -27.0222 -14.9902  -0.6542  14.1938  29.1698
> >>
> >> Coefficients: (1 not defined because of singularities)
> >>              Estimate Std. Error t value Pr(>|t|)
> >> (Intercept)   94.734      6.511   14.55 4.88e-07 *** y
> >> NA         NA      NA       NA
> >> ---
> >> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>
> >> Residual standard error: 19.53 on 8 degrees of freedom
> >>
> >>  > summary(lm(y ~ x))
> >>
> >> Call:
> >> lm(formula = y ~ x)
> >>
> >> Residuals:
> >>      Min      1Q  Median      3Q     Max
> >> -2.1687 -1.3345 -0.9466  1.3826  2.6551
> >>
> >> Coefficients:
> >>               Estimate Std. Error   t value Pr(>|t|)
> >> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 *** x
> >> 6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> >> ---
> >> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>
> >> Residual standard error: 1.885 on 7 degrees of freedom Multiple
> >> R-squared:  0.9788,    Adjusted R-squared:  0.9758
> >> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ---
> >> This email has been checked for viruses by AVG.
> >> https://www.avg.com
> >>
> >>
> >
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm fails on some large input

Dingyuan Wang
In reply to this post by Jeff Newmiller
The final goal is to make two lines and find the intersection point.
I don't want to argue more about the reason.

The tol suggestion is reasonable, and I'll take that.

2019/4/19 4:12, Jeff Newmiller:

> The fact that you think x~y is interchangeable with y~x suggests to me that you will have a difficult time convincing R Core that this is a bug. I recommend that you take at leastan upper division college course in linear regression first.
>
> On April 18, 2019 9:35:55 AM PDT, Dingyuan Wang <[hidden email]> wrote:
>> I just want to make a line out of timestamps vs some coordinates, so
>> y~x
>> or x~y doesn't matter.
>>
>> Yes, I know the answer. When trying R, I'm surprised that R can't solve
>>
>> that either. I first noticed that PostgreSQL can't solve it, and found
>> that they fixed that in pg 12.
>>
>> https://www.postgresql.org/message-id/153313051300.1397.9594490737341194671%40wrigleys.postgresql.org
>>
>> Therefore I come to ask whether someone know how to fix this in R, or I
>>
>> must submit it as a bug?
>>
>> 2019/4/18 23:24, Michael Dewey:
>>> Perhaps subtract 1506705766 from y?
>>>
>>> Saying some other software does it well implies you know what the
>>> _correct_ answer is here but I would question what that means with
>> this
>>> sort of data-set.
>>>
>>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>>>> Hi,
>>>>
>>>> This input doesn't have any interesting properties except y is unix
>>>> time. Spreadsheets can do this well.
>>>> Is this a bug that lm can't do x ~ y?
>>>>
>>>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>>>> Copyright (C) 2018 The R Foundation for Statistical Computing
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>
>>>>   > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>>>> 101.632, 108.928, 94.08)
>>>>   > y = c(1506705739.385, 1506705766.895, 1506705746.293,
>>>> 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26,
>>>> 1506705761.307, 1506705747.372)
>>>>   > m = lm(x ~ y)
>>>>   > summary(m)
>>>>
>>>> Call:
>>>> lm(formula = x ~ y)
>>>>
>>>> Residuals:
>>>>        Min       1Q   Median       3Q      Max
>>>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>>>>
>>>> Coefficients: (1 not defined because of singularities)
>>>>               Estimate Std. Error t value Pr(>|t|)
>>>> (Intercept)   94.734      6.511   14.55 4.88e-07 ***
>>>> y                 NA         NA      NA       NA
>>>> ---
>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>
>>>> Residual standard error: 19.53 on 8 degrees of freedom
>>>>
>>>>   > summary(lm(y ~ x))
>>>>
>>>> Call:
>>>> lm(formula = y ~ x)
>>>>
>>>> Residuals:
>>>>       Min      1Q  Median      3Q     Max
>>>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>>>>
>>>> Coefficients:
>>>>                Estimate Std. Error   t value Pr(>|t|)
>>>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>>>> x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>>>> ---
>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>
>>>> Residual standard error: 1.885 on 7 degrees of freedom
>>>> Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
>>>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ---
>>>> This email has been checked for viruses by AVG.
>>>> https://www.avg.com
>>>>
>>>>
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.