Raffa,

I ran this on a MacOS machine and got what you expected. I added a call to sessionInfo() for your information.

> rm(list=ls())

> N = 30000

> xvar <- runif(N, -10, 10)

> e <- rnorm(N, mean=0, sd=1)

> yvar <- 1 + 2*xvar + e

> plot(xvar,yvar)

> lmMod <- lm(yvar~xvar)

> print(summary(lmMod))

Call:

lm(formula = yvar ~ xvar)

Residuals:

Min 1Q Median 3Q Max

-4.2407 -0.6738 -0.0031 0.6822 4.0619

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.0059022 0.0057370 175.3 <2e-16 ***

xvar 2.0005811 0.0009918 2017.2 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9937 on 29998 degrees of freedom

Multiple R-squared: 0.9927, Adjusted R-squared: 0.9927

F-statistic: 4.069e+06 on 1 and 29998 DF, p-value: < 2.2e-16

> domain <- seq(min(xvar), max(xvar)) # define a vector of x values to feed into model

> lines(domain, predict(lmMod, newdata = data.frame(xvar=domain))) # add regression line, using `predict` to generate y-values

> sessionInfo()

R version 3.6.0 (2019-04-26)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: macOS Mojave 10.14.4

Matrix products: default

BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):

[1] compiler_3.6.0

R. Mark Sharp, Ph.D.

Data Scientist and Biomedical Statistical Consultant

7526 Meadow Green St.

San Antonio, TX 78251

mobile: 210-218-2868

[hidden email]
> On May 25, 2019, at 7:38 AM, Raffa <

[hidden email]> wrote:

>

> I have the following code:

>

> ```

>

> rm(list=ls())

> N = 30000

> xvar <- runif(N, -10, 10)

> e <- rnorm(N, mean=0, sd=1)

> yvar <- 1 + 2*xvar + e

> plot(xvar,yvar)

> lmMod <- lm(yvar~xvar)

> print(summary(lmMod))

> domain <- seq(min(xvar), max(xvar)) # define a vector of x values to

> feed into model

> lines(domain, predict(lmMod, newdata = data.frame(xvar=domain))) #

> add regression line, using `predict` to generate y-values

>

> ```

>

> I expected the coefficients to be something similar to [1,2]. Instead R

> keeps throwing at me random numbers that are not statistically

> significant and don't fit the model, and I have 20k observations. For

> example

>

> ```

>

> Call:

> lm(formula = yvar ~ xvar)

>

> Residuals:

> Min 1Q Median 3Q Max

> -21.384 -8.908 1.016 10.972 23.663

>

> Coefficients:

> Estimate Std. Error t value Pr(>|t|)

> (Intercept) 0.0007145 0.0670316 0.011 0.991

> xvar 0.0168271 0.0116420 1.445 0.148

>

> Residual standard error: 11.61 on 29998 degrees of freedom

> Multiple R-squared: 7.038e-05, Adjusted R-squared: 3.705e-05

> F-statistic: 2.112 on 1 and 29998 DF, p-value: 0.1462

>

> ```

>

>

> The strange thing is that the code works perfectly for N=200 or N=2000.

> It's only for larger N that this thing happen U(for example, N=20000). I

> have tried to ask for example in CrossValidated

> <

https://stats.stackexchange.com/questions/410050/increasing-number-of-observations-worsen-the-regression-model>

> but the code works for them. Any help?

>

> I am runnign R 3.6.0 on Kubuntu 19.04

>

> Best regards

>

> Raffaele

>

>

> [[alternative HTML version deleted]]

>

> ______________________________________________

>

[hidden email] mailing list -- To UNSUBSCRIBE and more, see

>

https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

______________________________________________

[hidden email] mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.