NIST StRD linear regression

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

NIST StRD linear regression

carnellr
NIST maintains a repository of Statistical Reference Datasets at
http://www.itl.nist.gov/div898/strd/.  I have been working through the
datasets to compare R's results to their references with the hope that
if all works well, this could become a validation package.

All the linear regression datasets give results with some degree of
accuracy except one.  The NIST model includes 11 parameters, but R will
not compute the estimates for all 11 parameters because it finds the
data matrix to be singular.

The code I used is below.  Any help in getting R to estimate all 11
regression parameters would be greatly appreciated.

I am posting this to the R-devel list since I think that the discussion
might involve the limitations of platform precision.

I am using R 2.3.1 for Windows XP.

rm(list=ls())
require(gsubfn)

defaultPath <- "my path"

data.base <- "http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA"

reg.data <- paste(data.base, "/Filip.dat", sep="")

model <-
"V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I
(V2^10)"

filePath <- paste(defaultPath, "//NISTtest.dat", sep="")
download.file(reg.data, filePath, quiet=TRUE)

A <- read.table(filePath, skip=60, strip.white=TRUE)
lm.data <- lm(formula(model), A)

lm.data


Rob Carnell

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: NIST StRD linear regression

Prof Brian Ripley
That is not an appropriate way to fit a degree-10 polynomial (in any
language, if fitting a degree-10 polynomial is in fact an appropriate
statistical analysis, which seems unlikely).

On Sun, 30 Jul 2006, Carnell, Rob C wrote:

> NIST maintains a repository of Statistical Reference Datasets at
> http://www.itl.nist.gov/div898/strd/.  I have been working through the
> datasets to compare R's results to their references with the hope that
> if all works well, this could become a validation package.

What does it validate?  The R user's understanding of numerical methods?

> All the linear regression datasets give results with some degree of
> accuracy except one.  The NIST model includes 11 parameters, but R will
> not compute the estimates for all 11 parameters because it finds the
> data matrix to be singular.
>
> The code I used is below.  Any help in getting R to estimate all 11
> regression parameters would be greatly appreciated.
>
> I am posting this to the R-devel list since I think that the discussion
> might involve the limitations of platform precision.
>
> I am using R 2.3.1 for Windows XP.
>
> rm(list=ls())
> require(gsubfn)

That is not needed.

> defaultPath <- "my path"
>
> data.base <- "http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA"
>
> reg.data <- paste(data.base, "/Filip.dat", sep="")
>
> model <-
> "V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I
> (V2^10)"
>
> filePath <- paste(defaultPath, "//NISTtest.dat", sep="")
> download.file(reg.data, filePath, quiet=TRUE)

filePath <-
url("http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA/Filip.dat")

will suffice.

> A <- read.table(filePath, skip=60, strip.white=TRUE)

> lm.data <- lm(formula(model), A)
>
> lm.data

lm(V1 ~ poly(V2, 10), A)

works.

> kappa(model.matrix(V1 ~ poly(V2, 10, raw=TRUE), A), exact=TRUE)
[1] 1.767963e+15

shows the design matrix is indeed numerically singular by the naive
method.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: NIST StRD linear regression

Martin Maechler
In reply to this post by carnellr
>>>>> "RobCar" == Carnell, Rob C <[hidden email]>
>>>>>     on Sun, 30 Jul 2006 19:42:29 -0400 writes:

    RobCar> NIST maintains a repository of Statistical Reference
    RobCar> Datasets at http://www.itl.nist.gov/div898/strd/.  I
    RobCar> have been working through the datasets to compare
    RobCar> R's results to their references with the hope that
    RobCar> if all works well, this could become a validation
    RobCar> package.

    RobCar> All the linear regression datasets give results with
    RobCar> some degree of accuracy except one.  The NIST model
    RobCar> includes 11 parameters, but R will not compute the
    RobCar> estimates for all 11 parameters because it finds the
    RobCar> data matrix to be singular.

    RobCar> The code I used is below.  Any help in getting R to
    RobCar> estimate all 11 regression parameters would be
    RobCar> greatly appreciated.

    RobCar> I am posting this to the R-devel list since I think
    RobCar> that the discussion might involve the limitations of
    RobCar> platform precision.

    RobCar> I am using R 2.3.1 for Windows XP.

    RobCar> rm(list=ls())
    RobCar> require(gsubfn)

    RobCar> defaultPath <- "my path"

    RobCar> data.base <- "http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA"

Here is a slight improvement {note the function file.path(); and
model <- ..; also  poly(V2, 10) !}
which shows you how to tell lm() to "believe" in 10 digit
precision of input data.  

-------------------------------------------------------------------------------

reg.data <- paste(data.base, "/Filip.dat", sep="")
filePath <- file.path(defaultPath, "NISTtest.dat")
download.file(reg.data, filePath, quiet=TRUE)

A <- read.table(filePath, skip=60, strip.white=TRUE)

## If you really need high-order polynomial regression in S and R,
##  *DO* as you are told in all good books, and use orthogonal polynomials:
(lm.ok <- lm(V1 ~ poly(V2,10), data = A))
## and there is no problem
summary(lm.ok)

## But if you insist on doing nonsense ....

model <- "V1 ~ V2+ I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I(V2^10)"

## MM: "better":
(model <- paste("V1 ~ V2", paste("+ I(V2^", 2:10, ")", sep='', collapse='')))
(form <- formula(model))

mod.mat <- model.matrix(form, data = A)
dim(mod.mat) ## 82 11
(m.qr <- qr(mod.mat             ))$rank # -> 10 (only, instead of 11)
(m.qr <- qr(mod.mat, tol = 1e-10))$rank # -> 11

(lm.def  <- lm(form, data = A)) ## last coef. is NA
(lm.plus <- lm(form, data = A, tol = 1e-10))## no NA coefficients

-------------------------------------------------------------------------------


    RobCar> reg.data <- paste(data.base, "/Filip.dat", sep="")

    RobCar> model <-
    RobCar> "V1~V2+I(V2^2)+I(V2^3)+I(V2^4)+I(V2^5)+I(V2^6)+I(V2^7)+I(V2^8)+I(V2^9)+I
    RobCar> (V2^10)"

    RobCar> filePath <- paste(defaultPath, "//NISTtest.dat", sep="")
    RobCar> download.file(reg.data, filePath, quiet=TRUE)

    RobCar> A <- read.table(filePath, skip=60, strip.white=TRUE)
    RobCar> lm.data <- lm(formula(model), A)

    RobCar> lm.data


    RobCar> Rob Carnell

A propos NIST StRD:
If you go further to  NONlinear regression,
and use nls(), you will see that high quality statistics
packages such as R  do *NOT* always conform to NIST -- at least
not to what NIST did about 5 years ago when I last looked.
There are many nonlinear least squares problems where the
correct result is *NO CONVERGENCE* (because of
over-parametrization, ill-posednes, ...),
owever many (cr.p) pieces of software do "converge"---falsely.
I think you find more on this topic in the monograph of
Bates and Watts (1988), but in any case,
just install and use the CRAN R package 'NISTnls' by Doug Bates
which contains the data sets with documentation and example
calls.

Martin Maechler, ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel