scale.default gives an incorrect error message when is.numeric() fails on a sparse row matrix (dgeMatrix)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

scale.default gives an incorrect error message when is.numeric() fails on a sparse row matrix (dgeMatrix)

Michael Chirico
I am attempting to use the lars package with a sparse input feature matrix,
but the following fails:

library(Matrix)
library(lars)
data(diabetes)
attach(diabetes)
x = as(as.matrix(as.data.frame(x)), 'dgCMatrix')
lars(x, y, intercept = FALSE)

Error in scale.default(x, FALSE, normx) :
>
>   length of 'scale' must equal the number of columns of 'x'
>
>
More specifically, scale.default fails:

normx = new(
  "dgeMatrix",
  x = c(1.00000000000004, 1, 1.00000000000009,
        1.00000000000001, 1.00000000000001,
        0.999999999999992, 1.00000000000004,
        0.999999999999975, 1.00000000000006,
        1.00000000000006), Dim = c(1L, 10L),
  Dimnames =
    list(NULL, c("x.age", "x.sex", "x.bmi", "x.map", "x.tc",
                 "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu")),
  factors = list()
)

scale(x, FALSE, normx)

The problem is that this check fails because is.numeric(normx) is FALSE:

if (is.numeric(scale) && length(scale) == nc)

So, the error message is misleading. In fact length(scale) is the same as
nc.

At a minimum, the error message needs to be repaired; do we also want to
attempt as.numeric(normx) (which I believe would have allowed scale to work
in this case)?

(I'm aware that there's some import issues in lars, as the offending line
to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) %*%
(x^2)))) is TRUE -- it's simply that lars doesn't import the appropriate S4
methods)

Michael Chirico

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: scale.default gives an incorrect error message when is.numeric() fails on a dgeMatrix

Martin Maechler
>>>>> Michael Chirico <[hidden email]>
>>>>>     on Tue, 27 Feb 2018 20:18:34 +0800 writes:

Slightly amended 'Subject': (unimportant mistake: a dgeMatrix is *not* sparse)

MM: modified to commented R code,  slightly changed from your post:


## I am attempting to use the lars package with a sparse input feature matrix,
## but the following fails:

library(Matrix)
library(lars)
data(diabetes) # from 'lars'
##UAagghh! not like this -- both attach() *and*   as.data.frame()  are horrific!
##UA  attach(diabetes)
##UA  x = as(as.matrix(as.data.frame(x)), 'dgCMatrix')
x <- as(unclass(diabetes$x), "dgCMatrix")
lars(x, y, intercept = FALSE)
## Error in scale.default(x, FALSE, normx) :
##   length of 'scale' must equal the number of columns of 'x'

## More specifically, scale.default fails as called from lars():
normx <- new("dgeMatrix",
  x = c(4, 0, 9, 1, 1, -1, 4, -2, 6, 6)*1e-14, Dim = c(1L, 10L),
  Dimnames = list(NULL,
                  c("x.age", "x.sex", "x.bmi", "x.map", "x.tc",
                    "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu")))
scale.default(x, center=FALSE, scale = normx)
## Error in scale.default(x, center = FALSE, scale = normx) :
##   length of 'scale' must equal the number of columns of 'x'

>  The problem is that this check fails because is.numeric(normx) is FALSE:

>  if (is.numeric(scale) && length(scale) == nc)

>  So, the error message is misleading. In fact length(scale) is the same as
>  nc.

Correct, twice.

>  At a minimum, the error message needs to be repaired; do we also want to
>  attempt as.numeric(normx) (which I believe would have allowed scale to work
>  in this case)?

It seems sensible to allow  both 'center' and 'scale' to only
have to *obey*  as.numeric(.)  rather than fulfill is.numeric(.).

Though that is not a bug in scale()  as its help page has always
said that 'center' and 'scale' should either be a logical value
or a numeric vector.

For that reason I can really claim a bug in 'lars' which should
really not use

       scale(x, FALSE, normx)

but rather

       scale(x, FALSE, scale = as.numeric(normx))

and then all would work.

> -----------------

>  (I'm aware that there's some import issues in lars, as the offending line
>  to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) %*%
>  (x^2)))) is TRUE -- it's simply that lars doesn't import the appropriate S4
>  methods)

>  Michael Chirico

Yes, 'lars' has _not_ been updated since  Spring 2013, notably
because its authors have been saying (for rather more than 5
years I think) that one should really use

 require("glmnet")

instead.

Your point is still valid that it would be easy to enhance
base :: scale.default()  so it'd work in more cases.

Thank you for that.  I do plan to consider such a change in
R-devel (planned to become R 3.5.0 in April).

Martin Maechler,
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: scale.default gives an incorrect error message when is.numeric() fails on a dgeMatrix

Michael Chirico
thanks. I know the setup code is a mess, just duct-taped something together
from the examples in lars (which are a mess in turn). in fact when I
messaged Prof. Hastie he recommended using glmnet. I wonder why lars is
kept on CRAN if they've no intention of maintaining it... but I digress...

On Mar 2, 2018 1:52 AM, "Martin Maechler" <[hidden email]>
wrote:

> >>>>> Michael Chirico <[hidden email]>
> >>>>>     on Tue, 27 Feb 2018 20:18:34 +0800 writes:
>
> Slightly amended 'Subject': (unimportant mistake: a dgeMatrix is *not*
> sparse)
>
> MM: modified to commented R code,  slightly changed from your post:
>
>
> ## I am attempting to use the lars package with a sparse input feature
> matrix,
> ## but the following fails:
>
> library(Matrix)
> library(lars)
> data(diabetes) # from 'lars'
> ##UAagghh! not like this -- both attach() *and*   as.data.frame()  are
> horrific!
> ##UA  attach(diabetes)
> ##UA  x = as(as.matrix(as.data.frame(x)), 'dgCMatrix')
> x <- as(unclass(diabetes$x), "dgCMatrix")
> lars(x, y, intercept = FALSE)
> ## Error in scale.default(x, FALSE, normx) :
> ##   length of 'scale' must equal the number of columns of 'x'
>
> ## More specifically, scale.default fails as called from lars():
> normx <- new("dgeMatrix",
>   x = c(4, 0, 9, 1, 1, -1, 4, -2, 6, 6)*1e-14, Dim = c(1L, 10L),
>   Dimnames = list(NULL,
>                   c("x.age", "x.sex", "x.bmi", "x.map", "x.tc",
>                     "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu")))
> scale.default(x, center=FALSE, scale = normx)
> ## Error in scale.default(x, center = FALSE, scale = normx) :
> ##   length of 'scale' must equal the number of columns of 'x'
>
> >  The problem is that this check fails because is.numeric(normx) is FALSE:
>
> >  if (is.numeric(scale) && length(scale) == nc)
>
> >  So, the error message is misleading. In fact length(scale) is the same
> as
> >  nc.
>
> Correct, twice.
>
> >  At a minimum, the error message needs to be repaired; do we also want to
> >  attempt as.numeric(normx) (which I believe would have allowed scale to
> work
> >  in this case)?
>
> It seems sensible to allow  both 'center' and 'scale' to only
> have to *obey*  as.numeric(.)  rather than fulfill is.numeric(.).
>
> Though that is not a bug in scale()  as its help page has always
> said that 'center' and 'scale' should either be a logical value
> or a numeric vector.
>
> For that reason I can really claim a bug in 'lars' which should
> really not use
>
>        scale(x, FALSE, normx)
>
> but rather
>
>        scale(x, FALSE, scale = as.numeric(normx))
>
> and then all would work.
>
> > -----------------
>
> >  (I'm aware that there's some import issues in lars, as the offending
> line
> >  to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x))
> %*%
> >  (x^2)))) is TRUE -- it's simply that lars doesn't import the
> appropriate S4
> >  methods)
>
> >  Michael Chirico
>
> Yes, 'lars' has _not_ been updated since  Spring 2013, notably
> because its authors have been saying (for rather more than 5
> years I think) that one should really use
>
>  require("glmnet")
>
> instead.
>
> Your point is still valid that it would be easy to enhance
> base :: scale.default()  so it'd work in more cases.
>
> Thank you for that.  I do plan to consider such a change in
> R-devel (planned to become R 3.5.0 in April).
>
> Martin Maechler,
> ETH Zurich
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel