Suggest patch for princomp.formula and prcomp.formula

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggest patch for princomp.formula and prcomp.formula

Berwin A Turlach
Dear all,

perhaps I am using princomp.formula and prcomp.formula in a way that
is not documented to work, but then the documentation just says:

        formula: a formula with no response variable.

Thus, to avoid a lot of typing, it would be nice if one could use '.'
and '-' in the formula, e.g.

> library(DAAG)
> res <- prcomp(~ . - case - site - Pop - sex, possum)
Error in prcomp.formula(~. - case - site - Pop - sex, possum) :
        PCA applies only to numerical variables
> res <- princomp(~ . - case - site - Pop - sex, possum)
Error in princomp.formula(~. - case - site - Pop - sex, possum) :
        PCA applies only to numerical variables

Unfortunately, as the examples above show, this is currently not
possible, since both functions test whether any term mentioned in the
formula is non numeric or a factor, instead of just testing those that
enter the analysis.

The attached patch should allow the use of '.' and '-', while still
producing an error when a factor or a non-numeric variable is
specified to enter the analysis:

> library(DAAG)
> res <- prcomp(~ . - case - site - Pop - sex, possum)
> res <- princomp(~ . - case - site - Pop - sex, possum)
> res <- prcomp(~ . - case - site - Pop, possum)
Error in prcomp.formula(~. - case - site - Pop, possum) :
        PCA applies only to numerical variables
> res <- princomp(~ . - case - site - Pop, possum)
Error in princomp.formula(~. - case - site - Pop, possum) :
        PCA applies only to numerical variables

On my machine, `make check FORCE=FORCE' succeeds with this patch and,
as far as I can tell, no modification of the help pages would be
necessary.  

Cheers,

        Berwin


Index: src/library/stats/R/princomp.R
===================================================================
--- src/library/stats/R/princomp.R (revision 37571)
+++ src/library/stats/R/princomp.R (working copy)
@@ -10,13 +10,14 @@
     mf$... <- NULL
     mf[[1]] <- as.name("model.frame")
     mf <- eval.parent(mf)
+    na.act <- attr(mf, "na.action")
+    mt <- attr(mf, "terms") # allow model.frame to update it
+    attr(mt, "intercept") <- 0
+    mterms <- attr(mt, "term.labels")
     ## this is not a `standard' model-fitting function,
     ## so no need to consider contrasts or levels
-    if(any(sapply(mf, function(x) is.factor(x) || !is.numeric(x))))
+    if(any(sapply(mterms, function(x) is.factor(mf[,x]) || !is.numeric(mf[,x]))))
         stop("PCA applies only to numerical variables")
-    na.act <- attr(mf, "na.action")
-    mt <- attr(mf, "terms") # allow model.frame to update it
-    attr(mt, "intercept") <- 0
     x <- model.matrix(mt, mf)
     res <- princomp.default(x, ...)
     ## fix up call to refer to the generic, but leave arg name as `formula'
Index: src/library/stats/R/prcomp.R
===================================================================
--- src/library/stats/R/prcomp.R (revision 37571)
+++ src/library/stats/R/prcomp.R (working copy)
@@ -37,13 +37,14 @@
     mf$... <- NULL
     mf[[1]] <- as.name("model.frame")
     mf <- eval.parent(mf)
+    na.act <- attr(mf, "na.action")
+    mt <- attr(mf, "terms")
+    attr(mt, "intercept") <- 0
+    mterms <- attr(mt, "term.labels")
     ## this is not a `standard' model-fitting function,
     ## so no need to consider contrasts or levels
-    if (any(sapply(mf, function(x) is.factor(x) || !is.numeric(x))))
+    if(any(sapply(mterms, function(x) is.factor(mf[,x]) || !is.numeric(mf[,x]))))
         stop("PCA applies only to numerical variables")
-    na.act <- attr(mf, "na.action")
-    mt <- attr(mf, "terms")
-    attr(mt, "intercept") <- 0
     x <- model.matrix(mt, mf)
     res <- prcomp.default(x, ...)
     ## fix up call to refer to the generic, but leave arg name as `formula'

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Suggest patch for princomp.formula and prcomp.formula

Brian Ripley
I would argue this is a bug in model.frame(), but it seems that it is
S-compatible.  That is, variables excluded by - appear in the model frame
even though they do not really appear in the simplified formula.  (I
suppose the rationale is that - need not always interpreted as deletion.)

The proposed fix regretably will not work, since one can do things like

library(MASS)
prcomp(~ dist + dist:climb, hills)

I'll see if I can dream up a general solution.  (One way forward is to
simplify the formula and call terms again, but simplifcation is rather
clumsy code.)


On Sat, 25 Mar 2006, Berwin A Turlach wrote:

> Dear all,
>
> perhaps I am using princomp.formula and prcomp.formula in a way that
> is not documented to work, but then the documentation just says:
>
>        formula: a formula with no response variable.
>
> Thus, to avoid a lot of typing, it would be nice if one could use '.'
> and '-' in the formula, e.g.
>
>> library(DAAG)
>> res <- prcomp(~ . - case - site - Pop - sex, possum)
> Error in prcomp.formula(~. - case - site - Pop - sex, possum) :
> PCA applies only to numerical variables
>> res <- princomp(~ . - case - site - Pop - sex, possum)
> Error in princomp.formula(~. - case - site - Pop - sex, possum) :
> PCA applies only to numerical variables
>
> Unfortunately, as the examples above show, this is currently not
> possible, since both functions test whether any term mentioned in the
> formula is non numeric or a factor, instead of just testing those that
> enter the analysis.
>
> The attached patch should allow the use of '.' and '-', while still
> producing an error when a factor or a non-numeric variable is
> specified to enter the analysis:
>
>> library(DAAG)
>> res <- prcomp(~ . - case - site - Pop - sex, possum)
>> res <- princomp(~ . - case - site - Pop - sex, possum)
>> res <- prcomp(~ . - case - site - Pop, possum)
> Error in prcomp.formula(~. - case - site - Pop, possum) :
> PCA applies only to numerical variables
>> res <- princomp(~ . - case - site - Pop, possum)
> Error in princomp.formula(~. - case - site - Pop, possum) :
> PCA applies only to numerical variables
>
> On my machine, `make check FORCE=FORCE' succeeds with this patch and,
> as far as I can tell, no modification of the help pages would be
> necessary.
>
> Cheers,
>
>        Berwin
>
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Suggest patch for princomp.formula and prcomp.formula

Berwin A Turlach
G'day Brian,

>>>>> "BDR" == Prof Brian Ripley <[hidden email]> writes:

    BDR> The proposed fix regretably will not work, since one can do
    BDR> things like

    BDR> library(MASS)
    BDR> prcomp(~ dist + dist:climb, hills)
Yes, I had the impression that this would work with the current
version but dismissed it as unintentional. ;-)

I wouldn't expect anyone to specify interaction terms in a principal
components analysis, but perhaps there are valid reasons to do so.

Attached is a slightly modified patch which still allows the above
example but also the changes that I proposed.  On my machine, R-devel
with this patch passes "make check FORCE=FORCE" and produces the
following output:

      > library(DAAG)
      > res <- prcomp(~ . - case - site - Pop - sex, possum)
      > res <- princomp(~ . - case - site - Pop - sex, possum)
      > res <- prcomp(~ . - case - site - Pop, possum)
      Error in prcomp.formula(~. - case - site - Pop, possum) :
      PCA applies only to numerical variables
      > res <- princomp(~ . - case - site - Pop, possum)
      Error in princomp.formula(~. - case - site - Pop, possum) :
      PCA applies only to numerical variables
      > library(MASS)
     
      Attaching package: 'MASS'
     
     
      The following object(s) are masked from package:DAAG :
     
      hills
     
      > prcomp(~ dist + dist:climb, hills)
      Standard deviations:
      [1] 29655.128279     3.101566
     
      Rotation:
                          PC1          PC2
      dist       -0.000154139 -0.999999988
      dist:climb -0.999999988  0.000154139

Cheers,

        Berwin


Index: src/library/stats/R/princomp.R
===================================================================
--- src/library/stats/R/princomp.R (revision 37574)
+++ src/library/stats/R/princomp.R (working copy)
@@ -10,13 +10,15 @@
     mf$... <- NULL
     mf[[1]] <- as.name("model.frame")
     mf <- eval.parent(mf)
+    na.act <- attr(mf, "na.action")
+    mt <- attr(mf, "terms") # allow model.frame to update it
+    attr(mt, "intercept") <- 0
+    mterms <- attr(mt, "factors")
+    mterms <- rownames(mterms)[apply(mterms, 1, any)]
     ## this is not a `standard' model-fitting function,
     ## so no need to consider contrasts or levels
-    if(any(sapply(mf, function(x) is.factor(x) || !is.numeric(x))))
+    if(any(sapply(mterms, function(x) is.factor(mf[,x]) || !is.numeric(mf[,x]))))
         stop("PCA applies only to numerical variables")
-    na.act <- attr(mf, "na.action")
-    mt <- attr(mf, "terms") # allow model.frame to update it
-    attr(mt, "intercept") <- 0
     x <- model.matrix(mt, mf)
     res <- princomp.default(x, ...)
     ## fix up call to refer to the generic, but leave arg name as `formula'
Index: src/library/stats/R/prcomp.R
===================================================================
--- src/library/stats/R/prcomp.R (revision 37574)
+++ src/library/stats/R/prcomp.R (working copy)
@@ -37,13 +37,15 @@
     mf$... <- NULL
     mf[[1]] <- as.name("model.frame")
     mf <- eval.parent(mf)
+    na.act <- attr(mf, "na.action")
+    mt <- attr(mf, "terms")
+    attr(mt, "intercept") <- 0
+    mterms <- attr(mt, "factors")
+    mterms <- rownames(mterms)[apply(mterms, 1, any)]
     ## this is not a `standard' model-fitting function,
     ## so no need to consider contrasts or levels
-    if (any(sapply(mf, function(x) is.factor(x) || !is.numeric(x))))
+    if (any(sapply(mterms, function(x) is.factor(mf[,x]) || !is.numeric(mf[,x]))))
         stop("PCA applies only to numerical variables")
-    na.act <- attr(mf, "na.action")
-    mt <- attr(mf, "terms")
-    attr(mt, "intercept") <- 0
     x <- model.matrix(mt, mf)
     res <- prcomp.default(x, ...)
     ## fix up call to refer to the generic, but leave arg name as `formula'

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel