delete.response leaves response in attribute dataClasses

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

delete.response leaves response in attribute dataClasses

PaulJohnson32gmail
I posted this one as an R bug
(https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14767), but
Prof. Ripley says I'm premature, and I should raise the question here.

Here's the behavior I assert is a bug:
The output from delete.response on a terms object alters the formula
by removing the dependent variable. It removes the response from the
"variables" attribute and it changes the response attribute from 1 to
0.  The response is removed from "predvars"

But it leaves the name of the dependent variable first in the in
"dataClasses".  It caused an unexpected behavior in my code, so (as
usual) the bug may be mine, but in my heart, I believe it belongs to
delete.response.

To illustrate, here's a terms object from a regression.

> tt
y ~ x1 * x2 + x3 + x4
attr(,"variables")
list(y, x1, x2, x3, x4)
attr(,"factors")
   x1 x2 x3 x4 x1:x2
y   0  0  0  0     0
x1  1  0  0  0     1
x2  0  1  0  0     1
x3  0  0  1  0     0
x4  0  0  0  1     0
attr(,"term.labels")
[1] "x1"    "x2"    "x3"    "x4"    "x1:x2"
attr(,"order")
[1] 1 1 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(y, x1, x2, x3, x4)
attr(,"dataClasses")
        y        x1        x2        x3        x4
"numeric" "numeric" "numeric" "numeric" "numeric"

Now observe that delete.response removes the response from all
attributes except dataClasses.

> delete.response(tt)
~x1 * x2 + x3 + x4
attr(,"variables")
list(x1, x2, x3, x4)
attr(,"factors")
   x1 x2 x3 x4 x1:x2
x1  1  0  0  0     1
x2  0  1  0  0     1
x3  0  0  1  0     0
x4  0  0  0  1     0
attr(,"term.labels")
[1] "x1"    "x2"    "x3"    "x4"    "x1:x2"
attr(,"order")
[1] 1 1 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 0
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(x1, x2, x3, x4)
attr(,"dataClasses")
        y        x1        x2        x3        x4
"numeric" "numeric" "numeric" "numeric" "numeric"


pj

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: delete.response leaves response in attribute dataClasses

William Dunlap
I had noticed the same thing but figured that most
people (writers of predict methods) would be looking
up entries in dataClasses by name and not by position,
since predict's newdata argument need not have entries
in the same order as the data used to fit the model.
Hence the extra entry would not noticed (nor would it be
missed if it were omitted).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Paul Johnson
> Sent: Thursday, January 05, 2012 12:27 PM
> To: R Devel List
> Subject: [Rd] delete.response leaves response in attribute dataClasses
>
> I posted this one as an R bug
> (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14767), but
> Prof. Ripley says I'm premature, and I should raise the question here.
>
> Here's the behavior I assert is a bug:
> The output from delete.response on a terms object alters the formula
> by removing the dependent variable. It removes the response from the
> "variables" attribute and it changes the response attribute from 1 to
> 0.  The response is removed from "predvars"
>
> But it leaves the name of the dependent variable first in the in
> "dataClasses".  It caused an unexpected behavior in my code, so (as
> usual) the bug may be mine, but in my heart, I believe it belongs to
> delete.response.
>
> To illustrate, here's a terms object from a regression.
>
> > tt
> y ~ x1 * x2 + x3 + x4
> attr(,"variables")
> list(y, x1, x2, x3, x4)
> attr(,"factors")
>    x1 x2 x3 x4 x1:x2
> y   0  0  0  0     0
> x1  1  0  0  0     1
> x2  0  1  0  0     1
> x3  0  0  1  0     0
> x4  0  0  0  1     0
> attr(,"term.labels")
> [1] "x1"    "x2"    "x3"    "x4"    "x1:x2"
> attr(,"order")
> [1] 1 1 1 1 2
> attr(,"intercept")
> [1] 1
> attr(,"response")
> [1] 1
> attr(,".Environment")
> <environment: R_GlobalEnv>
> attr(,"predvars")
> list(y, x1, x2, x3, x4)
> attr(,"dataClasses")
>         y        x1        x2        x3        x4
> "numeric" "numeric" "numeric" "numeric" "numeric"
>
> Now observe that delete.response removes the response from all
> attributes except dataClasses.
>
> > delete.response(tt)
> ~x1 * x2 + x3 + x4
> attr(,"variables")
> list(x1, x2, x3, x4)
> attr(,"factors")
>    x1 x2 x3 x4 x1:x2
> x1  1  0  0  0     1
> x2  0  1  0  0     1
> x3  0  0  1  0     0
> x4  0  0  0  1     0
> attr(,"term.labels")
> [1] "x1"    "x2"    "x3"    "x4"    "x1:x2"
> attr(,"order")
> [1] 1 1 1 1 2
> attr(,"intercept")
> [1] 1
> attr(,"response")
> [1] 0
> attr(,".Environment")
> <environment: R_GlobalEnv>
> attr(,"predvars")
> list(x1, x2, x3, x4)
> attr(,"dataClasses")
>         y        x1        x2        x3        x4
> "numeric" "numeric" "numeric" "numeric" "numeric"
>
>
> pj
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: delete.response leaves response in attribute dataClasses

William Dunlap
My feeling that everyone would index dataClasses by name was
wrong.  I looked through the packages that used dataClasses
and saw code that would break if the first (response) entry
were omitted.  (I didn't check to see if passing the output
of delete.response to these functions would be appropriate.)
E.g.,
file: AICcmodavg/R/predictSE.mer.r
  ##matrix with info on factors
  fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]

  ##continue if factors
  if(any(fact.frame == "factor")) {
    id.factors <- which(fact.frame == "factor")
    fact.name <- names(fact.frame)[id.factors] #identify the rows for factors

Some packages create a dataClass attribute for a model.frame
(not its terms attribute) that does not have any names:
file: caper/R/macrocaic.R
   attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2])
.checkMFClasses() does not throw an error for that, but it
doesn't do any real checking either.

Most users of dataClasses do pass it to .checkMFClasses() to
compare it with newdata and that doesn't care if you have extra
entries in dataClasses.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of William Dunlap
> Sent: Thursday, January 05, 2012 12:57 PM
> To: Paul Johnson; R Devel List
> Subject: Re: [Rd] delete.response leaves response in attribute dataClasses
>
> I had noticed the same thing but figured that most
> people (writers of predict methods) would be looking
> up entries in dataClasses by name and not by position,
> since predict's newdata argument need not have entries
> in the same order as the data used to fit the model.
> Hence the extra entry would not noticed (nor would it be
> missed if it were omitted).
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]] On Behalf Of Paul Johnson
> > Sent: Thursday, January 05, 2012 12:27 PM
> > To: R Devel List
> > Subject: [Rd] delete.response leaves response in attribute dataClasses
> >
> > I posted this one as an R bug
> > (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14767), but
> > Prof. Ripley says I'm premature, and I should raise the question here.
> >
> > Here's the behavior I assert is a bug:
> > The output from delete.response on a terms object alters the formula
> > by removing the dependent variable. It removes the response from the
> > "variables" attribute and it changes the response attribute from 1 to
> > 0.  The response is removed from "predvars"
> >
> > But it leaves the name of the dependent variable first in the in
> > "dataClasses".  It caused an unexpected behavior in my code, so (as
> > usual) the bug may be mine, but in my heart, I believe it belongs to
> > delete.response.
> >
> > To illustrate, here's a terms object from a regression.
> >
> > > tt
> > y ~ x1 * x2 + x3 + x4
> > attr(,"variables")
> > list(y, x1, x2, x3, x4)
> > attr(,"factors")
> >    x1 x2 x3 x4 x1:x2
> > y   0  0  0  0     0
> > x1  1  0  0  0     1
> > x2  0  1  0  0     1
> > x3  0  0  1  0     0
> > x4  0  0  0  1     0
> > attr(,"term.labels")
> > [1] "x1"    "x2"    "x3"    "x4"    "x1:x2"
> > attr(,"order")
> > [1] 1 1 1 1 2
> > attr(,"intercept")
> > [1] 1
> > attr(,"response")
> > [1] 1
> > attr(,".Environment")
> > <environment: R_GlobalEnv>
> > attr(,"predvars")
> > list(y, x1, x2, x3, x4)
> > attr(,"dataClasses")
> >         y        x1        x2        x3        x4
> > "numeric" "numeric" "numeric" "numeric" "numeric"
> >
> > Now observe that delete.response removes the response from all
> > attributes except dataClasses.
> >
> > > delete.response(tt)
> > ~x1 * x2 + x3 + x4
> > attr(,"variables")
> > list(x1, x2, x3, x4)
> > attr(,"factors")
> >    x1 x2 x3 x4 x1:x2
> > x1  1  0  0  0     1
> > x2  0  1  0  0     1
> > x3  0  0  1  0     0
> > x4  0  0  0  1     0
> > attr(,"term.labels")
> > [1] "x1"    "x2"    "x3"    "x4"    "x1:x2"
> > attr(,"order")
> > [1] 1 1 1 1 2
> > attr(,"intercept")
> > [1] 1
> > attr(,"response")
> > [1] 0
> > attr(,".Environment")
> > <environment: R_GlobalEnv>
> > attr(,"predvars")
> > list(x1, x2, x3, x4)
> > attr(,"dataClasses")
> >         y        x1        x2        x3        x4
> > "numeric" "numeric" "numeric" "numeric" "numeric"
> >
> >
> > pj
> >
> > --
> > Paul E. Johnson
> > Professor, Political Science
> > 1541 Lilac Lane, Room 504
> > University of Kansas
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: delete.response leaves response in attribute dataClasses

PaulJohnson32gmail
Thanks, Bill

Counter-arguments at the end

On Thu, Jan 5, 2012 at 3:15 PM, William Dunlap <[hidden email]> wrote:

> My feeling that everyone would index dataClasses by name was
> wrong.  I looked through the packages that used dataClasses
> and saw code that would break if the first (response) entry
> were omitted.  (I didn't check to see if passing the output
> of delete.response to these functions would be appropriate.)
> E.g.,
> file: AICcmodavg/R/predictSE.mer.r
>  ##matrix with info on factors
>  fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]
>
>  ##continue if factors
>  if(any(fact.frame == "factor")) {
>    id.factors <- which(fact.frame == "factor")
>    fact.name <- names(fact.frame)[id.factors] #identify the rows for factors
>
> Some packages create a dataClass attribute for a model.frame
> (not its terms attribute) that does not have any names:
> file: caper/R/macrocaic.R
>   attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2])
> .checkMFClasses() does not throw an error for that, but it
> doesn't do any real checking either.
>
> Most users of dataClasses do pass it to .checkMFClasses() to
> compare it with newdata and that doesn't care if you have extra
> entries in dataClasses.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>

I can't understand what your point is.  I agree we can work around the
problem, but why should we have to?

If you confine yourself to the output of "delete.response" applied to
a terms object from a regression, can you point to any package or
usage that depends on leaving the response variable in the dataClasses
attribute?  I can't find one.  In R base, these are all the references
to delete.response:

stats/R/models.R:delete.response <- function (termobj)
stats/R/lm.R:        Terms <- delete.response(tt)
stats/R/lm.R:        Terms <- delete.response(tt)
stats/R/ppr.R:        Terms <- delete.response(object$terms)
stats/R/loess.R:
as.matrix(model.frame(delete.response(terms(object)), newdata,
stats/R/dummy.coef.R:    Terms <- delete.response(Terms)

I've looked it over carefully and predict.lm (in lm.R) would not be
affected by the change I propose. I can't find any usage in loess.R of
the dataClasses attribute.

Furthermore, I can't see how a person would use the dataClasses
attribute at all, after the other markers of the response are
eliminated. How is a method to find which variable is the response,
after response=0?

I'm not disagreeing with you that I can workaround the peculiarity
that the response is left in the dataClasses attribute of the output
object from delete.response.  I'm just saying it is a complication
that programmers should not have to put up with, because I think
delete.response should delete the response from all attributes of a
terms object.

pj


--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: delete.response leaves response in attribute dataClasses

William Dunlap
> -----Original Message-----
> From: Paul Johnson [mailto:[hidden email]]
> Sent: Friday, January 06, 2012 11:17 AM
> To: William Dunlap
> Cc: R Devel List
> Subject: Re: [Rd] delete.response leaves response in attribute dataClasses
>
> Thanks, Bill
>
> Counter-arguments at the end
>
> On Thu, Jan 5, 2012 at 3:15 PM, William Dunlap <[hidden email]> wrote:
> > My feeling that everyone would index dataClasses by name was
> > wrong.  I looked through the packages that used dataClasses
> > and saw code that would break if the first (response) entry
> > were omitted.  (I didn't check to see if passing the output
> > of delete.response to these functions would be appropriate.)
> > E.g.,
> > file: AICcmodavg/R/predictSE.mer.r
> >  ##matrix with info on factors
> >  fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]
> >
> >  ##continue if factors
> >  if(any(fact.frame == "factor")) {
> >    id.factors <- which(fact.frame == "factor")
> >    fact.name <- names(fact.frame)[id.factors] #identify the rows for factors
> >
> > Some packages create a dataClass attribute for a model.frame
> > (not its terms attribute) that does not have any names:
> > file: caper/R/macrocaic.R
> >   attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2])
> > .checkMFClasses() does not throw an error for that, but it
> > doesn't do any real checking either.
> >
> > Most users of dataClasses do pass it to .checkMFClasses() to
> > compare it with newdata and that doesn't care if you have extra
> > entries in dataClasses.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
>
> I can't understand what your point is.  I agree we can work around the
> problem, but why should we have to?

I guess my point was that it would make sense for delete.response
to drop the response element from dataClasses, as it has no use.
It was almost certainly an oversight that it wasn't dropped, as most
terms objects don't have the dataClasses attribute.

Properly written code, which only subscripted dataClasses by name
(not by number) would not be affected by the change but improperly
written code (e.g., AICcmodavg's predictSE, which assumes the response
is in position 1) would be adversely affected in the unlikely case that
someone passed it the output of delete.response.

I don't know how much you want to cater to "errors" by package writers.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



>
> If you confine yourself to the output of "delete.response" applied to
> a terms object from a regression, can you point to any package or
> usage that depends on leaving the response variable in the dataClasses
> attribute?  I can't find one.  In R base, these are all the references
> to delete.response:
>
> stats/R/models.R:delete.response <- function (termobj)
> stats/R/lm.R:        Terms <- delete.response(tt)
> stats/R/lm.R:        Terms <- delete.response(tt)
> stats/R/ppr.R:        Terms <- delete.response(object$terms)
> stats/R/loess.R:
> as.matrix(model.frame(delete.response(terms(object)), newdata,
> stats/R/dummy.coef.R:    Terms <- delete.response(Terms)
>
> I've looked it over carefully and predict.lm (in lm.R) would not be
> affected by the change I propose. I can't find any usage in loess.R of
> the dataClasses attribute.
>
> Furthermore, I can't see how a person would use the dataClasses
> attribute at all, after the other markers of the response are
> eliminated. How is a method to find which variable is the response,
> after response=0?
>
> I'm not disagreeing with you that I can workaround the peculiarity
> that the response is left in the dataClasses attribute of the output
> object from delete.response.  I'm just saying it is a complication
> that programmers should not have to put up with, because I think
> delete.response should delete the response from all attributes of a
> terms object.
>
> pj
>
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel