formula(model.frame(..)) is misleading

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

formula(model.frame(..)) is misleading

R devel mailing list
When formula() is applied to the output of model.frame() it ignores the
formula in the model.frame's 'terms' attribute:

  > d <- data.frame(A=log(1:6), B=LETTERS[rep(1:2,c(2,4))], C=1/(1:6),
D=rep(letters[25:26],c(4,2)), Y=1:6)
  > m0 <- model.frame(data=d, Y ~ A:B)
  > formula(m0)
  Y ~ A + B
  > `attributes<-`(terms(m0), value=NULL)
  Y ~ A:B

This is in part because model.frame()'s output has class "data.frame"
instread of c("model.frame","data.frame"), as SV4 did, so there are no
methods for model.frames.

Is there a reason that model.frame() returns a data.frame with extra
attributes but no special class or is it just an oversight?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: formula(model.frame(..)) is misleading

Martin Maechler
>>>>> William Dunlap via R-devel
>>>>>     on Thu, 20 Dec 2018 15:09:56 -0800 writes:

    > When formula() is applied to the output of model.frame()
    > it ignores the formula in the model.frame's 'terms'
    > attribute:

    >> d <- data.frame(A=log(1:6), B=LETTERS[rep(1:2,c(2,4))],
    >> C=1/(1:6),
    > D=rep(letters[25:26],c(4,2)), Y=1:6)
    >> m0 <- model.frame(data=d, Y ~ A:B) formula(m0)
    >   Y ~ A + B
    >> `attributes<-`(terms(m0), value=NULL)
    >   Y ~ A:B

    > This is in part because model.frame()'s output has class
    > "data.frame" instread of c("model.frame","data.frame"), as
    > SV4 did, so there are no methods for model.frames.

    > Is there a reason that model.frame() returns a data.frame
    > with extra attributes but no special class or is it just
    > an oversight?

May guess is "oversight" || "well let's keep it simple"
Do you (all readers) see situation where it could harm now (with
the 20'000 packages on CRAN+BIoc+...) to do as SV4 (S version 4) has been doing?

I'd be sympathetic to class()ing it.
Martin

    > Bill Dunlap TIBCO Software wdunlap tibco.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: formula(model.frame(..)) is misleading

Fox, John
In reply to this post by R devel mailing list
Dear Martin,

Since no one else has picked up on this, I’ll take a crack at it:

The proposal is to define the S3 class of model-frame objects as c(“model.frame”, “data.frame”) (not the formal class of these objects, even though this feature was coincidentally introduced in S4). That’s unlikely to do harm, since model frames would still “inherit” data.frame methods.

It's possible that some packages rely on current data.frame methods that are eventually superseded by specific model.frame methods or do something peculiar with the class of model frames, so as far as I can see, one can’t know whether problems will arise before trying it.

I hope that helps,
 John

  -------------------------------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Dec 21, 2018, at 2:51 AM, Martin Maechler <[hidden email]> wrote:
>
>>>>>> William Dunlap via R-devel
>>>>>>    on Thu, 20 Dec 2018 15:09:56 -0800 writes:
>
>> When formula() is applied to the output of model.frame()
>> it ignores the formula in the model.frame's 'terms'
>> attribute:
>
>>> d <- data.frame(A=log(1:6), B=LETTERS[rep(1:2,c(2,4))],
>>> C=1/(1:6),
>> D=rep(letters[25:26],c(4,2)), Y=1:6)
>>> m0 <- model.frame(data=d, Y ~ A:B) formula(m0)
>>  Y ~ A + B
>>> `attributes<-`(terms(m0), value=NULL)
>>  Y ~ A:B
>
>> This is in part because model.frame()'s output has class
>> "data.frame" instread of c("model.frame","data.frame"), as
>> SV4 did, so there are no methods for model.frames.
>
>> Is there a reason that model.frame() returns a data.frame
>> with extra attributes but no special class or is it just
>> an oversight?
>
> May guess is "oversight" || "well let's keep it simple"
> Do you (all readers) see situation where it could harm now (with
> the 20'000 packages on CRAN+BIoc+...) to do as SV4 (S version 4) has been doing?
>
> I'd be sympathetic to class()ing it.
> Martin
>
>> Bill Dunlap TIBCO Software wdunlap tibco.com
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: formula(model.frame(..)) is misleading

R devel mailing list
I don't have a copy of SV4 (or SV3, where model.frame was introduced), but
S+ 8.3 (based on SV4) puts the class "model.frame" on model.frame()'s
return value but has no methods (in the default packages) for class
"model.frame".  Perhaps that is why R omitted the class.

However, S+ 8.3's (and problably S's) formula.data.frame did look for a
"terms" attribute of a data.frame before making up an additive formula
based on the column names of a data.frame:

Splus-8.3> formula.data.frame
function(object)
{
        if(length(tms <- attr(object, "terms")))
                return(formula(tms))
        n <- names(object)
        f <- paste(n[-1.], collapse = "+")
        f <- parse(text = paste(n[1.], f, sep = "~"))[[1.]]
        oldClass(f) <- "formula"
        f
}



Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Dec 21, 2018 at 8:16 AM Fox, John <[hidden email]> wrote:

> Dear Martin,
>
> Since no one else has picked up on this, I’ll take a crack at it:
>
> The proposal is to define the S3 class of model-frame objects as
> c(“model.frame”, “data.frame”) (not the formal class of these objects, even
> though this feature was coincidentally introduced in S4). That’s unlikely
> to do harm, since model frames would still “inherit” data.frame methods.
>
> It's possible that some packages rely on current data.frame methods that
> are eventually superseded by specific model.frame methods or do something
> peculiar with the class of model frames, so as far as I can see, one can’t
> know whether problems will arise before trying it.
>
> I hope that helps,
>  John
>
>   -------------------------------------------------
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
>
> > On Dec 21, 2018, at 2:51 AM, Martin Maechler <[hidden email]>
> wrote:
> >
> >>>>>> William Dunlap via R-devel
> >>>>>>    on Thu, 20 Dec 2018 15:09:56 -0800 writes:
> >
> >> When formula() is applied to the output of model.frame()
> >> it ignores the formula in the model.frame's 'terms'
> >> attribute:
> >
> >>> d <- data.frame(A=log(1:6), B=LETTERS[rep(1:2,c(2,4))],
> >>> C=1/(1:6),
> >> D=rep(letters[25:26],c(4,2)), Y=1:6)
> >>> m0 <- model.frame(data=d, Y ~ A:B) formula(m0)
> >>  Y ~ A + B
> >>> `attributes<-`(terms(m0), value=NULL)
> >>  Y ~ A:B
> >
> >> This is in part because model.frame()'s output has class
> >> "data.frame" instread of c("model.frame","data.frame"), as
> >> SV4 did, so there are no methods for model.frames.
> >
> >> Is there a reason that model.frame() returns a data.frame
> >> with extra attributes but no special class or is it just
> >> an oversight?
> >
> > May guess is "oversight" || "well let's keep it simple"
> > Do you (all readers) see situation where it could harm now (with
> > the 20'000 packages on CRAN+BIoc+...) to do as SV4 (S version 4) has
> been doing?
> >
> > I'd be sympathetic to class()ing it.
> > Martin
> >
> >> Bill Dunlap TIBCO Software wdunlap tibco.com
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: formula(model.frame(..)) is misleading

Martin Maechler
In reply to this post by Fox, John
>>>>> Fox, John
>>>>>     on Fri, 21 Dec 2018 16:16:40 +0000 writes:

    > Dear Martin,

    > Since no one else has picked up on this, I’ll take a crack
    > at it:

Thank you, John

    > The proposal is to define the S3 class of model-frame
    > objects as c(“model.frame”, “data.frame”) (not the formal
    > class of these objects, even though this feature was
    > coincidentally introduced in S4). That’s unlikely to do
    > harm, since model frames would still “inherit” data.frame methods.

Well, sure, "in theory".
My fear is different -- and I think for good reasons :

IIRC, I've seen slides of a talk by a well respected R
community member where they advertized  -- IIRC, even as example
of good R programming --  to use

  switch(class(obj)[1],
         "foo" = { ....  some things .... },
         "bar" = { ....  other things .... },
         .....
         stop("invalid class ", class(obj)[1])

        )

and I have seen many package authors use analogously "broken" R
code

  if(class(obj) == "foo") { # deal with "foo" ...
    ....
  } else if(class(obj) == "bar") {
    ...
  } else .....

all of which will fail if users of that code (including other
package writers) decide to extend that S3 class  using a
length(class(.)) >= 2  ....

Now, with Bill Dunlap's findings about S-plus 8.3, namely that
it does not contain a *single*  model.frame method,
I'd rather tend to only  fix formula.data.frame()

Martin



    > It's possible that some packages rely on current
    > data.frame methods that are eventually superseded by
    > specific model.frame methods or do something peculiar with
    > the class of model frames, so as far as I can see, one
    > can’t know whether problems will arise before trying it.

    > I hope that helps, John

    >   -------------------------------------------------
    >   John Fox, Professor Emeritus McMaster University
    > Hamilton, Ontario, Canada Web:
    > http::/socserv.mcmaster.ca/jfox

    >> On Dec 21, 2018, at 2:51 AM, Martin Maechler
    >> <[hidden email]> wrote:
    >>
    >>>>>>> William Dunlap via R-devel on Thu, 20 Dec 2018
    >>>>>>> 15:09:56 -0800 writes:
    >>
    >>> When formula() is applied to the output of model.frame()
    >>> it ignores the formula in the model.frame's 'terms'
    >>> attribute:
    >>
    >>>> d <- data.frame(A=log(1:6), B=LETTERS[rep(1:2,c(2,4))],
    >>>> C=1/(1:6),
    >>> D=rep(letters[25:26],c(4,2)), Y=1:6)
    >>>> m0 <- model.frame(data=d, Y ~ A:B) formula(m0)
    >>> Y ~ A + B
    >>>> `attributes<-`(terms(m0), value=NULL)
    >>> Y ~ A:B
    >>
    >>> This is in part because model.frame()'s output has class
    >>> "data.frame" instread of c("model.frame","data.frame"),
    >>> as SV4 did, so there are no methods for model.frames.
    >>
    >>> Is there a reason that model.frame() returns a
    >>> data.frame with extra attributes but no special class or
    >>> is it just an oversight?
    >>
    >> May guess is "oversight" || "well let's keep it simple"
    >> Do you (all readers) see situation where it could harm
    >> now (with the 20'000 packages on CRAN+BIoc+...) to do as
    >> SV4 (S version 4) has been doing?
    >>
    >> I'd be sympathetic to class()ing it.  Martin
    >>
    >>> Bill Dunlap TIBCO Software wdunlap tibco.com
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: formula(model.frame(..)) is misleading

Martin Maechler
In reply to this post by R devel mailing list
>>>>> William Dunlap via R-devel
>>>>>     on Fri, 21 Dec 2018 13:34:16 -0800 writes:

    > I don't have a copy of SV4 (or SV3, where model.frame was
    > introduced), but S+ 8.3 (based on SV4) puts the class
    > "model.frame" on model.frame()'s return value but has no
    > methods (in the default packages) for class "model.frame".
    > Perhaps that is why R omitted the class.

aahh.. that's very relevant,  thank you very much, Bill, for digging!

    > However, S+ 8.3's (and problably S's) formula.data.frame
    > did look for a "terms" attribute of a data.frame before
    > making up an additive formula based on the column names of
    > a data.frame:

    > Splus-8.3> formula.data.frame
    > function(object)
    > {
    >         if(length(tms <- attr(object, "terms")))
    >                 return(formula(tms))
    >         n <- names(object)
    >         f <- paste(n[-1.], collapse = "+")
    >         f <- parse(text = paste(n[1.], f, sep = "~"))[[1.]]
    >         oldClass(f) <- "formula"
    >         f
    > }

There's quite a bit of code looking for  attr(*, "terms")
anyway in our code base, so indeed, this would be internally
consistent with the existing code base and hence probably the
best way to solve the original problem.

I'll look into committing this to R-devel soonish.

Martin


    > Bill Dunlap
    > TIBCO Software
    > wdunlap tibco.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel