Documentation of model.frame() and get_all_vars()

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Documentation of model.frame() and get_all_vars()

Thomas J. Leeper
Hi everyone,

This is about documentation for the model.frame() page. The
get_all_vars() function (added in R 2.5.0) is a great addition, but
the behavior of its '...' argument is different from that of
model.frame() with which it is documented and this creates ambiguity.
The current docs read:

\item{\dots}{further arguments such as \code{data}, \code{na.action},
\code{subset}. Any additional arguments such as \code{offset} and
\code{weights} which reach the default method are used to create
further columns in the model frame, with parenthesised names such as
\code{"(offset)"}.}

This is only true for model.frame() methods but not get_all_vars().
For get_all_vars(), arguments passed to '...' are only ever treated as
variables to add to the data frame. See for example:

> str(model.frame(mpg ~ wt, data = mtcars, subset = am == 1), give.attr = FALSE)
'data.frame':   13 obs. of  2 variables:
 $ mpg: num  21 21 22.8 32.4 30.4 33.9 27.3 26 30.4 15.8 ...
 $ wt : num  2.62 2.88 2.32 2.2 1.61 ...

> str(get_all_vars(mpg ~ wt, data = mtcars, subset = am == 1), give.attr = FALSE)
'data.frame':   32 obs. of  3 variables:
 $ mpg   : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ wt    : num  2.62 2.88 2.32 3.21 3.44 ...
 $ subset: logi  TRUE TRUE TRUE FALSE FALSE FALSE ...

The behavior of '...' thus differs in the two functions. This is (I
think) a quite problematic ambiguity and one that might best be
resolved by adding data, na.action, and subset as formal arguments to
all current model.frame() methods and the generic to resolve the
ambiguity of '...' (with docs updated accordingly), but that would
require a more thorough patch and testing.

In lieu of that, a simple documentation change could at least describe
the current behavior more accurately:

\item{\dots}{for \code{get_all_vars}, further named columns to include
in the model frame. For \code{model.frame} methods, a mix of further
arguments such as \code{data}, \code{na.action}, \code{subset} to pass
to the default method. Any additional arguments (such as \code{offset}
and \code{weights} or other named arguments) which reach the default
method are used to create further columns in the model frame, with
parenthesised names such as \code{"(offset)"}.}

That at least describes what is currently happening.

Relatedly, it may be worth noting that additional variables passed via
'...' to get_all_vars() are subject to vector recycling whereas those
passed to model.frame.default() are not:

> str(get_all_vars(mpg ~ wt, data = mtcars, new = 2), give.attr = FALSE)
'data.frame':   32 obs. of  3 variables:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ wt : num  2.62 2.88 2.32 3.21 3.44 ...
 $ new: num  2 2 2 2 2 2 2 2 2 2 ...

> str(get_all_vars(mpg ~ wt, data = mtcars, new = rep(2, nrow(mtcars))), give.attr = FALSE)
'data.frame':   32 obs. of  3 variables:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ wt : num  2.62 2.88 2.32 3.21 3.44 ...
 $ new: num  2 2 2 2 2 2 2 2 2 2 ...

> str(model.frame.default(mpg ~ wt, data = mtcars, new = rep(2, nrow(mtcars))), give.attr = FALSE)
'data.frame':   32 obs. of  3 variables:
 $ mpg  : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ wt   : num  2.62 2.88 2.32 3.21 3.44 ...
 $ (new): num  2 2 2 2 2 2 2 2 2 2 ...

> str(model.frame.default(mpg ~ wt, data = mtcars, new = 2), give.attr = FALSE)
Error in model.frame.default(mpg ~ wt, data = mtcars, new = 2) :
  variable lengths differ (found for '(new)')

But, maybe that's something for the "Details" section? (Or it's a bug
- I don't really know.)

Thanks in advance for your consideration.

Best,
-Thomas

Thomas J. Leeper
http://www.thomasleeper.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Documentation of model.frame() and get_all_vars()

Martin Maechler
>>>>> Thomas J Leeper <[hidden email]>
>>>>>     on Sun, 26 Mar 2017 17:48:17 +0100 writes:

    > Hi everyone,
    > This is about documentation for the model.frame() page. The
    > get_all_vars() function (added in R 2.5.0) is a great addition, but
    > the behavior of its '...' argument is different from that of
    > model.frame() with which it is documented and this creates ambiguity.
    > The current docs read:

    > \item{\dots}{further arguments such as \code{data}, \code{na.action},
    > \code{subset}. Any additional arguments such as \code{offset} and
    > \code{weights} which reach the default method are used to create
    > further columns in the model frame, with parenthesised names such as
    > \code{"(offset)"}.}

    > This is only true for model.frame() methods but not get_all_vars().
    > For get_all_vars(), arguments passed to '...' are only ever treated as
    > variables to add to the data frame.

as the documentation of get_all_vars() actually does say.

But you are right:  The description of '...' should mention its
different meaning for get_all_vars().

    > See for example:
    >> str(model.frame(mpg ~ wt, data = mtcars, subset = am == 1), give.attr = FALSE)
    > 'data.frame':   13 obs. of  2 variables:
    > $ mpg: num  21 21 22.8 32.4 30.4 33.9 27.3 26 30.4 15.8 ...
    > $ wt : num  2.62 2.88 2.32 2.2 1.61 ...

    >> str(get_all_vars(mpg ~ wt, data = mtcars, subset = am == 1), give.attr = FALSE)
    > 'data.frame':   32 obs. of  3 variables:
    > $ mpg   : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
    > $ wt    : num  2.62 2.88 2.32 3.21 3.44 ...
    > $ subset: logi  TRUE TRUE TRUE FALSE FALSE FALSE ...

    > The behavior of '...' thus differs in the two functions. This is (I
    > think) a quite problematic ambiguity and one that might best be
    > resolved by adding data, na.action, and subset as formal arguments to
    > all current model.frame() methods and the generic to resolve the
    > ambiguity of '...' (with docs updated accordingly), but that would
    > require a more thorough patch and testing.

Hmm, I agree with the very last part, however I don't understand
why you claim this to be an important ambiguity to be resolved:

get_all_vars() is _not_ called by any of our model.frame()
methods, nor the reverse:  it does not call model.frame().

I did not design or implement it but I think it does make sense
to be documented on the same page as model.frame() because the
functions have very similar semantics.
Adding the extra arguments to the generic and all methods does
not seem desirable to me.

    > In lieu of that, a simple documentation change could at least describe
    > the current behavior more accurately:

    > \item{\dots}{for \code{get_all_vars}, further named columns to include
    > in the model frame. For \code{model.frame} methods, a mix of further
    > arguments such as \code{data}, \code{na.action}, \code{subset} to pass
    > to the default method. Any additional arguments (such as \code{offset}
    > and \code{weights} or other named arguments) which reach the default
    > method are used to create further columns in the model frame, with
    > parenthesised names such as \code{"(offset)"}.}

    > That at least describes what is currently happening.

I agree... and plan to commit this improved description (to
R-devel and R 3.4.0 alpha).


    > Relatedly, it may be worth noting that additional variables passed via
    > '...' to get_all_vars() are subject to vector recycling whereas those
    > passed to model.frame.default() are not:

I'll also mention the vector recycling for get_all_vars().

    >> str(get_all_vars(mpg ~ wt, data = mtcars, new = 2), give.attr = FALSE)
    > 'data.frame':   32 obs. of  3 variables:
    > $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
    > $ wt : num  2.62 2.88 2.32 3.21 3.44 ...
    > $ new: num  2 2 2 2 2 2 2 2 2 2 ...

    >> str(get_all_vars(mpg ~ wt, data = mtcars, new = rep(2, nrow(mtcars))), give.attr = FALSE)
    > 'data.frame':   32 obs. of  3 variables:
    > $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
    > $ wt : num  2.62 2.88 2.32 3.21 3.44 ...
    > $ new: num  2 2 2 2 2 2 2 2 2 2 ...

    >> str(model.frame.default(mpg ~ wt, data = mtcars, new = rep(2, nrow(mtcars))), give.attr = FALSE)
    > 'data.frame':   32 obs. of  3 variables:
    > $ mpg  : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
    > $ wt   : num  2.62 2.88 2.32 3.21 3.44 ...
    > $ (new): num  2 2 2 2 2 2 2 2 2 2 ...

    >> str(model.frame.default(mpg ~ wt, data = mtcars, new = 2), give.attr = FALSE)
    > Error in model.frame.default(mpg ~ wt, data = mtcars, new = 2) :
    > variable lengths differ (found for '(new)')

    > But, maybe that's something for the "Details" section? (Or it's a bug
    > - I don't really know.)

I would not want to change model.frame.default() currently as it's
too important a building block and it may be wise to require
that its callers should have done recycling.

    > Thanks in advance for your consideration.

Thank you Thomas for the suggested help file improvements!
Martin

--
Martin Maechler
ETH Zurich

    > Best,
    > -Thomas

    > Thomas J. Leeper
    > http://www.thomasleeper.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel