Inconsistency in as.data.frame.table for stringsAsFactors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistency in as.data.frame.table for stringsAsFactors

macrakis
I noticed that in as.data.frame.table, the stringsAsFactors argument
defaults to TRUE, whereas in the other as.data.frame methods, it defaults to
default.stringsAsFactors().

The documentation and implementation agree on this, so this is not a bug.

However, I was wondering if this disparity was intended or if it might be
some sort of unintentional oversight.  If it is intentional, I wonder what
the rationale is.

Thanks,

            -s

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Rd] Inconsistency in as.data.frame.table for stringsAsFactors

Martin Maechler
>>>>> "SM" == Stavros Macrakis <[hidden email]>
>>>>>     on Thu, 21 Jan 2010 20:19:28 -0500 writes:

    SM> I noticed that in as.data.frame.table, the stringsAsFactors argument
    SM> defaults to TRUE, whereas in the other as.data.frame methods, it defaults to
    SM> default.stringsAsFactors().

    SM> The documentation and implementation agree on this, so this is not a bug.

    SM> However, I was wondering if this disparity was intended or if it might be
    SM> some sort of unintentional oversight.  If it is intentional, I wonder what
    SM> the rationale is.

Some of us (including me) have strongly argued on several
occasions that  global options() settings should *not* have an effect
on anything "computing" but just on "output"
i.e. printing/graphing of R results.
As it is currently, potentially R scripts and R functions may
only work correctly for one setting of  
     options( stringsAsFactors = * )
which is against all principles of functional programming.

From this (my) point of view, we should strive to eventually deprecate
default.stringsAsFactors() which basically returns getOption("stringsAsFactors"),
or as first/2nd step redefine it as

 default.stringsAsFactors <- function() TRUE

Martin Mächler.

    SM> Thanks,
    SM> -s

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Rd] Inconsistency in as.data.frame.table for stringsAsFactors

hadley wickham
On Fri, Jan 22, 2010 at 2:17 AM, Martin Maechler
<[hidden email]> wrote:

>>>>>> "SM" == Stavros Macrakis <[hidden email]>
>>>>>>     on Thu, 21 Jan 2010 20:19:28 -0500 writes:
>
>    SM> I noticed that in as.data.frame.table, the stringsAsFactors argument
>    SM> defaults to TRUE, whereas in the other as.data.frame methods, it defaults to
>    SM> default.stringsAsFactors().
>
>    SM> The documentation and implementation agree on this, so this is not a bug.
>
>    SM> However, I was wondering if this disparity was intended or if it might be
>    SM> some sort of unintentional oversight.  If it is intentional, I wonder what
>    SM> the rationale is.
>
> Some of us (including me) have strongly argued on several
> occasions that  global options() settings should *not* have an effect
> on anything "computing" but just on "output"
> i.e. printing/graphing of R results.
> As it is currently, potentially R scripts and R functions may
> only work correctly for one setting of
>     options( stringsAsFactors = * )
> which is against all principles of functional programming.

A similar argument would also seem to apply to defaultPackages,
deparse.max.lines, download.file.method, encoding, expressions, warn
and na.action.  There are plenty of functions in R that violate other
principles of functional programming, so, by itself, this argument
seems a little weak to me.  There are obviously differences of opinion
about this issue in R core, and it's unfortunate that the user has to
be exposed to them through inconsistent function definitions.

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Rd] Inconsistency in as.data.frame.table for stringsAsFactors

hadley wickham
>> Some of us (including me) have strongly argued on several
>> occasions that  global options() settings should *not* have an effect
>> on anything "computing" but just on "output"
>> i.e. printing/graphing of R results.
>> As it is currently, potentially R scripts and R functions may
>> only work correctly for one setting of
>>     options( stringsAsFactors = * )
>> which is against all principles of functional programming.
>
> A similar argument would also seem to apply to defaultPackages,
> deparse.max.lines, download.file.method, encoding, expressions, warn
> and na.action.  There are plenty of functions in R that violate other
> principles of functional programming, so, by itself, this argument
> seems a little weak to me.  There are obviously differences of opinion
> about this issue in R core, and it's unfortunate that the user has to
> be exposed to them through inconsistent function definitions.

Which isn't to say I don't think that you're right - I would hate for
R to head in the direction of PHP where every script has to check
three or four different global variables in order to work on all
installations.

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Rd] Inconsistency in as.data.frame.table for stringsAsFactors

macrakis
In reply to this post by Martin Maechler
Martin,

I agree that global options settings that affect computations are
problematic.

But that's not the issue I was addressing.  If for some classes, func.CLASS
has certain defaults for some arguments, it is surprising that for other
classes, it has different defaults, whether these defaults are fixed or
taken from global settings -- when there is no obvious reason for the
default to vary by class.

          -s



On Fri, Jan 22, 2010 at 3:17 AM, Martin Maechler <[hidden email]
> wrote:

> >>>>> "SM" == Stavros Macrakis <[hidden email]>
> >>>>>     on Thu, 21 Jan 2010 20:19:28 -0500 writes:
>
>    SM> I noticed that in as.data.frame.table, the stringsAsFactors argument
>    SM> defaults to TRUE, whereas in the other as.data.frame methods, it
> defaults to
>    SM> default.stringsAsFactors().
>
>    SM> The documentation and implementation agree on this, so this is not a
> bug.
>
>    SM> However, I was wondering if this disparity was intended or if it
> might be
>    SM> some sort of unintentional oversight.  If it is intentional, I
> wonder what
>    SM> the rationale is.
>
> Some of us (including me) have strongly argued on several
> occasions that  global options() settings should *not* have an effect
> on anything "computing" but just on "output"
> i.e. printing/graphing of R results.
> As it is currently, potentially R scripts and R functions may
> only work correctly for one setting of
>     options( stringsAsFactors = * )
> which is against all principles of functional programming.
>
> From this (my) point of view, we should strive to eventually deprecate
> default.stringsAsFactors() which basically returns
> getOption("stringsAsFactors"),
> or as first/2nd step redefine it as
>
>  default.stringsAsFactors <- function() TRUE
>
> Martin Mächler.
>
>    SM> Thanks,
>    SM> -s
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Rd] Inconsistency in as.data.frame.table for stringsAsFactors

Peter Dalgaard
Stavros Macrakis wrote:

> Martin,
>
> I agree that global options settings that affect computations are
> problematic.
>
> But that's not the issue I was addressing.  If for some classes, func.CLASS
> has certain defaults for some arguments, it is surprising that for other
> classes, it has different defaults, whether these defaults are fixed or
> taken from global settings -- when there is no obvious reason for the
> default to vary by class.
>
>           -s

"A foolish consistency is the hobgoblin of little minds..."

The thing is that if you are converting the classifying factors of a
table to columns of a data frame, you will presumably prefer that they
come out as factors, retaining level order. The alternative is like this:

 >  (x <- as.table(c("Rare"=5, "Medium"=2, "Well-done"=6)))
      Rare    Medium Well-done
         5         2         6
 > df <- as.data.frame(x, stringsAsFactors=F)
 > xtabs(Freq~Var1, data=df)
Var1
    Medium      Rare Well-done
         2         5         6

This is completely different from other cases, where as.data.frame will
auto-convert character variables to factors; e.g., on reading. Having a
global option intended for read.table() interfere with the above kind of
operation, could be a really nasty surprise for the user. (Notice also
that the option was introduced in 2.10.0, before then, noone would
expect that classifying factors could come out as non-factors.
Defaulting to the global option could easily break working code.)

--
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])              FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Rd] Inconsistency in as.data.frame.table for stringsAsFactors

hadley wickham
On Sat, Jan 23, 2010 at 5:12 AM, Peter Dalgaard
<[hidden email]> wrote:

> Stavros Macrakis wrote:
>>
>> Martin,
>>
>> I agree that global options settings that affect computations are
>> problematic.
>>
>> But that's not the issue I was addressing.  If for some classes,
>> func.CLASS
>> has certain defaults for some arguments, it is surprising that for other
>> classes, it has different defaults, whether these defaults are fixed or
>> taken from global settings -- when there is no obvious reason for the
>> default to vary by class.
>>
>>          -s
>
> "A foolish consistency is the hobgoblin of little minds..."

Perhaps conversely,

"A foolish inconsistency is the hobgoblin of all minds" ?

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Rd] Inconsistency in as.data.frame.table for stringsAsFactors

Therneau, Terry M., Ph.D.
In reply to this post by macrakis
Kudos to Peter for actually answering the question of why the
inconsistency was there.  It might be well to add a bit to the
documentation.

  As to the larger discussion of global defaults let me offer two
opinions:
  1. They are the salvation of those of us who do not agree with certain
global defaults.
   -- 'best practice' is not always a consensus
   -- defaults are often informed too much by "the data we happened to
be analyising when we decided the default".  The long-standing
contrast.helmert one for instance; a look at the white book shows that
they were working on an orthagonal manufacturing design, the one case
where Helmert contrasts make sense.  The survival package contains
several defaults with the same type of origin.

  2. People in these discussions play the "it might break something"
card far too often.  At Mayo, for instance, the table() command has been
replaced by one which lists NA by default, for all data types.  We've
done this for as long as R and Splus have been used (10+ years), for all
150 people in the biostat group, and nothing has broken yet.  A
suggestion to allow this as a global default will immediately elicit the
above argument, I guarrantee it.  Ditto for our experience with
stringsAsFactors=FALSE; nothing's broken yet.  Give a concrete example
before crying wolf.

Terry T

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Rd] Inconsistency in as.data.frame.table for stringsAsFactors

Gabor Grothendieck
On Mon, Jan 25, 2010 at 10:36 AM, Terry Therneau <[hidden email]> wrote:
> Kudos to Peter for actually answering the question of why the
> inconsistency was there.  It might be well to add a bit to the
> documentation.
>
>  As to the larger discussion of global defaults let me offer two
> opinions:
>  1. They are the salvation of those of us who do not agree with certain

As soon as you have to interface with other software that may require
a different global default it becomes problematic.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel