as.data.frame.table() does not recognize default.stringsAsFactors()

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

as.data.frame.table() does not recognize default.stringsAsFactors()

Mychaleckyj, Josyf C (jcm6t)
Reporting a possible inconsistency or bug in handling stringsAsFactors in as.data.frame.table()

Here is a simple test

> options()$stringsAsFactors
[1] TRUE
> x<-c("a","b","c","a","b")
> d<-as.data.frame(table(x))
> d
  x Freq
1 a    2
2 b    2
3 c    1
> class(d$x)
[1] "factor"
> d2<-as.data.frame(table(x),stringsAsFactors=F)
> class(d2$x)
[1] “character"
> options(stringsAsFactors=F)
> options()$stringsAsFactors
[1] FALSE
> d3<-as.data.frame(table(x))
> d3
  x Freq
1 a    2
2 b    2
3 c    1
> class(d3$x)
[1] “factor"
> d4<-as.data.frame(table(x),stringsAsFactors=F)
> class(d4$x)
[1] “character"


# Display the code showing the different  stringsAsFactors handling in table and matrix:

> as.data.frame.table
function (x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors = TRUE,
    sep = "", base = list(LETTERS))
{
    ex <- quote(data.frame(do.call("expand.grid", c(dimnames(provideDimnames(x,
        sep = sep, base = base)), KEEP.OUT.ATTRS = FALSE, stringsAsFactors = stringsAsFactors)),
        Freq = c(x), row.names = row.names))
    names(ex)[3L] <- responseName
    eval(ex)
}
<bytecode: 0x28769f8>
<environment: namespace:base>

> as.data.frame.matrix
function (x, row.names = NULL, optional = FALSE, make.names = TRUE,
    ..., stringsAsFactors = default.stringsAsFactors())
{
    d <- dim(x)
    nrows <- d[[1L]]
    ncols <- d[[2L]]
    ic <- seq_len(ncols)
    dn <- dimnames(x)
    if (is.null(row.names))
        row.names <- dn[[1L]]
    collabs <- dn[[2L]]
    if (any(empty <- !nzchar(collabs)))
        collabs[empty] <- paste0("V", ic)[empty]
    value <- vector("list", ncols)
    if (mode(x) == "character" && stringsAsFactors) {
        for (i in ic) value[[i]] <- as.factor(x[, i])
    }
    else {
        for (i in ic) value[[i]] <- as.vector(x[, i])
    }
    autoRN <- (is.null(row.names) || length(row.names) != nrows)
    if (length(collabs) == ncols)
        names(value) <- collabs
    else if (!optional)
        names(value) <- paste0("V", ic)
    class(value) <- "data.frame"
    if (autoRN)
        attr(value, "row.names") <- .set_row_names(nrows)
    else .rowNamesDF(value, make.names = make.names) <- row.names
    value
}
<bytecode: 0x29995c0>
<environment: namespace:base>


> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /usr/lib64/liblapack.so.3.4.2

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.2 tools_3.5.2

Thanks,
Joe



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame.table() does not recognize default.stringsAsFactors()

Peter Dalgaard-2
I have no recollection of the original rationale for as.data.frame.table, but I actually think it is fine as it is:

The classifying _factors_ of a crosstable should be factors unless very specifically directed otherwise and that should not depend on the setting of an option that controls the conversion of character data.

For as.data.frame.matrix, in contrast, it is the _content_ of the matrix that is being converted, and it seems much more reasonable to follow the same path as for other character data.

-pd

> On 12 Mar 2019, at 21:39 , Mychaleckyj, Josyf C (jcm6t) <[hidden email]> wrote:
>
> Reporting a possible inconsistency or bug in handling stringsAsFactors in as.data.frame.table()
>
> Here is a simple test
>
>> options()$stringsAsFactors
> [1] TRUE
>> x<-c("a","b","c","a","b")
>> d<-as.data.frame(table(x))
>> d
>  x Freq
> 1 a    2
> 2 b    2
> 3 c    1
>> class(d$x)
> [1] "factor"
>> d2<-as.data.frame(table(x),stringsAsFactors=F)
>> class(d2$x)
> [1] “character"
>> options(stringsAsFactors=F)
>> options()$stringsAsFactors
> [1] FALSE
>> d3<-as.data.frame(table(x))
>> d3
>  x Freq
> 1 a    2
> 2 b    2
> 3 c    1
>> class(d3$x)
> [1] “factor"
>> d4<-as.data.frame(table(x),stringsAsFactors=F)
>> class(d4$x)
> [1] “character"
>
>
> # Display the code showing the different  stringsAsFactors handling in table and matrix:
>
>> as.data.frame.table
> function (x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors = TRUE,
>    sep = "", base = list(LETTERS))
> {
>    ex <- quote(data.frame(do.call("expand.grid", c(dimnames(provideDimnames(x,
>        sep = sep, base = base)), KEEP.OUT.ATTRS = FALSE, stringsAsFactors = stringsAsFactors)),
>        Freq = c(x), row.names = row.names))
>    names(ex)[3L] <- responseName
>    eval(ex)
> }
> <bytecode: 0x28769f8>
> <environment: namespace:base>
>
>> as.data.frame.matrix
> function (x, row.names = NULL, optional = FALSE, make.names = TRUE,
>    ..., stringsAsFactors = default.stringsAsFactors())
> {
>    d <- dim(x)
>    nrows <- d[[1L]]
>    ncols <- d[[2L]]
>    ic <- seq_len(ncols)
>    dn <- dimnames(x)
>    if (is.null(row.names))
>        row.names <- dn[[1L]]
>    collabs <- dn[[2L]]
>    if (any(empty <- !nzchar(collabs)))
>        collabs[empty] <- paste0("V", ic)[empty]
>    value <- vector("list", ncols)
>    if (mode(x) == "character" && stringsAsFactors) {
>        for (i in ic) value[[i]] <- as.factor(x[, i])
>    }
>    else {
>        for (i in ic) value[[i]] <- as.vector(x[, i])
>    }
>    autoRN <- (is.null(row.names) || length(row.names) != nrows)
>    if (length(collabs) == ncols)
>        names(value) <- collabs
>    else if (!optional)
>        names(value) <- paste0("V", ic)
>    class(value) <- "data.frame"
>    if (autoRN)
>        attr(value, "row.names") <- .set_row_names(nrows)
>    else .rowNamesDF(value, make.names = make.names) <- row.names
>    value
> }
> <bytecode: 0x29995c0>
> <environment: namespace:base>
>
>
>> sessionInfo()
> R version 3.5.2 (2018-12-20)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: CentOS Linux 7 (Core)
>
> Matrix products: default
> BLAS: /usr/lib64/libblas.so.3.4.2
> LAPACK: /usr/lib64/liblapack.so.3.4.2
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.2 tools_3.5.2
>
> Thanks,
> Joe
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame.table() does not recognize default.stringsAsFactors()

Mychaleckyj, Josyf C (jcm6t)
Peter,
Thanks for the response. I have no wish to prolong this and have no axe to grind. I’m sure you were delighted to see another stringsAsFactors issue.

Perhaps we talking about the conflation of two steps: the first is the language ‘pure' conversion of the table to a data.frame with the cross-tab factor, followed by an optional  subsequent step with programmatic utility for a specific application, of conversion of that factor to a character column.

As my toy example shows, the as.data.frame.table() function permits passing the inline stringsAsFactors argument and returns a data.frame with a factor cross-tab column coerced as a character column, permitting these two steps to be accomplished in a single function.

If you intend the function to only meet the first step, then I would suggest you remove stringsAsFactors as an argument to this function and amend the documentation.  
Following this, if an application needed a coercion to a character, then it should be accomplished in a second step.

If you are implying that the core team intended options(stringsAsFactors) to be a ‘selective’ global option then I am guess I am confused and have not seen documentation about a limited scope of the session-wide options().

?options
  ‘stringsAsFactors’: The default setting for arguments of
          ‘data.frame’ and ‘read.table’.

As a practical programming matter this inconsistency created a bug in our code that was very insidious and cost hours of debugging and a lot of head scratching. Chars and factors are always prime candidates, but we never even considered that the session option would not have been respected by a low level core function in which the function call in the documentation explicitly included the inline argument.

?as.data.frame.table()

From the Usage section of as.data.frame.table()

     ## S3 method for class 'table'
     as.data.frame(x, row.names = NULL, ...,
                   responseName = "Freq", stringsAsFactors = TRUE,
                   sep = "", base = list(LETTERS))


Thanks,  Joe.


> On Mar 14, 2019, at 11:18 AM, peter dalgaard <[hidden email]> wrote:
>
> I have no recollection of the original rationale for as.data.frame.table, but I actually think it is fine as it is:
>
> The classifying _factors_ of a crosstable should be factors unless very specifically directed otherwise and that should not depend on the setting of an option that controls the conversion of character data.
>
> For as.data.frame.matrix, in contrast, it is the _content_ of the matrix that is being converted, and it seems much more reasonable to follow the same path as for other character data.
>
> -pd
>
>> On 12 Mar 2019, at 21:39 , Mychaleckyj, Josyf C (jcm6t) <[hidden email]> wrote:
>>
>> Reporting a possible inconsistency or bug in handling stringsAsFactors in as.data.frame.table()
>>
>> Here is a simple test
>>
>>> options()$stringsAsFactors
>> [1] TRUE
>>> x<-c("a","b","c","a","b")
>>> d<-as.data.frame(table(x))
>>> d
>> x Freq
>> 1 a    2
>> 2 b    2
>> 3 c    1
>>> class(d$x)
>> [1] "factor"
>>> d2<-as.data.frame(table(x),stringsAsFactors=F)
>>> class(d2$x)
>> [1] “character"
>>> options(stringsAsFactors=F)
>>> options()$stringsAsFactors
>> [1] FALSE
>>> d3<-as.data.frame(table(x))
>>> d3
>> x Freq
>> 1 a    2
>> 2 b    2
>> 3 c    1
>>> class(d3$x)
>> [1] “factor"
>>> d4<-as.data.frame(table(x),stringsAsFactors=F)
>>> class(d4$x)
>> [1] “character"
>>
>>
>> # Display the code showing the different  stringsAsFactors handling in table and matrix:
>>
>>> as.data.frame.table
>> function (x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors = TRUE,
>>   sep = "", base = list(LETTERS))
>> {
>>   ex <- quote(data.frame(do.call("expand.grid", c(dimnames(provideDimnames(x,
>>       sep = sep, base = base)), KEEP.OUT.ATTRS = FALSE, stringsAsFactors = stringsAsFactors)),
>>       Freq = c(x), row.names = row.names))
>>   names(ex)[3L] <- responseName
>>   eval(ex)
>> }
>> <bytecode: 0x28769f8>
>> <environment: namespace:base>
>>
>>> as.data.frame.matrix
>> function (x, row.names = NULL, optional = FALSE, make.names = TRUE,
>>   ..., stringsAsFactors = default.stringsAsFactors())
>> {
>>   d <- dim(x)
>>   nrows <- d[[1L]]
>>   ncols <- d[[2L]]
>>   ic <- seq_len(ncols)
>>   dn <- dimnames(x)
>>   if (is.null(row.names))
>>       row.names <- dn[[1L]]
>>   collabs <- dn[[2L]]
>>   if (any(empty <- !nzchar(collabs)))
>>       collabs[empty] <- paste0("V", ic)[empty]
>>   value <- vector("list", ncols)
>>   if (mode(x) == "character" && stringsAsFactors) {
>>       for (i in ic) value[[i]] <- as.factor(x[, i])
>>   }
>>   else {
>>       for (i in ic) value[[i]] <- as.vector(x[, i])
>>   }
>>   autoRN <- (is.null(row.names) || length(row.names) != nrows)
>>   if (length(collabs) == ncols)
>>       names(value) <- collabs
>>   else if (!optional)
>>       names(value) <- paste0("V", ic)
>>   class(value) <- "data.frame"
>>   if (autoRN)
>>       attr(value, "row.names") <- .set_row_names(nrows)
>>   else .rowNamesDF(value, make.names = make.names) <- row.names
>>   value
>> }
>> <bytecode: 0x29995c0>
>> <environment: namespace:base>
>>
>>
>>> sessionInfo()
>> R version 3.5.2 (2018-12-20)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: CentOS Linux 7 (Core)
>>
>> Matrix products: default
>> BLAS: /usr/lib64/libblas.so.3.4.2
>> LAPACK: /usr/lib64/liblapack.so.3.4.2
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.5.2 tools_3.5.2
>>
>> Thanks,
>> Joe
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame.table() does not recognize default.stringsAsFactors()

Martin Maechler
In reply to this post by Peter Dalgaard-2
>>>>> peter dalgaard
>>>>>     on Thu, 14 Mar 2019 16:18:55 +0100 writes:

    > I have no recollection of the original rationale for as.data.frame.table, but I actually think it is fine as it is:
    > The classifying _factors_ of a crosstable should be factors unless very specifically directed otherwise and that should not depend on the setting of an option that controls the conversion of character data.

    > For as.data.frame.matrix, in contrast, it is the _content_ of the matrix that is being converted, and it seems much more reasonable to follow the same path as for other character data.

    > -pd

I very strongly agree that as.data.frame.table() should not be
changed to follow a global option.

To the contrary: I've repeatedly mentioned that in my view it
has been a design mistake to allow data.frame() and as.data.frame() be influenced
by a global option
 [and we should've tried harder to keep things purely functional
   (R remaining as closely as possible a "functional language"),
  e.g. by providing wrapper functions the same way we have such
  wrappers for versions of read.table() with different defaults
  for some of the arguments
 ]

Martin


    >> On 12 Mar 2019, at 21:39 , Mychaleckyj, Josyf C (jcm6t) <[hidden email]> wrote:
    >>
    >> Reporting a possible inconsistency or bug in handling stringsAsFactors in as.data.frame.table()
    >>
    >> Here is a simple test
    >>
    >>> options()$stringsAsFactors
    >> [1] TRUE
    >>> x<-c("a","b","c","a","b")
    >>> d<-as.data.frame(table(x))
    >>> d
    >> x Freq
    >> 1 a    2
    >> 2 b    2
    >> 3 c    1
    >>> class(d$x)
    >> [1] "factor"
    >>> d2<-as.data.frame(table(x),stringsAsFactors=F)
    >>> class(d2$x)
    >> [1] “character"
    >>> options(stringsAsFactors=F)
    >>> options()$stringsAsFactors
    >> [1] FALSE
    >>> d3<-as.data.frame(table(x))
    >>> d3
    >> x Freq
    >> 1 a    2
    >> 2 b    2
    >> 3 c    1
    >>> class(d3$x)
    >> [1] “factor"
    >>> d4<-as.data.frame(table(x),stringsAsFactors=F)
    >>> class(d4$x)
    >> [1] “character"
    >>
    >>
    >> # Display the code showing the different  stringsAsFactors handling in table and matrix:
    >>
    >>> as.data.frame.table
    >> function (x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors = TRUE,
    >> sep = "", base = list(LETTERS))
    >> {
    >> ex <- quote(data.frame(do.call("expand.grid", c(dimnames(provideDimnames(x,
    >> sep = sep, base = base)), KEEP.OUT.ATTRS = FALSE, stringsAsFactors = stringsAsFactors)),
    >> Freq = c(x), row.names = row.names))
    >> names(ex)[3L] <- responseName
    >> eval(ex)
    >> }
    >> <bytecode: 0x28769f8>
    >> <environment: namespace:base>
    >>
    >>> as.data.frame.matrix
    >> function (x, row.names = NULL, optional = FALSE, make.names = TRUE,
    >> ..., stringsAsFactors = default.stringsAsFactors())
    >> {
    >> d <- dim(x)
    >> nrows <- d[[1L]]
    >> ncols <- d[[2L]]
    >> ic <- seq_len(ncols)
    >> dn <- dimnames(x)
    >> if (is.null(row.names))
    >> row.names <- dn[[1L]]
    >> collabs <- dn[[2L]]
    >> if (any(empty <- !nzchar(collabs)))
    >> collabs[empty] <- paste0("V", ic)[empty]
    >> value <- vector("list", ncols)
    >> if (mode(x) == "character" && stringsAsFactors) {
    >> for (i in ic) value[[i]] <- as.factor(x[, i])
    >> }
    >> else {
    >> for (i in ic) value[[i]] <- as.vector(x[, i])
    >> }
    >> autoRN <- (is.null(row.names) || length(row.names) != nrows)
    >> if (length(collabs) == ncols)
    >> names(value) <- collabs
    >> else if (!optional)
    >> names(value) <- paste0("V", ic)
    >> class(value) <- "data.frame"
    >> if (autoRN)
    >> attr(value, "row.names") <- .set_row_names(nrows)
    >> else .rowNamesDF(value, make.names = make.names) <- row.names
    >> value
    >> }
    >> <bytecode: 0x29995c0>
    >> <environment: namespace:base>
    >>
    >>
    >>> sessionInfo()
    >> R version 3.5.2 (2018-12-20)
    >> Platform: x86_64-pc-linux-gnu (64-bit)
    >> Running under: CentOS Linux 7 (Core)
    >>
    >> Matrix products: default
    >> BLAS: /usr/lib64/libblas.so.3.4.2
    >> LAPACK: /usr/lib64/liblapack.so.3.4.2
    >>
    >> locale:
    >> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
    >> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
    >> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
    >> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
    >> [9] LC_ADDRESS=C               LC_TELEPHONE=C
    >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
    >>
    >> attached base packages:
    >> [1] stats     graphics  grDevices utils     datasets  methods   base
    >>
    >> loaded via a namespace (and not attached):
    >> [1] compiler_3.5.2 tools_3.5.2
    >>
    >> Thanks,
    >> Joe
    >>
    >>
    >>
    >> [[alternative HTML version deleted]]
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > --
    > Peter Dalgaard, Professor,
    > Center for Statistics, Copenhagen Business School
    > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501
    > Office: A 4.23
    > Email: [hidden email]  Priv: [hidden email]

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame.table() does not recognize default.stringsAsFactors()

Abs Spurdle
In reply to this post by Mychaleckyj, Josyf C (jcm6t)
Martin Maechler Wrote:
and we should've tried harder to keep things purely functional (R remaining
as closely as possible a "functional language")

This is diverging from the original post.
However, isn't R a multiparadigm programming language (by design)?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame.table() does not recognize default.stringsAsFactors()

R devel mailing list
In reply to this post by Mychaleckyj, Josyf C (jcm6t)
I have to disagree with both Peter and Martin on this.

The underneath issue is that the automatic conversion of characters to factors by the
data.frame functions was the single most egregious design blunder in the Statistical
Models in S book, and we are still living with it.  The stringsAsFactors option was a
compromise to let users opt out of that mistake (one I had to fight hard for).    In that
light I read Peter's defense as "but in this case we really DO know better than the user,
and won't let them opt out", and Martin's as "they shouldn't have been able to opt out in
the first place, so weaken it at every opportunity".

I generally agree that global options should be minimal.  But if one exists, let's be
consistent and listen to it.

(Footnote: In the Mayo Biostat group, stringsAsFactors=FALSE is the recommended global
option for all users.  It's a pure cost/productivity thing.  We work on thousands of data
sets in a year, and the errors and misunderstandings that silent conversions generate far
outweigh any benefits. )

Terry T.


On 3/15/19 6:00 AM, [hidden email] wrote:

>      > I have no recollection of the original rationale for as.data.frame.table, but I actually think it is fine as it is:
>      > The classifying_factors_  of a crosstable should be factors unless very specifically directed otherwise and that should not depend on the setting of an option that controls the conversion of character data.
>
>      > For as.data.frame.matrix, in contrast, it is the_content_  of the matrix that is being converted, and it seems much more reasonable to follow the same path as for other character data.
>
>      > -pd
>
> I very strongly agree that as.data.frame.table() should not be
> changed to follow a global option.
>
> To the contrary: I've repeatedly mentioned that in my view it
> has been a design mistake to allow data.frame() and as.data.frame() be influenced
> by a global option
>   [and we should've tried harder to keep things purely functional
>     (R remaining as closely as possible a "functional language"),
>    e.g. by providing wrapper functions the same way we have such
>    wrappers for versions of read.table() with different defaults
>    for some of the arguments
>   ]


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame.table() does not recognize default.stringsAsFactors()

Peter Dalgaard-2
My point was that, in a table, the row and columns usually have a well-defined order. If you convert the table to data frame form, typically in order to fit a Poisson GLM, you do want to preserve that order, and not have the levels converted to a locale-dependent alphabetical order in your analyses. Or at least, if you do want conversion to character, you should say so very explicitly. That is the way it currently works: You can override, just not via the global option.

Notice also that it is very easy to do as.character(factor) if you need it, whereas it is rather more painful to convert a character vector to a factor with level names determined by the dimension names of the appropriate extent of the original table.

-pd

> On 15 Mar 2019, at 13:13 , Therneau, Terry M., Ph.D. via R-devel <[hidden email]> wrote:
>
> I have to disagree with both Peter and Martin on this.
>
> The underneath issue is that the automatic conversion of characters to factors by the
> data.frame functions was the single most egregious design blunder in the Statistical
> Models in S book, and we are still living with it.  The stringsAsFactors option was a
> compromise to let users opt out of that mistake (one I had to fight hard for).    In that
> light I read Peter's defense as "but in this case we really DO know better than the user,
> and won't let them opt out", and Martin's as "they shouldn't have been able to opt out in
> the first place, so weaken it at every opportunity".
>
> I generally agree that global options should be minimal.  But if one exists, let's be
> consistent and listen to it.
>
> (Footnote: In the Mayo Biostat group, stringsAsFactors=FALSE is the recommended global
> option for all users.  It's a pure cost/productivity thing.  We work on thousands of data
> sets in a year, and the errors and misunderstandings that silent conversions generate far
> outweigh any benefits. )
>
> Terry T.
>
>
> On 3/15/19 6:00 AM, [hidden email] wrote:
>>> I have no recollection of the original rationale for as.data.frame.table, but I actually think it is fine as it is:
>>> The classifying_factors_  of a crosstable should be factors unless very specifically directed otherwise and that should not depend on the setting of an option that controls the conversion of character data.
>>
>>> For as.data.frame.matrix, in contrast, it is the_content_  of the matrix that is being converted, and it seems much more reasonable to follow the same path as for other character data.
>>
>>> -pd
>>
>> I very strongly agree that as.data.frame.table() should not be
>> changed to follow a global option.
>>
>> To the contrary: I've repeatedly mentioned that in my view it
>> has been a design mistake to allow data.frame() and as.data.frame() be influenced
>> by a global option
>>  [and we should've tried harder to keep things purely functional
>>    (R remaining as closely as possible a "functional language"),
>>   e.g. by providing wrapper functions the same way we have such
>>   wrappers for versions of read.table() with different defaults
>>   for some of the arguments
>>  ]
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame.table() does not recognize default.stringsAsFactors()

R devel mailing list
Peter, we are arguing at cross purposes.  My point was that if I have specified
options(stringsAsFactors=FALSE) that is a statement to R to NOT DO THIS.   Your argument
in return is that "yes, Therneau said that, but in this case he almost certainly doesn't
really mean it, so ignore him".  Now the first part of your sentence may quite possibly be
true.  I still don't like the second.

Of course, the best would be consistency from the start.  If I want a data frame form of
table(a, b), what some packages call list mode, have the class of a, b in the result match
the class of a,b at the start: character to character, factor to factor, numeric to
numeric, etc.  The fact that all of them ended up as character in the intermediate
dimnames takes much of the wind out my argument above: when converting a table to a
dataframe the best R can do is guess what the original class was.   Factor might indeed be
the best guess, lacking a data.frame=TRUE argument for table().

Here, though I hate to say it, is an argument for your side: data.frame(table( rep(1:10,
length=15)))
"10" should be after "9".

Terry T.


On 3/15/19 8:31 AM, peter dalgaard wrote:

> My point was that, in a table, the row and columns usually have a well-defined order. If you convert the table to data frame form, typically in order to fit a Poisson GLM, you do want to preserve that order, and not have the levels converted to a locale-dependent alphabetical order in your analyses. Or at least, if you do want conversion to character, you should say so very explicitly. That is the way it currently works: You can override, just not via the global option.
>
> Notice also that it is very easy to do as.character(factor) if you need it, whereas it is rather more painful to convert a character vector to a factor with level names determined by the dimension names of the appropriate extent of the original table.
>
> -pd
>
>> On 15 Mar 2019, at 13:13 , Therneau, Terry M., Ph.D. via R-devel <[hidden email]> wrote:
>>
>> I have to disagree with both Peter and Martin on this.
>>
>> The underneath issue is that the automatic conversion of characters to factors by the
>> data.frame functions was the single most egregious design blunder in the Statistical
>> Models in S book, and we are still living with it.  The stringsAsFactors option was a
>> compromise to let users opt out of that mistake (one I had to fight hard for).    In that
>> light I read Peter's defense as "but in this case we really DO know better than the user,
>> and won't let them opt out", and Martin's as "they shouldn't have been able to opt out in
>> the first place, so weaken it at every opportunity".
>>
>> I generally agree that global options should be minimal.  But if one exists, let's be
>> consistent and listen to it.
>>
>> (Footnote: In the Mayo Biostat group, stringsAsFactors=FALSE is the recommended global
>> option for all users.  It's a pure cost/productivity thing.  We work on thousands of data
>> sets in a year, and the errors and misunderstandings that silent conversions generate far
>> outweigh any benefits. )
>>
>> Terry T.
>>
>>
>> On 3/15/19 6:00 AM, [hidden email] wrote:
>>>> I have no recollection of the original rationale for as.data.frame.table, but I actually think it is fine as it is:
>>>> The classifying_factors_  of a crosstable should be factors unless very specifically directed otherwise and that should not depend on the setting of an option that controls the conversion of character data.
>>>> For as.data.frame.matrix, in contrast, it is the_content_  of the matrix that is being converted, and it seems much more reasonable to follow the same path as for other character data.
>>>> -pd
>>> I very strongly agree that as.data.frame.table() should not be
>>> changed to follow a global option.
>>>
>>> To the contrary: I've repeatedly mentioned that in my view it
>>> has been a design mistake to allow data.frame() and as.data.frame() be influenced
>>> by a global option
>>>   [and we should've tried harder to keep things purely functional
>>>     (R remaining as closely as possible a "functional language"),
>>>    e.g. by providing wrapper functions the same way we have such
>>>    wrappers for versions of read.table() with different defaults
>>>    for some of the arguments
>>>   ]
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel