as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)

R devel mailing list
as.data.frame methods behave inconsistently when they are given a row.name
argument of the wrong length.  The matrix method silently ignores row.names
if it has the wrong length and the numeric, integer, and character methods
do not bother to check and thus make an illegal data.frame.

> as.data.frame(matrix(1:6,nrow=3), row.names=c("One","Two"))
  V1 V2
1  1  4
2  2  5
3  3  6
> as.data.frame(1:3, row.names=c("One","Two"))
    1:3
One   1
Two   2
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
> as.data.frame(c("a","b","c"), row.names=c("One","Two"))
    c("a", "b", "c")
One                a
Two                b
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

(The warnings are from the printing, not the making, of the data.frames.)

I ran into this while using the DoE.wrapper package, which has what I think
is a typo,
giving "t" as the row.names for the output of mapply():
   cross.design.R:    ro <- as.data.frame(mapply("touter",ro1, ro2,
"paste", sep="_"),"t")

I don't know all the reasons why people use as.data.frame instead of
data.frame.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)

Paul Grosu

Hi Bill,

The thing is that is happening here is the specific instance of
as.data.frame that is being run, which in this instance switch between
as.data.frame.matrix() and as.data.frame.matrix().  I attached the
dataframe.R code, which you can find the src/library/base/R folder of the
source code.  Though if you use data.frame() it will give a more expected
result.  

For instance the first runs as follows through matrix:

> as.data.frame.matrix(matrix(1:6,nrow=3), row.names=c("One","Two"))
  V1 V2
1  1  4
2  2  5
3  3  6

The other two run via vector:

> as.data.frame.vector(1:3, row.names=c("One","Two"))
    1:3
One   1
Two   2
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

> as.data.frame.vector(c("a","b","c"), row.names=c("One","Two"))
    c("a", "b", "c")
One                a
Two                b
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

The thing is that if you use data.frame() it will work more as expected:

> data.frame(matrix(1:6,nrow=3), row.names=c("One","Two"))
Error in data.frame(matrix(1:6, nrow = 3), row.names = c("One", "Two")) :
  row names supplied are of the wrong length

> data.frame(matrix(1:6,nrow=3), row.names=c("One","Two","Three"))
      X1 X2
One    1  4
Two    2  5
Three  3  6

> data.frame(c("a","b","c"), row.names=c("One","Two"))
Error in data.frame(c("a", "b", "c"), row.names = c("One", "Two")) :
  row names supplied are of the wrong length

> data.frame(c("a","b","c"), row.names=c("One","Two","Three"))
      c..a....b....c..
One                  a
Two                  b
Three                c

> data.frame(1:3, row.names=c("One","Two"))
Error in data.frame(1:3, row.names = c("One", "Two")) :
  row names supplied are of the wrong length

> data.frame(1:3, row.names=c("One","Two","Three"))
      X1.3
One      1
Two      2
Three    3

Hope it helps,
Paul

-----Original Message-----
From: R-devel [mailto:[hidden email]] On Behalf Of William
Dunlap via R-devel
Sent: Wednesday, January 13, 2016 4:46 PM
To: [hidden email]; Ulrike Groemping
Subject: [Rd] as.data.frame and illegal row.names argument (bug in
package:DoE.wrapper?)

as.data.frame methods behave inconsistently when they are given a row.name
argument of the wrong length.  The matrix method silently ignores row.names
if it has the wrong length and the numeric, integer, and character methods
do not bother to check and thus make an illegal data.frame.

> as.data.frame(matrix(1:6,nrow=3), row.names=c("One","Two"))
  V1 V2
1  1  4
2  2  5
3  3  6
> as.data.frame(1:3, row.names=c("One","Two"))
    1:3
One   1
Two   2
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
> as.data.frame(c("a","b","c"), row.names=c("One","Two"))
    c("a", "b", "c")
One                a
Two                b
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs

(The warnings are from the printing, not the making, of the data.frames.)

I ran into this while using the DoE.wrapper package, which has what I think
is a typo, giving "t" as the row.names for the output of mapply():
   cross.design.R:    ro <- as.data.frame(mapply("touter",ro1, ro2,
"paste", sep="_"),"t")

I don't know all the reasons why people use as.data.frame instead of
data.frame.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)

Martin Maechler
>>>>> Paul Grosu <[hidden email]>
>>>>>     on Thu, 14 Jan 2016 17:35:49 -0500 writes:

 > Hi Bill,

 > The thing is that is happening here is the specific
 > instance of as.data.frame that is being run, which in this
 > instance switch between as.data.frame.matrix() and as.data.frame.matrix().  

(This must be another typo i.e. "cut/n/paste forgot to modify" lapsus;
 you probably meant *.vector in the 2nd case).

I'm pretty sure Bill was not asking *why* this happens {he would
easily find out if he wanted} but reporting two (potential) bugs:

- one in R  [not reporting erronous as.data.frame() usage]
- one in DoE.wrapper

I'm going to look into the  R one, which is indeed in the
as.data.frame.vector() method, as you've noted.

--
Martin Maechler
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)

Martin Maechler
In reply to this post by R devel mailing list
>>>>> William Dunlap via R-devel <[hidden email]>
>>>>>     on Wed, 13 Jan 2016 13:46:05 -0800 writes:

> as.data.frame methods behave inconsistently when they are given a row.name
> argument of the wrong length.  The matrix method silently ignores row.names
> if it has the wrong length and the numeric, integer, and character methods
> do not bother to check and thus make an illegal data.frame.
>
> > as.data.frame(matrix(1:6,nrow=3), row.names=c("One","Two"))
>   V1 V2
> 1  1  4
> 2  2  5
> 3  3  6
> > as.data.frame(1:3, row.names=c("One","Two"))
>     1:3
> One   1
> Two   2
> Warning message:
> In format.data.frame(x, digits = digits, na.encode = FALSE) :
>   corrupt data frame: columns will be truncated or padded with NAs
> > as.data.frame(c("a","b","c"), row.names=c("One","Two"))
>     c("a", "b", "c")
> One                a
> Two                b
> Warning message:
> In format.data.frame(x, digits = digits, na.encode = FALSE) :
>   corrupt data frame: columns will be truncated or padded with NAs

as I said yesterday, I want to "fix" this in R.
As Paul Grosu mentioned, the bugous -- too tolerant -- behavior
is in the as.data.frame.vector() method,  and the
as.data.frame.matrix() simply drops wrong row.names and use
default row names in that case.

This would leave (at least) two ways to change:
1) the *.matrix compatible one simply forgets wrong  'row.names'
2) Wrong row.names are a user error.

Now, '1)' would be more in line with the matrix method, but
really feels wrong, because it does not catch user error and
silently disregards a specifically specified argument.

For '2)' I propose a fix which will only *warn* about the wrong
'row.names' for now (so code continues to work which has
implicitly relied on the wrong behavior, but with a warning:

    > as.data.frame(1:3, row.names=c("One","Two"))
      1:3
    1   1
    2   2
    3   3
    Warning message:
    In as.data.frame.integer(1:3, row.names = c("One", "Two")) :
      'row.names' is not a character vector of length 3 -- omitting it. Will be an error!
    >

This will give new warnings in packages, and package authors can
fix these.... before the above will eventually become an error.


The remaining question is if the  as.data.frame.matrix() method
should not also produce the same warning about illegal
row.names.  Interestingly, the *model.matrix* method does
produce an error even now, when row.names are specified of wrong
length:

   > ff <- log(Volume) ~ log(Height) + log(Girth)
   > m <- model.frame(ff, trees)
   > mat <- model.matrix(ff, m)
   > data.frame(mat, row.names = paste0("r", 1:30))
   Error in data.frame(mat, row.names = paste0("r", 1:30)) :
     row names supplied are of the wrong length
   >

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)

Paul Grosu
In reply to this post by Martin Maechler
Hi Martin,

Sorry about the confusion.  Yes, my eyes were playing tricks on me during
the copy-&-paste process :)  Thank you for helping so quickly with this.  I
am not part of the R core group to make changes to R, but I've studied the R
source code - and that of quite a few BioConductor packages - for many
years.  So I was just posting to help with why it was happening to focus the
root-cause for a fix, and how to find a possible work-around for it.

Thank you,
Paul

-----Original Message-----
From: Martin Maechler [mailto:[hidden email]]
Sent: Friday, January 15, 2016 2:57 AM
To: Paul Grosu
Cc: [hidden email]
Subject: Re: [Rd] as.data.frame and illegal row.names argument (bug in
package:DoE.wrapper?)

>>>>> Paul Grosu <[hidden email]>
>>>>>     on Thu, 14 Jan 2016 17:35:49 -0500 writes:

 > Hi Bill,

 > The thing is that is happening here is the specific  > instance of
as.data.frame that is being run, which in this  > instance switch between
as.data.frame.matrix() and as.data.frame.matrix().  

(This must be another typo i.e. "cut/n/paste forgot to modify" lapsus;  you
probably meant *.vector in the 2nd case).

I'm pretty sure Bill was not asking *why* this happens {he would easily find
out if he wanted} but reporting two (potential) bugs:

- one in R  [not reporting erronous as.data.frame() usage]
- one in DoE.wrapper

I'm going to look into the  R one, which is indeed in the
as.data.frame.vector() method, as you've noted.

--
Martin Maechler
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel