The "Summary" group generics always throw errors for a data.frame with zero rows, for example:
> sum(data.frame(x = numeric(0))) #> Error in FUN(X[[i]], ...) : #> only defined on a data frame with all numeric variables Same behaviour for min, max, any, all, ... . I believe this is inconsistent with what these methods do for other empty objects (vectors, matrices), where the return value is chosen to ensure transitivity: sum(numeric(0)) == 0. The reason for this is that the return type of as.matrix() for empty (no rows or no columns) data.frame objects is always a matrix of type "logical". The Summary method for data.frame, in turn, throws an error when the data.frame, converted to a matrix, is not of numeric type. I suggest two ways that make sum, min, max, ... more consistent. IMHO it would be fitting to implement both of these fixes, because they also make other things more consistent. 1. Make the return type of as.matrix() for zero-row data.frames consistent with the type that would have been returned, had the data.frame had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if there is an empty "character" column the return matrix should be a character etc. This would make subsetting by row and conversion to matrix commute (except for row names sometimes): > all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = FALSE]) Furthermore, this change would make as.matrix.data.frame obey the documentation, which indicates that the coercion hierarchy is used for the return type. 2. Make the Summary.data.frame method accept data.frames that produce non-numeric matrices. Next to the main focus of this message, I believe it would e.g. be fitting to have any() and all() work on logical data.frame objects. The current behaviour is such that > any(data.frame(x = 1)) #> [1] TRUE #> Warning message: #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical and > any(data.frame(x = TRUE)) #> Error in FUN(X[[i]], ...) : #> only defined on a data frame with all numeric variables So a numeric data.frame warns about implicit coercion, while a logical data.frame (which would not need coercion) does not work at all. (I feel more strongly about fixing 1. than 2., because I don't know the discussion that lead to the behaviour described in 2.) Best, Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hmm, yes, this is probably wrong. E.g., we are likely to get inconsistencies out of boundary cases like this
> a <- na.omit(airquality) > sum(a) [1] 37495.3 > sum(a[FALSE,]) Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables Or, closer to an actual use case: > sum(subset(a, Ozone>100)) [1] 3330.5 > sum(subset(a, Ozone>200)) Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables However, given that numeric summaries generally treat logicals as 0/1, wouldn't it be easiest just to extend the check inside Summary.data.frame with "&& !is.logical(x)"? > sum(as.matrix(a[FALSE,])) [1] 0 -pd > On 17 Oct 2020, at 21:18 , Martin <[hidden email]> wrote: > > The "Summary" group generics always throw errors for a data.frame with zero rows, for example: >> sum(data.frame(x = numeric(0))) > #> Error in FUN(X[[i]], ...) : > #> only defined on a data frame with all numeric variables > Same behaviour for min, max, any, all, ... . I believe this is inconsistent with what these methods do for other empty objects (vectors, matrices), where the return value is chosen to ensure transitivity: sum(numeric(0)) == 0. > > The reason for this is that the return type of as.matrix() for empty (no rows or no columns) data.frame objects is always a matrix of type "logical". The Summary method for data.frame, in turn, throws an error when the data.frame, converted to a matrix, is not of numeric type. > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it would be fitting to implement both of these fixes, because they also make other things more consistent. > > 1. Make the return type of as.matrix() for zero-row data.frames consistent with the type that would have been returned, had the data.frame had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if there is an empty "character" column the return matrix should be a character etc. This would make subsetting by row and conversion to matrix commute (except for row names sometimes): >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = FALSE]) > Furthermore, this change would make as.matrix.data.frame obey the documentation, which indicates that the coercion hierarchy is used for the return type. > > 2. Make the Summary.data.frame method accept data.frames that produce non-numeric matrices. Next to the main focus of this message, I believe it would e.g. be fitting to have any() and all() work on logical data.frame objects. The current behaviour is such that >> any(data.frame(x = 1)) > #> [1] TRUE > #> Warning message: > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical > and >> any(data.frame(x = TRUE)) > #> Error in FUN(X[[i]], ...) : > #> only defined on a data frame with all numeric variables > So a numeric data.frame warns about implicit coercion, while a logical data.frame (which would not need coercion) does not work at all. > > (I feel more strongly about fixing 1. than 2., because I don't know the discussion that lead to the behaviour described in 2.) > > Best, > Martin > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Peter et al,
I had the same thought, in particular for any() and all(), which in as much as they should work on data.frames in the first place (which to be perfectly honest i do find quite debatable myself), should certainly work on "logical" data.frames if they are going to work on "numeric" ones. I can volunteer to prepare a patch if Martin (the reporter) did not want to take a crack at it, and further if it is not already being done within R-core. Best, ~G On Sun, Oct 18, 2020 at 12:19 AM peter dalgaard <[hidden email]> wrote: > Hmm, yes, this is probably wrong. E.g., we are likely to get > inconsistencies out of boundary cases like this > > > a <- na.omit(airquality) > > sum(a) > [1] 37495.3 > > sum(a[FALSE,]) > Error in FUN(X[[i]], ...) : > only defined on a data frame with all numeric variables > > Or, closer to an actual use case: > > > sum(subset(a, Ozone>100)) > [1] 3330.5 > > sum(subset(a, Ozone>200)) > Error in FUN(X[[i]], ...) : > only defined on a data frame with all numeric variables > > > However, given that numeric summaries generally treat logicals as 0/1, > wouldn't it be easiest just to extend the check inside Summary.data.frame > with "&& !is.logical(x)"? > > > sum(as.matrix(a[FALSE,])) > [1] 0 > > -pd > > > On 17 Oct 2020, at 21:18 , Martin <[hidden email]> wrote: > > > > The "Summary" group generics always throw errors for a data.frame with > zero rows, for example: > >> sum(data.frame(x = numeric(0))) > > #> Error in FUN(X[[i]], ...) : > > #> only defined on a data frame with all numeric variables > > Same behaviour for min, max, any, all, ... . I believe this is > inconsistent with what these methods do for other empty objects (vectors, > matrices), where the return value is chosen to ensure transitivity: > sum(numeric(0)) == 0. > > > > The reason for this is that the return type of as.matrix() for empty (no > rows or no columns) data.frame objects is always a matrix of type > "logical". The Summary method for data.frame, in turn, throws an error when > the data.frame, converted to a matrix, is not of numeric type. > > > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it > would be fitting to implement both of these fixes, because they also make > other things more consistent. > > > > 1. Make the return type of as.matrix() for zero-row data.frames > consistent with the type that would have been returned, had the data.frame > had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should > then be numeric, if there is an empty "character" column the return matrix > should be a character etc. This would make subsetting by row and conversion > to matrix commute (except for row names sometimes): > >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , > drop = FALSE]) > > Furthermore, this change would make as.matrix.data.frame obey the > documentation, which indicates that the coercion hierarchy is used for the > return type. > > > > 2. Make the Summary.data.frame method accept data.frames that produce > non-numeric matrices. Next to the main focus of this message, I believe it > would e.g. be fitting to have any() and all() work on logical data.frame > objects. The current behaviour is such that > >> any(data.frame(x = 1)) > > #> [1] TRUE > > #> Warning message: > > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to > logical > > and > >> any(data.frame(x = TRUE)) > > #> Error in FUN(X[[i]], ...) : > > #> only defined on a data frame with all numeric variables > > So a numeric data.frame warns about implicit coercion, while a logical > data.frame (which would not need coercion) does not work at all. > > > > (I feel more strongly about fixing 1. than 2., because I don't know the > discussion that lead to the behaviour described in 2.) > > > > Best, > > Martin > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: [hidden email] Priv: [hidden email] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
From my side: it would be great if you (or R core) could prepare a patch, it would probably take me quite a bit longer than you since I don't have experience creating patches for R.
Best, Martin On Sun, Oct 18, 2020, at 21:49, Gabriel Becker wrote: > Peter et al, > > I had the same thought, in particular for any() and all(), which in as > much as they should work on data.frames in the first place (which to be > perfectly honest i do find quite debatable myself), should certainly > work on "logical" data.frames if they are going to work on "numeric" > ones. > > I can volunteer to prepare a patch if Martin (the reporter) did not > want to take a crack at it, and further if it is not already being done > within R-core. > > Best, > ~G > > On Sun, Oct 18, 2020 at 12:19 AM peter dalgaard <[hidden email]> wrote: > > Hmm, yes, this is probably wrong. E.g., we are likely to get inconsistencies out of boundary cases like this > > > > > a <- na.omit(airquality) > > > sum(a) > > [1] 37495.3 > > > sum(a[FALSE,]) > > Error in FUN(X[[i]], ...) : > > only defined on a data frame with all numeric variables > > > > Or, closer to an actual use case: > > > > > sum(subset(a, Ozone>100)) > > [1] 3330.5 > > > sum(subset(a, Ozone>200)) > > Error in FUN(X[[i]], ...) : > > only defined on a data frame with all numeric variables > > > > > > However, given that numeric summaries generally treat logicals as 0/1, wouldn't it be easiest just to extend the check inside Summary.data.frame with "&& !is.logical(x)"? > > > > > sum(as.matrix(a[FALSE,])) > > [1] 0 > > > > -pd > > > > > On 17 Oct 2020, at 21:18 , Martin <[hidden email]> wrote: > > > > > > The "Summary" group generics always throw errors for a data.frame with zero rows, for example: > > >> sum(data.frame(x = numeric(0))) > > > #> Error in FUN(X[[i]], ...) : > > > #> only defined on a data frame with all numeric variables > > > Same behaviour for min, max, any, all, ... . I believe this is inconsistent with what these methods do for other empty objects (vectors, matrices), where the return value is chosen to ensure transitivity: sum(numeric(0)) == 0. > > > > > > The reason for this is that the return type of as.matrix() for empty (no rows or no columns) data.frame objects is always a matrix of type "logical". The Summary method for data.frame, in turn, throws an error when the data.frame, converted to a matrix, is not of numeric type. > > > > > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it would be fitting to implement both of these fixes, because they also make other things more consistent. > > > > > > 1. Make the return type of as.matrix() for zero-row data.frames consistent with the type that would have been returned, had the data.frame had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if there is an empty "character" column the return matrix should be a character etc. This would make subsetting by row and conversion to matrix commute (except for row names sometimes): > > >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = FALSE]) > > > Furthermore, this change would make as.matrix.data.frame obey the documentation, which indicates that the coercion hierarchy is used for the return type. > > > > > > 2. Make the Summary.data.frame method accept data.frames that produce non-numeric matrices. Next to the main focus of this message, I believe it would e.g. be fitting to have any() and all() work on logical data.frame objects. The current behaviour is such that > > >> any(data.frame(x = 1)) > > > #> [1] TRUE > > > #> Warning message: > > > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical > > > and > > >> any(data.frame(x = TRUE)) > > > #> Error in FUN(X[[i]], ...) : > > > #> only defined on a data frame with all numeric variables > > > So a numeric data.frame warns about implicit coercion, while a logical data.frame (which would not need coercion) does not work at all. > > > > > > (I feel more strongly about fixing 1. than 2., because I don't know the discussion that lead to the behaviour described in 2.) > > > > > > Best, > > > Martin > > > > > > ______________________________________________ > > > [hidden email] mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Office: A 4.23 > > Email: [hidden email] Priv: [hidden email] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
>>>>> mb706
>>>>> on Sun, 18 Oct 2020 22:14:55 +0200 writes: >> From my side: it would be great if you (or R core) could prepare a patch, it would probably take me quite a bit longer than you since I don't have experience creating patches for R. > Best, Martin Basically, just 1. svn co https://svn.r-project.org/R/trunk R-devel 2. inside the R-devel source tree, find src/library/base/R/dataframe.R make the *minimal* changes there, (then also add some regression tests and update the help :-) 3. inside R-devel, do svn diff -x -ubw > mb706.patch 4. you've got the patch file mb706.patch which you could attach to a bug report on R's bugzilla (once you've got an account there ... As you've asked for that *and* as you've proven your good judgment about "true bug" vs. "not what I expected", I'll create such an account for you now, in spite of the fact that I'd still like to know a bit more than "Martin mb706" about you ...) The changes have been committed to R-devel a quarter of an hour ago. We will keep them in R-devel (currently planned to become R 4.1.0 in spring 2021), and not port to the R-4.0.z branch, as the change is something like an API change, and also because nobody had ever reported this as an issue to our knowledge. Thank you, Martin B706 for bringing the issue up, and Gabe and Peter for chiming in !! Best regards, Martin Maechler ETH Zurich and R core team > On Sun, Oct 18, 2020, at 21:49, Gabriel Becker wrote: >> Peter et al, >> >> I had the same thought, in particular for any() and all(), which in as >> much as they should work on data.frames in the first place (which to be >> perfectly honest i do find quite debatable myself), should certainly >> work on "logical" data.frames if they are going to work on "numeric" >> ones. >> >> I can volunteer to prepare a patch if Martin (the reporter) did not >> want to take a crack at it, and further if it is not already being done >> within R-core. >> >> Best, >> ~G >> >> On Sun, Oct 18, 2020 at 12:19 AM peter dalgaard <[hidden email]> wrote: >> > Hmm, yes, this is probably wrong. E.g., we are likely to get inconsistencies out of boundary cases like this >> > >> > > a <- na.omit(airquality) >> > > sum(a) >> > [1] 37495.3 >> > > sum(a[FALSE,]) >> > Error in FUN(X[[i]], ...) : >> > only defined on a data frame with all numeric variables >> > >> > Or, closer to an actual use case: >> > >> > > sum(subset(a, Ozone>100)) >> > [1] 3330.5 >> > > sum(subset(a, Ozone>200)) >> > Error in FUN(X[[i]], ...) : >> > only defined on a data frame with all numeric variables >> > >> > >> > However, given that numeric summaries generally treat logicals as 0/1, wouldn't it be easiest just to extend the check inside Summary.data.frame with "&& !is.logical(x)"? >> > >> > > sum(as.matrix(a[FALSE,])) >> > [1] 0 >> > >> > -pd >> > >> > > On 17 Oct 2020, at 21:18 , Martin <[hidden email]> wrote: >> > > >> > > The "Summary" group generics always throw errors for a data.frame with zero rows, for example: >> > >> sum(data.frame(x = numeric(0))) >> > > #> Error in FUN(X[[i]], ...) : >> > > #> only defined on a data frame with all numeric variables >> > > Same behaviour for min, max, any, all, ... . I believe this is inconsistent with what these methods do for other empty objects (vectors, matrices), where the return value is chosen to ensure transitivity: sum(numeric(0)) == 0. >> > > >> > > The reason for this is that the return type of as.matrix() for empty (no rows or no columns) data.frame objects is always a matrix of type "logical". The Summary method for data.frame, in turn, throws an error when the data.frame, converted to a matrix, is not of numeric type. >> > > >> > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it would be fitting to implement both of these fixes, because they also make other things more consistent. >> > > >> > > 1. Make the return type of as.matrix() for zero-row data.frames consistent with the type that would have been returned, had the data.frame had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if there is an empty "character" column the return matrix should be a character etc. This would make subsetting by row and conversion to matrix commute (except for row names sometimes): >> > >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = FALSE]) >> > > Furthermore, this change would make as.matrix.data.frame obey the documentation, which indicates that the coercion hierarchy is used for the return type. >> > > >> > > 2. Make the Summary.data.frame method accept data.frames that produce non-numeric matrices. Next to the main focus of this message, I believe it would e.g. be fitting to have any() and all() work on logical data.frame objects. The current behaviour is such that >> > >> any(data.frame(x = 1)) >> > > #> [1] TRUE >> > > #> Warning message: >> > > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical >> > > and >> > >> any(data.frame(x = TRUE)) >> > > #> Error in FUN(X[[i]], ...) : >> > > #> only defined on a data frame with all numeric variables >> > > So a numeric data.frame warns about implicit coercion, while a logical data.frame (which would not need coercion) does not work at all. >> > > >> > > (I feel more strongly about fixing 1. than 2., because I don't know the discussion that lead to the behaviour described in 2.) >> > > >> > > Best, >> > > Martin >> > > >> > > ______________________________________________ >> > > [hidden email] mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > -- >> > Peter Dalgaard, Professor, >> > Center for Statistics, Copenhagen Business School >> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> > Phone: (+45)38153501 >> > Office: A 4.23 >> > Email: [hidden email] Priv: [hidden email] >> > >> > ______________________________________________ >> > [hidden email] mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hi,
There are 2 bugs here. The proposed fix to Summary.data.frame() is fine but it doesn't address the other problem reported by the OP that as.matrix() on a zero-row data.frame doesn't respect the type of its columns, like other column-combining operations do: df <- data.frame(a=numeric(0), b=numeric(0)) typeof(as.matrix(df)) # [1] "logical" typeof(unlist(df)) # [1] "double" typeof(do.call(c, df)) # [1] "double" I've run myself into this in a couple of occasions (not in the context of Summary methods) and worked around it with something like: as_matrix_data_frame <- function(df) { ans <- as.matrix(df) if (nrow(df) == 0L) storage.mode(ans) <- typeof(unlist(df)) ans } No reason as.matrix.data.frame() couldn't do something similar. Cheers, H. On 10/20/20 09:36, Martin Maechler wrote: >>>>>> mb706 >>>>>> on Sun, 18 Oct 2020 22:14:55 +0200 writes: > > >> From my side: it would be great if you (or R core) could prepare a patch, it would probably take me quite a bit longer than you since I don't have experience creating patches for R. > > > Best, Martin > > Basically, just > > 1. svn co https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.r-2Dproject.org_R_trunk&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=PpmVRjh2Jrg07bLHjlbhdBgWQWAFe6RK_J2SivC74vw&e= R-devel > > 2. inside the R-devel source tree, find src/library/base/R/dataframe.R > make the *minimal* changes there, > > (then also add some regression tests and update the help :-) > > 3. inside R-devel, do > > svn diff -x -ubw > mb706.patch > > 4. you've got the patch file mb706.patch which you could > attach to a bug report on R's bugzilla > > (once you've got an account there ... > As you've asked for that *and* as you've proven your good > judgment about "true bug" vs. "not what I expected", > I'll create such an account for you now, in spite of the > fact that I'd still like to know a bit more than "Martin > mb706" about you ...) > > The changes have been committed to R-devel a quarter of an hour ago. > We will keep them in R-devel (currently planned to become R > 4.1.0 in spring 2021), and not port to the R-4.0.z branch, as > the change is something like an API change, and also because > nobody had ever reported this as an issue to our knowledge. > > Thank you, Martin B706 for bringing the issue up, and Gabe and Peter > for chiming in !! > > Best regards, > Martin Maechler > ETH Zurich and R core team > > > > On Sun, Oct 18, 2020, at 21:49, Gabriel Becker wrote: > >> Peter et al, > >> > >> I had the same thought, in particular for any() and all(), which in as > >> much as they should work on data.frames in the first place (which to be > >> perfectly honest i do find quite debatable myself), should certainly > >> work on "logical" data.frames if they are going to work on "numeric" > >> ones. > >> > >> I can volunteer to prepare a patch if Martin (the reporter) did not > >> want to take a crack at it, and further if it is not already being done > >> within R-core. > >> > >> Best, > >> ~G > >> > >> On Sun, Oct 18, 2020 at 12:19 AM peter dalgaard <[hidden email]> wrote: > >> > Hmm, yes, this is probably wrong. E.g., we are likely to get inconsistencies out of boundary cases like this > >> > > >> > > a <- na.omit(airquality) > >> > > sum(a) > >> > [1] 37495.3 > >> > > sum(a[FALSE,]) > >> > Error in FUN(X[[i]], ...) : > >> > only defined on a data frame with all numeric variables > >> > > >> > Or, closer to an actual use case: > >> > > >> > > sum(subset(a, Ozone>100)) > >> > [1] 3330.5 > >> > > sum(subset(a, Ozone>200)) > >> > Error in FUN(X[[i]], ...) : > >> > only defined on a data frame with all numeric variables > >> > > >> > > >> > However, given that numeric summaries generally treat logicals as 0/1, wouldn't it be easiest just to extend the check inside Summary.data.frame with "&& !is.logical(x)"? > >> > > >> > > sum(as.matrix(a[FALSE,])) > >> > [1] 0 > >> > > >> > -pd > >> > > >> > > On 17 Oct 2020, at 21:18 , Martin <[hidden email]> wrote: > >> > > > >> > > The "Summary" group generics always throw errors for a data.frame with zero rows, for example: > >> > >> sum(data.frame(x = numeric(0))) > >> > > #> Error in FUN(X[[i]], ...) : > >> > > #> only defined on a data frame with all numeric variables > >> > > Same behaviour for min, max, any, all, ... . I believe this is inconsistent with what these methods do for other empty objects (vectors, matrices), where the return value is chosen to ensure transitivity: sum(numeric(0)) == 0. > >> > > > >> > > The reason for this is that the return type of as.matrix() for empty (no rows or no columns) data.frame objects is always a matrix of type "logical". The Summary method for data.frame, in turn, throws an error when the data.frame, converted to a matrix, is not of numeric type. > >> > > > >> > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it would be fitting to implement both of these fixes, because they also make other things more consistent. > >> > > > >> > > 1. Make the return type of as.matrix() for zero-row data.frames consistent with the type that would have been returned, had the data.frame had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if there is an empty "character" column the return matrix should be a character etc. This would make subsetting by row and conversion to matrix commute (except for row names sometimes): > >> > >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = FALSE]) > >> > > Furthermore, this change would make as.matrix.data.frame obey the documentation, which indicates that the coercion hierarchy is used for the return type. > >> > > > >> > > 2. Make the Summary.data.frame method accept data.frames that produce non-numeric matrices. Next to the main focus of this message, I believe it would e.g. be fitting to have any() and all() work on logical data.frame objects. The current behaviour is such that > >> > >> any(data.frame(x = 1)) > >> > > #> [1] TRUE > >> > > #> Warning message: > >> > > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical > >> > > and > >> > >> any(data.frame(x = TRUE)) > >> > > #> Error in FUN(X[[i]], ...) : > >> > > #> only defined on a data frame with all numeric variables > >> > > So a numeric data.frame warns about implicit coercion, while a logical data.frame (which would not need coercion) does not work at all. > >> > > > >> > > (I feel more strongly about fixing 1. than 2., because I don't know the discussion that lead to the behaviour described in 2.) > >> > > > >> > > Best, > >> > > Martin > >> > > > >> > > ______________________________________________ > >> > > [hidden email] mailing list > >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=q0b1qGN5IxjiKAeQYAkmEKNdqyTOXnuIAFtuPTiPli8&e= > >> > > >> > -- > >> > Peter Dalgaard, Professor, > >> > Center for Statistics, Copenhagen Business School > >> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > >> > Phone: (+45)38153501 > >> > Office: A 4.23 > >> > Email: [hidden email] Priv: [hidden email] > >> > > >> > ______________________________________________ > >> > [hidden email] mailing list > >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=q0b1qGN5IxjiKAeQYAkmEKNdqyTOXnuIAFtuPTiPli8&e= > > > ______________________________________________ > > [hidden email] mailing list > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=q0b1qGN5IxjiKAeQYAkmEKNdqyTOXnuIAFtuPTiPli8&e= > > ______________________________________________ > [hidden email] mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=q0b1qGN5IxjiKAeQYAkmEKNdqyTOXnuIAFtuPTiPli8&e= > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: [hidden email] Phone: (206) 667-5791 Fax: (206) 667-1319 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Free forum by Nabble | Edit this page |