Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))

Juan Telleria Ruiz de Aguirre
Dear R Developers,

I would like to propose a simple optimization for print.data.frame
base function:

To add: x <- as.data.frame(head(x, n = options("max.print")))

This would prevent that, if for example, we have a 10GB data.frame
(e.g.: Instead of a data.table), and we accidentally print it, the R
Session does not "collapse", forcing us to press ESC or kill the
RSession.

function (x, ..., digits = NULL, quote = FALSE, right = TRUE,
          row.names = TRUE)
{
  n <- length(row.names(x))
  if (length(x) == 0L) {
    cat(sprintf(ngettext(n, "data frame with 0 columns and %d row",
                         "data frame with 0 columns and %d rows"), n), "\n",
        sep = "")
  }
  else if (n == 0L) {
    print.default(names(x), quote = FALSE)
    cat(gettext("<0 rows> (or 0-length row.names)\n"))
  }
  else {

    x <- as.data.frame(head(x, n = options("max.print")))

    m <- as.matrix(format.data.frame(x, digits = digits,
                                     na.encode = FALSE))
    if (!isTRUE(row.names))
      dimnames(m)[[1L]] <- if (isFALSE(row.names))
        rep.int("", n)
    else row.names
    print(m, ..., quote = quote, right = right)
  }
  invisible(x)
}

Thank you.

Best,
Juan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))

Juan Telleria Ruiz de Aguirre
I polished a little bit more the function:

* Used:  getOption("max.print")
* Added comment at the end:  cat('[ reached getOption("max.print") --
omitted ', omitted,' rows ]')

function (x, ..., digits = NULL, quote = FALSE, right = TRUE,
          row.names = TRUE)
{
  n <- length(row.names(x))
  if (length(x) == 0L) {
    cat(sprintf(ngettext(n, "data frame with 0 columns and %d row",
                         "data frame with 0 columns and %d rows"), n), "\n",
        sep = "")
  }
  else if (n == 0L) {
    print.default(names(x), quote = FALSE)
    cat(gettext("<0 rows> (or 0-length row.names)\n"))
  }
  else {

    omitted <- nrow(x)-getOption("max.print")

    x <- as.data.frame(head(x, n = getOption("max.print")))

    m <- as.matrix(format.data.frame(x, digits = digits,
                                     na.encode = FALSE))
    if (!isTRUE(row.names))
      dimnames(m)[[1L]] <- if (isFALSE(row.names))
        rep.int("", n)
    else row.names
    print(m, ..., quote = quote, right = right)

    if((nrow(x)-getOption("max.print"))>0){

      cat('[ reached getOption("max.print") -- omitted ', omitted,' rows ]')

    }

  }
  invisible(x)
}

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))

Martin Maechler
>>>>> Juan Telleria Ruiz de Aguirre
>>>>>     on Tue, 31 Jul 2018 08:19:33 +0200 writes:

    > I polished a little bit more the function:
    > * Used:  getOption("max.print")
    > * Added comment at the end:  cat('[ reached getOption("max.print") --
    > omitted ', omitted,' rows ]')

    > I polished a little bit more the function:

    > * Used:  getOption("max.print")
    > * Added comment at the end:  cat('[ reached getOption("max.print") --
    > omitted ', omitted,' rows ]')

and before

     > I would like to propose a simple optimization for print.data.frame
     > base function:
     >
     > To add: x <- as.data.frame(head(x, n = options("max.print")))
     >
     > This would prevent that, if for example, we have a 10GB data.frame
     > (e.g.: Instead of a data.table), and we accidentally print it, the R
     > Session does not "collapse", forcing us to press ESC or kill the
     > RSession.

Thank you, Juan.
You are right: The whole idea of introducing the 'max.print'
option (and the corresponding 'max' argument in print.default()
       {and print.Date() currently })
was that print() ing should not use too much resources.

and you are also right to use 'max.print' .. but R should be as
functional a language as sensible, and hence print(<data.frame>)
should be getting an argument 'max' which by default is equal to
the "max.print" option.

Also, any good citizen print() method *must* return its argument invisibly.
==> you are not supposed to change 'x' here.

But I entirely agree with your basic intuition for the problem
resolution.  Very good, thank you, indeed!

I'm currently running 'make check-all'  with the following change
to the source code (aka "patch") :

===================================================================
--- src/library/base/R/dataframe.R (revision 75016)
+++ src/library/base/R/dataframe.R (working copy)
@@ -1477,7 +1477,7 @@
 
 print.data.frame <-
     function(x, ..., digits = NULL, quote = FALSE, right = TRUE,
-     row.names = TRUE)
+     row.names = TRUE, max = NULL)
 {
     n <- length(row.names(x))
     if(length(x) == 0L) {
@@ -1489,12 +1489,19 @@
  print.default(names(x), quote = FALSE)
  cat(gettext("<0 rows> (or 0-length row.names)\n"))
     } else {
+ if(is.null(max)) max <- getOption("max.print", 99999L)
  ## format.<*>() : avoiding picking up e.g. format.AsIs
- m <- as.matrix(format.data.frame(x, digits = digits, na.encode = FALSE))
+ omit <- (n0 <- max %/% length(x)) < n
+ m <- as.matrix(
+    format.data.frame(if(omit) x[seq_len(n0), , drop=FALSE] else x,
+      digits = digits, na.encode = FALSE))
  if(!isTRUE(row.names))
     dimnames(m)[[1L]] <-
  if(isFALSE(row.names)) rep.int("", n) else row.names
  print(m, ..., quote = quote, right = right)
+ if(omit)
+    cat(" [ reached 'max' / getOption(\"max.print\") -- omitted",
+ n - n0, "rows ]\n")
     }
     invisible(x)
 }

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel