[PATCH 1/2] readtable: add hook for type conversions per column

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/2] readtable: add hook for type conversions per column

Kurt Van Dijck
This commit adds a function parameter to readtable. The function is called
for every column.
The goal is to allow specific (non-standard) type conversions depending on the input.
When the parameter is not given, or the function returns NULL, the legacy default applies.
The colClasses parameter still takes precedence, i.e. the colConvertFn only applies to
the default conversions.
This allows to properly load a .csv with timestamps expressed in the (quite common) %d/%m/%y %H:%M format,
which was impossible since overruling as.POSIXlt makes a copy in the users namespace, and
read.table would still take the base version of as.POSIXlt.
Rather than fixing my specific requirement, this hook allows to probe for any custom format
and do smart things with little syntax.

Signed-off-by: Kurt Van Dijck <[hidden email]>
---
 src/library/utils/R/readtable.R | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/library/utils/R/readtable.R b/src/library/utils/R/readtable.R
index 238542e..076a707 100644
--- a/src/library/utils/R/readtable.R
+++ b/src/library/utils/R/readtable.R
@@ -65,6 +65,7 @@ function(file, header = FALSE, sep = "", quote = "\"'", dec = ".",
          strip.white = FALSE, blank.lines.skip = TRUE,
          comment.char = "#", allowEscapes = FALSE, flush = FALSE,
          stringsAsFactors = default.stringsAsFactors(),
+         colConvert = NULL,
          fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
 {
     if (missing(file) && !missing(text)) {
@@ -226,9 +227,18 @@ function(file, header = FALSE, sep = "", quote = "\"'", dec = ".",
     if(rlabp) do[1L] <- FALSE # don't convert "row.names"
     for (i in (1L:cols)[do]) {
         data[[i]] <-
-            if (is.na(colClasses[i]))
+            if (is.na(colClasses[i])) {
+                tmp <- NULL
+                if (!is.null(colConvert))
+                    # attempt to convert from user provided hook
+                    tmp <- colConvert(data[[i]])
+                if (!is.null(tmp))
+                    (tmp)
+                else
+                    # fallback, default
                 type.convert(data[[i]], as.is = as.is[i], dec=dec,
      numerals=numerals, na.strings = character(0L))
+            }
         ## as na.strings have already been converted to <NA>
             else if (colClasses[i] == "factor") as.factor(data[[i]])
             else if (colClasses[i] == "Date") as.Date(data[[i]])
--
1.8.5.rc3

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/2] readtable: add test for type conversion hook 'colConvert'

Kurt Van Dijck
Signed-off-by: Kurt Van Dijck <[hidden email]>
---
 tests/reg-tests-2.R         | 21 +++++++++++++++++++++
 tests/reg-tests-2.Rout.save | 27 +++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/tests/reg-tests-2.R b/tests/reg-tests-2.R
index 9fd5242..5026fe7 100644
--- a/tests/reg-tests-2.R
+++ b/tests/reg-tests-2.R
@@ -1329,6 +1329,27 @@ unlink(foo)
 ## added in 2.0.0
 
 
+## colConvert in read.table
+probecol <- function(col) {
+ tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y %H:%M"));
+ if (all(!is.na(tmp)))
+ return (tmp)
+ tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y"));
+ if (all(!is.na(tmp)))
+ return (tmp)
+}
+
+Mat <- matrix(c(1:3, letters[1:3], 1:3, LETTERS[1:3],
+                c("22/4/1969", "8/4/1971", "23/9/1973"),
+                c("22/4/1969 6:01", " 8/4/1971 7:23", "23/9/1973 8:45")),
+              3, 6)
+foo <- tempfile()
+write.table(Mat, foo, sep = ",", col.names = FALSE, row.names = FALSE)
+read.table(foo, sep = ",", colConvert=probecol)
+unlist(sapply(.Last.value, class))
+unlink(foo)
+
+
 ## write.table with complex columns (PR#7260, in part)
 write.table(data.frame(x = 0.5+1:4, y = 1:4 + 1.5i), file = "")
 # printed all as complex in 2.0.0.
diff --git a/tests/reg-tests-2.Rout.save b/tests/reg-tests-2.Rout.save
index 598dd71..668898e 100644
--- a/tests/reg-tests-2.Rout.save
+++ b/tests/reg-tests-2.Rout.save
@@ -4206,6 +4206,33 @@ Warning message:
 > ## added in 2.0.0
 >
 >
+> ## colConvert in read.table
+> probecol <- function(col) {
++ tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y %H:%M"));
++ if (all(!is.na(tmp)))
++ return (tmp)
++ tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y"));
++ if (all(!is.na(tmp)))
++ return (tmp)
++ }
+>
+> Mat <- matrix(c(1:3, letters[1:3], 1:3, LETTERS[1:3],
++                 c("22/4/1969", "8/4/1971", "23/9/1973"),
++                 c("22/4/1969 6:01", " 8/4/1971 7:23", "23/9/1973 8:45")),
++               3, 6)
+> foo <- tempfile()
+> write.table(Mat, foo, sep = ",", col.names = FALSE, row.names = FALSE)
+> read.table(foo, sep = ",", colConvert=probecol)
+  V1 V2 V3 V4         V5                  V6
+1  1  a  1  A 1969-04-22 1969-04-22 06:01:00
+2  2  b  2  B 1971-04-08 1971-04-08 07:23:00
+3  3  c  3  C 1973-09-23 1973-09-23 08:45:00
+> unlist(sapply(.Last.value, class))
+       V1        V2        V3        V4       V51       V52       V61       V62
+"integer"  "factor" "integer"  "factor" "POSIXlt"  "POSIXt" "POSIXlt"  "POSIXt"
+> unlink(foo)
+>
+>
 > ## write.table with complex columns (PR#7260, in part)
 > write.table(data.frame(x = 0.5+1:4, y = 1:4 + 1.5i), file = "")
 "x" "y"
--
1.8.5.rc3

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2] readtable: add hook for type conversions per column

Kurt Van Dijck
In reply to this post by Kurt Van Dijck
Hello,

I want to find out if this patch is ok or not, and if not, what should
change.

Kind regards,
Kurt

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2] readtable: add hook for type conversions per column

R devel mailing list
Please file a bug on bugzilla so we can discuss this further.

On Tue, Mar 26, 2019 at 11:53 AM Kurt Van Dijck <
[hidden email]> wrote:

> Hello,
>
> I want to find out if this patch is ok or not, and if not, what should
> change.
>
> Kind regards,
> Kurt
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2] readtable: add hook for type conversions per column

Kurt Van Dijck
On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote:
>    Please file a bug on bugzilla so we can discuss this further.

All fine.
I didn't find a way to create an account on bugs.r-project.org.
Did I just not see it? or do I need administrator assistance?

Kind regards,
Kurt

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2] readtable: add hook for type conversions per column

bbolker

  You need admin assistance, someone will probably see your request here
and fulfill it.

  It might be helpful to read this question/answer on StackOverflow
discussing the context of proposing patches to base R functionality ...

https://stackoverflow.com/questions/8065835/proposing-feature-requests-to-the-r-core-team

  cheers
    Ben Bolker


On 2019-03-26 4:20 p.m., Kurt Van Dijck wrote:

> On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote:
>>    Please file a bug on bugzilla so we can discuss this further.
>
> All fine.
> I didn't find a way to create an account on bugs.r-project.org.
> Did I just not see it? or do I need administrator assistance?
>
> Kind regards,
> Kurt
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2] readtable: add hook for type conversions per column

Martin Maechler
In reply to this post by Kurt Van Dijck
>>>>> Kurt Van Dijck
>>>>>     on Tue, 26 Mar 2019 21:20:07 +0100 writes:

    > On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote:
    >> Please file a bug on bugzilla so we can discuss this
    >> further.

    > All fine.  I didn't find a way to create an account on
    > bugs.r-project.org.  Did I just not see it? or do I need
    > administrator assistance?

    > Kind regards, Kurt

--> https://www.r-project.org/bugs.html

Yes, there's some effort involved - for logistic reasons,
but I now find it's a also good thing that you have to read and
understand and then even e-talk to a human in the process.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] readtable enhancement

Kurt Van Dijck
In reply to this post by bbolker
Thank you for your answers.
I rather do not file a new bug, since what I coded isn't really a bug.

The problem I (my colleagues) have today is very stupid:
We read .csv files with a lot of columns, of which most contain
date-time stamps, coded in DD/MM/YYYY HH:MM.
This is not exotic, but the base library's readtable (and derivatives)
only accept date-times in a limited number of possible formats (which I
understand very well).

We could specify a format in a rather complicated format, for each
column individually, but this syntax is rather difficult to maintain.

My solution to this specific problem became trivial, yet generic
extension to read.table.
Rather than relying on the built-in type detection, I added a parameter
to a function that will be called for each to-be-type-probed column so I
can overrule the built-in limited default.
If nothing returns from the function, the built-in default is still
used.

This way, I could construct a type-probing function that is
straight-forward, not hard to code, and makes reading my .csv files
acceptible in terms of code (read.table parameters).

I'm sure I'm not the only one dealing with such needs, escpecially
date-time formats exist in enormous amounts, but I want to stress here
that my approach is agnostic to my specific problem.

For those asking to 'show me the code', I redirect to my 2nd patch,
where the tests have been extended with my specific problem.

What are your opinions about this?

Kind regards,
Kurt

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] readtable enhancement

R devel mailing list
This has some nice properties:

1) It self-documents the input expectations in a similar manner to
colClasses.
2) The implementation could eventually "push down" the coercion, e.g.,
calling it on each chunk of an iterative read operation.

The implementation needs work though, and I'm not convinced that coercion
failures should fallback gracefully to the default.

Feature requests fall under a "bug" in bugzilla terminology, so please
submit this there. I think I've made you an account.

Thanks,
Michael

On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck <
[hidden email]> wrote:

> Thank you for your answers.
> I rather do not file a new bug, since what I coded isn't really a bug.
>
> The problem I (my colleagues) have today is very stupid:
> We read .csv files with a lot of columns, of which most contain
> date-time stamps, coded in DD/MM/YYYY HH:MM.
> This is not exotic, but the base library's readtable (and derivatives)
> only accept date-times in a limited number of possible formats (which I
> understand very well).
>
> We could specify a format in a rather complicated format, for each
> column individually, but this syntax is rather difficult to maintain.
>
> My solution to this specific problem became trivial, yet generic
> extension to read.table.
> Rather than relying on the built-in type detection, I added a parameter
> to a function that will be called for each to-be-type-probed column so I
> can overrule the built-in limited default.
> If nothing returns from the function, the built-in default is still
> used.
>
> This way, I could construct a type-probing function that is
> straight-forward, not hard to code, and makes reading my .csv files
> acceptible in terms of code (read.table parameters).
>
> I'm sure I'm not the only one dealing with such needs, escpecially
> date-time formats exist in enormous amounts, but I want to stress here
> that my approach is agnostic to my specific problem.
>
> For those asking to 'show me the code', I redirect to my 2nd patch,
> where the tests have been extended with my specific problem.
>
> What are your opinions about this?
>
> Kind regards,
> Kurt
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] readtable enhancement

bbolker
   Just to clarify/amplify: on the bug tracking system there's a
drop-down menu to specify severity, and "enhancement" is one of the
choices, so you don't have to worry that you're misrepresenting your
patch as fixing a bug.

  The fact that an R-core member (Michael Lawrence) thinks this is
worth looking at is very encouraging (and somewhat unusual for
feature/enhancement suggestions)!

  Ben Bolker

On Wed, Mar 27, 2019 at 5:29 PM Michael Lawrence via R-devel
<[hidden email]> wrote:

>
> This has some nice properties:
>
> 1) It self-documents the input expectations in a similar manner to
> colClasses.
> 2) The implementation could eventually "push down" the coercion, e.g.,
> calling it on each chunk of an iterative read operation.
>
> The implementation needs work though, and I'm not convinced that coercion
> failures should fallback gracefully to the default.
>
> Feature requests fall under a "bug" in bugzilla terminology, so please
> submit this there. I think I've made you an account.
>
> Thanks,
> Michael
>
> On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck <
> [hidden email]> wrote:
>
> > Thank you for your answers.
> > I rather do not file a new bug, since what I coded isn't really a bug.
> >
> > The problem I (my colleagues) have today is very stupid:
> > We read .csv files with a lot of columns, of which most contain
> > date-time stamps, coded in DD/MM/YYYY HH:MM.
> > This is not exotic, but the base library's readtable (and derivatives)
> > only accept date-times in a limited number of possible formats (which I
> > understand very well).
> >
> > We could specify a format in a rather complicated format, for each
> > column individually, but this syntax is rather difficult to maintain.
> >
> > My solution to this specific problem became trivial, yet generic
> > extension to read.table.
> > Rather than relying on the built-in type detection, I added a parameter
> > to a function that will be called for each to-be-type-probed column so I
> > can overrule the built-in limited default.
> > If nothing returns from the function, the built-in default is still
> > used.
> >
> > This way, I could construct a type-probing function that is
> > straight-forward, not hard to code, and makes reading my .csv files
> > acceptible in terms of code (read.table parameters).
> >
> > I'm sure I'm not the only one dealing with such needs, escpecially
> > date-time formats exist in enormous amounts, but I want to stress here
> > that my approach is agnostic to my specific problem.
> >
> > For those asking to 'show me the code', I redirect to my 2nd patch,
> > where the tests have been extended with my specific problem.
> >
> > What are your opinions about this?
> >
> > Kind regards,
> > Kurt
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] readtable enhancement

Kurt Van Dijck
In reply to this post by R devel mailing list
Hey,

In the meantime, I submitted a bug. Thanks for the assistence on that.

>    and I'm not convinced that
>    coercion failures should fallback gracefully to the default.

the gracefull fallback:
- makes the code more complex
+ keeps colConvert implementations limited
+ requires the user to only implement what changed from the default
+ seemed to me to smallest overall effort

In my opinion, gracefull fallback makes the thing better,
but without it, the colConvert parameter remains usefull, it would still
fill a gap.

>    The implementation needs work though,

Other than to remove the gracefull fallback?

Kind regards,
Kurt

On wo, 27 mrt 2019 14:28:25 -0700, Michael Lawrence wrote:

>    This has some nice properties:
>    1) It self-documents the input expectations in a similar manner to
>    colClasses.
>    2) The implementation could eventually "push down" the coercion, e.g.,
>    calling it on each chunk of an iterative read operation.
>    The implementation needs work though, and I'm not convinced that
>    coercion failures should fallback gracefully to the default.
>    Feature requests fall under a "bug" in bugzilla terminology, so please
>    submit this there. I think I've made you an account.
>    Thanks,
>    Michael
>
>    On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck
>    <[1][hidden email]> wrote:
>
>      Thank you for your answers.
>      I rather do not file a new bug, since what I coded isn't really a
>      bug.
>      The problem I (my colleagues) have today is very stupid:
>      We read .csv files with a lot of columns, of which most contain
>      date-time stamps, coded in DD/MM/YYYY HH:MM.
>      This is not exotic, but the base library's readtable (and
>      derivatives)
>      only accept date-times in a limited number of possible formats
>      (which I
>      understand very well).
>      We could specify a format in a rather complicated format, for each
>      column individually, but this syntax is rather difficult to
>      maintain.
>      My solution to this specific problem became trivial, yet generic
>      extension to read.table.
>      Rather than relying on the built-in type detection, I added a
>      parameter
>      to a function that will be called for each to-be-type-probed column
>      so I
>      can overrule the built-in limited default.
>      If nothing returns from the function, the built-in default is still
>      used.
>      This way, I could construct a type-probing function that is
>      straight-forward, not hard to code, and makes reading my .csv files
>      acceptible in terms of code (read.table parameters).
>      I'm sure I'm not the only one dealing with such needs, escpecially
>      date-time formats exist in enormous amounts, but I want to stress
>      here
>      that my approach is agnostic to my specific problem.
>      For those asking to 'show me the code', I redirect to my 2nd patch,
>      where the tests have been extended with my specific problem.
>      What are your opinions about this?
>      Kind regards,
>      Kurt

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] readtable enhancement

Gabriel Becker-2
Kurt,

Cool idea and great "seeing new faces" on here proposing things on here and
engaging with R-core on here.

Some comments on the issue of fallbacks below.


On Wed, Mar 27, 2019 at 10:33 PM Kurt Van Dijck <
[hidden email]> wrote:

> Hey,
>
> In the meantime, I submitted a bug. Thanks for the assistence on that.
>
> >    and I'm not convinced that
> >    coercion failures should fallback gracefully to the default.
>
> the gracefull fallback:
> - makes the code more complex
> + keeps colConvert implementations limited
> + requires the user to only implement what changed from the default
> + seemed to me to smallest overall effort
>
> In my opinion, gracefull fallback makes the thing better,
> but without it, the colConvert parameter remains usefull, it would still
> fill a gap.
>

Another way of viewing coercion failure, I think, is that either the
user-supplied converter has a bug in it or was mistakenly applied in a
situation where it shouldn't have been. If thats the case the fail early
and loud paradigm might ultimately be more helpful to users there.

Another thought in the same vein is that if fallback occurs, the returned
result will not be what the user asked for and is expecting. So either
their code which assumes (e.g., that a column has correctly parsed as a
date) is going to break in mysterious (to them) ways, or they have to put a
bunch of their own checking logic after the call to see if their converters
actually worked in order to protect themselves from that.  Neither really
seems ideal to me; I think an error would be better, myself. I'm more of a
software developer than a script writer/analyst though, so its possible
others' opinions would differ (though I'd be a bit surprised by that in
this particular case given the danger).

Best,
~G

> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] readtable enhancement

Kurt Van Dijck
On wo, 27 mrt 2019 22:55:06 -0700, Gabriel Becker wrote:

>    Kurt,
>    Cool idea and great "seeing new faces" on here proposing things on here
>    and engaging with R-core on here.
>    Some comments on the issue of fallbacks below.
>    On Wed, Mar 27, 2019 at 10:33 PM Kurt Van Dijck
>    <[1][hidden email]> wrote:
>
>      Hey,
>      In the meantime, I submitted a bug. Thanks for the assistence on
>      that.
>      >    and I'm not convinced that
>      >    coercion failures should fallback gracefully to the default.
>      the gracefull fallback:
>      - makes the code more complex
>      + keeps colConvert implementations limited
>      + requires the user to only implement what changed from the default
>      + seemed to me to smallest overall effort
>      In my opinion, gracefull fallback makes the thing better,
>      but without it, the colConvert parameter remains usefull, it would
>      still
>      fill a gap.
>
>    Another way of viewing coercion failure, I think, is that either the
>    user-supplied converter has a bug in it or was mistakenly applied in a
>    situation where it shouldn't have been. If thats the case the fail
>    early and loud paradigm might ultimately be more helpful to users
>    there.
>    Another thought in the same vein is that if fallback occurs, the
>    returned result will not be what the user asked for and is expecting.
>    So either their code which assumes (e.g., that a column has correctly
>    parsed as a date) is going to break in mysterious (to them) ways, or
>    they have to put a bunch of their own checking logic after the call to
>    see if their converters actually worked in order to protect themselves
>    from that.  Neither really seems ideal to me; I think an error would be
>    better, myself. I'm more of a software developer than a script
>    writer/analyst though, so its possible others' opinions would differ
>    (though I'd be a bit surprised by that in this particular case given
>    the danger).

I see.
So if we provide a default colConvert, named e.g. colConvertBuiltin,
which is used if colConvert is not given?
1) This respects the 'fail early and loud'.
2) The user would get what he asks for
3) A colConvert implementation would be able to call colConvertBuiltin
manually if desired, so have colConvert limited to adding on top of the
default implementation.

If this is acceptable, I'll prepare a new patch.

Kind regards,
Kurt

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] readtable enhancement

R devel mailing list
Gabe described my main concern. Specifying a coercion function asserts that
the data (1) were what was expected and (2) were converted into what was
expected. Allowing a coercer to delegate to a "next method" is a good idea,
but keep in mind that any function that did that would not confer the
beneficial constraints mentioned above.

We can continue the discussion at:
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17546

Michael

On Thu, Mar 28, 2019 at 1:35 AM Kurt Van Dijck <
[hidden email]> wrote:

> On wo, 27 mrt 2019 22:55:06 -0700, Gabriel Becker wrote:
> >    Kurt,
> >    Cool idea and great "seeing new faces" on here proposing things on
> here
> >    and engaging with R-core on here.
> >    Some comments on the issue of fallbacks below.
> >    On Wed, Mar 27, 2019 at 10:33 PM Kurt Van Dijck
> >    <[1][hidden email]> wrote:
> >
> >      Hey,
> >      In the meantime, I submitted a bug. Thanks for the assistence on
> >      that.
> >      >    and I'm not convinced that
> >      >    coercion failures should fallback gracefully to the default.
> >      the gracefull fallback:
> >      - makes the code more complex
> >      + keeps colConvert implementations limited
> >      + requires the user to only implement what changed from the default
> >      + seemed to me to smallest overall effort
> >      In my opinion, gracefull fallback makes the thing better,
> >      but without it, the colConvert parameter remains usefull, it would
> >      still
> >      fill a gap.
> >
> >    Another way of viewing coercion failure, I think, is that either the
> >    user-supplied converter has a bug in it or was mistakenly applied in a
> >    situation where it shouldn't have been. If thats the case the fail
> >    early and loud paradigm might ultimately be more helpful to users
> >    there.
> >    Another thought in the same vein is that if fallback occurs, the
> >    returned result will not be what the user asked for and is expecting.
> >    So either their code which assumes (e.g., that a column has correctly
> >    parsed as a date) is going to break in mysterious (to them) ways, or
> >    they have to put a bunch of their own checking logic after the call to
> >    see if their converters actually worked in order to protect themselves
> >    from that.  Neither really seems ideal to me; I think an error would
> be
> >    better, myself. I'm more of a software developer than a script
> >    writer/analyst though, so its possible others' opinions would differ
> >    (though I'd be a bit surprised by that in this particular case given
> >    the danger).
>
> I see.
> So if we provide a default colConvert, named e.g. colConvertBuiltin,
> which is used if colConvert is not given?
> 1) This respects the 'fail early and loud'.
> 2) The user would get what he asks for
> 3) A colConvert implementation would be able to call colConvertBuiltin
> manually if desired, so have colConvert limited to adding on top of the
> default implementation.
>
> If this is acceptable, I'll prepare a new patch.
>
> Kind regards,
> Kurt
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel