round() and signif() do not check argument names when a single argument is given

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

round() and signif() do not check argument names when a single argument is given

Shane Mueller
Hi,

I was told to send this to the -devel list instead of posting to bugzilla.

When round our signif are called with a single named argument, R does not
check the name and runs the function with that named argument directly as
the first argument,  using 0.0 or 6.0 (6 in the case of signif) for the
second argument. Not checking the argument name is at odds with how all
other primitive functions are handled, and so I expect these to trigger an
error like other functions do.  I've tested in 4.0.0 code, but this
behavior might go back decades.

The cases are things like this:

#horse is not a legitimate argument
round(horse=3.1235)
[1] 3

#digits is a legitimate argument in round,
#but the value gets used without checking as the first argument.
round(digits=3.1135)
[1] 3

The second case is especially strange to me, because digits is a named
argument of the function.  No other functions behave this way as far as I
can tell; the first would normally either trigger a missing x value message
or an unused argument if it is a user function; the second would trigger a
warning about missing x.

Compare to:
>  log(base=3)
Error: argument "x" is missing, with no default

These two functions are handled in a  handler function in
src/main/arithmetic.c: do_Math2().  The strange cases get handled by the
code around line 1655 in src/main/arithmetic.c. The context is  n is the
number of arguments, but symbol names have not yet been checked:

if(n == 1) {
   double digits = 0.0;
   if(PRIMVAL(op) == 10004) digits = 6.0;
   SETCDR(args, CONS(ScalarReal(digits), R_NilValue));
}

Here, 10004 is the opcode symbol for signif and 10001 is for round, but
these are the only two ops handled by this function, so round uses
digits=0.0.  The SETCDR creates the argument list to be the current 1-item
list plus the new digits value set here, and then this gets passed to
do_math2.  do_math2 looks like it checks the arity of the arguments but not
the names, and calls fround or frpec on the arguments depending on the
opcode.

This case can be guarded against by adding a check for using the wrong
symbol name at this point, which only covers the case of calling these
functions with a single argument. This is essentially the same guard used
in other functions like do_log_bulitin  that is in the same file.

 if(n == 1 ){
static SEXP R_x_Symbol = NULL;
R_x_Symbol = install("x");
/*This handles single-argument calling*/
/*Make sure we don't call it with a mis-named argument:*/
if(CAR(args)==R_MissingArg ||
  TAG(args) != R_NilValue && TAG(args) != R_x_Symbol)
 error(_("argument \"%s\" is missing, with no default"), "x");

   double digits = 0.0;
   if(PRIMVAL(op) == 10004) digits = 6.0;
   SETCDR(args, CONS(ScalarReal(digits), R_NilValue));
}

I'm not sure if CAR(args)==R_MissingArg is doing anything here, and
removing it seems to work, but this pattern is used in do_log_bultin.

I've tested this against the cases copied below, and this change will now
cause the error I expect in cases 5 and 6  [like round(digits=5.532) and
round(horse=5.131)],  whereas current R 4.0.0 does not. This change
includes some _() text, but I'm not sure whether that means it will impact
internationalization since the strings are identical to other error
messages.

In the R code below, I put these in tryCatch so you can save the text and
do a source() of the file. Both R 4.0.0 and the changed code correctly
cause errors in case 7/8, to demonstrate that these cases are not impacted
by the change.

##This tests various round/signif argument configurations
## Each of these should succeed/fail  the same if used in signif

cat("\nCase 1: round(1.12345): " )
cat(round(1.12345),"\n")

cat("\nCase 2: round(x=1.12345,2): ")
cat(round(x=1.12345,2),"\n")

cat("\nCase 3: round(x=1.12345,digits=2): ")
cat(round(x=1.12345,digits=2),"\n")

cat("\nCase 4: round(digits=2,x=1.12345): ")
cat(round(digits=2,x=1.12345),"\n")

cat("\nCase 4b: round(digits=2,1.12345): ")
cat(round(digits=2,1.12345),"\n")

## Current R 4.0 does not produce error but should
cat("\nCase 5: round(digits=x): \n")
tryCatch(
cat("round(digits=99.23456): ",  round(digits=99.23456)),
error=function(cond){print(paste("[Error in case]",cond))})


## Current R 4.0 does not produce error but should
cat("\nCase 6: round(banana=x): \n")
tryCatch(
  cat("round(banana=99.23456): ", round(banana=99.23456))
,error=function(cond){print(paste("[Error in case]",cond))})

## Error case
##error:
cat("\nCase 7: round(x=1.12345, digits=2, banana=3): ")

tryCatch( { cat(round(x=1.12345, digits=2, banana=3),"\n")},
   error=function(cond){print(paste("[Error in case]",cond))})

##error case
cat("\nCase 8 : round(x=1.12345, banana=3):  ")
tryCatch( round(x=1.12345, banana=3),
   error=function(cond){print(paste("[Error in case]",cond))})

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: round() and signif() do not check argument names when a single argument is given

Shane Mueller
On Fri, May 22, 2020 at 9:55 PM David Winsemius <[hidden email]>
wrote:

> The premise in the first few lines of your preamble is at odds (in the
> logical sense) with my understanding of primitive function behavior. Try:
>
> data.frame(x=1:2,y=letters[1:2])[j=2, i=1]
>
> David
>

I had never seen naming indexes of the [] operator.  The documentation of
[] indicates that it does argument matching in a non-standard way,
recommends against doing it, and states the [.data.frame behavior used in
this example is 'undocumented'.  In the example above a warning is thrown
as well.

Here is the [] documentation:

Argument matching

> Note that these operations do not match their index arguments in the
> standard way: argument names are ignored and positional matching only is
> used. So m[j = 2, i = 1] is equivalent to m[2, 1] and not to m[1, 2].
>
> This may not be true for methods defined for them; for example it is not
> true for the data.frame methods described in [.data.frame which warn if i
> or j is named and have undocumented behaviour in that case.
>
> To avoid confusion, do not name index arguments (but drop and exact must
> be named).
>


For the data frames operator [], i and j appear to be named and used
arguments, as the following causes an unused argument error for k:
data.frame(x=1:2,y=letters[1:2])[j=2, k=1]

The analog for round() would be indexing with something like  [k=1,] alone,
which causes an unused argument error error for data frames, which is what
I'm suggesting round(banana=3.5) should do.   (note it works for matrix as
documented).

Best,
Shane

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: round() and signif() do not check argument names when a single argument is given

Martin Maechler
>>>>> Shane Mueller
>>>>>     on Sat, 23 May 2020 00:37:57 -0400 writes:

    > On Fri, May 22, 2020 at 9:55 PM David Winsemius <[hidden email]>
    > wrote:

    >> The premise in the first few lines of your preamble is at odds (in the
    >> logical sense) with my understanding of primitive function behavior. Try:
    >>
    >> data.frame(x=1:2,y=letters[1:2])[j=2, i=1]
    >>
    >> David
    >>

    > I had never seen naming indexes of the [] operator.  The documentation of
    > [] indicates that it does argument matching in a non-standard way,
    > recommends against doing it, and states the [.data.frame behavior used in
    > this example is 'undocumented'.  In the example above a warning is thrown
    > as well.

    > Here is the [] documentation:

    > Argument matching
    >> Note that these operations do not match their index arguments in the
    >> standard way: argument names are ignored and positional matching only is
    >> used. So m[j = 2, i = 1] is equivalent to m[2, 1] and not to m[1, 2].
    >>
    >> This may not be true for methods defined for them; for example it is not
    >> true for the data.frame methods described in [.data.frame which warn if i
    >> or j is named and have undocumented behaviour in that case.
    >>
    >> To avoid confusion, do not name index arguments (but drop and exact must
    >> be named).
    >>


    > For the data frames operator [], i and j appear to be named and used
    > arguments, as the following causes an unused argument error for k:
    > data.frame(x=1:2,y=letters[1:2])[j=2, k=1]

    > The analog for round() would be indexing with something like  [k=1,] alone,
    > which causes an unused argument error error for data frames, which is what
    > I'm suggesting round(banana=3.5) should do.   (note it works for matrix as
    > documented).

    > Best,
    > Shane

I agree with Shane.

I think this is a small "wart" that we should remove, ...
and I have been testing a version of your code addition,
and plan to commit that (once I've added regression tests etc).

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel