Possible issue with coercion in sprintf()?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Possible issue with coercion in sprintf()?

Evan Cortens
Dear R-Devel folks,

I've just run into what initially struck me as a rather strange result, as
follows:

> sprintf('%d', c(1.0, NA))
[1] "1"  "NA"

> sprintf('%d', c(NA, 1.0))
Error in sprintf("%d", c(NA, 1)) :
  invalid format '%d'; use format %f, %e, %g or %a for numeric objects

So if I pass sprintf() a vector of reals and attempt to format them as
integers, it'll work if the first element in the vector is identical to
that element coerced to an integer (and not NA). In other words, for a
vector x, as.numeric(as.integer(x[0])) == x[0]. (Which is actually written
in C as R_FINITE(r) && (double)((int) r) == r, where r is the first element
of the vector.) But it won't work if the first element is NA. (Of course it
also won't work if the first element is, say, 1.1 rather than 1.0, for
obvious reasons.)

The reason for this is clear, namely, in sprintf.c, the coercion only
checks the first item in the vector (in the latest R-devel, this is line
275, if(ns == 0)).

As far as I can see, the help file for sprintf() doesn't explicitly mention
this behaviour, though it does imply that you shouldn't rely on coercion
always working, and it's better to pass the right kind of arguments/use the
right format strings. The behaviour is specifically mentioned in a comment
in the source though: "Now let us see if some minimal coercion would be
sensible, but only do so once, for ns = 0:", so it's clear this isn't some
kind of oversight or accident.

My question is basically, I wonder if in trying to be helpful by coercing
reals to integers, sprintf() might actually be making things more
confusing? Another solution here would be to never coerce: if you pass a
vector of reals with a format of "%d", it simply gives you an error,
telling you to
"use format %f, %e, %g or %a for numeric objects". Yet another possibility
is to have some special treatment for NA's, where it would check the first
non-NA element in the vector rather than the first element.

Or perhaps I'm totally off base here and this has been thoroughly discussed
in the past.

All best,

Evan

--
Evan Cortens, PhD
Institutional Analyst - Office of Institutional Analysis
Mount Royal University
403-440-6529

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel