nchar(x, type = "bytes") seems slower than it could be

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

nchar(x, type = "bytes") seems slower than it could be

Hugh Parsonage
While profiling some C code, I rolled my own nchar function which
appears to be much faster than base R's (25 times faster for a 10M
length vector).  Obviously base::nchar provides significantly more
features than my barebones function (C snippet below); however, for
argument type = "bytes" it seems that the R_nchar and do_nchar
functions do not actually do anything more than this function.

My suspicion is that I have overlooked some subtlety in the base R
code, or that my benchmarks are not representative.  Alternatively,
the action in `do_nchar` of preparing the potential error message
before being passed to `R_nchar` may be quite costly indeed.  Or the
function cannot be unswitched from the more complex width and chars
arguments by the compiler.

If I haven't missed something, would a patch be warranted?

SEXP Cnchar(SEXP x) {
  R_xlen_t N = xlength(x);
  SEXP ans = PROTECT(allocVector(INTSXP, N));
  int * restrict ansp = INTEGER(ans);

  // Ignoring NA to avoid the branch has a very small
  // impact on performance.
  for (R_xlen_t i = 0; i < N; ++i) {
    SEXP sxi = STRING_ELT(x, i);
    if (sxi == NA_STRING) {
      ansp[i] = NA_INTEGER;
      continue;
    }
    ansp[i] = length(sxi);
  }
  UNPROTECT(1);
  return ans;
}

x <- rep_len(c(as.character(c(5L, 1:1e6)), NA_character_, 1e6:15e5), 1e7)
Cnchar(x)
90ms
nchar(x, type = "bytes")
2500 ms

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nchar(x, type = "bytes") seems slower than it could be

Tomas Kalibera
Thanks for the report, you are probably running into the overhead of the
eager creation of the error message. On my system, with your
micro-benchmark, it is about 10x. I've tested simply by uncommenting it
and re-running the benchmark. I'll fix (this is not a good task for a
contributed patch).

Best,
Tomas

On 3/30/21 8:02 AM, Hugh Parsonage wrote:

> While profiling some C code, I rolled my own nchar function which
> appears to be much faster than base R's (25 times faster for a 10M
> length vector).  Obviously base::nchar provides significantly more
> features than my barebones function (C snippet below); however, for
> argument type = "bytes" it seems that the R_nchar and do_nchar
> functions do not actually do anything more than this function.
> My suspicion is that I have overlooked some subtlety in the base R
> code, or that my benchmarks are not representative.  Alternatively,
> the action in `do_nchar` of preparing the potential error message
> before being passed to `R_nchar` may be quite costly indeed.  Or the
> function cannot be unswitched from the more complex width and chars
> arguments by the compiler.
>
> If I haven't missed something, would a patch be warranted?
>
> SEXP Cnchar(SEXP x) {
>    R_xlen_t N = xlength(x);
>    SEXP ans = PROTECT(allocVector(INTSXP, N));
>    int * restrict ansp = INTEGER(ans);
>
>    // Ignoring NA to avoid the branch has a very small
>    // impact on performance.
>    for (R_xlen_t i = 0; i < N; ++i) {
>      SEXP sxi = STRING_ELT(x, i);
>      if (sxi == NA_STRING) {
>        ansp[i] = NA_INTEGER;
>        continue;
>      }
>      ansp[i] = length(sxi);
>    }
>    UNPROTECT(1);
>    return ans;
> }
>
> x <- rep_len(c(as.character(c(5L, 1:1e6)), NA_character_, 1e6:15e5), 1e7)
> Cnchar(x)
> 90ms
> nchar(x, type = "bytes")
> 2500 ms
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nchar(x, type = "bytes") seems slower than it could be

Tomas Kalibera
For reference, fixed in R-devel (80153).
Tomas

On 3/30/21 10:20 AM, Tomas Kalibera wrote:

> Thanks for the report, you are probably running into the overhead of
> the eager creation of the error message. On my system, with your
> micro-benchmark, it is about 10x. I've tested simply by uncommenting
> it and re-running the benchmark. I'll fix (this is not a good task for
> a contributed patch).
>
> Best,
> Tomas
>
> On 3/30/21 8:02 AM, Hugh Parsonage wrote:
>> While profiling some C code, I rolled my own nchar function which
>> appears to be much faster than base R's (25 times faster for a 10M
>> length vector).  Obviously base::nchar provides significantly more
>> features than my barebones function (C snippet below); however, for
>> argument type = "bytes" it seems that the R_nchar and do_nchar
>> functions do not actually do anything more than this function.
>> My suspicion is that I have overlooked some subtlety in the base R
>> code, or that my benchmarks are not representative. Alternatively,
>> the action in `do_nchar` of preparing the potential error message
>> before being passed to `R_nchar` may be quite costly indeed.  Or the
>> function cannot be unswitched from the more complex width and chars
>> arguments by the compiler.
>>
>> If I haven't missed something, would a patch be warranted?
>>
>> SEXP Cnchar(SEXP x) {
>>    R_xlen_t N = xlength(x);
>>    SEXP ans = PROTECT(allocVector(INTSXP, N));
>>    int * restrict ansp = INTEGER(ans);
>>
>>    // Ignoring NA to avoid the branch has a very small
>>    // impact on performance.
>>    for (R_xlen_t i = 0; i < N; ++i) {
>>      SEXP sxi = STRING_ELT(x, i);
>>      if (sxi == NA_STRING) {
>>        ansp[i] = NA_INTEGER;
>>        continue;
>>      }
>>      ansp[i] = length(sxi);
>>    }
>>    UNPROTECT(1);
>>    return ans;
>> }
>>
>> x <- rep_len(c(as.character(c(5L, 1:1e6)), NA_character_, 1e6:15e5),
>> 1e7)
>> Cnchar(x)
>> 90ms
>> nchar(x, type = "bytes")
>> 2500 ms
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel