base::order breaking change in R-devel

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

base::order breaking change in R-devel

Jan Gorecki
Hi R developers,
There seems to be breaking change in base::order on Windows in
R-devel. Code below yields different results on R 4.0.0 and R-devel
(2020-05-22 r78545). I haven't found any info about that change in
NEWS. Was the change intentional?

Sys.setlocale("LC_CTYPE","C")
Sys.setlocale("LC_COLLATE","C")
x1 = "fa\xE7ile"
Encoding(x1) = "latin1"
x2 = iconv(x1, "latin1", "UTF-8")
base::order(c(x2,x1,x1,x2))
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))

# R 4.0.0
base::order(c(x2,x1,x1,x2))
#[1] 1 4 2 3
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))
#[1] 2 3 1 4

# R-devel
base::order(c(x2,x1,x1,x2))
#[1] 1 2 3 4
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))
#[1] 1 4 2 3

Best Regards,
Jan Gorecki

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: base::order breaking change in R-devel

Tomas Kalibera
This can be narrowed down to

Sys.setlocale("LC_CTYPE","C")
x2 <- "\u00e7"
x1 <- iconv(x2, from="UTF-8", to="latin1")
x1 < x2 # FALSE or NA

In R 4.0 it returns NA, in R-devel it returns FALSE (when running in
CP1252 locale on Windows).

It is the same character, only the encoding is different, so the R-devel
return value is correct and the previous behavior was a bug. It should
not matter what is the current native encoding when doing the
comparison. Also, the collation order should only apply after characters
are converted to a common encoding, when the encoding is known, so in
this case the collation order of the locale should not have an impact,
and it seems it doesn't. I don't think R should preserve
bug-compatibility in this case, code depending on this buggy behavior
should be fixed.

I don't see immediately which NEWS entry this corresponds to. Please
keep in mind that NEWS don't cover all changes, for that you need to
look at the svn commits, and even then it may be hard to track down
concrete changes in behavior to the commits, to do that you need to
debug the code or bisect.

Changes to _documented_ behavior should be more visible and of course
reflected by changes in the documentation, if not, it is a bug worth
reporting,  and the report should come with a reference to concrete
parts of the documentation that is violated.

Best
Tomas

On 5/23/20 12:03 PM, Jan Gorecki wrote:

> Hi R developers,
> There seems to be breaking change in base::order on Windows in
> R-devel. Code below yields different results on R 4.0.0 and R-devel
> (2020-05-22 r78545). I haven't found any info about that change in
> NEWS. Was the change intentional?
>
> Sys.setlocale("LC_CTYPE","C")
> Sys.setlocale("LC_COLLATE","C")
> x1 = "fa\xE7ile"
> Encoding(x1) = "latin1"
> x2 = iconv(x1, "latin1", "UTF-8")
> base::order(c(x2,x1,x1,x2))
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
>
> # R 4.0.0
> base::order(c(x2,x1,x1,x2))
> #[1] 1 4 2 3
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
> #[1] 2 3 1 4
>
> # R-devel
> base::order(c(x2,x1,x1,x2))
> #[1] 1 2 3 4
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
> #[1] 1 4 2 3
>
> Best Regards,
> Jan Gorecki
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel