encodeString converts to UTF-8 on Windows R-devel

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

encodeString converts to UTF-8 on Windows R-devel

Gábor Csárdi
Which is not necessarily bad news. :)

I wonder if this was intended, because I did not find anything about
it in the news file. It also breaks a couple of R packages, e.g. desc,
probably more.

Is this intended?

Thanks!

This is R-devel from yesterday:

> x <- "\xfc"
> Encoding(x) <- "latin1"
> charToRaw(encodeString(x))
[1] c3 bc
>
> l10n_info()
$MBCS
[1] FALSE

$`UTF-8`
[1] FALSE

$`Latin-1`
[1] TRUE

$codepage
[1] 1252

$system.codepage
[1] 1252

and this is R-4.0.4:

> x <- "\xfc"
> Encoding(x) <- "latin1"
> charToRaw(encodeString(x))
[1] fc
>
> l10n_info()
$MBCS
[1] FALSE

$`UTF-8`
[1] FALSE

$`Latin-1`
[1] TRUE

$codepage
[1] 1252

$system.codepage
[1] 1252

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encodeString converts to UTF-8 on Windows R-devel

Gábor Csárdi
For the record, this was a bug introduced here:
https://github.com/wch/r-source/commit/1c149eddee9c6d4b87a987a964a611bf8fe43a74
and fixed today here:
https://github.com/wch/r-source/commit/ec0761e63598d38eb5e8ab3fb995da06ab5c91ee

G.

On Fri, Mar 5, 2021 at 2:52 PM Gábor Csárdi <[hidden email]> wrote:

>
> Which is not necessarily bad news. :)
>
> I wonder if this was intended, because I did not find anything about
> it in the news file. It also breaks a couple of R packages, e.g. desc,
> probably more.
>
> Is this intended?
>
> Thanks!
>
> This is R-devel from yesterday:
>
> > x <- "\xfc"
> > Encoding(x) <- "latin1"
> > charToRaw(encodeString(x))
> [1] c3 bc
> >
> > l10n_info()
> $MBCS
> [1] FALSE
>
> $`UTF-8`
> [1] FALSE
>
> $`Latin-1`
> [1] TRUE
>
> $codepage
> [1] 1252
>
> $system.codepage
> [1] 1252
>
> and this is R-4.0.4:
>
> > x <- "\xfc"
> > Encoding(x) <- "latin1"
> > charToRaw(encodeString(x))
> [1] fc
> >
> > l10n_info()
> $MBCS
> [1] FALSE
>
> $`UTF-8`
> [1] FALSE
>
> $`Latin-1`
> [1] TRUE
>
> $codepage
> [1] 1252
>
> $system.codepage
> [1] 1252

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel