How to print UTF-8 encoded strings from a C routine to R's output?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to print UTF-8 encoded strings from a C routine to R's output?

Lixin Gong
Dear R experts,

It seems that Rprintf has to be used to print from a C routine to guarantee
to write to R’s output according to
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Printing.

However if a string is UTF-8 encoded, non-ASCII characters (e.g., the
infinity symbol http://www.fileformat.info/info/unicode/char/221e/index.htm)
are misprinted.
Is this an unsupported feature or is there a workaround for this limitation?

Thanks!

Michael

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to print UTF-8 encoded strings from a C routine to R's output?

Duncan Murdoch-2
On 05/09/2016 12:40 AM, Lixin Gong wrote:

> Dear R experts,
>
> It seems that Rprintf has to be used to print from a C routine to guarantee
> to write to R’s output according to
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Printing.
>
> However if a string is UTF-8 encoded, non-ASCII characters (e.g., the
> infinity symbol http://www.fileformat.info/info/unicode/char/221e/index.htm)
> are misprinted.
> Is this an unsupported feature or is there a workaround for this limitation?

If you are working in a UTF-8 locale (as on most Unix-like systems), you
should be fine.  If not (as is normal on Windows), you'll need to
translate the string to the local encoding.  The Writing R Extensions
manual section 6.11 tells you how to do the re-encoding.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to print UTF-8 encoded strings from a C routine to R's output?

Lixin Gong
Hi Duncan,

Thanks a lot for your quick reply pointing out the Re-encoding section that
I missed!

Before trying out R's C-level interface to the iconv's encoding conversion
capabilities,
I did some quick tests with Encoding() and iconv() on Windows with Rgui and
Rterm.
After Encoding(), non-ASCII characters are fine with Rgui but still wrong
with Rterm.
After iconv(), non-ASCII characters are still misprinted no matter if it is
Rgui or Rterm.

Here is the code that I used:

(neg_inf_utf8_hex <- as.raw(c(0x2d, 0xe2, 0x88, 0x9e)))
(neg_inf_utf8 <- rawToChar(neg_inf_utf8_hex))
Encoding(neg_inf_utf8)

Encoding(neg_inf_utf8) <- "UTF-8"
Encoding(neg_inf_utf8)
neg_inf_utf8

charToRaw(neg_inf_utf8)
iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = FALSE)
iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = TRUE)

Here is what I got with Rgui:

> (neg_inf_utf8_hex <- as.raw(c(0x2d, 0xe2, 0x88, 0x9e)))
[1] 2d e2 88 9e
> (neg_inf_utf8 <- rawToChar(neg_inf_utf8_hex))
[1] "-∞"
> Encoding(neg_inf_utf8)
[1] "unknown"
>
> Encoding(neg_inf_utf8) <- "UTF-8"
> Encoding(neg_inf_utf8)
[1] "UTF-8"
> neg_inf_utf8
[1] "-∞"
>
> charToRaw(neg_inf_utf8)
[1] 2d e2 88 9e
> iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = FALSE)
[1] "-8"
> iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = TRUE)
[[1]]
[1] 2d 38
>

Here is what I got with Rterm:

> (neg_inf_utf8_hex <- as.raw(c(0x2d, 0xe2, 0x88, 0x9e)))
[1] 2d e2 88 9e
> (neg_inf_utf8 <- rawToChar(neg_inf_utf8_hex))
[1] "-â^z"
> Encoding(neg_inf_utf8)
[1] "unknown"
>
> Encoding(neg_inf_utf8) <- "UTF-8"
> Encoding(neg_inf_utf8)
[1] "UTF-8"
> neg_inf_utf8
[1] "-8"
>
> charToRaw(neg_inf_utf8)
[1] 2d e2 88 9e
> iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = FALSE)
[1] "-8"
> iconv(neg_inf_utf8, from = "UTF-8", to = "", toRaw = TRUE)
[[1]]
[1] 2d 38
>

Here is the sessionInfo:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
>

Am I missing something obvious?  Thanks a lot for your help and your time!

Michael

On Mon, Sep 5, 2016 at 3:31 AM, Duncan Murdoch <[hidden email]>
wrote:

> On 05/09/2016 12:40 AM, Lixin Gong wrote:
>
>> Dear R experts,
>>
>> It seems that Rprintf has to be used to print from a C routine to
>> guarantee
>> to write to R’s output according to
>> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Printing.
>>
>> However if a string is UTF-8 encoded, non-ASCII characters (e.g., the
>> infinity symbol http://www.fileformat.info/inf
>> o/unicode/char/221e/index.htm)
>> are misprinted.
>> Is this an unsupported feature or is there a workaround for this
>> limitation?
>>
>
> If you are working in a UTF-8 locale (as on most Unix-like systems), you
> should be fine.  If not (as is normal on Windows), you'll need to translate
> the string to the local encoding.  The Writing R Extensions manual section
> 6.11 tells you how to do the re-encoding.
>
> Duncan Murdoch
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel