iconv: embedded nulls when converting to UTF-16

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

iconv: embedded nulls when converting to UTF-16

Braun, Michael
R-devel community:

I have encountered some unexpected behavior using iconv, which may be the source of errors I am getting when connecting to a UTF-16 -encoded SQL Server database.  A simple example is below.

When researching this problem, I found r-devel reports of the same problem in threads from June 2010 and February, 2016, and that bug #16738 was posted to Bugzilla as a result.  However, I have not been able to determine if the error is mine, if there is a known workaround, or it truly is a bug in R’s iconv implementation.  Any additional help is appreciated.

Thanks,

Michael

——

sessionInfo()
#> R version 3.6.1 (2019-07-05).   ## and replicated on R 3.4.1 on a cluster running CentOS Linux 7.
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
# <snip>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base    

#> loaded via a namespace (and not attached):
#> [1] compiler_3.6.1

s <- "test"
iconv(s, to="UTF-8”)
#> [1] “test"

iconv(s, to="UTF-16")
#> Error in iconv(s, to = "UTF-16"): embedded nul in string: '\xfe\xff\0t\0e\0s\0t’

iconv(s, to="UTF-16BE")
#> Error in iconv(s, to = "UTF-16BE"): embedded nul in string: '\0t\0e\0s\0t’

iconv(s, to="UTF-16LE")
#> Error in iconv(s, to = "UTF-16LE"): embedded nul in string: 't\0e\0s\0t\0’




--------------------------
Michael Braun, Ph.D.
Associate Professor of Marketing, and
  Corrigan Research Professor
Cox School of Business
Southern Methodist University
Dallas, TX 75275





______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: iconv: embedded nulls when converting to UTF-16

Duncan Murdoch-2
On 03/08/2019 11:59 p.m., Braun, Michael wrote:
> R-devel community:
>
> I have encountered some unexpected behavior using iconv, which may be the source of errors I am getting when connecting to a UTF-16 -encoded SQL Server database.  A simple example is below.
>
> When researching this problem, I found r-devel reports of the same problem in threads from June 2010 and February, 2016, and that bug #16738 was posted to Bugzilla as a result.  However, I have not been able to determine if the error is mine, if there is a known workaround, or it truly is a bug in R’s iconv implementation.  Any additional help is appreciated.

R does not support embedded nulls in character strings, so it can't
handle UTF-16 strings as character vectors.

If you are using iconv(), you can set toRaw = TRUE, and you'll get a
result containing the correct bytes.  For example,

 > s <- "test"
 > iconv(s, to="UTF-16",toRaw=TRUE)
[[1]]
  [1] fe ff 00 74 00 65 00 73 00 74


I don't know if SQL Server can handle raw vectors; I'd try to get it to
accept UTF-8 input instead.

Duncan Murdoch

>
> Thanks,
>
> Michael
>
> ——
>
> sessionInfo()
> #> R version 3.6.1 (2019-07-05).   ## and replicated on R 3.4.1 on a cluster running CentOS Linux 7.
> #> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> #> Running under: macOS Mojave 10.14.6
> # <snip>
> #> locale:
> #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> #> attached base packages:
> #> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> #> loaded via a namespace (and not attached):
> #> [1] compiler_3.6.1
>
> s <- "test"
> iconv(s, to="UTF-8”)
> #> [1] “test"
>
> iconv(s, to="UTF-16")
> #> Error in iconv(s, to = "UTF-16"): embedded nul in string: '\xfe\xff\0t\0e\0s\0t’
>
> iconv(s, to="UTF-16BE")
> #> Error in iconv(s, to = "UTF-16BE"): embedded nul in string: '\0t\0e\0s\0t’
>
> iconv(s, to="UTF-16LE")
> #> Error in iconv(s, to = "UTF-16LE"): embedded nul in string: 't\0e\0s\0t\0’
>
>
>
>
> --------------------------
> Michael Braun, Ph.D.
> Associate Professor of Marketing, and
>    Corrigan Research Professor
> Cox School of Business
> Southern Methodist University
> Dallas, TX 75275
>
>
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel