Quantcast

iconv documentation error

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

iconv documentation error

Therneau, Terry M., Ph.D.
This caught us yesterday when a string that we assumed to be in UTF-8 was actually using
CP1252.  (This came from an internal web based service, so the root cause is not R's
fault.)  The help page for iconv states that the result of an invalid conversion is NA
only when the toRaw argument is TRUE, but this appears to be true in general.

Example:

test1 <- "Ménière's disease"        # the offending string (it was buried in a 13000
character result string)
test2 <- iconv(test1, to="CP1252")  # create a version of the string that is in
Window-1252 coding
iconv(test2, from="UTF-8")          # reprise our error
[1] NA

Note that Encoding(test2) returns "latin-1", which is also not quite in alignment with the
help page.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...