Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Kevin Ushey
Given the following R script:

   x <- 1
   print(list())
   save(x, file = tempfile())
   output <- encodeString("apple")
   print(output)

If I source this script from RGui on Windows, I see the output:

   > source("encoding.R")
   list()
   [1] "\002ÿþapple\003ÿþ"

That is, it's as though R has injected what looks like byte order
marks around the encoded string:

   > charToRaw(output)
    [1] 02 ff fe 61 70 70 6c 65 03 ff fe

FWIW I see the same output in R-patched and R-devel. Any idea what
might be going on? For what it's worth, I don't see the same issue
with R as run from the terminal.

Thanks,
Kevin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Tomas Kalibera
Hi Kevin,

the extra bytes you are seeing are escapes for UTF-8 strings used in
input to RGui console. Recently ascii strings are converted to UTF-8 so
you would get these escapes for ascii strings now as well. RGui
understands these escapes and converts from UTF-8 to wide characters
before printing on Windows. The escapes should not be used unless
printing to RGui console.

I suppose you managed to leak the escapes but I cannot reproduce, the
example you sent seems incomplete ("x" not used, not clear what
encoding.R is, not clear where the encodeString is run) and none of the
variations I ran leaked the escapes on R-devel. Please clarify the
example if you believe it is a bug. Please also use current R-devel
(I've relatively recently fixed a bug in decoding these escaped strings,
perhaps unlikely, but not impossible it could be related).

Best
Tomas

On 07/16/2018 10:01 PM, Kevin Ushey wrote:

> Given the following R script:
>
>     x <- 1
>     print(list())
>     save(x, file = tempfile())
>     output <- encodeString("apple")
>     print(output)
>
> If I source this script from RGui on Windows, I see the output:
>
>     > source("encoding.R")
>     list()
>     [1] "\002ÿþapple\003ÿþ"
>
> That is, it's as though R has injected what looks like byte order
> marks around the encoded string:
>
>     > charToRaw(output)
>      [1] 02 ff fe 61 70 70 6c 65 03 ff fe
>
> FWIW I see the same output in R-patched and R-devel. Any idea what
> might be going on? For what it's worth, I don't see the same issue
> with R as run from the terminal.
>
> Thanks,
> Kevin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Kevin Ushey
Sorry, I should have been more clear -- if I write the contents of
that script to a file called 'encoding.R' and source that, then I see
the reported behavior.

Here's something standalone that you should hopefully be able to copy
+ paste into RGui to reproduce:

code <- '
   x <- 1
   print(list())
   save(x, file = tempfile())
   output <- encodeString("apple")
   print(output)
'

file <- tempfile(fileext = ".R")
writeLines(code, con = file)
source(file)

When I run this, I see:

> code <- '
+    x <- 1
+    print(list())
+    save(x, file = tempfile())
+    output <- encodeString("apple")
+    print(output)
+ '
>
> file <- tempfile(fileext = ".R")
> writeLines(code, con = file)
> source(file)
list()
[1] "\002ÿþapple\003ÿþ"

This is with today's R-devel:

> sessionInfo()
R Under development (unstable) (2018-07-16 r74967)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.6.0

I realize the example looks incomplete, but it seems like each step is
required to reproduce the strange behavior:

   1) You need to print an empty list,
   2) You need to invoke save() after printing that empty list,
   3) Then, attempts to call encodeString() will produce the strange output.

For what it's worth, it may be related to a behavior I'm seeing where
the first name printed for an R list is quoted with backticks even
when not necessary:

> list(x = 1, y = 2)
$`x`
[1] 1

$y
[1] 2

Thanks,
Kevin

On Tue, Jul 17, 2018 at 6:12 AM Tomas Kalibera <[hidden email]> wrote:

>
> Hi Kevin,
>
> the extra bytes you are seeing are escapes for UTF-8 strings used in
> input to RGui console. Recently ascii strings are converted to UTF-8 so
> you would get these escapes for ascii strings now as well. RGui
> understands these escapes and converts from UTF-8 to wide characters
> before printing on Windows. The escapes should not be used unless
> printing to RGui console.
>
> I suppose you managed to leak the escapes but I cannot reproduce, the
> example you sent seems incomplete ("x" not used, not clear what
> encoding.R is, not clear where the encodeString is run) and none of the
> variations I ran leaked the escapes on R-devel. Please clarify the
> example if you believe it is a bug. Please also use current R-devel
> (I've relatively recently fixed a bug in decoding these escaped strings,
> perhaps unlikely, but not impossible it could be related).
>
> Best
> Tomas
>
> On 07/16/2018 10:01 PM, Kevin Ushey wrote:
> > Given the following R script:
> >
> >     x <- 1
> >     print(list())
> >     save(x, file = tempfile())
> >     output <- encodeString("apple")
> >     print(output)
> >
> > If I source this script from RGui on Windows, I see the output:
> >
> >     > source("encoding.R")
> >     list()
> >     [1] "\002ÿþapple\003ÿþ"
> >
> > That is, it's as though R has injected what looks like byte order
> > marks around the encoded string:
> >
> >     > charToRaw(output)
> >      [1] 02 ff fe 61 70 70 6c 65 03 ff fe
> >
> > FWIW I see the same output in R-patched and R-devel. Any idea what
> > might be going on? For what it's worth, I don't see the same issue
> > with R as run from the terminal.
> >
> > Thanks,
> > Kevin
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Tomas Kalibera
Thanks, I can now reproduce and it is a bug that is easy to fix, I will
do so shortly.

Fyi it can be reproduced simply by running these two lines in Rgui:

list()
encodeString("apple")

Best
Tomas

On 07/17/2018 05:16 PM, Kevin Ushey wrote:

> Sorry, I should have been more clear -- if I write the contents of
> that script to a file called 'encoding.R' and source that, then I see
> the reported behavior.
>
> Here's something standalone that you should hopefully be able to copy
> + paste into RGui to reproduce:
>
> code <- '
>     x <- 1
>     print(list())
>     save(x, file = tempfile())
>     output <- encodeString("apple")
>     print(output)
> '
>
> file <- tempfile(fileext = ".R")
> writeLines(code, con = file)
> source(file)
>
> When I run this, I see:
>
>> code <- '
> +    x <- 1
> +    print(list())
> +    save(x, file = tempfile())
> +    output <- encodeString("apple")
> +    print(output)
> + '
>> file <- tempfile(fileext = ".R")
>> writeLines(code, con = file)
>> source(file)
> list()
> [1] "\002ÿþapple\003ÿþ"
>
> This is with today's R-devel:
>
>> sessionInfo()
> R Under development (unstable) (2018-07-16 r74967)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 17134)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0
>
> I realize the example looks incomplete, but it seems like each step is
> required to reproduce the strange behavior:
>
>     1) You need to print an empty list,
>     2) You need to invoke save() after printing that empty list,
>     3) Then, attempts to call encodeString() will produce the strange output.
>
> For what it's worth, it may be related to a behavior I'm seeing where
> the first name printed for an R list is quoted with backticks even
> when not necessary:
>
>> list(x = 1, y = 2)
> $`x`
> [1] 1
>
> $y
> [1] 2
>
> Thanks,
> Kevin
>
> On Tue, Jul 17, 2018 at 6:12 AM Tomas Kalibera <[hidden email]> wrote:
>> Hi Kevin,
>>
>> the extra bytes you are seeing are escapes for UTF-8 strings used in
>> input to RGui console. Recently ascii strings are converted to UTF-8 so
>> you would get these escapes for ascii strings now as well. RGui
>> understands these escapes and converts from UTF-8 to wide characters
>> before printing on Windows. The escapes should not be used unless
>> printing to RGui console.
>>
>> I suppose you managed to leak the escapes but I cannot reproduce, the
>> example you sent seems incomplete ("x" not used, not clear what
>> encoding.R is, not clear where the encodeString is run) and none of the
>> variations I ran leaked the escapes on R-devel. Please clarify the
>> example if you believe it is a bug. Please also use current R-devel
>> (I've relatively recently fixed a bug in decoding these escaped strings,
>> perhaps unlikely, but not impossible it could be related).
>>
>> Best
>> Tomas
>>
>> On 07/16/2018 10:01 PM, Kevin Ushey wrote:
>>> Given the following R script:
>>>
>>>      x <- 1
>>>      print(list())
>>>      save(x, file = tempfile())
>>>      output <- encodeString("apple")
>>>      print(output)
>>>
>>> If I source this script from RGui on Windows, I see the output:
>>>
>>>      > source("encoding.R")
>>>      list()
>>>      [1] "\002ÿþapple\003ÿþ"
>>>
>>> That is, it's as though R has injected what looks like byte order
>>> marks around the encoded string:
>>>
>>>      > charToRaw(output)
>>>       [1] 02 ff fe 61 70 70 6c 65 03 ff fe
>>>
>>> FWIW I see the same output in R-patched and R-devel. Any idea what
>>> might be going on? For what it's worth, I don't see the same issue
>>> with R as run from the terminal.
>>>
>>> Thanks,
>>> Kevin
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Tomas Kalibera
Fixed in R-devel and R-patched,
Tomas

On 07/18/2018 12:03 PM, Tomas Kalibera wrote:

> Thanks, I can now reproduce and it is a bug that is easy to fix, I
> will do so shortly.
>
> Fyi it can be reproduced simply by running these two lines in Rgui:
>
> list()
> encodeString("apple")
>
> Best
> Tomas
>
> On 07/17/2018 05:16 PM, Kevin Ushey wrote:
>> Sorry, I should have been more clear -- if I write the contents of
>> that script to a file called 'encoding.R' and source that, then I see
>> the reported behavior.
>>
>> Here's something standalone that you should hopefully be able to copy
>> + paste into RGui to reproduce:
>>
>> code <- '
>>     x <- 1
>>     print(list())
>>     save(x, file = tempfile())
>>     output <- encodeString("apple")
>>     print(output)
>> '
>>
>> file <- tempfile(fileext = ".R")
>> writeLines(code, con = file)
>> source(file)
>>
>> When I run this, I see:
>>
>>> code <- '
>> +    x <- 1
>> +    print(list())
>> +    save(x, file = tempfile())
>> +    output <- encodeString("apple")
>> +    print(output)
>> + '
>>> file <- tempfile(fileext = ".R")
>>> writeLines(code, con = file)
>>> source(file)
>> list()
>> [1] "\002ÿþapple\003ÿþ"
>>
>> This is with today's R-devel:
>>
>>> sessionInfo()
>> R Under development (unstable) (2018-07-16 r74967)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 10 x64 (build 17134)
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.6.0
>>
>> I realize the example looks incomplete, but it seems like each step is
>> required to reproduce the strange behavior:
>>
>>     1) You need to print an empty list,
>>     2) You need to invoke save() after printing that empty list,
>>     3) Then, attempts to call encodeString() will produce the strange output.
>>
>> For what it's worth, it may be related to a behavior I'm seeing where
>> the first name printed for an R list is quoted with backticks even
>> when not necessary:
>>
>>> list(x = 1, y = 2)
>> $`x`
>> [1] 1
>>
>> $y
>> [1] 2
>>
>> Thanks,
>> Kevin
>>
>> On Tue, Jul 17, 2018 at 6:12 AM Tomas Kalibera<[hidden email]>  wrote:
>>> Hi Kevin,
>>>
>>> the extra bytes you are seeing are escapes for UTF-8 strings used in
>>> input to RGui console. Recently ascii strings are converted to UTF-8 so
>>> you would get these escapes for ascii strings now as well. RGui
>>> understands these escapes and converts from UTF-8 to wide characters
>>> before printing on Windows. The escapes should not be used unless
>>> printing to RGui console.
>>>
>>> I suppose you managed to leak the escapes but I cannot reproduce, the
>>> example you sent seems incomplete ("x" not used, not clear what
>>> encoding.R is, not clear where the encodeString is run) and none of the
>>> variations I ran leaked the escapes on R-devel. Please clarify the
>>> example if you believe it is a bug. Please also use current R-devel
>>> (I've relatively recently fixed a bug in decoding these escaped strings,
>>> perhaps unlikely, but not impossible it could be related).
>>>
>>> Best
>>> Tomas
>>>
>>> On 07/16/2018 10:01 PM, Kevin Ushey wrote:
>>>> Given the following R script:
>>>>
>>>>      x <- 1
>>>>      print(list())
>>>>      save(x, file = tempfile())
>>>>      output <- encodeString("apple")
>>>>      print(output)
>>>>
>>>> If I source this script from RGui on Windows, I see the output:
>>>>
>>>>      > source("encoding.R")
>>>>      list()
>>>>      [1] "\002ÿþapple\003ÿþ"
>>>>
>>>> That is, it's as though R has injected what looks like byte order
>>>> marks around the encoded string:
>>>>
>>>>      > charToRaw(output)
>>>>       [1] 02 ff fe 61 70 70 6c 65 03 ff fe
>>>>
>>>> FWIW I see the same output in R-patched and R-devel. Any idea what
>>>> might be going on? For what it's worth, I don't see the same issue
>>>> with R as run from the terminal.
>>>>
>>>> Thanks,
>>>> Kevin
>>>>
>>>> ______________________________________________
>>>> [hidden email]  mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Kevin Ushey
Thank you for the quick fix! I could've sworn the 'save()' dance was a
necessary part of the reproducible example, but evidently not ...
On Wed, Jul 18, 2018 at 6:38 AM Tomas Kalibera <[hidden email]> wrote:

>
> Fixed in R-devel and R-patched,
> Tomas
>
> On 07/18/2018 12:03 PM, Tomas Kalibera wrote:
>
> Thanks, I can now reproduce and it is a bug that is easy to fix, I will do so shortly.
>
> Fyi it can be reproduced simply by running these two lines in Rgui:
>
> list()
> encodeString("apple")
>
> Best
> Tomas
>
> On 07/17/2018 05:16 PM, Kevin Ushey wrote:
>
> Sorry, I should have been more clear -- if I write the contents of
> that script to a file called 'encoding.R' and source that, then I see
> the reported behavior.
>
> Here's something standalone that you should hopefully be able to copy
> + paste into RGui to reproduce:
>
> code <- '
>    x <- 1
>    print(list())
>    save(x, file = tempfile())
>    output <- encodeString("apple")
>    print(output)
> '
>
> file <- tempfile(fileext = ".R")
> writeLines(code, con = file)
> source(file)
>
> When I run this, I see:
>
> code <- '
>
> +    x <- 1
> +    print(list())
> +    save(x, file = tempfile())
> +    output <- encodeString("apple")
> +    print(output)
> + '
>
> file <- tempfile(fileext = ".R")
> writeLines(code, con = file)
> source(file)
>
> list()
> [1] "\002ÿþapple\003ÿþ"
>
> This is with today's R-devel:
>
> sessionInfo()
>
> R Under development (unstable) (2018-07-16 r74967)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 17134)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0
>
> I realize the example looks incomplete, but it seems like each step is
> required to reproduce the strange behavior:
>
>    1) You need to print an empty list,
>    2) You need to invoke save() after printing that empty list,
>    3) Then, attempts to call encodeString() will produce the strange output.
>
> For what it's worth, it may be related to a behavior I'm seeing where
> the first name printed for an R list is quoted with backticks even
> when not necessary:
>
> list(x = 1, y = 2)
>
> $`x`
> [1] 1
>
> $y
> [1] 2
>
> Thanks,
> Kevin
>
> On Tue, Jul 17, 2018 at 6:12 AM Tomas Kalibera <[hidden email]> wrote:
>
> Hi Kevin,
>
> the extra bytes you are seeing are escapes for UTF-8 strings used in
> input to RGui console. Recently ascii strings are converted to UTF-8 so
> you would get these escapes for ascii strings now as well. RGui
> understands these escapes and converts from UTF-8 to wide characters
> before printing on Windows. The escapes should not be used unless
> printing to RGui console.
>
> I suppose you managed to leak the escapes but I cannot reproduce, the
> example you sent seems incomplete ("x" not used, not clear what
> encoding.R is, not clear where the encodeString is run) and none of the
> variations I ran leaked the escapes on R-devel. Please clarify the
> example if you believe it is a bug. Please also use current R-devel
> (I've relatively recently fixed a bug in decoding these escaped strings,
> perhaps unlikely, but not impossible it could be related).
>
> Best
> Tomas
>
> On 07/16/2018 10:01 PM, Kevin Ushey wrote:
>
> Given the following R script:
>
>     x <- 1
>     print(list())
>     save(x, file = tempfile())
>     output <- encodeString("apple")
>     print(output)
>
> If I source this script from RGui on Windows, I see the output:
>
>     > source("encoding.R")
>     list()
>     [1] "\002ÿþapple\003ÿþ"
>
> That is, it's as though R has injected what looks like byte order
> marks around the encoded string:
>
>     > charToRaw(output)
>      [1] 02 ff fe 61 70 70 6c 65 03 ff fe
>
> FWIW I see the same output in R-patched and R-devel. Any idea what
> might be going on? For what it's worth, I don't see the same issue
> with R as run from the terminal.
>
> Thanks,
> Kevin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel