encoding argument of source() in 3.5.0

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

encoding argument of source() in 3.5.0

Stephen Berman
In R 3.5.0 using the `encoding' argument of source() prevents loading
files from the internet; without the `encoding' argument files can be
loaded from the internet, but if they contain non-ascii characters,
these are not correctly displayed under MS-Windows (but they are
correctly displayed under GNU/Linux).  With R 3.4.{2,3,4} there is no
such problem: using `encoding' the files are loaded and non-ascii
characters are correctly displayed under MS-Windows (but not without
`encoding').  Here is a transcript from R 3.5.0 under GNU/Linux (the
URLs are real, in case anyone wants to try and reproduce the problem):

> ls()
character(0)
> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8")
> ls()
character(0)
> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
> ls()
character(0)
> source("http://home.versanet.de/~s-berman/source1.R")
> ls()
[1] "source.test1"
> source("http://home.versanet.de/~s-berman/source2.R")
> ls()
[1] "source.test1" "source.test2"
> source.test1()
[1] "This is a test."
> source.test2()
[1] "Non-ascii: äöüß"

(The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.)
With 3.5.0 under MS-Windows, the transcript is the same except for the
display of the last output, which is this:

[1] "Non-ascii: äöüß"

(Here there are eight non-ascii characters, which display the Unicode
decompositions of the four non-ascii characters above.)

Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's
the same except that the non-ascii characters are also correctly
displayed even without the `encoding' argument):

> ls()
character(0)
> source("http://home.versanet.de/~s-berman/source1.R")
> ls()
[1] "source.test1"
> source("http://home.versanet.de/~s-berman/source2.R")
> ls()
[1] "source.test1" "source.test2"
> source.test1()
[1] "This is a test."
> source.test2()
[1] "Non-ascii: äöüß"
> rm(source.test2)
> ls()
[1] "source.test1"
> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
> ls()
[1] "source.test1" "source.test2"
> source.test2()
[1] "Non-ascii: äöüß"

I did a web search but didn't find any reports of this issue, nor did I
see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but
maybe I've overlooked something.  I'd be grateful for any enlightenment.

Steve Berman

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encoding argument of source() in 3.5.0

Peter Dalgaard-2
Looks like this actually comes from readLines(), nothing to do with source() as such:

In current R-devel (still):

> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
> readLines(f)
character(0)
> close(f)
> f <- file("http://home.versanet.de/~s-berman/source2.R")
> readLines(f)
[1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
[3] "}"                            

-pd

> On 2 Jun 2018, at 15:37 , Stephen Berman <[hidden email]> wrote:
>
> In R 3.5.0 using the `encoding' argument of source() prevents loading
> files from the internet; without the `encoding' argument files can be
> loaded from the internet, but if they contain non-ascii characters,
> these are not correctly displayed under MS-Windows (but they are
> correctly displayed under GNU/Linux).  With R 3.4.{2,3,4} there is no
> such problem: using `encoding' the files are loaded and non-ascii
> characters are correctly displayed under MS-Windows (but not without
> `encoding').  Here is a transcript from R 3.5.0 under GNU/Linux (the
> URLs are real, in case anyone wants to try and reproduce the problem):
>
>> ls()
> character(0)
>> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8")
>> ls()
> character(0)
>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>> ls()
> character(0)
>> source("http://home.versanet.de/~s-berman/source1.R")
>> ls()
> [1] "source.test1"
>> source("http://home.versanet.de/~s-berman/source2.R")
>> ls()
> [1] "source.test1" "source.test2"
>> source.test1()
> [1] "This is a test."
>> source.test2()
> [1] "Non-ascii: äöüß"
>
> (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.)
> With 3.5.0 under MS-Windows, the transcript is the same except for the
> display of the last output, which is this:
>
> [1] "Non-ascii: äöüß"
>
> (Here there are eight non-ascii characters, which display the Unicode
> decompositions of the four non-ascii characters above.)
>
> Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's
> the same except that the non-ascii characters are also correctly
> displayed even without the `encoding' argument):
>
>> ls()
> character(0)
>> source("http://home.versanet.de/~s-berman/source1.R")
>> ls()
> [1] "source.test1"
>> source("http://home.versanet.de/~s-berman/source2.R")
>> ls()
> [1] "source.test1" "source.test2"
>> source.test1()
> [1] "This is a test."
>> source.test2()
> [1] "Non-ascii: äöüß"
>> rm(source.test2)
>> ls()
> [1] "source.test1"
>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>> ls()
> [1] "source.test1" "source.test2"
>> source.test2()
> [1] "Non-ascii: äöüß"
>
> I did a web search but didn't find any reports of this issue, nor did I
> see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but
> maybe I've overlooked something.  I'd be grateful for any enlightenment.
>
> Steve Berman
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encoding argument of source() in 3.5.0

Martin Maechler
>>>>> peter dalgaard
>>>>>     on Sun, 3 Jun 2018 23:51:24 +0200 writes:

    > Looks like this actually comes from readLines(), nothing
    > to do with source() as such: In current R-devel (still):

    >> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
    >> readLines(f)
    > character(0)
    >> close(f)
    >> f <- file("http://home.versanet.de/~s-berman/source2.R")
    >> readLines(f)
    > [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
    > [3] "}"                            

    > -pd

and that's not even readLines(), but rather how exactly the
connection is defined [even in your example above]

  > urlR <- "http://home.versanet.de/~s-berman/source2.R"
  > readLines(urlR, encoding="UTF-8")
  [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
  [3] "}"                            
  > f <- file(urlR, encoding = "UTF-8")
  > readLines(f)
  character(0)

and the same behavior with scan()  instead of readLines() :

> scan(urlR,"") # works
Read 7 items
[1] "source.test2"       "<-"                 "function()"         "{"
[5] "print(\"Non-ascii:" "äöüß\")"            "}"
> scan(f,"") # fails
Read 0 items
character(0)
>

So it seems as if the bug is in the file() [or url()] C code ..
But then we also have to consider Windows .. where I think most changes have
happened during the  R-3.4.4 --> R-3.5.0  transition.


    >> On 2 Jun 2018, at 15:37 , Stephen Berman <[hidden email]> wrote:
    >>
    >> In R 3.5.0 using the `encoding' argument of source() prevents loading
    >> files from the internet; without the `encoding' argument files can be
    >> loaded from the internet, but if they contain non-ascii characters,
    >> these are not correctly displayed under MS-Windows (but they are
    >> correctly displayed under GNU/Linux).  With R 3.4.{2,3,4} there is no
    >> such problem: using `encoding' the files are loaded and non-ascii
    >> characters are correctly displayed under MS-Windows (but not without
    >> `encoding').  Here is a transcript from R 3.5.0 under GNU/Linux (the
    >> URLs are real, in case anyone wants to try and reproduce the problem):
    >>
    >>> ls()
    >> character(0)
    >>> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8")
    >>> ls()
    >> character(0)
    >>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
    >>> ls()
    >> character(0)
    >>> source("http://home.versanet.de/~s-berman/source1.R")
    >>> ls()
    >> [1] "source.test1"
    >>> source("http://home.versanet.de/~s-berman/source2.R")
    >>> ls()
    >> [1] "source.test1" "source.test2"
    >>> source.test1()
    >> [1] "This is a test."
    >>> source.test2()
    >> [1] "Non-ascii: äöüß"
    >>
    >> (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.)
    >> With 3.5.0 under MS-Windows, the transcript is the same except for the
    >> display of the last output, which is this:
    >>
    >> [1] "Non-ascii: äöüß"
    >>
    >> (Here there are eight non-ascii characters, which display the Unicode
    >> decompositions of the four non-ascii characters above.)
    >>
    >> Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's
    >> the same except that the non-ascii characters are also correctly
    >> displayed even without the `encoding' argument):
    >>
    >>> ls()
    >> character(0)
    >>> source("http://home.versanet.de/~s-berman/source1.R")
    >>> ls()
    >> [1] "source.test1"
    >>> source("http://home.versanet.de/~s-berman/source2.R")
    >>> ls()
    >> [1] "source.test1" "source.test2"
    >>> source.test1()
    >> [1] "This is a test."
    >>> source.test2()
    >> [1] "Non-ascii: äöüß"
    >>> rm(source.test2)
    >>> ls()
    >> [1] "source.test1"
    >>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
    >>> ls()
    >> [1] "source.test1" "source.test2"
    >>> source.test2()
    >> [1] "Non-ascii: äöüß"
    >>
    >> I did a web search but didn't find any reports of this issue, nor did I
    >> see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but
    >> maybe I've overlooked something.  I'd be grateful for any enlightenment.
    >>
    >> Steve Berman
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > --
    > Peter Dalgaard, Professor,
    > Center for Statistics, Copenhagen Business School
    > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501
    > Office: A 4.23
    > Email: [hidden email]  Priv: [hidden email]

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encoding argument of source() in 3.5.0

Stephen Berman
On Mon, 4 Jun 2018 10:44:11 +0200 Martin Maechler <[hidden email]> wrote:

>>>>>> peter dalgaard
>>>>>>     on Sun, 3 Jun 2018 23:51:24 +0200 writes:
>
>     > Looks like this actually comes from readLines(), nothing
>     > to do with source() as such: In current R-devel (still):
>
>     >> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>     >> readLines(f)
>     > character(0)
>     >> close(f)
>     >> f <- file("http://home.versanet.de/~s-berman/source2.R")
>     >> readLines(f)
>     > [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
>     > [3] "}"                            
>
>     > -pd
>
> and that's not even readLines(), but rather how exactly the
> connection is defined [even in your example above]
>
>   > urlR <- "http://home.versanet.de/~s-berman/source2.R"
>   > readLines(urlR, encoding="UTF-8")
>   [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
>   [3] "}"                            
>   > f <- file(urlR, encoding = "UTF-8")
>   > readLines(f)
>   character(0)
>
> and the same behavior with scan()  instead of readLines() :
>
>> scan(urlR,"") # works
> Read 7 items
> [1] "source.test2"       "<-"                 "function()"         "{"
> [5] "print(\"Non-ascii:" "äöüß\")"            "}"
>> scan(f,"") # fails
> Read 0 items
> character(0)
>>
>
> So it seems as if the bug is in the file() [or url()] C code ..

Yes, the problem seems to be restricted to loading files from a
(non-local) URL; i.e. this works fine on my computer:

  > source("file:///home/steve/prog/R/source2.R", encoding="UTF-8")

Also, I noticed this works too:

  > read.table("http://home.versanet.de/~s-berman/table2", encoding="UTF-8", skip=1)

where (if I read the source correctly) using `skip=1' makes read.table()
call readLines().  (The read.table() invocation also works without
`skip'.)

> But then we also have to consider Windows .. where I think most changes have
> happened during the  R-3.4.4 --> R-3.5.0  transition.

Yes, please.  I need (or at least it would be convenient) to be able to
load R code containing non-ascii characters from the web under
MS-Windows.

Steve Berman

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encoding argument of source() in 3.5.0

Peter Dalgaard-2
In reply to this post by Martin Maechler
It's not Windows-specific, though. My example was on a Mac...

I hope we can sort this out before 3.5.1.

-pd

> On 4 Jun 2018, at 10:44 , Martin Maechler <[hidden email]> wrote:
>
> So it seems as if the bug is in the file() [or url()] C code ..
> But then we also have to consider Windows .. where I think most changes have
> happened during the  R-3.4.4 --> R-3.5.0  transition.
>
>
>>> On 2 Jun 2018, at 15:37 , Stephen Berman <[hidden email]> wrote:
>>>
>>> In R 3.5.0 using the `encoding' argument of source() prevents loading
>>> files from the internet; without the `encoding' argument files can be
>>> loaded from the internet, but if they contain non-ascii characters,
>>> these are not correctly displayed under MS-Windows (but they are
>>> correctly displayed under GNU/Linux).  With R 3.4.{2,3,4} there is no
>>> such problem: using `encoding' the files are loaded and non-ascii
>>> characters are correctly displayed under MS-Windows (but not without
>>> `encoding').  Here is a transcript from R 3.5.0 under GNU/Linux (the
>>> URLs are real, in case anyone wants to try and reproduce the problem):
>>>
>>>> ls()
>>> character(0)
>>>> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8")
>>>> ls()
>>> character(0)
>>>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>>>> ls()
>>> character(0)
>>>> source("http://home.versanet.de/~s-berman/source1.R")
>>>> ls()
>>> [1] "source.test1"
>>>> source("http://home.versanet.de/~s-berman/source2.R")
>>>> ls()
>>> [1] "source.test1" "source.test2"
>>>> source.test1()
>>> [1] "This is a test."
>>>> source.test2()
>>> [1] "Non-ascii: äöüß"
>>>
>>> (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.)
>>> With 3.5.0 under MS-Windows, the transcript is the same except for the
>>> display of the last output, which is this:
>>>
>>> [1] "Non-ascii: äöüß"
>>>
>>> (Here there are eight non-ascii characters, which display the Unicode
>>> decompositions of the four non-ascii characters above.)
>>>
>>> Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's
>>> the same except that the non-ascii characters are also correctly
>>> displayed even without the `encoding' argument):
>>>
>>>> ls()
>>> character(0)
>>>> source("http://home.versanet.de/~s-berman/source1.R")
>>>> ls()
>>> [1] "source.test1"
>>>> source("http://home.versanet.de/~s-berman/source2.R")
>>>> ls()
>>> [1] "source.test1" "source.test2"
>>>> source.test1()
>>> [1] "This is a test."
>>>> source.test2()
>>> [1] "Non-ascii: äöüß"
>>>> rm(source.test2)
>>>> ls()
>>> [1] "source.test1"
>>>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>>>> ls()
>>> [1] "source.test1" "source.test2"
>>>> source.test2()
>>> [1] "Non-ascii: äöüß"
>>>
>>> I did a web search but didn't find any reports of this issue, nor did I
>>> see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but
>>> maybe I've overlooked something.  I'd be grateful for any enlightenment.
>>>
>>> Steve Berman
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: [hidden email]  Priv: [hidden email]
>
>

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encoding argument of source() in 3.5.0

NELSON, Michael
In reply to this post by Stephen Berman


On R 3.5.0 (Mac)

The issue appears when using the default (libcurl) method and specifying the encoding

Note that using method='internal' causes a segfault if used in conjunction with encoding. (and works when encoding is not set)

urlR <- "http://home.versanet.de/~s-berman/source2.R"
# works
url_default <- url(urlR)
scan(url_default, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"          
# [7] "}"                

url_default_en <- url(urlR, encoding = "UTF-8")
scan(url_default_en, "")
# Read 0 items
# character(0)
url_internal <- url(urlR, method = 'internal')
scan(url_internal, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"          
# [7] "}"                

url_internal_en <- url(urlR, encoding = "UTF-8", method = 'internal')
#scan(url_internal_en, "")
#*** caught segfault ***
#  address 0x0, cause 'memory not mapped'

url_libcurl <- url(urlR, method = 'libcurl')
scan(url_libcurl, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"          
# [7] "}"
url_libcurl_en <- url(urlR, encoding = "UTF-8", method = 'libcurl')
scan(url_libcurl_en, "")
# Read 0 items
# character(0)


Michael

________________________________________
From: R-devel [[hidden email]] on behalf of Stephen Berman [[hidden email]]
Sent: Monday, 4 June 2018 7:26 PM
To: Martin Maechler
Cc: R-devel
Subject: Re: [Rd] encoding argument of source() in 3.5.0

On Mon, 4 Jun 2018 10:44:11 +0200 Martin Maechler <[hidden email]> wrote:

>>>>>> peter dalgaard
>>>>>>     on Sun, 3 Jun 2018 23:51:24 +0200 writes:
>
>     > Looks like this actually comes from readLines(), nothing
>     > to do with source() as such: In current R-devel (still):
>
>     >> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>     >> readLines(f)
>     > character(0)
>     >> close(f)
>     >> f <- file("http://home.versanet.de/~s-berman/source2.R")
>     >> readLines(f)
>     > [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
>     > [3] "}"
>
>     > -pd
>
> and that's not even readLines(), but rather how exactly the
> connection is defined [even in your example above]
>
>   > urlR <- "http://home.versanet.de/~s-berman/source2.R"
>   > readLines(urlR, encoding="UTF-8")
>   [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
>   [3] "}"
>   > f <- file(urlR, encoding = "UTF-8")
>   > readLines(f)
>   character(0)
>
> and the same behavior with scan()  instead of readLines() :
>
>> scan(urlR,"") # works
> Read 7 items
> [1] "source.test2"       "<-"                 "function()"         "{"

> [5] "print(\"Non-ascii:" "äöüß\")"            "}"
>> scan(f,"") # fails
> Read 0 items
> character(0)
>>
>
> So it seems as if the bug is in the file() [or url()] C code ..

Yes, the problem seems to be restricted to loading files from a
(non-local) URL; i.e. this works fine on my computer:

  > source("file:///home/steve/prog/R/source2.R", encoding="UTF-8")

Also, I noticed this works too:

  > read.table("http://home.versanet.de/~s-berman/table2", encoding="UTF-8", skip=1)

where (if I read the source correctly) using `skip=1' makes read.table()
call readLines().  (The read.table() invocation also works without
`skip'.)

> But then we also have to consider Windows .. where I think most changes have
> happened during the  R-3.4.4 --> R-3.5.0  transition.

Yes, please.  I need (or at least it would be convenient) to be able to
load R code containing non-ascii characters from the web under
MS-Windows.

Steve Berman

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__________________________________________________________________________________________________________
This email has been scanned for the NSW Ministry of Health by the Websense Hosted Email Security System.
Emails and attachments are monitored to ensure compliance with the NSW Ministry of health's Electronic Messaging Policy.
__________________________________________________________________________________________________________

_______________________________________________________________________________________________________
Disclaimer: This message is intended for the addressee named and may contain confidential information.
If you are not the intended recipient, please delete it and notify the sender.
Views expressed in this message are those of the individual sender, and are not necessarily the views of the NSW Ministry of Health.
_______________________________________________________________________________________________________
This email has been scanned for the NSW Ministry of Health by the Websense Hosted Email Security System.
Emails and attachments are monitored to ensure compliance with the NSW Ministry of Health's Electronic Messaging Policy.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encoding argument of source() in 3.5.0

Tomas Kalibera
Thanks for the report, fixed in R-devel (74848).

Best
Tomas

On 06/04/2018 02:41 PM, NELSON, Michael wrote:

>
> On R 3.5.0 (Mac)
>
> The issue appears when using the default (libcurl) method and specifying the encoding
>
> Note that using method='internal' causes a segfault if used in conjunction with encoding. (and works when encoding is not set)
>
> urlR <- "http://home.versanet.de/~s-berman/source2.R"
> # works
> url_default <- url(urlR)
> scan(url_default, "")
> # Read 7 items
> # [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"
> # [7] "}"
>
> url_default_en <- url(urlR, encoding = "UTF-8")
> scan(url_default_en, "")
> # Read 0 items
> # character(0)
> url_internal <- url(urlR, method = 'internal')
> scan(url_internal, "")
> # Read 7 items
> # [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"
> # [7] "}"
>
> url_internal_en <- url(urlR, encoding = "UTF-8", method = 'internal')
> #scan(url_internal_en, "")
> #*** caught segfault ***
> #  address 0x0, cause 'memory not mapped'
>
> url_libcurl <- url(urlR, method = 'libcurl')
> scan(url_libcurl, "")
> # Read 7 items
> # [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"
> # [7] "}"
> url_libcurl_en <- url(urlR, encoding = "UTF-8", method = 'libcurl')
> scan(url_libcurl_en, "")
> # Read 0 items
> # character(0)
>
>
> Michael
>
> ________________________________________
> From: R-devel [[hidden email]] on behalf of Stephen Berman [[hidden email]]
> Sent: Monday, 4 June 2018 7:26 PM
> To: Martin Maechler
> Cc: R-devel
> Subject: Re: [Rd] encoding argument of source() in 3.5.0
>
> On Mon, 4 Jun 2018 10:44:11 +0200 Martin Maechler <[hidden email]> wrote:
>
>>>>>>> peter dalgaard
>>>>>>>      on Sun, 3 Jun 2018 23:51:24 +0200 writes:
>>      > Looks like this actually comes from readLines(), nothing
>>      > to do with source() as such: In current R-devel (still):
>>
>>      >> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>>      >> readLines(f)
>>      > character(0)
>>      >> close(f)
>>      >> f <- file("http://home.versanet.de/~s-berman/source2.R")
>>      >> readLines(f)
>>      > [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
>>      > [3] "}"
>>
>>      > -pd
>>
>> and that's not even readLines(), but rather how exactly the
>> connection is defined [even in your example above]
>>
>>    > urlR <- "http://home.versanet.de/~s-berman/source2.R"
>>    > readLines(urlR, encoding="UTF-8")
>>    [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
>>    [3] "}"
>>    > f <- file(urlR, encoding = "UTF-8")
>>    > readLines(f)
>>    character(0)
>>
>> and the same behavior with scan()  instead of readLines() :
>>
>>> scan(urlR,"") # works
>> Read 7 items
>> [1] "source.test2"       "<-"                 "function()"         "{"
>> [5] "print(\"Non-ascii:" "äöüß\")"            "}"
>>> scan(f,"") # fails
>> Read 0 items
>> character(0)
>> So it seems as if the bug is in the file() [or url()] C code ..
> Yes, the problem seems to be restricted to loading files from a
> (non-local) URL; i.e. this works fine on my computer:
>
>    > source("file:///home/steve/prog/R/source2.R", encoding="UTF-8")
>
> Also, I noticed this works too:
>
>    > read.table("http://home.versanet.de/~s-berman/table2", encoding="UTF-8", skip=1)
>
> where (if I read the source correctly) using `skip=1' makes read.table()
> call readLines().  (The read.table() invocation also works without
> `skip'.)
>
>> But then we also have to consider Windows .. where I think most changes have
>> happened during the  R-3.4.4 --> R-3.5.0  transition.
> Yes, please.  I need (or at least it would be convenient) to be able to
> load R code containing non-ascii characters from the web under
> MS-Windows.
>
> Steve Berman
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> __________________________________________________________________________________________________________
> This email has been scanned for the NSW Ministry of Health by the Websense Hosted Email Security System.
> Emails and attachments are monitored to ensure compliance with the NSW Ministry of health's Electronic Messaging Policy.
> __________________________________________________________________________________________________________
>
> _______________________________________________________________________________________________________
> Disclaimer: This message is intended for the addressee named and may contain confidential information.
> If you are not the intended recipient, please delete it and notify the sender.
> Views expressed in this message are those of the individual sender, and are not necessarily the views of the NSW Ministry of Health.
> _______________________________________________________________________________________________________
> This email has been scanned for the NSW Ministry of Health by the Websense Hosted Email Security System.
> Emails and attachments are monitored to ensure compliance with the NSW Ministry of Health's Electronic Messaging Policy.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encoding argument of source() in 3.5.0

Stephen Berman
On Tue, 5 Jun 2018 16:03:54 +0200 Tomas Kalibera <[hidden email]> wrote:

> Thanks for the report, fixed in R-devel (74848).
>
> Best
> Tomas

FTR, I confirm that the problem I reported is now fixed under both
GNU/Linux and MS-Windows.  Thanks!

Steve Berman

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: encoding argument of source() in 3.5.0

Tomas Kalibera
Thanks, the fix is now in R-patched and will be included in 3.5.1.
Tomas

On 06/06/2018 09:54 PM, Stephen Berman wrote:

> On Tue, 5 Jun 2018 16:03:54 +0200 Tomas Kalibera <[hidden email]> wrote:
>
>> Thanks for the report, fixed in R-devel (74848).
>>
>> Best
>> Tomas
> FTR, I confirm that the problem I reported is now fixed under both
> GNU/Linux and MS-Windows.  Thanks!
>
> Steve Berman

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel