suggestion to fix packageDescription() for Windows users

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

suggestion to fix packageDescription() for Windows users

Ben Marwick
Recently I was trying to cite a package where the authors have ä
and ø in their names. I found that on Windows the citation() function
did not return the authors' names at all, but on Linux there was no
problem (sessionInfos at the bottom):

On Windows, no author names are returned:

#---------------

 > citation("readr")

To cite package ‘readr’ in publications use:

   (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
   https://CRAN.R-project.org/package=readr

A BibTeX entry for LaTeX users is

   @Manual{,
     title = {readr: Read Rectangular Text Data},
     year = {2017},
     note = {R package version 1.1.1},
     url = {https://CRAN.R-project.org/package=readr},
   }

ATTENTION: This citation information has been auto-generated from the
package DESCRIPTION file and may need manual editing, see
‘help("citation")’.
#---------------

On Linux we do see the author names:

#---------------
 > citation("readr")

To cite package ‘readr’ in publications use:

   Hadley Wickham, Jim Hester and Romain Francois (2017). readr:
   Read Rectangular Text Data. R package version 1.1.1.
   https://CRAN.R-project.org/package=readr

A BibTeX entry for LaTeX users is

   @Manual{,
     title = {readr: Read Rectangular Text Data},
     author = {Hadley Wickham and Jim Hester and Romain Francois},
     year = {2017},
     note = {R package version 1.1.1},
     url = {https://CRAN.R-project.org/package=readr},
   }
#---------------

This appears to be an OS-dependent encoding issue. The citation function
does not take an encoding argument, so it's not possible to set the
encoding at the point where that function is used. The citation function
working with the packageDescription function, which does have an
encoding argument, but the default is not useful for Windows when there
is an encoding set in the DESCRIPTION of the package (in this case UTF-8).

We can set the encoding argument in packageDescription so it works in
Windows to give the authors as expected, but it is very inconvenient to
generate citations directly from the output of this function. So I'd
like to propose a solution this problem by changing one line in the
packageDescription function, like so, from:

#---------------
if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
#---------------

to:

#---------------
if ((missing(encoding) && Sys.getlocale("LC_CTYPE") == "C") |
unname(Sys.info()['sysname']) == "Windows")
#---------------

If I understand correctly, that will force ASCII//TRANSLIT encoding when
DESCRIPTION files are read by packageDescription() on Windows machines.
The upside is that Windows users will get the authors in the package
citation, unlike the current situation. The downside is that the exotic
symbols in the authors' names are replaced with common ones that are
similar.

I think getting the citations to easily include the authors' names is
pretty important, even if their names have exotic characters, so this is
worth fixing. Is this edit to packageDescription the best way to solve
this problem of exotic characters preventing the authors' names from
showing on Windows?

thanks,

Ben




Windows sessionInfo

#---------------
 > sessionInfo()
R version 3.4.0 Patched (2017-05-10 r72670)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
  [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
tools_3.4.0
  [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11   knitr_1.16
rlang_0.1.1
[11] fortunes_1.5-4
#---------------

Linux sessionInfo:

#---------------
 > sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.10

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.1 yaml_2.1.14 knitr_1.16
#---------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

Duncan Murdoch-2
On 17/06/2017 7:10 AM, Ben Marwick wrote:
> Recently I was trying to cite a package where the authors have ä
> and ø in their names. I found that on Windows the citation() function
> did not return the authors' names at all, but on Linux there was no
> problem (sessionInfos at the bottom):
>
> On Windows, no author names are returned:

I'm not seeing this.  You have fairly strange localization settings; see
comments below.

>
> #---------------
>
>  > citation("readr")
>
> To cite package ‘readr’ in publications use:
>
>    (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
>    https://CRAN.R-project.org/package=readr
>
> A BibTeX entry for LaTeX users is
>
>    @Manual{,
>      title = {readr: Read Rectangular Text Data},
>      year = {2017},
>      note = {R package version 1.1.1},
>      url = {https://CRAN.R-project.org/package=readr},
>    }
>
> ATTENTION: This citation information has been auto-generated from the
> package DESCRIPTION file and may need manual editing, see
> ‘help("citation")’.
> #---------------
>
> On Linux we do see the author names:
>
> #---------------
>  > citation("readr")
>
> To cite package ‘readr’ in publications use:
>
>    Hadley Wickham, Jim Hester and Romain Francois (2017). readr:
>    Read Rectangular Text Data. R package version 1.1.1.
>    https://CRAN.R-project.org/package=readr
>
> A BibTeX entry for LaTeX users is
>
>    @Manual{,
>      title = {readr: Read Rectangular Text Data},
>      author = {Hadley Wickham and Jim Hester and Romain Francois},
>      year = {2017},
>      note = {R package version 1.1.1},
>      url = {https://CRAN.R-project.org/package=readr},
>    }
> #---------------
>
> This appears to be an OS-dependent encoding issue. The citation function
> does not take an encoding argument, so it's not possible to set the
> encoding at the point where that function is used. The citation function
> working with the packageDescription function, which does have an
> encoding argument, but the default is not useful for Windows when there
> is an encoding set in the DESCRIPTION of the package (in this case UTF-8).
>
> We can set the encoding argument in packageDescription so it works in
> Windows to give the authors as expected, but it is very inconvenient to
> generate citations directly from the output of this function. So I'd
> like to propose a solution this problem by changing one line in the
> packageDescription function, like so, from:
>
> #---------------
> if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
> #---------------
>
> to:
>
> #---------------
> if ((missing(encoding) && Sys.getlocale("LC_CTYPE") == "C") |
> unname(Sys.info()['sysname']) == "Windows")
> #---------------
>
> If I understand correctly, that will force ASCII//TRANSLIT encoding when
> DESCRIPTION files are read by packageDescription() on Windows machines.
> The upside is that Windows users will get the authors in the package
> citation, unlike the current situation. The downside is that the exotic
> symbols in the authors' names are replaced with common ones that are
> similar.
>
> I think getting the citations to easily include the authors' names is
> pretty important, even if their names have exotic characters, so this is
> worth fixing. Is this edit to packageDescription the best way to solve
> this problem of exotic characters preventing the authors' names from
> showing on Windows?
>
> thanks,
>
> Ben
>
>
>
>
> Windows sessionInfo
>
> #---------------
>  > sessionInfo()
> R version 3.4.0 Patched (2017-05-10 r72670)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_Australia.1252
> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
> [3] LC_MONETARY=English_Australia.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_Australia.1252

I don't know what English_Australia.1252 does that's different from what
I use (English_Canada.1252), but the Chinese locale setting could cause
trouble.  Could you try setting this (presumably in the Windows control
panel) to be consistent?  You're using a much simpler setting on Linux.

Duncan Murdoch

>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
>   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
> tools_3.4.0
>   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11   knitr_1.16
> rlang_0.1.1
> [11] fortunes_1.5-4
> #---------------
>
> Linux sessionInfo:
>
> #---------------
>  > sessionInfo()
> R version 3.3.1 (2016-06-21)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.10
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
> #---------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

Ben Marwick
Hi Duncan,

Thanks for your reply. Yes, it does seem to be specific to the CTYPE
setting to Chinese on Windows. If I set it to English using
Sys.setlocale() there is no problem, then back to Chinese and the
authors disappear:

Sys.setlocale("LC_ALL","English")
citation("readr")

#' To cite package ‘readr’ in publications use:
#'
#'   Hadley Wickham, Jim Hester and Romain Francois (2017). readr: Read
#' Rectangular Text Data. R package version 1.1.1.
#' https://CRAN.R-project.org/package=readr
#'
#' A BibTeX entry for LaTeX users is
#'
#' @Manual{,
#'   title = {readr: Read Rectangular Text Data},
#'   author = {Hadley Wickham and Jim Hester and Romain Francois},
#'   year = {2017},
#'   note = {R package version 1.1.1},
#'   url = {https://CRAN.R-project.org/package=readr},
#' }


Sys.setlocale("LC_CTYPE", "Chinese")
citation("readr")

#'
#' To cite package ‘readr’ in publications use:
#'
#'   (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
#' https://CRAN.R-project.org/package=readr
#'
#' A BibTeX entry for LaTeX users is
#'
#' @Manual{,
#'   title = {readr: Read Rectangular Text Data},
#'   year = {2017},
#'   note = {R package version 1.1.1},
#'   url = {https://CRAN.R-project.org/package=readr},
#' }
#'
#' ATTENTION: This citation information has been auto-generated from the
#' package DESCRIPTION file and may need manual editing, see
#' ‘help("citation")’.

Where do we go from here? I do want to use the Chinese locale with R on
Windows (and perhaps others do too), so switching the locale isn't a fix.

Thanks,

Ben

On 17/06/2017 10:36 PM, Duncan Murdoch wrote:

> On 17/06/2017 7:10 AM, Ben Marwick wrote:
>> Recently I was trying to cite a package where the authors have ä
>> and ø in their names. I found that on Windows the citation() function
>> did not return the authors' names at all, but on Linux there was no
>> problem (sessionInfos at the bottom):
>>
>> On Windows, no author names are returned:
>
> I'm not seeing this.  You have fairly strange localization settings; see
> comments below.
>
>>
>> #---------------
>>
>>  > citation("readr")
>>
>> To cite package ‘readr’ in publications use:
>>
>>    (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
>>    https://CRAN.R-project.org/package=readr
>>
>> A BibTeX entry for LaTeX users is
>>
>>    @Manual{,
>>      title = {readr: Read Rectangular Text Data},
>>      year = {2017},
>>      note = {R package version 1.1.1},
>>      url = {https://CRAN.R-project.org/package=readr},
>>    }
>>
>> ATTENTION: This citation information has been auto-generated from the
>> package DESCRIPTION file and may need manual editing, see
>> ‘help("citation")’.
>> #---------------
>>
>> On Linux we do see the author names:
>>
>> #---------------
>>  > citation("readr")
>>
>> To cite package ‘readr’ in publications use:
>>
>>    Hadley Wickham, Jim Hester and Romain Francois (2017). readr:
>>    Read Rectangular Text Data. R package version 1.1.1.
>>    https://CRAN.R-project.org/package=readr
>>
>> A BibTeX entry for LaTeX users is
>>
>>    @Manual{,
>>      title = {readr: Read Rectangular Text Data},
>>      author = {Hadley Wickham and Jim Hester and Romain Francois},
>>      year = {2017},
>>      note = {R package version 1.1.1},
>>      url = {https://CRAN.R-project.org/package=readr},
>>    }
>> #---------------
>>
>> This appears to be an OS-dependent encoding issue. The citation function
>> does not take an encoding argument, so it's not possible to set the
>> encoding at the point where that function is used. The citation function
>> working with the packageDescription function, which does have an
>> encoding argument, but the default is not useful for Windows when there
>> is an encoding set in the DESCRIPTION of the package (in this case
>> UTF-8).
>>
>> We can set the encoding argument in packageDescription so it works in
>> Windows to give the authors as expected, but it is very inconvenient to
>> generate citations directly from the output of this function. So I'd
>> like to propose a solution this problem by changing one line in the
>> packageDescription function, like so, from:
>>
>> #---------------
>> if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
>> #---------------
>>
>> to:
>>
>> #---------------
>> if ((missing(encoding) && Sys.getlocale("LC_CTYPE") == "C") |
>> unname(Sys.info()['sysname']) == "Windows")
>> #---------------
>>
>> If I understand correctly, that will force ASCII//TRANSLIT encoding when
>> DESCRIPTION files are read by packageDescription() on Windows machines.
>> The upside is that Windows users will get the authors in the package
>> citation, unlike the current situation. The downside is that the exotic
>> symbols in the authors' names are replaced with common ones that are
>> similar.
>>
>> I think getting the citations to easily include the authors' names is
>> pretty important, even if their names have exotic characters, so this is
>> worth fixing. Is this edit to packageDescription the best way to solve
>> this problem of exotic characters preventing the authors' names from
>> showing on Windows?
>>
>> thanks,
>>
>> Ben
>>
>>
>>
>>
>> Windows sessionInfo
>>
>> #---------------
>>  > sessionInfo()
>> R version 3.4.0 Patched (2017-05-10 r72670)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=English_Australia.1252
>> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
>> [3] LC_MONETARY=English_Australia.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_Australia.1252
>
> I don't know what English_Australia.1252 does that's different from what
> I use (English_Canada.1252), but the Chinese locale setting could cause
> trouble.  Could you try setting this (presumably in the Windows control
> panel) to be consistent?  You're using a much simpler setting on Linux.
>
> Duncan Murdoch
>
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>>   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
>> tools_3.4.0
>>   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11   knitr_1.16
>> rlang_0.1.1
>> [11] fortunes_1.5-4
>> #---------------
>>
>> Linux sessionInfo:
>>
>> #---------------
>>  > sessionInfo()
>> R version 3.3.1 (2016-06-21)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.10
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
>> #---------------
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

Duncan Murdoch-2
On 17/06/2017 9:13 AM, Ben Marwick wrote:
> Hi Duncan,
>
> Thanks for your reply. Yes, it does seem to be specific to the CTYPE
> setting to Chinese on Windows. If I set it to English using
> Sys.setlocale() there is no problem, then back to Chinese and the
> authors disappear:
>
> Sys.setlocale("LC_ALL","English")
> citation("readr")

Thanks, that makes the problem reproducible.  I'll submit it as a bug
report.  Maybe someone from Microsoft will fix it.

Duncan Murdoch

>
> #' To cite package ‘readr’ in publications use:
> #'
> #'   Hadley Wickham, Jim Hester and Romain Francois (2017). readr: Read
> #' Rectangular Text Data. R package version 1.1.1.
> #' https://CRAN.R-project.org/package=readr
> #'
> #' A BibTeX entry for LaTeX users is
> #'
> #' @Manual{,
> #'   title = {readr: Read Rectangular Text Data},
> #'   author = {Hadley Wickham and Jim Hester and Romain Francois},
> #'   year = {2017},
> #'   note = {R package version 1.1.1},
> #'   url = {https://CRAN.R-project.org/package=readr},
> #' }
>
>
> Sys.setlocale("LC_CTYPE", "Chinese")
> citation("readr")
>
> #'
> #' To cite package ‘readr’ in publications use:
> #'
> #'   (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
> #' https://CRAN.R-project.org/package=readr
> #'
> #' A BibTeX entry for LaTeX users is
> #'
> #' @Manual{,
> #'   title = {readr: Read Rectangular Text Data},
> #'   year = {2017},
> #'   note = {R package version 1.1.1},
> #'   url = {https://CRAN.R-project.org/package=readr},
> #' }
> #'
> #' ATTENTION: This citation information has been auto-generated from the
> #' package DESCRIPTION file and may need manual editing, see
> #' ‘help("citation")’.
>
> Where do we go from here? I do want to use the Chinese locale with R on
> Windows (and perhaps others do too), so switching the locale isn't a fix.
>
> Thanks,
>
> Ben
>
> On 17/06/2017 10:36 PM, Duncan Murdoch wrote:
>> On 17/06/2017 7:10 AM, Ben Marwick wrote:
>>> Recently I was trying to cite a package where the authors have ä
>>> and ø in their names. I found that on Windows the citation() function
>>> did not return the authors' names at all, but on Linux there was no
>>> problem (sessionInfos at the bottom):
>>>
>>> On Windows, no author names are returned:
>>
>> I'm not seeing this.  You have fairly strange localization settings; see
>> comments below.
>>
>>>
>>> #---------------
>>>
>>>  > citation("readr")
>>>
>>> To cite package ‘readr’ in publications use:
>>>
>>>    (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
>>>    https://CRAN.R-project.org/package=readr
>>>
>>> A BibTeX entry for LaTeX users is
>>>
>>>    @Manual{,
>>>      title = {readr: Read Rectangular Text Data},
>>>      year = {2017},
>>>      note = {R package version 1.1.1},
>>>      url = {https://CRAN.R-project.org/package=readr},
>>>    }
>>>
>>> ATTENTION: This citation information has been auto-generated from the
>>> package DESCRIPTION file and may need manual editing, see
>>> ‘help("citation")’.
>>> #---------------
>>>
>>> On Linux we do see the author names:
>>>
>>> #---------------
>>>  > citation("readr")
>>>
>>> To cite package ‘readr’ in publications use:
>>>
>>>    Hadley Wickham, Jim Hester and Romain Francois (2017). readr:
>>>    Read Rectangular Text Data. R package version 1.1.1.
>>>    https://CRAN.R-project.org/package=readr
>>>
>>> A BibTeX entry for LaTeX users is
>>>
>>>    @Manual{,
>>>      title = {readr: Read Rectangular Text Data},
>>>      author = {Hadley Wickham and Jim Hester and Romain Francois},
>>>      year = {2017},
>>>      note = {R package version 1.1.1},
>>>      url = {https://CRAN.R-project.org/package=readr},
>>>    }
>>> #---------------
>>>
>>> This appears to be an OS-dependent encoding issue. The citation function
>>> does not take an encoding argument, so it's not possible to set the
>>> encoding at the point where that function is used. The citation function
>>> working with the packageDescription function, which does have an
>>> encoding argument, but the default is not useful for Windows when there
>>> is an encoding set in the DESCRIPTION of the package (in this case
>>> UTF-8).
>>>
>>> We can set the encoding argument in packageDescription so it works in
>>> Windows to give the authors as expected, but it is very inconvenient to
>>> generate citations directly from the output of this function. So I'd
>>> like to propose a solution this problem by changing one line in the
>>> packageDescription function, like so, from:
>>>
>>> #---------------
>>> if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
>>> #---------------
>>>
>>> to:
>>>
>>> #---------------
>>> if ((missing(encoding) && Sys.getlocale("LC_CTYPE") == "C") |
>>> unname(Sys.info()['sysname']) == "Windows")
>>> #---------------
>>>
>>> If I understand correctly, that will force ASCII//TRANSLIT encoding when
>>> DESCRIPTION files are read by packageDescription() on Windows machines.
>>> The upside is that Windows users will get the authors in the package
>>> citation, unlike the current situation. The downside is that the exotic
>>> symbols in the authors' names are replaced with common ones that are
>>> similar.
>>>
>>> I think getting the citations to easily include the authors' names is
>>> pretty important, even if their names have exotic characters, so this is
>>> worth fixing. Is this edit to packageDescription the best way to solve
>>> this problem of exotic characters preventing the authors' names from
>>> showing on Windows?
>>>
>>> thanks,
>>>
>>> Ben
>>>
>>>
>>>
>>>
>>> Windows sessionInfo
>>>
>>> #---------------
>>>  > sessionInfo()
>>> R version 3.4.0 Patched (2017-05-10 r72670)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>>
>>> Matrix products: default
>>>
>>> locale:
>>> [1] LC_COLLATE=English_Australia.1252
>>> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
>>> [3] LC_MONETARY=English_Australia.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_Australia.1252
>>
>> I don't know what English_Australia.1252 does that's different from what
>> I use (English_Canada.1252), but the Chinese locale setting could cause
>> trouble.  Could you try setting this (presumably in the Windows control
>> panel) to be consistent?  You're using a much simpler setting on Linux.
>>
>> Duncan Murdoch
>>
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> loaded via a namespace (and not attached):
>>>   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
>>> tools_3.4.0
>>>   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11   knitr_1.16
>>> rlang_0.1.1
>>> [11] fortunes_1.5-4
>>> #---------------
>>>
>>> Linux sessionInfo:
>>>
>>> #---------------
>>>  > sessionInfo()
>>> R version 3.3.1 (2016-06-21)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 16.10
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
>>> #---------------
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

Ben Marwick
Thanks very much, I see your bug report here:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17291

On 18/06/2017 2:26 AM, Duncan Murdoch wrote:

> On 17/06/2017 9:13 AM, Ben Marwick wrote:
>> Hi Duncan,
>>
>> Thanks for your reply. Yes, it does seem to be specific to the CTYPE
>> setting to Chinese on Windows. If I set it to English using
>> Sys.setlocale() there is no problem, then back to Chinese and the
>> authors disappear:
>>
>> Sys.setlocale("LC_ALL","English")
>> citation("readr")
>
> Thanks, that makes the problem reproducible.  I'll submit it as a bug
> report.  Maybe someone from Microsoft will fix it.
>
> Duncan Murdoch
>
>>
>> #' To cite package ‘readr’ in publications use:
>> #'
>> #'   Hadley Wickham, Jim Hester and Romain Francois (2017). readr: Read
>> #' Rectangular Text Data. R package version 1.1.1.
>> #' https://CRAN.R-project.org/package=readr
>> #'
>> #' A BibTeX entry for LaTeX users is
>> #'
>> #' @Manual{,
>> #'   title = {readr: Read Rectangular Text Data},
>> #'   author = {Hadley Wickham and Jim Hester and Romain Francois},
>> #'   year = {2017},
>> #'   note = {R package version 1.1.1},
>> #'   url = {https://CRAN.R-project.org/package=readr},
>> #' }
>>
>>
>> Sys.setlocale("LC_CTYPE", "Chinese")
>> citation("readr")
>>
>> #'
>> #' To cite package ‘readr’ in publications use:
>> #'
>> #'   (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
>> #' https://CRAN.R-project.org/package=readr
>> #'
>> #' A BibTeX entry for LaTeX users is
>> #'
>> #' @Manual{,
>> #'   title = {readr: Read Rectangular Text Data},
>> #'   year = {2017},
>> #'   note = {R package version 1.1.1},
>> #'   url = {https://CRAN.R-project.org/package=readr},
>> #' }
>> #'
>> #' ATTENTION: This citation information has been auto-generated from the
>> #' package DESCRIPTION file and may need manual editing, see
>> #' ‘help("citation")’.
>>
>> Where do we go from here? I do want to use the Chinese locale with R on
>> Windows (and perhaps others do too), so switching the locale isn't a fix.
>>
>> Thanks,
>>
>> Ben
>>
>> On 17/06/2017 10:36 PM, Duncan Murdoch wrote:
>>> On 17/06/2017 7:10 AM, Ben Marwick wrote:
>>>> Recently I was trying to cite a package where the authors have ä
>>>> and ø in their names. I found that on Windows the citation() function
>>>> did not return the authors' names at all, but on Linux there was no
>>>> problem (sessionInfos at the bottom):
>>>>
>>>> On Windows, no author names are returned:
>>>
>>> I'm not seeing this.  You have fairly strange localization settings; see
>>> comments below.
>>>
>>>>
>>>> #---------------
>>>>
>>>>  > citation("readr")
>>>>
>>>> To cite package ‘readr’ in publications use:
>>>>
>>>>    (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
>>>>    https://CRAN.R-project.org/package=readr
>>>>
>>>> A BibTeX entry for LaTeX users is
>>>>
>>>>    @Manual{,
>>>>      title = {readr: Read Rectangular Text Data},
>>>>      year = {2017},
>>>>      note = {R package version 1.1.1},
>>>>      url = {https://CRAN.R-project.org/package=readr},
>>>>    }
>>>>
>>>> ATTENTION: This citation information has been auto-generated from the
>>>> package DESCRIPTION file and may need manual editing, see
>>>> ‘help("citation")’.
>>>> #---------------
>>>>
>>>> On Linux we do see the author names:
>>>>
>>>> #---------------
>>>>  > citation("readr")
>>>>
>>>> To cite package ‘readr’ in publications use:
>>>>
>>>>    Hadley Wickham, Jim Hester and Romain Francois (2017). readr:
>>>>    Read Rectangular Text Data. R package version 1.1.1.
>>>>    https://CRAN.R-project.org/package=readr
>>>>
>>>> A BibTeX entry for LaTeX users is
>>>>
>>>>    @Manual{,
>>>>      title = {readr: Read Rectangular Text Data},
>>>>      author = {Hadley Wickham and Jim Hester and Romain Francois},
>>>>      year = {2017},
>>>>      note = {R package version 1.1.1},
>>>>      url = {https://CRAN.R-project.org/package=readr},
>>>>    }
>>>> #---------------
>>>>
>>>> This appears to be an OS-dependent encoding issue. The citation
>>>> function
>>>> does not take an encoding argument, so it's not possible to set the
>>>> encoding at the point where that function is used. The citation
>>>> function
>>>> working with the packageDescription function, which does have an
>>>> encoding argument, but the default is not useful for Windows when there
>>>> is an encoding set in the DESCRIPTION of the package (in this case
>>>> UTF-8).
>>>>
>>>> We can set the encoding argument in packageDescription so it works in
>>>> Windows to give the authors as expected, but it is very inconvenient to
>>>> generate citations directly from the output of this function. So I'd
>>>> like to propose a solution this problem by changing one line in the
>>>> packageDescription function, like so, from:
>>>>
>>>> #---------------
>>>> if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
>>>> #---------------
>>>>
>>>> to:
>>>>
>>>> #---------------
>>>> if ((missing(encoding) && Sys.getlocale("LC_CTYPE") == "C") |
>>>> unname(Sys.info()['sysname']) == "Windows")
>>>> #---------------
>>>>
>>>> If I understand correctly, that will force ASCII//TRANSLIT encoding
>>>> when
>>>> DESCRIPTION files are read by packageDescription() on Windows machines.
>>>> The upside is that Windows users will get the authors in the package
>>>> citation, unlike the current situation. The downside is that the exotic
>>>> symbols in the authors' names are replaced with common ones that are
>>>> similar.
>>>>
>>>> I think getting the citations to easily include the authors' names is
>>>> pretty important, even if their names have exotic characters, so
>>>> this is
>>>> worth fixing. Is this edit to packageDescription the best way to solve
>>>> this problem of exotic characters preventing the authors' names from
>>>> showing on Windows?
>>>>
>>>> thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>>
>>>> Windows sessionInfo
>>>>
>>>> #---------------
>>>>  > sessionInfo()
>>>> R version 3.4.0 Patched (2017-05-10 r72670)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>>>
>>>> Matrix products: default
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_Australia.1252
>>>> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
>>>> [3] LC_MONETARY=English_Australia.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_Australia.1252
>>>
>>> I don't know what English_Australia.1252 does that's different from what
>>> I use (English_Canada.1252), but the Chinese locale setting could cause
>>> trouble.  Could you try setting this (presumably in the Windows control
>>> panel) to be consistent?  You're using a much simpler setting on Linux.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> loaded via a namespace (and not attached):
>>>>   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
>>>> tools_3.4.0
>>>>   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11   knitr_1.16
>>>> rlang_0.1.1
>>>> [11] fortunes_1.5-4
>>>> #---------------
>>>>
>>>> Linux sessionInfo:
>>>>
>>>> #---------------
>>>>  > sessionInfo()
>>>> R version 3.3.1 (2016-06-21)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>> Running under: Ubuntu 16.10
>>>>
>>>> locale:
>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
>>>> #---------------
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

Andrie
In reply to this post by Duncan Murdoch-2
Hi, Duncan

i have forwarded this thread to Nathan, who promised to look into it.

Andrie

On 17 Jun 2017 17:26, "Duncan Murdoch" <[hidden email]> wrote:

> On 17/06/2017 9:13 AM, Ben Marwick wrote:
>
>> Hi Duncan,
>>
>> Thanks for your reply. Yes, it does seem to be specific to the CTYPE
>> setting to Chinese on Windows. If I set it to English using
>> Sys.setlocale() there is no problem, then back to Chinese and the
>> authors disappear:
>>
>> Sys.setlocale("LC_ALL","English")
>> citation("readr")
>>
>
> Thanks, that makes the problem reproducible.  I'll submit it as a bug
> report.  Maybe someone from Microsoft will fix it.
>
> Duncan Murdoch
>
>
>> #' To cite package ‘readr’ in publications use:
>> #'
>> #'   Hadley Wickham, Jim Hester and Romain Francois (2017). readr: Read
>> #' Rectangular Text Data. R package version 1.1.1.
>> #' https://CRAN.R-project.org/package=readr
>> #'
>> #' A BibTeX entry for LaTeX users is
>> #'
>> #' @Manual{,
>> #'   title = {readr: Read Rectangular Text Data},
>> #'   author = {Hadley Wickham and Jim Hester and Romain Francois},
>> #'   year = {2017},
>> #'   note = {R package version 1.1.1},
>> #'   url = {https://CRAN.R-project.org/package=readr},
>> #' }
>>
>>
>> Sys.setlocale("LC_CTYPE", "Chinese")
>> citation("readr")
>>
>> #'
>> #' To cite package ‘readr’ in publications use:
>> #'
>> #'   (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
>> #' https://CRAN.R-project.org/package=readr
>> #'
>> #' A BibTeX entry for LaTeX users is
>> #'
>> #' @Manual{,
>> #'   title = {readr: Read Rectangular Text Data},
>> #'   year = {2017},
>> #'   note = {R package version 1.1.1},
>> #'   url = {https://CRAN.R-project.org/package=readr},
>> #' }
>> #'
>> #' ATTENTION: This citation information has been auto-generated from the
>> #' package DESCRIPTION file and may need manual editing, see
>> #' ‘help("citation")’.
>>
>> Where do we go from here? I do want to use the Chinese locale with R on
>> Windows (and perhaps others do too), so switching the locale isn't a fix.
>>
>> Thanks,
>>
>> Ben
>>
>> On 17/06/2017 10:36 PM, Duncan Murdoch wrote:
>>
>>> On 17/06/2017 7:10 AM, Ben Marwick wrote:
>>>
>>>> Recently I was trying to cite a package where the authors have ä
>>>> and ø in their names. I found that on Windows the citation() function
>>>> did not return the authors' names at all, but on Linux there was no
>>>> problem (sessionInfos at the bottom):
>>>>
>>>> On Windows, no author names are returned:
>>>>
>>>
>>> I'm not seeing this.  You have fairly strange localization settings; see
>>> comments below.
>>>
>>>
>>>> #---------------
>>>>
>>>>  > citation("readr")
>>>>
>>>> To cite package ‘readr’ in publications use:
>>>>
>>>>    (2017). readr: Read Rectangular Text Data. R package version 1.1.1.
>>>>    https://CRAN.R-project.org/package=readr
>>>>
>>>> A BibTeX entry for LaTeX users is
>>>>
>>>>    @Manual{,
>>>>      title = {readr: Read Rectangular Text Data},
>>>>      year = {2017},
>>>>      note = {R package version 1.1.1},
>>>>      url = {https://CRAN.R-project.org/package=readr},
>>>>    }
>>>>
>>>> ATTENTION: This citation information has been auto-generated from the
>>>> package DESCRIPTION file and may need manual editing, see
>>>> ‘help("citation")’.
>>>> #---------------
>>>>
>>>> On Linux we do see the author names:
>>>>
>>>> #---------------
>>>>  > citation("readr")
>>>>
>>>> To cite package ‘readr’ in publications use:
>>>>
>>>>    Hadley Wickham, Jim Hester and Romain Francois (2017). readr:
>>>>    Read Rectangular Text Data. R package version 1.1.1.
>>>>    https://CRAN.R-project.org/package=readr
>>>>
>>>> A BibTeX entry for LaTeX users is
>>>>
>>>>    @Manual{,
>>>>      title = {readr: Read Rectangular Text Data},
>>>>      author = {Hadley Wickham and Jim Hester and Romain Francois},
>>>>      year = {2017},
>>>>      note = {R package version 1.1.1},
>>>>      url = {https://CRAN.R-project.org/package=readr},
>>>>    }
>>>> #---------------
>>>>
>>>> This appears to be an OS-dependent encoding issue. The citation function
>>>> does not take an encoding argument, so it's not possible to set the
>>>> encoding at the point where that function is used. The citation function
>>>> working with the packageDescription function, which does have an
>>>> encoding argument, but the default is not useful for Windows when there
>>>> is an encoding set in the DESCRIPTION of the package (in this case
>>>> UTF-8).
>>>>
>>>> We can set the encoding argument in packageDescription so it works in
>>>> Windows to give the authors as expected, but it is very inconvenient to
>>>> generate citations directly from the output of this function. So I'd
>>>> like to propose a solution this problem by changing one line in the
>>>> packageDescription function, like so, from:
>>>>
>>>> #---------------
>>>> if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
>>>> #---------------
>>>>
>>>> to:
>>>>
>>>> #---------------
>>>> if ((missing(encoding) && Sys.getlocale("LC_CTYPE") == "C") |
>>>> unname(Sys.info()['sysname']) == "Windows")
>>>> #---------------
>>>>
>>>> If I understand correctly, that will force ASCII//TRANSLIT encoding when
>>>> DESCRIPTION files are read by packageDescription() on Windows machines.
>>>> The upside is that Windows users will get the authors in the package
>>>> citation, unlike the current situation. The downside is that the exotic
>>>> symbols in the authors' names are replaced with common ones that are
>>>> similar.
>>>>
>>>> I think getting the citations to easily include the authors' names is
>>>> pretty important, even if their names have exotic characters, so this is
>>>> worth fixing. Is this edit to packageDescription the best way to solve
>>>> this problem of exotic characters preventing the authors' names from
>>>> showing on Windows?
>>>>
>>>> thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>>
>>>> Windows sessionInfo
>>>>
>>>> #---------------
>>>>  > sessionInfo()
>>>> R version 3.4.0 Patched (2017-05-10 r72670)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>>>
>>>> Matrix products: default
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_Australia.1252
>>>> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
>>>> [3] LC_MONETARY=English_Australia.1252
>>>> [4] LC_NUMERIC=C
>>>> [5] LC_TIME=English_Australia.1252
>>>>
>>>
>>> I don't know what English_Australia.1252 does that's different from what
>>> I use (English_Canada.1252), but the Chinese locale setting could cause
>>> trouble.  Could you try setting this (presumably in the Windows control
>>> panel) to be consistent?  You're using a much simpler setting on Linux.
>>>
>>> Duncan Murdoch
>>>
>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> loaded via a namespace (and not attached):
>>>>   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
>>>> tools_3.4.0
>>>>   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11   knitr_1.16
>>>> rlang_0.1.1
>>>> [11] fortunes_1.5-4
>>>> #---------------
>>>>
>>>> Linux sessionInfo:
>>>>
>>>> #---------------
>>>>  > sessionInfo()
>>>> R version 3.3.1 (2016-06-21)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>> Running under: Ubuntu 16.10
>>>>
>>>> locale:
>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
>>>> #---------------
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

Duncan Murdoch-2
On 18/06/2017 5:57 AM, Andrie de Vries wrote:
> Hi, Duncan
>
> i have forwarded this thread to Nathan, who promised to look into it.

Any progress on this?

Duncan Murdoch

>
> Andrie
>
> On 17 Jun 2017 17:26, "Duncan Murdoch" <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 17/06/2017 9:13 AM, Ben Marwick wrote:
>
>         Hi Duncan,
>
>         Thanks for your reply. Yes, it does seem to be specific to the CTYPE
>         setting to Chinese on Windows. If I set it to English using
>         Sys.setlocale() there is no problem, then back to Chinese and the
>         authors disappear:
>
>         Sys.setlocale("LC_ALL","English")
>         citation("readr")
>
>
>     Thanks, that makes the problem reproducible.  I'll submit it as a
>     bug report.  Maybe someone from Microsoft will fix it.
>
>     Duncan Murdoch
>
>
>         #' To cite package ‘readr’ in publications use:
>         #'
>         #'   Hadley Wickham, Jim Hester and Romain Francois (2017).
>         readr: Read
>         #' Rectangular Text Data. R package version 1.1.1.
>         #' https://CRAN.R-project.org/package=readr
>         <https://CRAN.R-project.org/package=readr>
>         #'
>         #' A BibTeX entry for LaTeX users is
>         #'
>         #' @Manual{,
>         #'   title = {readr: Read Rectangular Text Data},
>         #'   author = {Hadley Wickham and Jim Hester and Romain Francois},
>         #'   year = {2017},
>         #'   note = {R package version 1.1.1},
>         #'   url = {https://CRAN.R-project.org/package=readr
>         <https://CRAN.R-project.org/package=readr>},
>         #' }
>
>
>         Sys.setlocale("LC_CTYPE", "Chinese")
>         citation("readr")
>
>         #'
>         #' To cite package ‘readr’ in publications use:
>         #'
>         #'   (2017). readr: Read Rectangular Text Data. R package
>         version 1.1.1.
>         #' https://CRAN.R-project.org/package=readr
>         <https://CRAN.R-project.org/package=readr>
>         #'
>         #' A BibTeX entry for LaTeX users is
>         #'
>         #' @Manual{,
>         #'   title = {readr: Read Rectangular Text Data},
>         #'   year = {2017},
>         #'   note = {R package version 1.1.1},
>         #'   url = {https://CRAN.R-project.org/package=readr
>         <https://CRAN.R-project.org/package=readr>},
>         #' }
>         #'
>         #' ATTENTION: This citation information has been auto-generated
>         from the
>         #' package DESCRIPTION file and may need manual editing, see
>         #' ‘help("citation")’.
>
>         Where do we go from here? I do want to use the Chinese locale
>         with R on
>         Windows (and perhaps others do too), so switching the locale
>         isn't a fix.
>
>         Thanks,
>
>         Ben
>
>         On 17/06/2017 10:36 PM, Duncan Murdoch wrote:
>
>             On 17/06/2017 7:10 AM, Ben Marwick wrote:
>
>                 Recently I was trying to cite a package where the
>                 authors have ä
>                 and ø in their names. I found that on Windows the
>                 citation() function
>                 did not return the authors' names at all, but on Linux
>                 there was no
>                 problem (sessionInfos at the bottom):
>
>                 On Windows, no author names are returned:
>
>
>             I'm not seeing this.  You have fairly strange localization
>             settings; see
>             comments below.
>
>
>                 #---------------
>
>                  > citation("readr")
>
>                 To cite package ‘readr’ in publications use:
>
>                    (2017). readr: Read Rectangular Text Data. R package
>                 version 1.1.1.
>                    https://CRAN.R-project.org/package=readr
>                 <https://CRAN.R-project.org/package=readr>
>
>                 A BibTeX entry for LaTeX users is
>
>                    @Manual{,
>                      title = {readr: Read Rectangular Text Data},
>                      year = {2017},
>                      note = {R package version 1.1.1},
>                      url = {https://CRAN.R-project.org/package=readr
>                 <https://CRAN.R-project.org/package=readr>},
>                    }
>
>                 ATTENTION: This citation information has been
>                 auto-generated from the
>                 package DESCRIPTION file and may need manual editing, see
>                 ‘help("citation")’.
>                 #---------------
>
>                 On Linux we do see the author names:
>
>                 #---------------
>                  > citation("readr")
>
>                 To cite package ‘readr’ in publications use:
>
>                    Hadley Wickham, Jim Hester and Romain Francois
>                 (2017). readr:
>                    Read Rectangular Text Data. R package version 1.1.1.
>                    https://CRAN.R-project.org/package=readr
>                 <https://CRAN.R-project.org/package=readr>
>
>                 A BibTeX entry for LaTeX users is
>
>                    @Manual{,
>                      title = {readr: Read Rectangular Text Data},
>                      author = {Hadley Wickham and Jim Hester and Romain
>                 Francois},
>                      year = {2017},
>                      note = {R package version 1.1.1},
>                      url = {https://CRAN.R-project.org/package=readr
>                 <https://CRAN.R-project.org/package=readr>},
>                    }
>                 #---------------
>
>                 This appears to be an OS-dependent encoding issue. The
>                 citation function
>                 does not take an encoding argument, so it's not possible
>                 to set the
>                 encoding at the point where that function is used. The
>                 citation function
>                 working with the packageDescription function, which does
>                 have an
>                 encoding argument, but the default is not useful for
>                 Windows when there
>                 is an encoding set in the DESCRIPTION of the package (in
>                 this case
>                 UTF-8).
>
>                 We can set the encoding argument in packageDescription
>                 so it works in
>                 Windows to give the authors as expected, but it is very
>                 inconvenient to
>                 generate citations directly from the output of this
>                 function. So I'd
>                 like to propose a solution this problem by changing one
>                 line in the
>                 packageDescription function, like so, from:
>
>                 #---------------
>                 if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
>                 #---------------
>
>                 to:
>
>                 #---------------
>                 if ((missing(encoding) && Sys.getlocale("LC_CTYPE") ==
>                 "C") |
>                 unname(Sys.info()['sysname']) == "Windows")
>                 #---------------
>
>                 If I understand correctly, that will force
>                 ASCII//TRANSLIT encoding when
>                 DESCRIPTION files are read by packageDescription() on
>                 Windows machines.
>                 The upside is that Windows users will get the authors in
>                 the package
>                 citation, unlike the current situation. The downside is
>                 that the exotic
>                 symbols in the authors' names are replaced with common
>                 ones that are
>                 similar.
>
>                 I think getting the citations to easily include the
>                 authors' names is
>                 pretty important, even if their names have exotic
>                 characters, so this is
>                 worth fixing. Is this edit to packageDescription the
>                 best way to solve
>                 this problem of exotic characters preventing the
>                 authors' names from
>                 showing on Windows?
>
>                 thanks,
>
>                 Ben
>
>
>
>
>                 Windows sessionInfo
>
>                 #---------------
>                  > sessionInfo()
>                 R version 3.4.0 Patched (2017-05-10 r72670)
>                 Platform: x86_64-w64-mingw32/x64 (64-bit)
>                 Running under: Windows 7 x64 (build 7601) Service Pack 1
>
>                 Matrix products: default
>
>                 locale:
>                 [1] LC_COLLATE=English_Australia.1252
>                 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of
>                 China.936
>                 [3] LC_MONETARY=English_Australia.1252
>                 [4] LC_NUMERIC=C
>                 [5] LC_TIME=English_Australia.1252
>
>
>             I don't know what English_Australia.1252 does that's
>             different from what
>             I use (English_Canada.1252), but the Chinese locale setting
>             could cause
>             trouble.  Could you try setting this (presumably in the
>             Windows control
>             panel) to be consistent?  You're using a much simpler
>             setting on Linux.
>
>             Duncan Murdoch
>
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                 methods   base
>
>                 loaded via a namespace (and not attached):
>                   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
>                 tools_3.4.0
>                   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11
>                  knitr_1.16
>                 rlang_0.1.1
>                 [11] fortunes_1.5-4
>                 #---------------
>
>                 Linux sessionInfo:
>
>                 #---------------
>                  > sessionInfo()
>                 R version 3.3.1 (2016-06-21)
>                 Platform: x86_64-pc-linux-gnu (64-bit)
>                 Running under: Ubuntu 16.10
>
>                 locale:
>                   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>                   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>                   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>                   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>                   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>                 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                 methods   base
>
>                 loaded via a namespace (and not attached):
>                 [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
>                 #---------------
>
>                 ______________________________________________
>                 [hidden email] <mailto:[hidden email]>
>                 mailing list
>                 https://stat.ethz.ch/mailman/listinfo/r-devel
>                 <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
>         ______________________________________________
>         [hidden email] <mailto:[hidden email]> mailing list
>         https://stat.ethz.ch/mailman/listinfo/r-devel
>         <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

R devel mailing list
Hi Duncan,

I'm guessing I'll be able to look at this over the weekend/next week (probably closer to next week). It is on my list of things to do and I've just had a few other prior commitments that I have to finish first.

Sorry for the delay. I'll chime in with a status update next week.

Nathan

-----Original Message-----
From: R-devel [mailto:[hidden email]] On Behalf Of Duncan Murdoch
Sent: Friday, June 23, 2017 5:16 AM
To: Andrie de Vries <[hidden email]>
Cc: [hidden email]; Ben Marwick <[hidden email]>
Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

On 18/06/2017 5:57 AM, Andrie de Vries wrote:
> Hi, Duncan
>
> i have forwarded this thread to Nathan, who promised to look into it.

Any progress on this?

Duncan Murdoch

>
> Andrie
>
> On 17 Jun 2017 17:26, "Duncan Murdoch" <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 17/06/2017 9:13 AM, Ben Marwick wrote:
>
>         Hi Duncan,
>
>         Thanks for your reply. Yes, it does seem to be specific to the CTYPE
>         setting to Chinese on Windows. If I set it to English using
>         Sys.setlocale() there is no problem, then back to Chinese and the
>         authors disappear:
>
>         Sys.setlocale("LC_ALL","English")
>         citation("readr")
>
>
>     Thanks, that makes the problem reproducible.  I'll submit it as a
>     bug report.  Maybe someone from Microsoft will fix it.
>
>     Duncan Murdoch
>
>
>         #' To cite package ‘readr’ in publications use:
>         #'
>         #'   Hadley Wickham, Jim Hester and Romain Francois (2017).
>         readr: Read
>         #' Rectangular Text Data. R package version 1.1.1.
>         #' https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>
>         #'
>         #' A BibTeX entry for LaTeX users is
>         #'
>         #' @Manual{,
>         #'   title = {readr: Read Rectangular Text Data},
>         #'   author = {Hadley Wickham and Jim Hester and Romain Francois},
>         #'   year = {2017},
>         #'   note = {R package version 1.1.1},
>         #'   url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>         #' }
>
>
>         Sys.setlocale("LC_CTYPE", "Chinese")
>         citation("readr")
>
>         #'
>         #' To cite package ‘readr’ in publications use:
>         #'
>         #'   (2017). readr: Read Rectangular Text Data. R package
>         version 1.1.1.
>         #' https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>
>         #'
>         #' A BibTeX entry for LaTeX users is
>         #'
>         #' @Manual{,
>         #'   title = {readr: Read Rectangular Text Data},
>         #'   year = {2017},
>         #'   note = {R package version 1.1.1},
>         #'   url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>         #' }
>         #'
>         #' ATTENTION: This citation information has been auto-generated
>         from the
>         #' package DESCRIPTION file and may need manual editing, see
>         #' ‘help("citation")’.
>
>         Where do we go from here? I do want to use the Chinese locale
>         with R on
>         Windows (and perhaps others do too), so switching the locale
>         isn't a fix.
>
>         Thanks,
>
>         Ben
>
>         On 17/06/2017 10:36 PM, Duncan Murdoch wrote:
>
>             On 17/06/2017 7:10 AM, Ben Marwick wrote:
>
>                 Recently I was trying to cite a package where the
>                 authors have ä
>                 and ø in their names. I found that on Windows the
>                 citation() function
>                 did not return the authors' names at all, but on Linux
>                 there was no
>                 problem (sessionInfos at the bottom):
>
>                 On Windows, no author names are returned:
>
>
>             I'm not seeing this.  You have fairly strange localization
>             settings; see
>             comments below.
>
>
>                 #---------------
>
>                  > citation("readr")
>
>                 To cite package ‘readr’ in publications use:
>
>                    (2017). readr: Read Rectangular Text Data. R package
>                 version 1.1.1.
>                    https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN
> .R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.co
> m%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db4
> 7%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQX
> UDbzD3E5EWRM%3D&reserved=0>
>
>                 A BibTeX entry for LaTeX users is
>
>                    @Manual{,
>                      title = {readr: Read Rectangular Text Data},
>                      year = {2017},
>                      note = {R package version 1.1.1},
>                      url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>                 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>                    }
>
>                 ATTENTION: This citation information has been
>                 auto-generated from the
>                 package DESCRIPTION file and may need manual editing, see
>                 ‘help("citation")’.
>                 #---------------
>
>                 On Linux we do see the author names:
>
>                 #---------------
>                  > citation("readr")
>
>                 To cite package ‘readr’ in publications use:
>
>                    Hadley Wickham, Jim Hester and Romain Francois
>                 (2017). readr:
>                    Read Rectangular Text Data. R package version 1.1.1.
>                    https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN
> .R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.co
> m%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db4
> 7%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQX
> UDbzD3E5EWRM%3D&reserved=0>
>
>                 A BibTeX entry for LaTeX users is
>
>                    @Manual{,
>                      title = {readr: Read Rectangular Text Data},
>                      author = {Hadley Wickham and Jim Hester and Romain
>                 Francois},
>                      year = {2017},
>                      note = {R package version 1.1.1},
>                      url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>                 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>                    }
>                 #---------------
>
>                 This appears to be an OS-dependent encoding issue. The
>                 citation function
>                 does not take an encoding argument, so it's not possible
>                 to set the
>                 encoding at the point where that function is used. The
>                 citation function
>                 working with the packageDescription function, which does
>                 have an
>                 encoding argument, but the default is not useful for
>                 Windows when there
>                 is an encoding set in the DESCRIPTION of the package (in
>                 this case
>                 UTF-8).
>
>                 We can set the encoding argument in packageDescription
>                 so it works in
>                 Windows to give the authors as expected, but it is very
>                 inconvenient to
>                 generate citations directly from the output of this
>                 function. So I'd
>                 like to propose a solution this problem by changing one
>                 line in the
>                 packageDescription function, like so, from:
>
>                 #---------------
>                 if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
>                 #---------------
>
>                 to:
>
>                 #---------------
>                 if ((missing(encoding) && Sys.getlocale("LC_CTYPE") ==
>                 "C") |
>                 unname(Sys.info()['sysname']) == "Windows")
>                 #---------------
>
>                 If I understand correctly, that will force
>                 ASCII//TRANSLIT encoding when
>                 DESCRIPTION files are read by packageDescription() on
>                 Windows machines.
>                 The upside is that Windows users will get the authors in
>                 the package
>                 citation, unlike the current situation. The downside is
>                 that the exotic
>                 symbols in the authors' names are replaced with common
>                 ones that are
>                 similar.
>
>                 I think getting the citations to easily include the
>                 authors' names is
>                 pretty important, even if their names have exotic
>                 characters, so this is
>                 worth fixing. Is this edit to packageDescription the
>                 best way to solve
>                 this problem of exotic characters preventing the
>                 authors' names from
>                 showing on Windows?
>
>                 thanks,
>
>                 Ben
>
>
>
>
>                 Windows sessionInfo
>
>                 #---------------
>                  > sessionInfo()
>                 R version 3.4.0 Patched (2017-05-10 r72670)
>                 Platform: x86_64-w64-mingw32/x64 (64-bit)
>                 Running under: Windows 7 x64 (build 7601) Service Pack
> 1
>
>                 Matrix products: default
>
>                 locale:
>                 [1] LC_COLLATE=English_Australia.1252
>                 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of
>                 China.936
>                 [3] LC_MONETARY=English_Australia.1252
>                 [4] LC_NUMERIC=C
>                 [5] LC_TIME=English_Australia.1252
>
>
>             I don't know what English_Australia.1252 does that's
>             different from what
>             I use (English_Canada.1252), but the Chinese locale setting
>             could cause
>             trouble.  Could you try setting this (presumably in the
>             Windows control
>             panel) to be consistent?  You're using a much simpler
>             setting on Linux.
>
>             Duncan Murdoch
>
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                 methods   base
>
>                 loaded via a namespace (and not attached):
>                   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
>                 tools_3.4.0
>                   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11
>                  knitr_1.16
>                 rlang_0.1.1
>                 [11] fortunes_1.5-4
>                 #---------------
>
>                 Linux sessionInfo:
>
>                 #---------------
>                  > sessionInfo()
>                 R version 3.3.1 (2016-06-21)
>                 Platform: x86_64-pc-linux-gnu (64-bit)
>                 Running under: Ubuntu 16.10
>
>                 locale:
>                   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>                   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>                   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>                   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>                   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>                 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                 methods   base
>
>                 loaded via a namespace (and not attached):
>                 [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
>                 #---------------
>
>                 ______________________________________________
>                 [hidden email] <mailto:[hidden email]>
>                 mailing list
>                 https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEUZ8f7wasDh9RQ7p2DK8%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micro
> soft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztE
> UZ8f7wasDh9RQ7p2DK8%3D&reserved=0>
>
>
>
>         ______________________________________________
>         [hidden email] <mailto:[hidden email]> mailing list
>         https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEUZ8f7wasDh9RQ7p2DK8%3D&reserved=0
>        
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micro
> soft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztE
> UZ8f7wasDh9RQ7p2DK8%3D&reserved=0>
>
>

______________________________________________
[hidden email] mailing list
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEUZ8f7wasDh9RQ7p2DK8%3D&reserved=0
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

R devel mailing list
The following patch is not the most elegant, but it restores the Authors when "LC_CTYPE" is set to either "Chinese" or "Arabic":

> Sys.setlocale("LC_CTYPE", "Chinese")
[1] "Chinese (Simplified)_China.936"
> citation("readr")

To cite package ‘readr’ in publications use:

  (2016). readr: Read Tabular Data. R package version 1.0.0.
  https://CRAN.R-project.org/package=readr

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {readr: Read Tabular Data},
    year = {2016},
    note = {R package version 1.0.0},
    url = {https://CRAN.R-project.org/package=readr},
  }

ATTENTION: This citation information has been auto-generated from the
package DESCRIPTION file and may need manual editing, see
‘help("citation")’.

> Sys.setlocale("LC_CTYPE", "Arabic")
[1] "Arabic_Saudi Arabia.1256"
> citation("readr")

To cite package ‘readr’ in publications use:

  (2016). readr: Read Tabular Data. R package version 1.0.0.
  https://CRAN.R-project.org/package=readr

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {readr: Read Tabular Data},
    year = {2016},
    note = {R package version 1.0.0},
    url = {https://CRAN.R-project.org/package=readr},
  }

ATTENTION: This citation information has been auto-generated from the
package DESCRIPTION file and may need manual editing, see
‘help("citation")’.

> citation <- newCitation
> citation("readr")

To cite package ‘readr’ in publications use:

  Hadley Wickham, Jim Hester and Romain Francois (2016). readr: Read
  Tabular Data. R package version 1.0.0.
  https://CRAN.R-project.org/package=readr

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {readr: Read Tabular Data},
    author = {Hadley Wickham and Jim Hester and Romain Francois},
    year = {2016},
    note = {R package version 1.0.0},
    url = {https://CRAN.R-project.org/package=readr},
  }



The patch is:

Index: citation.R
===================================================================
--- citation.R (revision 72852)
+++ citation.R (working copy)
@@ -1162,8 +1162,11 @@
         if(dir == "")
             stop(gettextf("package %s not found", sQuote(package)),
                  domain = NA)
-        meta <- packageDescription(pkg = package,
-                                   lib.loc = dirname(dir))
+    args <- list(pkg = package, lib.loc = dirname(dir))
+    if (!is.na(enc <- packageDescription(pkg = package, lib.loc=dirname(dir), field="Encoding")))
+    args$enc <- enc
+        meta <- do.call("packageDescription", args=args)
+
         ## if(is.null(auto)): Use default auto-citation if no CITATION
         ## available.
         citfile <- file.path(dir, "CITATION")


Nathan says he can look into this further next week...

Cheers,

Rich Calaway
Microsoft R Product Team
24/1341
+1 (425) 4219919 X19919

-----Original Message-----
From: R-devel [mailto:[hidden email]] On Behalf Of Nathan Sosnovske via R-devel
Sent: Friday, June 23, 2017 7:36 AM
To: Duncan Murdoch <[hidden email]>; Andrie de Vries <[hidden email]>
Cc: [hidden email]; Ben Marwick <[hidden email]>
Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

Hi Duncan,

I'm guessing I'll be able to look at this over the weekend/next week (probably closer to next week). It is on my list of things to do and I've just had a few other prior commitments that I have to finish first.

Sorry for the delay. I'll chime in with a status update next week.

Nathan

-----Original Message-----
From: R-devel [mailto:[hidden email]] On Behalf Of Duncan Murdoch
Sent: Friday, June 23, 2017 5:16 AM
To: Andrie de Vries <[hidden email]>
Cc: [hidden email]; Ben Marwick <[hidden email]>
Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

On 18/06/2017 5:57 AM, Andrie de Vries wrote:
> Hi, Duncan
>
> i have forwarded this thread to Nathan, who promised to look into it.

Any progress on this?

Duncan Murdoch

>
> Andrie
>
> On 17 Jun 2017 17:26, "Duncan Murdoch" <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 17/06/2017 9:13 AM, Ben Marwick wrote:
>
>         Hi Duncan,
>
>         Thanks for your reply. Yes, it does seem to be specific to the CTYPE
>         setting to Chinese on Windows. If I set it to English using
>         Sys.setlocale() there is no problem, then back to Chinese and the
>         authors disappear:
>
>         Sys.setlocale("LC_ALL","English")
>         citation("readr")
>
>
>     Thanks, that makes the problem reproducible.  I'll submit it as a
>     bug report.  Maybe someone from Microsoft will fix it.
>
>     Duncan Murdoch
>
>
>         #' To cite package ‘readr’ in publications use:
>         #'
>         #'   Hadley Wickham, Jim Hester and Romain Francois (2017).
>         readr: Read
>         #' Rectangular Text Data. R package version 1.1.1.
>         #' https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>
>         #'
>         #' A BibTeX entry for LaTeX users is
>         #'
>         #' @Manual{,
>         #'   title = {readr: Read Rectangular Text Data},
>         #'   author = {Hadley Wickham and Jim Hester and Romain Francois},
>         #'   year = {2017},
>         #'   note = {R package version 1.1.1},
>         #'   url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>         #' }
>
>
>         Sys.setlocale("LC_CTYPE", "Chinese")
>         citation("readr")
>
>         #'
>         #' To cite package ‘readr’ in publications use:
>         #'
>         #'   (2017). readr: Read Rectangular Text Data. R package
>         version 1.1.1.
>         #' https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>
>         #'
>         #' A BibTeX entry for LaTeX users is
>         #'
>         #' @Manual{,
>         #'   title = {readr: Read Rectangular Text Data},
>         #'   year = {2017},
>         #'   note = {R package version 1.1.1},
>         #'   url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>         #' }
>         #'
>         #' ATTENTION: This citation information has been auto-generated
>         from the
>         #' package DESCRIPTION file and may need manual editing, see
>         #' ‘help("citation")’.
>
>         Where do we go from here? I do want to use the Chinese locale
>         with R on
>         Windows (and perhaps others do too), so switching the locale
>         isn't a fix.
>
>         Thanks,
>
>         Ben
>
>         On 17/06/2017 10:36 PM, Duncan Murdoch wrote:
>
>             On 17/06/2017 7:10 AM, Ben Marwick wrote:
>
>                 Recently I was trying to cite a package where the
>                 authors have ä
>                 and ø in their names. I found that on Windows the
>                 citation() function
>                 did not return the authors' names at all, but on Linux
>                 there was no
>                 problem (sessionInfos at the bottom):
>
>                 On Windows, no author names are returned:
>
>
>             I'm not seeing this.  You have fairly strange localization
>             settings; see
>             comments below.
>
>
>                 #---------------
>
>                  > citation("readr")
>
>                 To cite package ‘readr’ in publications use:
>
>                    (2017). readr: Read Rectangular Text Data. R package
>                 version 1.1.1.
>                    
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.
> R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com
> %7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47
> %7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXU
> DbzD3E5EWRM%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN
> .R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.co
> m%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db4
> 7%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQX
> UDbzD3E5EWRM%3D&reserved=0>
>
>                 A BibTeX entry for LaTeX users is
>
>                    @Manual{,
>                      title = {readr: Read Rectangular Text Data},
>                      year = {2017},
>                      note = {R package version 1.1.1},
>                      url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>                 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>                    }
>
>                 ATTENTION: This citation information has been
>                 auto-generated from the
>                 package DESCRIPTION file and may need manual editing, see
>                 ‘help("citation")’.
>                 #---------------
>
>                 On Linux we do see the author names:
>
>                 #---------------
>                  > citation("readr")
>
>                 To cite package ‘readr’ in publications use:
>
>                    Hadley Wickham, Jim Hester and Romain Francois
>                 (2017). readr:
>                    Read Rectangular Text Data. R package version 1.1.1.
>                    
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.
> R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com
> %7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47
> %7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXU
> DbzD3E5EWRM%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN
> .R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.co
> m%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db4
> 7%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQX
> UDbzD3E5EWRM%3D&reserved=0>
>
>                 A BibTeX entry for LaTeX users is
>
>                    @Manual{,
>                      title = {readr: Read Rectangular Text Data},
>                      author = {Hadley Wickham and Jim Hester and Romain
>                 Francois},
>                      year = {2017},
>                      note = {R package version 1.1.1},
>                      url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>                 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>                    }
>                 #---------------
>
>                 This appears to be an OS-dependent encoding issue. The
>                 citation function
>                 does not take an encoding argument, so it's not possible
>                 to set the
>                 encoding at the point where that function is used. The
>                 citation function
>                 working with the packageDescription function, which does
>                 have an
>                 encoding argument, but the default is not useful for
>                 Windows when there
>                 is an encoding set in the DESCRIPTION of the package (in
>                 this case
>                 UTF-8).
>
>                 We can set the encoding argument in packageDescription
>                 so it works in
>                 Windows to give the authors as expected, but it is very
>                 inconvenient to
>                 generate citations directly from the output of this
>                 function. So I'd
>                 like to propose a solution this problem by changing one
>                 line in the
>                 packageDescription function, like so, from:
>
>                 #---------------
>                 if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
>                 #---------------
>
>                 to:
>
>                 #---------------
>                 if ((missing(encoding) && Sys.getlocale("LC_CTYPE") ==
>                 "C") |
>                 unname(Sys.info()['sysname']) == "Windows")
>                 #---------------
>
>                 If I understand correctly, that will force
>                 ASCII//TRANSLIT encoding when
>                 DESCRIPTION files are read by packageDescription() on
>                 Windows machines.
>                 The upside is that Windows users will get the authors in
>                 the package
>                 citation, unlike the current situation. The downside is
>                 that the exotic
>                 symbols in the authors' names are replaced with common
>                 ones that are
>                 similar.
>
>                 I think getting the citations to easily include the
>                 authors' names is
>                 pretty important, even if their names have exotic
>                 characters, so this is
>                 worth fixing. Is this edit to packageDescription the
>                 best way to solve
>                 this problem of exotic characters preventing the
>                 authors' names from
>                 showing on Windows?
>
>                 thanks,
>
>                 Ben
>
>
>
>
>                 Windows sessionInfo
>
>                 #---------------
>                  > sessionInfo()
>                 R version 3.4.0 Patched (2017-05-10 r72670)
>                 Platform: x86_64-w64-mingw32/x64 (64-bit)
>                 Running under: Windows 7 x64 (build 7601) Service Pack
> 1
>
>                 Matrix products: default
>
>                 locale:
>                 [1] LC_COLLATE=English_Australia.1252
>                 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of
>                 China.936
>                 [3] LC_MONETARY=English_Australia.1252
>                 [4] LC_NUMERIC=C
>                 [5] LC_TIME=English_Australia.1252
>
>
>             I don't know what English_Australia.1252 does that's
>             different from what
>             I use (English_Canada.1252), but the Chinese locale setting
>             could cause
>             trouble.  Could you try setting this (presumably in the
>             Windows control
>             panel) to be consistent?  You're using a much simpler
>             setting on Linux.
>
>             Duncan Murdoch
>
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                 methods   base
>
>                 loaded via a namespace (and not attached):
>                   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
>                 tools_3.4.0
>                   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11
>                  knitr_1.16
>                 rlang_0.1.1
>                 [11] fortunes_1.5-4
>                 #---------------
>
>                 Linux sessionInfo:
>
>                 #---------------
>                  > sessionInfo()
>                 R version 3.3.1 (2016-06-21)
>                 Platform: x86_64-pc-linux-gnu (64-bit)
>                 Running under: Ubuntu 16.10
>
>                 locale:
>                   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>                   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>                   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>                   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>                   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>                 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                 methods   base
>
>                 loaded via a namespace (and not attached):
>                 [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
>                 #---------------
>
>                 ______________________________________________
>                 [hidden email] <mailto:[hidden email]>
>                 mailing list
>                
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.
> ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micros
> oft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd
> 011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEU
> Z8f7wasDh9RQ7p2DK8%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micro
> soft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztE
> UZ8f7wasDh9RQ7p2DK8%3D&reserved=0>
>
>
>
>         ______________________________________________
>         [hidden email] <mailto:[hidden email]> mailing list
>        
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.
> ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micros
> oft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd
> 011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEU
> Z8f7wasDh9RQ7p2DK8%3D&reserved=0
>        
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micro
> soft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztE
> UZ8f7wasDh9RQ7p2DK8%3D&reserved=0>
>
>

______________________________________________
[hidden email] mailing list
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEUZ8f7wasDh9RQ7p2DK8%3D&reserved=0
______________________________________________
[hidden email] mailing list
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Crichcala%40microsoft.com%7Cb22a180ce5364536e2fb08d4ba452c57%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338253654457398&sdata=8lLgE%2FbFCc3YImfDsYBuNSLhHB15giroe7rJwe%2F66UE%3D&reserved=0
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

R devel mailing list
I'd be curious to know what others think of Rich's patch. If it is acceptable, I can spend time that I was going to look at it this week on another bug.

-----Original Message-----
From: Rich Calaway
Sent: Friday, June 23, 2017 6:34 PM
To: Nathan Sosnovske <[hidden email]>; Duncan Murdoch <[hidden email]>; Andrie de Vries <[hidden email]>
Cc: Ben Marwick <[hidden email]>; R-devel Mailing List ([hidden email]) <[hidden email]>
Subject: RE: [Rd] suggestion to fix packageDescription() for Windows users

The following patch is not the most elegant, but it restores the Authors when "LC_CTYPE" is set to either "Chinese" or "Arabic":

> Sys.setlocale("LC_CTYPE", "Chinese")
[1] "Chinese (Simplified)_China.936"
> citation("readr")

To cite package ‘readr’ in publications use:

  (2016). readr: Read Tabular Data. R package version 1.0.0.
  https://CRAN.R-project.org/package=readr

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {readr: Read Tabular Data},
    year = {2016},
    note = {R package version 1.0.0},
    url = {https://CRAN.R-project.org/package=readr},
  }

ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help("citation")’.

> Sys.setlocale("LC_CTYPE", "Arabic")
[1] "Arabic_Saudi Arabia.1256"
> citation("readr")

To cite package ‘readr’ in publications use:

  (2016). readr: Read Tabular Data. R package version 1.0.0.
  https://CRAN.R-project.org/package=readr

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {readr: Read Tabular Data},
    year = {2016},
    note = {R package version 1.0.0},
    url = {https://CRAN.R-project.org/package=readr},
  }

ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help("citation")’.

> citation <- newCitation
> citation("readr")

To cite package ‘readr’ in publications use:

  Hadley Wickham, Jim Hester and Romain Francois (2016). readr: Read
  Tabular Data. R package version 1.0.0.
  https://CRAN.R-project.org/package=readr

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {readr: Read Tabular Data},
    author = {Hadley Wickham and Jim Hester and Romain Francois},
    year = {2016},
    note = {R package version 1.0.0},
    url = {https://CRAN.R-project.org/package=readr},
  }



The patch is:

Index: citation.R
===================================================================
--- citation.R (revision 72852)
+++ citation.R (working copy)
@@ -1162,8 +1162,11 @@
         if(dir == "")
             stop(gettextf("package %s not found", sQuote(package)),
                  domain = NA)
-        meta <- packageDescription(pkg = package,
-                                   lib.loc = dirname(dir))
+    args <- list(pkg = package, lib.loc = dirname(dir))
+    if (!is.na(enc <- packageDescription(pkg = package, lib.loc=dirname(dir), field="Encoding")))
+    args$enc <- enc
+        meta <- do.call("packageDescription", args=args)
+
         ## if(is.null(auto)): Use default auto-citation if no CITATION
         ## available.
         citfile <- file.path(dir, "CITATION")


Nathan says he can look into this further next week...

Cheers,

Rich Calaway
Microsoft R Product Team
24/1341
+1 (425) 4219919 X19919

-----Original Message-----
From: R-devel [mailto:[hidden email]] On Behalf Of Nathan Sosnovske via R-devel
Sent: Friday, June 23, 2017 7:36 AM
To: Duncan Murdoch <[hidden email]>; Andrie de Vries <[hidden email]>
Cc: [hidden email]; Ben Marwick <[hidden email]>
Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

Hi Duncan,

I'm guessing I'll be able to look at this over the weekend/next week (probably closer to next week). It is on my list of things to do and I've just had a few other prior commitments that I have to finish first.

Sorry for the delay. I'll chime in with a status update next week.

Nathan

-----Original Message-----
From: R-devel [mailto:[hidden email]] On Behalf Of Duncan Murdoch
Sent: Friday, June 23, 2017 5:16 AM
To: Andrie de Vries <[hidden email]>
Cc: [hidden email]; Ben Marwick <[hidden email]>
Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

On 18/06/2017 5:57 AM, Andrie de Vries wrote:
> Hi, Duncan
>
> i have forwarded this thread to Nathan, who promised to look into it.

Any progress on this?

Duncan Murdoch

>
> Andrie
>
> On 17 Jun 2017 17:26, "Duncan Murdoch" <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 17/06/2017 9:13 AM, Ben Marwick wrote:
>
>         Hi Duncan,
>
>         Thanks for your reply. Yes, it does seem to be specific to the CTYPE
>         setting to Chinese on Windows. If I set it to English using
>         Sys.setlocale() there is no problem, then back to Chinese and the
>         authors disappear:
>
>         Sys.setlocale("LC_ALL","English")
>         citation("readr")
>
>
>     Thanks, that makes the problem reproducible.  I'll submit it as a
>     bug report.  Maybe someone from Microsoft will fix it.
>
>     Duncan Murdoch
>
>
>         #' To cite package ‘readr’ in publications use:
>         #'
>         #'   Hadley Wickham, Jim Hester and Romain Francois (2017).
>         readr: Read
>         #' Rectangular Text Data. R package version 1.1.1.
>         #' https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>
>         #'
>         #' A BibTeX entry for LaTeX users is
>         #'
>         #' @Manual{,
>         #'   title = {readr: Read Rectangular Text Data},
>         #'   author = {Hadley Wickham and Jim Hester and Romain Francois},
>         #'   year = {2017},
>         #'   note = {R package version 1.1.1},
>         #'   url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>         #' }
>
>
>         Sys.setlocale("LC_CTYPE", "Chinese")
>         citation("readr")
>
>         #'
>         #' To cite package ‘readr’ in publications use:
>         #'
>         #'   (2017). readr: Read Rectangular Text Data. R package
>         version 1.1.1.
>         #' https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>
>         #'
>         #' A BibTeX entry for LaTeX users is
>         #'
>         #' @Manual{,
>         #'   title = {readr: Read Rectangular Text Data},
>         #'   year = {2017},
>         #'   note = {R package version 1.1.1},
>         #'   url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>         <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>         #' }
>         #'
>         #' ATTENTION: This citation information has been auto-generated
>         from the
>         #' package DESCRIPTION file and may need manual editing, see
>         #' ‘help("citation")’.
>
>         Where do we go from here? I do want to use the Chinese locale
>         with R on
>         Windows (and perhaps others do too), so switching the locale
>         isn't a fix.
>
>         Thanks,
>
>         Ben
>
>         On 17/06/2017 10:36 PM, Duncan Murdoch wrote:
>
>             On 17/06/2017 7:10 AM, Ben Marwick wrote:
>
>                 Recently I was trying to cite a package where the
>                 authors have ä
>                 and ø in their names. I found that on Windows the
>                 citation() function
>                 did not return the authors' names at all, but on Linux
>                 there was no
>                 problem (sessionInfos at the bottom):
>
>                 On Windows, no author names are returned:
>
>
>             I'm not seeing this.  You have fairly strange localization
>             settings; see
>             comments below.
>
>
>                 #---------------
>
>                  > citation("readr")
>
>                 To cite package ‘readr’ in publications use:
>
>                    (2017). readr: Read Rectangular Text Data. R package
>                 version 1.1.1.
>                    
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.
> R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com
> %7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47
> %7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXU
> DbzD3E5EWRM%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN
> .R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.co
> m%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db4
> 7%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQX
> UDbzD3E5EWRM%3D&reserved=0>
>
>                 A BibTeX entry for LaTeX users is
>
>                    @Manual{,
>                      title = {readr: Read Rectangular Text Data},
>                      year = {2017},
>                      note = {R package version 1.1.1},
>                      url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>                 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>                    }
>
>                 ATTENTION: This citation information has been
>                 auto-generated from the
>                 package DESCRIPTION file and may need manual editing, see
>                 ‘help("citation")’.
>                 #---------------
>
>                 On Linux we do see the author names:
>
>                 #---------------
>                  > citation("readr")
>
>                 To cite package ‘readr’ in publications use:
>
>                    Hadley Wickham, Jim Hester and Romain Francois
>                 (2017). readr:
>                    Read Rectangular Text Data. R package version 1.1.1.
>                    
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.
> R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com
> %7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47
> %7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXU
> DbzD3E5EWRM%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN
> .R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.co
> m%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db4
> 7%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQX
> UDbzD3E5EWRM%3D&reserved=0>
>
>                 A BibTeX entry for LaTeX users is
>
>                    @Manual{,
>                      title = {readr: Read Rectangular Text Data},
>                      author = {Hadley Wickham and Jim Hester and Romain
>                 Francois},
>                      year = {2017},
>                      note = {R package version 1.1.1},
>                      url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0
>                 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=SnbAtaLB%2BUtjGEZkTXwTqnaJtiF3jQXUDbzD3E5EWRM%3D&reserved=0>},
>                    }
>                 #---------------
>
>                 This appears to be an OS-dependent encoding issue. The
>                 citation function
>                 does not take an encoding argument, so it's not possible
>                 to set the
>                 encoding at the point where that function is used. The
>                 citation function
>                 working with the packageDescription function, which does
>                 have an
>                 encoding argument, but the default is not useful for
>                 Windows when there
>                 is an encoding set in the DESCRIPTION of the package (in
>                 this case
>                 UTF-8).
>
>                 We can set the encoding argument in packageDescription
>                 so it works in
>                 Windows to give the authors as expected, but it is very
>                 inconvenient to
>                 generate citations directly from the output of this
>                 function. So I'd
>                 like to propose a solution this problem by changing one
>                 line in the
>                 packageDescription function, like so, from:
>
>                 #---------------
>                 if (missing(encoding) && Sys.getlocale("LC_CTYPE") == "C")
>                 #---------------
>
>                 to:
>
>                 #---------------
>                 if ((missing(encoding) && Sys.getlocale("LC_CTYPE") ==
>                 "C") |
>                 unname(Sys.info()['sysname']) == "Windows")
>                 #---------------
>
>                 If I understand correctly, that will force
>                 ASCII//TRANSLIT encoding when
>                 DESCRIPTION files are read by packageDescription() on
>                 Windows machines.
>                 The upside is that Windows users will get the authors in
>                 the package
>                 citation, unlike the current situation. The downside is
>                 that the exotic
>                 symbols in the authors' names are replaced with common
>                 ones that are
>                 similar.
>
>                 I think getting the citations to easily include the
>                 authors' names is
>                 pretty important, even if their names have exotic
>                 characters, so this is
>                 worth fixing. Is this edit to packageDescription the
>                 best way to solve
>                 this problem of exotic characters preventing the
>                 authors' names from
>                 showing on Windows?
>
>                 thanks,
>
>                 Ben
>
>
>
>
>                 Windows sessionInfo
>
>                 #---------------
>                  > sessionInfo()
>                 R version 3.4.0 Patched (2017-05-10 r72670)
>                 Platform: x86_64-w64-mingw32/x64 (64-bit)
>                 Running under: Windows 7 x64 (build 7601) Service Pack
> 1
>
>                 Matrix products: default
>
>                 locale:
>                 [1] LC_COLLATE=English_Australia.1252
>                 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of
>                 China.936
>                 [3] LC_MONETARY=English_Australia.1252
>                 [4] LC_NUMERIC=C
>                 [5] LC_TIME=English_Australia.1252
>
>
>             I don't know what English_Australia.1252 does that's
>             different from what
>             I use (English_Canada.1252), but the Chinese locale setting
>             could cause
>             trouble.  Could you try setting this (presumably in the
>             Windows control
>             panel) to be consistent?  You're using a much simpler
>             setting on Linux.
>
>             Duncan Murdoch
>
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                 methods   base
>
>                 loaded via a namespace (and not attached):
>                   [1] readr_1.1.1    compiler_3.4.0 R6_2.2.1       hms_0.3
>                 tools_3.4.0
>                   [6] tibble_1.3.3   yaml_2.1.14    Rcpp_0.12.11
>                  knitr_1.16
>                 rlang_0.1.1
>                 [11] fortunes_1.5-4
>                 #---------------
>
>                 Linux sessionInfo:
>
>                 #---------------
>                  > sessionInfo()
>                 R version 3.3.1 (2016-06-21)
>                 Platform: x86_64-pc-linux-gnu (64-bit)
>                 Running under: Ubuntu 16.10
>
>                 locale:
>                   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>                   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>                   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>                   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>                   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>                 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                 methods   base
>
>                 loaded via a namespace (and not attached):
>                 [1] tools_3.3.1 yaml_2.1.14 knitr_1.16
>                 #---------------
>
>                 ______________________________________________
>                 [hidden email] <mailto:[hidden email]>
>                 mailing list
>                
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.
> ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micros
> oft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd
> 011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEU
> Z8f7wasDh9RQ7p2DK8%3D&reserved=0
>                
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micro
> soft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztE
> UZ8f7wasDh9RQ7p2DK8%3D&reserved=0>
>
>
>
>         ______________________________________________
>         [hidden email] <mailto:[hidden email]> mailing
> list
>        
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.
> ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micros
> oft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd
> 011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEU
> Z8f7wasDh9RQ7p2DK8%3D&reserved=0
>        
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40micro
> soft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztE
> UZ8f7wasDh9RQ7p2DK8%3D&reserved=0>
>
>

______________________________________________
[hidden email] mailing list
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Cnsosnov%40microsoft.com%7Ccf07101e770643227da008d4ba31aa85%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338169899157582&sdata=8o%2FDxEx7SSzw9rTkmz0h5ztEUZ8f7wasDh9RQ7p2DK8%3D&reserved=0
______________________________________________
[hidden email] mailing list
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=02%7C01%7Crichcala%40microsoft.com%7Cb22a180ce5364536e2fb08d4ba452c57%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636338253654457398&sdata=8lLgE%2FbFCc3YImfDsYBuNSLhHB15giroe7rJwe%2F66UE%3D&reserved=0
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

Martin Maechler
>>>>> Nathan Sosnovske via R-devel <[hidden email]>
>>>>>     on Mon, 26 Jun 2017 18:22:25 +0000 writes:

    > I'd be curious to know what others think of Rich's
    > patch. If it is acceptable, I can spend time that I was
    > going to look at it this week on another bug.

It is a bit kludgy (*) of course, but I confirm it solves the
problem in a "robust" way.

*) Of course I'd hoped you'd find why the underlying
packageDescription() function is not "getting the right thing" in this
case directly -- in Windows only in some locales -- and provide a
Windows-only patch for the underlying problem there, rather than
the workaround patch in citation().
The patch does solve the problem at hand, alright, so thank you,
Rich and Nathan!

Note that Duncan Murdoch did mention in this thread to file
an official bug report and Ben Marwick gave the URL

> From: Ben Marwick ...
> Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users
> Date: Sun, 18 Jun 2017 08:34:56 +1000

> Thanks very much, I see your bug report here:
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17291

so ideally almost all of this follow up should have happened there.
I have followed up there and also there attached a Windows-only
+ commented version of Rich's patch. As mentioned, I've tested
it and confirmed to work for the use case in the mean time, so
plan to commit soon.

This will be too late for the release of R 3.4.1 tomorrow,
of course [code freeze was on June 23].

Martin Maechler
ETH Zurich

    > -----Original Message-----
    > From: Rich Calaway
    > Sent: Friday, June 23, 2017 6:34 PM
    > To: Nathan Sosnovske <[hidden email]>; Duncan Murdoch <[hidden email]>; Andrie de Vries <[hidden email]>
    > Cc: Ben Marwick <[hidden email]>; R-devel Mailing List ([hidden email]) <[hidden email]>
    > Subject: RE: [Rd] suggestion to fix packageDescription() for Windows users

    > The following patch is not the most elegant, but it restores the Authors when "LC_CTYPE" is set to either "Chinese" or "Arabic":

    >> Sys.setlocale("LC_CTYPE", "Chinese")
    > [1] "Chinese (Simplified)_China.936"
    >> citation("readr")

    > To cite package ‘readr’ in publications use:

    > (2016). readr: Read Tabular Data. R package version 1.0.0.
    > https://CRAN.R-project.org/package=readr

    > A BibTeX entry for LaTeX users is

    > @Manual{,
    > title = {readr: Read Tabular Data},
    > year = {2016},
    > note = {R package version 1.0.0},
    > url = {https://CRAN.R-project.org/package=readr},
    > }

    > ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help("citation")’.

    >> Sys.setlocale("LC_CTYPE", "Arabic")
    > [1] "Arabic_Saudi Arabia.1256"
    >> citation("readr")

    > To cite package ‘readr’ in publications use:

    > (2016). readr: Read Tabular Data. R package version 1.0.0.
    > https://CRAN.R-project.org/package=readr

    > A BibTeX entry for LaTeX users is

    > @Manual{,
    > title = {readr: Read Tabular Data},
    > year = {2016},
    > note = {R package version 1.0.0},
    > url = {https://CRAN.R-project.org/package=readr},
    > }

    > ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help("citation")’.

    >> citation <- newCitation
    >> citation("readr")

    > To cite package ‘readr’ in publications use:

    > Hadley Wickham, Jim Hester and Romain Francois (2016). readr: Read
    > Tabular Data. R package version 1.0.0.
    > https://CRAN.R-project.org/package=readr

    > A BibTeX entry for LaTeX users is

    > @Manual{,
    > title = {readr: Read Tabular Data},
    > author = {Hadley Wickham and Jim Hester and Romain Francois},
    > year = {2016},
    > note = {R package version 1.0.0},
    > url = {https://CRAN.R-project.org/package=readr},
    > }



    > The patch is:

    > Index: citation.R
    > ===================================================================
    > --- citation.R (revision 72852)
    > +++ citation.R (working copy)
    > @@ -1162,8 +1162,11 @@
    > if(dir == "")
    > stop(gettextf("package %s not found", sQuote(package)),
    > domain = NA)
    > -        meta <- packageDescription(pkg = package,
    > -                                   lib.loc = dirname(dir))
    > +    args <- list(pkg = package, lib.loc = dirname(dir))
    > +    if (!is.na(enc <- packageDescription(pkg = package, lib.loc=dirname(dir), field="Encoding")))
    > +    args$enc <- enc
    > +        meta <- do.call("packageDescription", args=args)
    > +
    > ## if(is.null(auto)): Use default auto-citation if no CITATION
    > ## available.
    > citfile <- file.path(dir, "CITATION")


    > Nathan says he can look into this further next week...

    > Cheers,

    > Rich Calaway
    > Microsoft R Product Team
    > 24/1341
    > +1 (425) 4219919 X19919

    > -----Original Message-----
    > From: R-devel [mailto:[hidden email]] On Behalf Of Nathan Sosnovske via R-devel
    > Sent: Friday, June 23, 2017 7:36 AM
    > To: Duncan Murdoch <[hidden email]>; Andrie de Vries <[hidden email]>
    > Cc: [hidden email]; Ben Marwick <[hidden email]>
    > Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

    > Hi Duncan,

    > I'm guessing I'll be able to look at this over the weekend/next week (probably closer to next week). It is on my list of things to do and I've just had a few other prior commitments that I have to finish first.

    > Sorry for the delay. I'll chime in with a status update next week.

    > Nathan

    > -----Original Message-----
    > From: R-devel [mailto:[hidden email]] On Behalf Of Duncan Murdoch
    > Sent: Friday, June 23, 2017 5:16 AM
    > To: Andrie de Vries <[hidden email]>
    > Cc: [hidden email]; Ben Marwick <[hidden email]>
    > Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

    > On 18/06/2017 5:57 AM, Andrie de Vries wrote:
    >> Hi, Duncan
    >>
    >> i have forwarded this thread to Nathan, who promised to look into it.

    > Any progress on this?

    > Duncan Murdoch

    >>
    >> Andrie
    >>
    >> On 17 Jun 2017 17:26, "Duncan Murdoch" <[hidden email]
    >> <mailto:[hidden email]>> wrote:
    >>
    >> On 17/06/2017 9:13 AM, Ben Marwick wrote:
    >>
    >> Hi Duncan,
    >>
    >> Thanks for your reply. Yes, it does seem to be specific to the CTYPE
    >> setting to Chinese on Windows. If I set it to English using
    >> Sys.setlocale() there is no problem, then back to Chinese and the
    >> authors disappear:
    >>
    >> Sys.setlocale("LC_ALL","English")
    >> citation("readr")
    >>
    >>
    >> Thanks, that makes the problem reproducible.  I'll submit it as a
    >> bug report.  Maybe someone from Microsoft will fix it.
    >>
    >> Duncan Murdoch

    [.........]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Windows iconv() "failure" in certain locales

Martin Maechler
This is a continuation of the R-devel thread with subject
 "suggestion to fix packageDescription() for Windows users" :

As I said there, a patch should rather address the underlying
problem in packageDescription rather than a kludgy workaround
patch for  citation().
(For that same reason, Ben Marwick proposed to fix
 packageDescription() rather than the symptom seen in citation().)

It's not hard to see that the problem is that  iconv() in
Windows does not always succeed to translate from "UTF-8" to the
"current locale", in the case mentioned there.

I'm giving some easier reproducible examples:  no need to install
half of tidyverse just to get citation("readr") :

> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> Encoding(x1) <- "latin1"
> xU <- iconv(x1, "latin1", "UTF-8")

> Sys.setlocale("LC_CTYPE", "Chinese")
[1] "Chinese (Simplified)_People's Republic of China.936"
>
> iconv(x1, "latin1", "") # NA NA NA
[1] NA NA NA
> iconv(xU, "UTF-8", "") # NA NA NA
[1] NA NA NA
> iconv(xU, "UTF-8", "//TRANSLIT")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "", sub = "byte")
[1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Z¨¹rcher"


> Sys.setlocale("LC_CTYPE", "Arabic")
[1] "Arabic_Saudi Arabia.1256"
> iconv(x1, "latin1", "")  # NA NA NA
[1] NA NA NA
> iconv(xU, "UTF-8", "")  # NA NA NA
[1] NA NA NA
> iconv(xU, "UTF-8", "//TRANSLIT")
[1] "Ekstr\370m"         "J\366reskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "", sub="byte")
[1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Zürcher"
> iconv(xU, "UTF-8", "", sub="?")
[1] "Ekstr??m"         "J??reskog"        "bi??chen Zürcher"

Etc... .  As the above is typically garbled between e-mail
transfer agents, I append both the iconv-Windows.R R script and
the corresponding iconv-Windows.Rout  R transcript to this
e-mail (using MIME type text/plain (easy using emacs for mail..)),
and they contain a bit more than the above.

Note that the above shows that using 'sub = *' and using
"//TRANSLIT" in case of a previous NA  result helps quite a bit,
in the sense that it gives much more information to see
  "J?reskog"  instead   NA.

I'm considering updating  packageDescription() to try these in
case it first returns NA.   This would make the citation() hack
unnecessary.

Martin


#### iconv() behavior depending on Locales  LC_CTYPE  in Windows
#### =======                       ==============================
###
### In a *shell* in Windows (emacs), after doing R.home() in R, use that to do something like
###   c:/PROGRA~1/R/R-devel/bin/R CMD BATCH iconv-Windows.R
###   ^^^^^^^^^^^^^^^^^^^^^^^^^^= === ===== ===============  ==> producing  iconv-Windows.Rout
###
sessionInfo() ## does not matter so much
## -- should be Windows to exhibit the problems

## From  help(iconv) 's  example : Using "latin1" European language letters:
x1 <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Encoding(x1) <- "latin1"
xU <- iconv(x1, "latin1", "UTF-8")


## 2 locales that do not work well : ---------------------------------
Sys.setlocale("LC_CTYPE", "Chinese")

iconv(x1, "latin1", "") # NA NA NA
iconv(x1, "latin1", "//TRANSLIT") # perfect for Chinese
iconv(x1, "latin1", "", sub = "byte")
iconv(xU, "UTF-8", "") # NA NA NA
iconv(xU, "UTF-8", "//TRANSLIT")
iconv(xU, "UTF-8", "", sub = "byte")
##--
Sys.setlocale("LC_CTYPE", "Arabic")
iconv(x1, "latin1", "")  # NA NA NA
iconv(x1, "latin1", "//TRANSLIT") # not bad, but not perfect
iconv(x1, "latin1", "", sub="byte")
iconv(x1, "latin1", "", sub="?")
iconv(xU, "UTF-8", "")  # NA NA NA
iconv(xU, "UTF-8", "//TRANSLIT")
iconv(xU, "UTF-8", "", sub="byte")
iconv(xU, "UTF-8", "", sub="?")

## 2 locales that work well for these examples (no wonder) -----------

Sys.setlocale("LC_CTYPE", "German_Switzerland")
iconv(x1, "latin1", "")
iconv(x1, "latin1", "//TRANSLIT")
iconv(x1, "latin1", "", sub="?")
iconv(xU, "UTF-8", "")
iconv(xU, "UTF-8", "//TRANSLIT")
iconv(xU, "UTF-8", "", sub="?")
##--
Sys.setlocale("LC_CTYPE", "English")
iconv(x1, "latin1", "")
iconv(x1, "latin1", "//TRANSLIT")
iconv(x1, "latin1", "", sub="?")
iconv(xU, "UTF-8", "")
iconv(xU, "UTF-8", "//TRANSLIT")
iconv(xU, "UTF-8", "", sub="?")


R Under development (unstable) (2017-06-25 r72854) -- "Unsuffered Consequences"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> #### iconv() behavior depending on Locales  LC_CTYPE  in Windows
> #### =======                       ==============================
> ###
> ### In a *shell* in Windows (emacs), after doing R.home() in R, use that to do something like
> ###   c:/PROGRA~1/R/R-devel/bin/R CMD BATCH iconv-Windows.R
> ###   ^^^^^^^^^^^^^^^^^^^^^^^^^^= === ===== ===============  ==> producing  iconv-Windows.Rout
> ###
> sessionInfo() ## does not matter so much
R Under development (unstable) (2017-06-25 r72854)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252  
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

loaded via a namespace (and not attached):
[1] compiler_3.5.0

> ## -- should be Windows to exhibit the problems
>
> ## From  help(iconv) 's  example : Using "latin1" European language letters:
> x1 <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> Encoding(x1) <- "latin1"
> xU <- iconv(x1, "latin1", "UTF-8")
>
>
> ## 2 locales that do not work well : ---------------------------------
> Sys.setlocale("LC_CTYPE", "Chinese")
[1] "Chinese (Simplified)_People's Republic of China.936"
>
> iconv(x1, "latin1", "") # NA NA NA
[1] NA NA NA
> iconv(x1, "latin1", "//TRANSLIT") # perfect for Chinese
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(x1, "latin1", "", sub = "byte")
[1] "Ekstr<f8>m"         "J<f6>reskog"        "bi<df>chen Z¨¹rcher"
> iconv(xU, "UTF-8", "") # NA NA NA
[1] NA NA NA
> iconv(xU, "UTF-8", "//TRANSLIT")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "", sub = "byte")
[1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Z¨¹rcher"
> ##--
> Sys.setlocale("LC_CTYPE", "Arabic")
[1] "Arabic_Saudi Arabia.1256"
> iconv(x1, "latin1", "")  # NA NA NA
[1] NA NA NA
> iconv(x1, "latin1", "//TRANSLIT") # not bad, but not perfect
[1] "Ekstr\370m"         "J\366reskog"        "bißchen Zürcher"
> iconv(x1, "latin1", "", sub="byte")
[1] "Ekstr<f8>m"         "J<f6>reskog"        "bi<df>chen Zürcher"
> iconv(x1, "latin1", "", sub="?")
[1] "Ekstr?m"         "J?reskog"        "bi?chen Zürcher"
> iconv(xU, "UTF-8", "")  # NA NA NA
[1] NA NA NA
> iconv(xU, "UTF-8", "//TRANSLIT")
[1] "Ekstr\370m"         "J\366reskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "", sub="byte")
[1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Zürcher"
> iconv(xU, "UTF-8", "", sub="?")
[1] "Ekstr??m"         "J??reskog"        "bi??chen Zürcher"
>
> ## 2 locales that work well for these examples (no wonder) -----------
>
> Sys.setlocale("LC_CTYPE", "German_Switzerland")
[1] "German_Switzerland.1252"
> iconv(x1, "latin1", "")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(x1, "latin1", "//TRANSLIT")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(x1, "latin1", "", sub="?")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "//TRANSLIT")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "", sub="?")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> ##--
> Sys.setlocale("LC_CTYPE", "English")
[1] "English_United States.1252"
> iconv(x1, "latin1", "")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(x1, "latin1", "//TRANSLIT")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(x1, "latin1", "", sub="?")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "//TRANSLIT")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
> iconv(xU, "UTF-8", "", sub="?")
[1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>
> proc.time()
   user  system elapsed
   0.18    0.14    0.98

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: suggestion to fix packageDescription() for Windows users

R devel mailing list
In reply to this post by Martin Maechler
> *) Of course I'd hoped you'd find why the underlying
> packageDescription() function is not "getting the right thing" in this case directly -- in Windows only in some locales -- and provide a Windows-only patch for the underlying problem there, rather than the workaround patch in citation().
> The patch does solve the problem at hand, alright, so thank you, Rich and Nathan!

This makes sense. I was asking if we wanted to proceed with the workaround-esque patch Rich suggested because I was trying to understand what was more useful to the community: diving deeper into the issue to see what was going on, or spending that time on another issue that was symptomatic. Either way, I think it makes sense to fix the underlying issue at some point and I do have time set aside this week to do so.

> so ideally almost all of this follow up should have happened there.

Apologies about this. I'm still learning and appreciate the gentle correction. 😊

-----Original Message-----
From: Martin Maechler [mailto:[hidden email]]
Sent: Tuesday, June 27, 2017 3:34 AM
To: Nathan Sosnovske <[hidden email]>
Cc: Rich Calaway <[hidden email]>; Duncan Murdoch <[hidden email]>; Andrie de Vries <[hidden email]>; R-devel Mailing List ([hidden email]) <[hidden email]>; Ben Marwick <[hidden email]>; Martin Maechler <[hidden email]>
Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

>>>>> Nathan Sosnovske via R-devel <[hidden email]>
>>>>>     on Mon, 26 Jun 2017 18:22:25 +0000 writes:

    > I'd be curious to know what others think of Rich's
    > patch. If it is acceptable, I can spend time that I was
    > going to look at it this week on another bug.

It is a bit kludgy (*) of course, but I confirm it solves the problem in a "robust" way.

*) Of course I'd hoped you'd find why the underlying
packageDescription() function is not "getting the right thing" in this case directly -- in Windows only in some locales -- and provide a Windows-only patch for the underlying problem there, rather than the workaround patch in citation().
The patch does solve the problem at hand, alright, so thank you, Rich and Nathan!

Note that Duncan Murdoch did mention in this thread to file an official bug report and Ben Marwick gave the URL

> From: Ben Marwick ...
> Subject: Re: [Rd] suggestion to fix packageDescription() for Windows
> users
> Date: Sun, 18 Jun 2017 08:34:56 +1000

> Thanks very much, I see your bug report here:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.
> r-project.org%2Fbugzilla3%2Fshow_bug.cgi%3Fid%3D17291&data=02%7C01%7Cn
> sosnov%40microsoft.com%7Ce3ed6ea675b44c908c9208d4bd4817dc%7C72f988bf86
> f141af91ab2d7cd011db47%7C1%7C0%7C636341564724082662&sdata=AFG8tyP5Maqc
> iZwYFqBZ4wylbVJAoyWu4kASsxFZr%2F4%3D&reserved=0

so ideally almost all of this follow up should have happened there.
I have followed up there and also there attached a Windows-only
+ commented version of Rich's patch. As mentioned, I've tested
it and confirmed to work for the use case in the mean time, so plan to commit soon.

This will be too late for the release of R 3.4.1 tomorrow, of course [code freeze was on June 23].

Martin Maechler
ETH Zurich

    > -----Original Message-----
    > From: Rich Calaway
    > Sent: Friday, June 23, 2017 6:34 PM
    > To: Nathan Sosnovske <[hidden email]>; Duncan Murdoch <[hidden email]>; Andrie de Vries <[hidden email]>
    > Cc: Ben Marwick <[hidden email]>; R-devel Mailing List ([hidden email]) <[hidden email]>
    > Subject: RE: [Rd] suggestion to fix packageDescription() for Windows users

    > The following patch is not the most elegant, but it restores the Authors when "LC_CTYPE" is set to either "Chinese" or "Arabic":

    >> Sys.setlocale("LC_CTYPE", "Chinese")
    > [1] "Chinese (Simplified)_China.936"
    >> citation("readr")

    > To cite package ‘readr’ in publications use:

    > (2016). readr: Read Tabular Data. R package version 1.0.0.
    > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ce3ed6ea675b44c908c9208d4bd4817dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636341564724082662&sdata=C91aZ4qu2CT56qCrqFFBshREabZzfkaLXAABgWQSvXg%3D&reserved=0

    > A BibTeX entry for LaTeX users is

    > @Manual{,
    > title = {readr: Read Tabular Data},
    > year = {2016},
    > note = {R package version 1.0.0},
    > url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ce3ed6ea675b44c908c9208d4bd4817dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636341564724082662&sdata=C91aZ4qu2CT56qCrqFFBshREabZzfkaLXAABgWQSvXg%3D&reserved=0},
    > }

    > ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help("citation")’.

    >> Sys.setlocale("LC_CTYPE", "Arabic")
    > [1] "Arabic_Saudi Arabia.1256"
    >> citation("readr")

    > To cite package ‘readr’ in publications use:

    > (2016). readr: Read Tabular Data. R package version 1.0.0.
    > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ce3ed6ea675b44c908c9208d4bd4817dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636341564724082662&sdata=C91aZ4qu2CT56qCrqFFBshREabZzfkaLXAABgWQSvXg%3D&reserved=0

    > A BibTeX entry for LaTeX users is

    > @Manual{,
    > title = {readr: Read Tabular Data},
    > year = {2016},
    > note = {R package version 1.0.0},
    > url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ce3ed6ea675b44c908c9208d4bd4817dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636341564724082662&sdata=C91aZ4qu2CT56qCrqFFBshREabZzfkaLXAABgWQSvXg%3D&reserved=0},
    > }

    > ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help("citation")’.

    >> citation <- newCitation
    >> citation("readr")

    > To cite package ‘readr’ in publications use:

    > Hadley Wickham, Jim Hester and Romain Francois (2016). readr: Read
    > Tabular Data. R package version 1.0.0.
    > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ce3ed6ea675b44c908c9208d4bd4817dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636341564724082662&sdata=C91aZ4qu2CT56qCrqFFBshREabZzfkaLXAABgWQSvXg%3D&reserved=0

    > A BibTeX entry for LaTeX users is

    > @Manual{,
    > title = {readr: Read Tabular Data},
    > author = {Hadley Wickham and Jim Hester and Romain Francois},
    > year = {2016},
    > note = {R package version 1.0.0},
    > url = {https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2FCRAN.R-project.org%2Fpackage%3Dreadr&data=02%7C01%7Cnsosnov%40microsoft.com%7Ce3ed6ea675b44c908c9208d4bd4817dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636341564724082662&sdata=C91aZ4qu2CT56qCrqFFBshREabZzfkaLXAABgWQSvXg%3D&reserved=0},
    > }



    > The patch is:

    > Index: citation.R
    > ===================================================================
    > --- citation.R (revision 72852)
    > +++ citation.R (working copy)
    > @@ -1162,8 +1162,11 @@
    > if(dir == "")
    > stop(gettextf("package %s not found", sQuote(package)),
    > domain = NA)
    > -        meta <- packageDescription(pkg = package,
    > -                                   lib.loc = dirname(dir))
    > +    args <- list(pkg = package, lib.loc = dirname(dir))
    > +    if (!is.na(enc <- packageDescription(pkg = package, lib.loc=dirname(dir), field="Encoding")))
    > +    args$enc <- enc
    > +        meta <- do.call("packageDescription", args=args)
    > +
    > ## if(is.null(auto)): Use default auto-citation if no CITATION
    > ## available.
    > citfile <- file.path(dir, "CITATION")


    > Nathan says he can look into this further next week...

    > Cheers,

    > Rich Calaway
    > Microsoft R Product Team
    > 24/1341
    > +1 (425) 4219919 X19919

    > -----Original Message-----
    > From: R-devel [mailto:[hidden email]] On Behalf Of Nathan Sosnovske via R-devel
    > Sent: Friday, June 23, 2017 7:36 AM
    > To: Duncan Murdoch <[hidden email]>; Andrie de Vries <[hidden email]>
    > Cc: [hidden email]; Ben Marwick <[hidden email]>
    > Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

    > Hi Duncan,

    > I'm guessing I'll be able to look at this over the weekend/next week (probably closer to next week). It is on my list of things to do and I've just had a few other prior commitments that I have to finish first.

    > Sorry for the delay. I'll chime in with a status update next week.

    > Nathan

    > -----Original Message-----
    > From: R-devel [mailto:[hidden email]] On Behalf Of Duncan Murdoch
    > Sent: Friday, June 23, 2017 5:16 AM
    > To: Andrie de Vries <[hidden email]>
    > Cc: [hidden email]; Ben Marwick <[hidden email]>
    > Subject: Re: [Rd] suggestion to fix packageDescription() for Windows users

    > On 18/06/2017 5:57 AM, Andrie de Vries wrote:
    >> Hi, Duncan
    >>
    >> i have forwarded this thread to Nathan, who promised to look into it.

    > Any progress on this?

    > Duncan Murdoch

    >>
    >> Andrie
    >>
    >> On 17 Jun 2017 17:26, "Duncan Murdoch" <[hidden email]
    >> <mailto:[hidden email]>> wrote:
    >>
    >> On 17/06/2017 9:13 AM, Ben Marwick wrote:
    >>
    >> Hi Duncan,
    >>
    >> Thanks for your reply. Yes, it does seem to be specific to the CTYPE
    >> setting to Chinese on Windows. If I set it to English using
    >> Sys.setlocale() there is no problem, then back to Chinese and the
    >> authors disappear:
    >>
    >> Sys.setlocale("LC_ALL","English")
    >> citation("readr")
    >>
    >>
    >> Thanks, that makes the problem reproducible.  I'll submit it as a
    >> bug report.  Maybe someone from Microsoft will fix it.
    >>
    >> Duncan Murdoch

    [.........]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Windows iconv() "failure" in certain locales

Duncan Murdoch-2
In reply to this post by Martin Maechler
On 27/06/2017 11:36 AM, Martin Maechler wrote:

> This is a continuation of the R-devel thread with subject
>  "suggestion to fix packageDescription() for Windows users" :
>
> As I said there, a patch should rather address the underlying
> problem in packageDescription rather than a kludgy workaround
> patch for  citation().
> (For that same reason, Ben Marwick proposed to fix
>  packageDescription() rather than the symptom seen in citation().)
>
> It's not hard to see that the problem is that  iconv() in
> Windows does not always succeed to translate from "UTF-8" to the
> "current locale", in the case mentioned there.
>
> I'm giving some easier reproducible examples:  no need to install
> half of tidyverse just to get citation("readr") :
>
>> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
>> Encoding(x1) <- "latin1"
>> xU <- iconv(x1, "latin1", "UTF-8")
>
>> Sys.setlocale("LC_CTYPE", "Chinese")
> [1] "Chinese (Simplified)_People's Republic of China.936"
>>
>> iconv(x1, "latin1", "") # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "") # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "//TRANSLIT")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "", sub = "byte")
> [1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Z¨¹rcher"
>
>
>> Sys.setlocale("LC_CTYPE", "Arabic")
> [1] "Arabic_Saudi Arabia.1256"
>> iconv(x1, "latin1", "")  # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "")  # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "//TRANSLIT")
> [1] "Ekstr\370m"         "J\366reskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "", sub="byte")
> [1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Zürcher"
>> iconv(xU, "UTF-8", "", sub="?")
> [1] "Ekstr??m"         "J??reskog"        "bi??chen Zürcher"
>
> Etc... .  As the above is typically garbled between e-mail
> transfer agents, I append both the iconv-Windows.R R script and
> the corresponding iconv-Windows.Rout  R transcript to this
> e-mail (using MIME type text/plain (easy using emacs for mail..)),
> and they contain a bit more than the above.
>
> Note that the above shows that using 'sub = *' and using
> "//TRANSLIT" in case of a previous NA  result helps quite a bit,
> in the sense that it gives much more information to see
>   "J?reskog"  instead   NA.
>
> I'm considering updating  packageDescription() to try these in
> case it first returns NA.   This would make the citation() hack
> unnecessary.

I agree with the general sentiment (fix the underlying problem).  I
haven't traced through this one, but the usual cause of problems like
this is that we too frequently convert to the local encoding even when
that loses information.

Kirill Müller and I are gradually working through internal code and
fixing these issues.  I don't know if this one will be fixed sooner or
later, but I would hope it would be fixed by 3.5.0.

So in order that we don't hide it, I'd ask you not to apply the patch in
R-devel.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Windows iconv() "failure" in certain locales

Uwe Ligges-3
In reply to this post by Martin Maechler


On 27.06.2017 17:36, Martin Maechler wrote:

> This is a continuation of the R-devel thread with subject
>   "suggestion to fix packageDescription() for Windows users" :
>
> As I said there, a patch should rather address the underlying
> problem in packageDescription rather than a kludgy workaround
> patch for  citation().
> (For that same reason, Ben Marwick proposed to fix
>   packageDescription() rather than the symptom seen in citation().)
>
> It's not hard to see that the problem is that  iconv() in
> Windows does not always succeed to translate from "UTF-8" to the
> "current locale", in the case mentioned there.
>
> I'm giving some easier reproducible examples:  no need to install
> half of tidyverse just to get citation("readr") :
>
>> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
>> Encoding(x1) <- "latin1"
>> xU <- iconv(x1, "latin1", "UTF-8")
>
>> Sys.setlocale("LC_CTYPE", "Chinese")
> [1] "Chinese (Simplified)_People's Republic of China.936"
>>
>> iconv(x1, "latin1", "") # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "") # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "//TRANSLIT")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"

Interesting, I get chinese characters here.

Beside the comments from Duncan Murdoch:
iconv(x1, "latin1", "", sub="?")
etc. would be an alternative in case some characters really cannot be
converted into the target encoding and should perhaps be considered for
the time after Duncan commits the fix for the underlying porblem.

Best,
Uwe








>> iconv(xU, "UTF-8", "", sub = "byte")
> [1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Z¨¹rcher"
>
>
>> Sys.setlocale("LC_CTYPE", "Arabic")
> [1] "Arabic_Saudi Arabia.1256"
>> iconv(x1, "latin1", "")  # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "")  # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "//TRANSLIT")
> [1] "Ekstr\370m"         "J\366reskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "", sub="byte")
> [1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Zürcher"
>> iconv(xU, "UTF-8", "", sub="?")
> [1] "Ekstr??m"         "J??reskog"        "bi??chen Zürcher"
>
> Etc... .  As the above is typically garbled between e-mail
> transfer agents, I append both the iconv-Windows.R R script and
> the corresponding iconv-Windows.Rout  R transcript to this
> e-mail (using MIME type text/plain (easy using emacs for mail..)),
> and they contain a bit more than the above.
>
> Note that the above shows that using 'sub = *' and using
> "//TRANSLIT" in case of a previous NA  result helps quite a bit,
> in the sense that it gives much more information to see
>    "J?reskog"  instead   NA.
>
> I'm considering updating  packageDescription() to try these in
> case it first returns NA.   This would make the citation() hack
> unnecessary.
>
> Martin
>
>
> iconv-Windows.R
>
>
> #### iconv() behavior depending on Locales  LC_CTYPE  in Windows
> #### =======                       ==============================
> ###
> ### In a *shell* in Windows (emacs), after doing R.home() in R, use that to do something like
> ###   c:/PROGRA~1/R/R-devel/bin/R CMD BATCH iconv-Windows.R
> ###   ^^^^^^^^^^^^^^^^^^^^^^^^^^= === ===== ===============  ==> producing  iconv-Windows.Rout
> ###
> sessionInfo() ## does not matter so much
> ## -- should be Windows to exhibit the problems
>
> ## From  help(iconv) 's  example : Using "latin1" European language letters:
> x1 <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> Encoding(x1) <- "latin1"
> xU <- iconv(x1, "latin1", "UTF-8")
>
>
> ## 2 locales that do not work well : ---------------------------------
> Sys.setlocale("LC_CTYPE", "Chinese")
>
> iconv(x1, "latin1", "") # NA NA NA
> iconv(x1, "latin1", "//TRANSLIT") # perfect for Chinese
> iconv(x1, "latin1", "", sub = "byte")
> iconv(xU, "UTF-8", "") # NA NA NA
> iconv(xU, "UTF-8", "//TRANSLIT")
> iconv(xU, "UTF-8", "", sub = "byte")
> ##--
> Sys.setlocale("LC_CTYPE", "Arabic")
> iconv(x1, "latin1", "")  # NA NA NA
> iconv(x1, "latin1", "//TRANSLIT") # not bad, but not perfect
> iconv(x1, "latin1", "", sub="byte")
> iconv(x1, "latin1", "", sub="?")
> iconv(xU, "UTF-8", "")  # NA NA NA
> iconv(xU, "UTF-8", "//TRANSLIT")
> iconv(xU, "UTF-8", "", sub="byte")
> iconv(xU, "UTF-8", "", sub="?")
>
> ## 2 locales that work well for these examples (no wonder) -----------
>
> Sys.setlocale("LC_CTYPE", "German_Switzerland")
> iconv(x1, "latin1", "")
> iconv(x1, "latin1", "//TRANSLIT")
> iconv(x1, "latin1", "", sub="?")
> iconv(xU, "UTF-8", "")
> iconv(xU, "UTF-8", "//TRANSLIT")
> iconv(xU, "UTF-8", "", sub="?")
> ##--
> Sys.setlocale("LC_CTYPE", "English")
> iconv(x1, "latin1", "")
> iconv(x1, "latin1", "//TRANSLIT")
> iconv(x1, "latin1", "", sub="?")
> iconv(xU, "UTF-8", "")
> iconv(xU, "UTF-8", "//TRANSLIT")
> iconv(xU, "UTF-8", "", sub="?")
>
>
> iconv-Windows.Rout
>
>
>
> R Under development (unstable) (2017-06-25 r72854) -- "Unsuffered Consequences"
> Copyright (C) 2017 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> #### iconv() behavior depending on Locales  LC_CTYPE  in Windows
>> #### =======                       ==============================
>> ###
>> ### In a *shell* in Windows (emacs), after doing R.home() in R, use that to do something like
>> ###   c:/PROGRA~1/R/R-devel/bin/R CMD BATCH iconv-Windows.R
>> ###   ^^^^^^^^^^^^^^^^^^^^^^^^^^= === ===== ===============  ==> producing  iconv-Windows.Rout
>> ###
>> sessionInfo() ## does not matter so much
> R Under development (unstable) (2017-06-25 r72854)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.0
>> ## -- should be Windows to exhibit the problems
>>
>> ## From  help(iconv) 's  example : Using "latin1" European language letters:
>> x1 <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
>> Encoding(x1) <- "latin1"
>> xU <- iconv(x1, "latin1", "UTF-8")
>>
>>
>> ## 2 locales that do not work well : ---------------------------------
>> Sys.setlocale("LC_CTYPE", "Chinese")
> [1] "Chinese (Simplified)_People's Republic of China.936"
>>
>> iconv(x1, "latin1", "") # NA NA NA
> [1] NA NA NA
>> iconv(x1, "latin1", "//TRANSLIT") # perfect for Chinese
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(x1, "latin1", "", sub = "byte")
> [1] "Ekstr<f8>m"         "J<f6>reskog"        "bi<df>chen Z¨¹rcher"
>> iconv(xU, "UTF-8", "") # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "//TRANSLIT")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "", sub = "byte")
> [1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Z¨¹rcher"
>> ##--
>> Sys.setlocale("LC_CTYPE", "Arabic")
> [1] "Arabic_Saudi Arabia.1256"
>> iconv(x1, "latin1", "")  # NA NA NA
> [1] NA NA NA
>> iconv(x1, "latin1", "//TRANSLIT") # not bad, but not perfect
> [1] "Ekstr\370m"         "J\366reskog"        "bißchen Zürcher"
>> iconv(x1, "latin1", "", sub="byte")
> [1] "Ekstr<f8>m"         "J<f6>reskog"        "bi<df>chen Zürcher"
>> iconv(x1, "latin1", "", sub="?")
> [1] "Ekstr?m"         "J?reskog"        "bi?chen Zürcher"
>> iconv(xU, "UTF-8", "")  # NA NA NA
> [1] NA NA NA
>> iconv(xU, "UTF-8", "//TRANSLIT")
> [1] "Ekstr\370m"         "J\366reskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "", sub="byte")
> [1] "Ekstr<c3><b8>m"         "J<c3><b6>reskog"        "bi<c3><9f>chen Zürcher"
>> iconv(xU, "UTF-8", "", sub="?")
> [1] "Ekstr??m"         "J??reskog"        "bi??chen Zürcher"
>>
>> ## 2 locales that work well for these examples (no wonder) -----------
>>
>> Sys.setlocale("LC_CTYPE", "German_Switzerland")
> [1] "German_Switzerland.1252"
>> iconv(x1, "latin1", "")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(x1, "latin1", "//TRANSLIT")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(x1, "latin1", "", sub="?")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "//TRANSLIT")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "", sub="?")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> ##--
>> Sys.setlocale("LC_CTYPE", "English")
> [1] "English_United States.1252"
>> iconv(x1, "latin1", "")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(x1, "latin1", "//TRANSLIT")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(x1, "latin1", "", sub="?")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "//TRANSLIT")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>> iconv(xU, "UTF-8", "", sub="?")
> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>>
>> proc.time()
>     user  system elapsed
>     0.18    0.14    0.98
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Windows iconv() "failure" in certain locales

Martin Maechler
>>>>> Uwe Ligges <[hidden email]>
>>>>>     on Wed, 28 Jun 2017 18:45:59 +0200 writes:

    > On 27.06.2017 17:36, Martin Maechler wrote:
    >> This is a continuation of the R-devel thread with subject
    >> "suggestion to fix packageDescription() for Windows users" :
    >>
    >> As I said there, a patch should rather address the underlying
    >> problem in packageDescription rather than a kludgy workaround
    >> patch for  citation().
    >> (For that same reason, Ben Marwick proposed to fix
    >> packageDescription() rather than the symptom seen in citation().)
    >>
    >> It's not hard to see that the problem is that  iconv() in
    >> Windows does not always succeed to translate from "UTF-8" to the
    >> "current locale", in the case mentioned there.
    >>
    >> I'm giving some easier reproducible examples:  no need to install
    >> half of tidyverse just to get citation("readr") :
    >>
    >>> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
    >>> Encoding(x1) <- "latin1"
    >>> xU <- iconv(x1, "latin1", "UTF-8")
    >>
    >>> Sys.setlocale("LC_CTYPE", "Chinese")
    >> [1] "Chinese (Simplified)_People's Republic of China.936"
    >>>
    >>> iconv(x1, "latin1", "") # NA NA NA
    >> [1] NA NA NA
    >>> iconv(xU, "UTF-8", "") # NA NA NA
    >> [1] NA NA NA
    >>> iconv(xU, "UTF-8", "//TRANSLIT")
    >> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"

    > Interesting, I get chinese characters here.

For which one of the above cases; can you show them
 (it may survive E-mail servers; we had other
  Chinese R strings on R-help / R-devel recently, right?)

In any case, I think  that is even worse, isn't it?  
As also in a Chinese locale you'd want explicit-latin1 text to
see in something that looks like latin-1 (I know from a master's
 student that Windows+Chinese can well show latin-1-like
 letters also interspersed in the Chinese text),
no ?


    > Beside the comments from Duncan Murdoch:

    > iconv(x1, "latin1", "", sub="?")
    > etc. would be an alternative in case some characters really cannot be
    > converted into the target encoding and should perhaps be considered for
    > the time after Duncan commits the fix for the underlying porblem.

Yes. I'd had the same idea that's why I used it in the code I
sent along.

So,

1)  we definitely won't commit the workaround patch for citation().

2) I have a "workaround patch" for packageDescription() which is
   more useful in the sense that only if iconv() produces NA's, it
   tries alternatives, notably   "//TRANSLIT",  "ASCII//TRANSLIT"
   (the latter Ben also mentioned, but my patch would only use it
    in the NA case) and also the same  'sub="?"' that you mention
    above, Uwe.

   That patch is not Windows-specific and will automatically
   also help in other cases / platforms where the iconv()
   re-encoding leads to partial NAs.
   
  @Duncan M: would you _not_ want me to commit that either?

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Windows iconv() "failure" in certain locales

Uwe Ligges-3


On 29.06.2017 12:27, Martin Maechler wrote:

>>>>>> Uwe Ligges <[hidden email]>
>>>>>>      on Wed, 28 Jun 2017 18:45:59 +0200 writes:
>
>      > On 27.06.2017 17:36, Martin Maechler wrote:
>      >> This is a continuation of the R-devel thread with subject
>      >> "suggestion to fix packageDescription() for Windows users" :
>      >>
>      >> As I said there, a patch should rather address the underlying
>      >> problem in packageDescription rather than a kludgy workaround
>      >> patch for  citation().
>      >> (For that same reason, Ben Marwick proposed to fix
>      >> packageDescription() rather than the symptom seen in citation().)
>      >>
>      >> It's not hard to see that the problem is that  iconv() in
>      >> Windows does not always succeed to translate from "UTF-8" to the
>      >> "current locale", in the case mentioned there.
>      >>
>      >> I'm giving some easier reproducible examples:  no need to install
>      >> half of tidyverse just to get citation("readr") :
>      >>
>      >>> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
>      >>> Encoding(x1) <- "latin1"
>      >>> xU <- iconv(x1, "latin1", "UTF-8")
>      >>
>      >>> Sys.setlocale("LC_CTYPE", "Chinese")
>      >> [1] "Chinese (Simplified)_People's Republic of China.936"
>      >>>
>      >>> iconv(x1, "latin1", "") # NA NA NA
>      >> [1] NA NA NA
>      >>> iconv(xU, "UTF-8", "") # NA NA NA
>      >> [1] NA NA NA
>      >>> iconv(xU, "UTF-8", "//TRANSLIT")
>      >> [1] "Ekstrøm"         "Jöreskog"        "bißchen Zürcher"
>
>      > Interesting, I get chinese characters here.
>
> For which one of the above cases; can you show them
>   (it may survive E-mail servers; we had other
>    Chinese R strings on R-help / R-devel recently, right?)



x1 <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Encoding(x1) <- "latin1"
Sys.setlocale("LC_CTYPE", "Chinese")
# [1] "Chinese (Simplified)_People's Republic of China.936"
xU <- iconv(x1, "latin1", "UTF-8")
iconv(xU, "UTF-8", "//TRANSLIT")
# [1] "Ekstr鴐"         "J鰎eskog"        "bi遚hen Z黵cher


 > sessionInfo()
R Under development (unstable) (2017-06-28 r72861)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252
LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=German_Germany.1252
LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.0



Best,
Uwe






> In any case, I think  that is even worse, isn't it > As also in a Chinese locale you'd want explicit-latin1 text to
> see in something that looks like latin-1 (I know from a master's
>   student that Windows+Chinese can well show latin-1-like
>   letters also interspersed in the Chinese text),
> no ?
>
>
>      > Beside the comments from Duncan Murdoch:
>
>      > iconv(x1, "latin1", "", sub="?")
>      > etc. would be an alternative in case some characters really cannot be
>      > converted into the target encoding and should perhaps be considered for
>      > the time after Duncan commits the fix for the underlying porblem.
>
> Yes. I'd had the same idea that's why I used it in the code I
> sent along.
>
> So,
>
> 1)  we definitely won't commit the workaround patch for citation().
>
> 2) I have a "workaround patch" for packageDescription() which is
>     more useful in the sense that only if iconv() produces NA's, it
>     tries alternatives, notably   "//TRANSLIT",  "ASCII//TRANSLIT"
>     (the latter Ben also mentioned, but my patch would only use it
>      in the NA case) and also the same  'sub="?"' that you mention
>      above, Uwe.
>
>     That patch is not Windows-specific and will automatically
>     also help in other cases / platforms where the iconv()
>     re-encoding leads to partial NAs.
>    
>    @Duncan M: would you _not_ want me to commit that either?
>
> Martin
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel