Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

Juan Telleria Ruiz de Aguirre
Dear R Developers,

I am having an issue with the renv package and R 4.0.1, which I
suspect is related to base R and not the renv package itself, as with
R 3.6.3 such an "error" does not appear.

The error is raised by a file.exists() path, and path
"C:\Users\J-tel\Documents", which in R 3.6.3 is read correctly, but in
R 4.0.1 fails (Probably because of the "-" symbol), and I suspect it
might be related with the new UTF-8 usage on Windows 10?
(https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html)

I have also checked file.exists() function and its internals, and seem
not to have happened changes in the meanwhile within them:

https://github.com/wch/r-source/blob/0e3b3182f87a60af4b0293a5410dde680b910f49/src/library/base/R/files.R
https://github.com/search?q=SEXP%20attribute_hidden%20do_fileexists+repo:wch/r-source&type=Code

Error Details:

> renv::init()
Error in file.exists(children) :
  file name conversion problem -- name too long?
> traceback()
14: file.exists(children)
13: renv_dependencies_find_dir_children(path, root)
12: renv_dependencies_find_dir(path, root)
11: FUN(X[[i]], ...)
10: lapply(path, renv_dependencies_find_impl, root = root)
9: renv_dependencies_find(path, root)
8: (function (path = getwd(), root = NULL, ..., progress = TRUE,
       errors = c("reported", "fatal", "ignored"), dev = FALSE)
   {
       path <- renv_path_normalize(path, winslash = "/", mustWork = TRUE)
       root <- root %||% renv_dependencies_root(path)
       if (exists(path, envir = `_renv_dependencies`))
           return(get(path, envir = `_renv_dependencies`))
       renv_dependencies_begin(root = root)
       on.exit(renv_dependencies_end(), add = TRUE)
       dots <- list(...)
       if (identical(dots[["quiet"]], TRUE)) {
           progress <- FALSE
           errors <- "ignored"
       }
       files <- renv_dependencies_find(path, root)
       deps <- renv_dependencies_discover(files, progress, errors)
       renv_dependencies_report(errors)
       deps
   })(path, progress = FALSE, errors = errors, dev = TRUE)
7: eval(call, envir = parent.frame(2))
6: eval(call, envir = parent.frame(2))
5: delegate(renv_dependencies_impl)
4: dependencies(path, progress = FALSE, errors = errors, dev = TRUE)
3: withCallingHandlers(dependencies(path, progress = FALSE, errors = errors,
       dev = TRUE), renv.dependencies.error =
renv_dependencies_error_handler(message,
       errors))
2: renv_dependencies_scope(project, action = "init")
1: renv::init()

> renv::diagnostics()
Diagnostics Report -- renv [0.10.0]
===================================

# Session Info =======================
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] renv_0.10.0

loaded via a namespace (and not attached):
 [1] compiler_4.0.1   rsconnect_0.8.16 htmltools_0.4.0  tools_4.0.1
 [5] yaml_2.2.1       Rcpp_1.0.4.6     rmarkdown_2.2    knitr_1.28
 [9] xfun_0.14        digest_0.6.25    packrat_0.5.0    rlang_0.4.6
[13] evaluate_0.14

# Project ============================
Project path: "~/Test2"

# Status =============================

# Lockfile ===========================
This project has not yet been snapshotted: 'renv.lock' does not exist.

# Library ============================
The project library "~/Test2/renv/library/R-4.0/x86_64-w64-mingw32"
does not exist.

# Dependencies =======================

# User Profile =======================
[no user profile detected]

# Settings ===========================
List of 6
 $ external.libraries       : chr(0)
 $ ignored.packages         : chr(0)
 $ package.dependency.fields: chr [1:3] "Imports" "Depends" "LinkingTo"
 $ snapshot.type            : chr "implicit"
 $ use.cache                : logi TRUE
 $ vcs.ignore.library       : logi TRUE

# Options ============================
List of 1
 $ renv.verbose: logi TRUE

# Environment Variables ==============
HOME        = C:\Users\J-tel\OneDrive\Documents
LANG        = <NA>
R_LIBS      = <NA>
R_LIBS_SITE = <NA>
R_LIBS_USER = C:/Users/J-tel/OneDrive/Documents/R/win-library/4.0

# PATH ===============================
- C:\rtools40\usr\bin
- C:\Program Files\R\R-4.0.1\bin\x64
- C:\ProgramData\Miniconda3
- C:\ProgramData\Miniconda3\Library\mingw-w64\bin
- C:\ProgramData\Miniconda3\Library\usr\bin
- C:\ProgramData\Miniconda3\Library\bin
- C:\ProgramData\Miniconda3\Scripts
- C:\ProgramData\Oracle\Java\javapath
- C:\WINDOWS\system32
- C:\WINDOWS
- C:\WINDOWS\System32\Wbem
- C:\WINDOWS\System32\WindowsPowerShell\v1.0\
- C:\WINDOWS\System32\OpenSSH\
- C:\Program Files\MiKTeX 2.9\miktex\bin\x64\
- C:\ProgramData\Miniconda3\Scripts\conda.exe

# Cache ==============================
There are a total of 0 package(s) installed in the renv cache.
Cache path: "C:/Users/J-tel/AppData/Local/renv/cache/v5/R-4.0/x86_64-w64-mingw32"

System Information:

> R.Version()
$platform
[1] "x86_64-w64-mingw32"

$arch
[1] "x86_64"

$os
[1] "mingw32"

$system
[1] "x86_64, mingw32"

$status
[1] ""

$major
[1] "4"

$minor
[1] "0.1"

$year
[1] "2020"

$month
[1] "06"

$day
[1] "06"

$`svn rev`
[1] "78648"

$language
[1] "R"

$version.string
[1] "R version 4.0.1 (2020-06-06)"

$nickname
[1] "See Things Now"

Thank you,
Juan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

Tomas Kalibera

Dear Juan,

I don't see what is the problem from your report. Please try to create a
minimal but complete reproducible example that does not use the renv
package. Perhaps you could use the R debugger (e.g. via
options(error=recover)) to find out what is the argument that
file.exists() has been called with. And then you could try just to call
file.exists() directly with that argument to trigger the problem.

It may be that the argument has been corrupted/is invalid in the current
native encoding. If that is the case, the next step would be to find out
who corrupted it (renv, R, something else). The error is displayed when
a path name cannot be converted from the current native encoding to
UTF16-LE.

The experimental support for UTF-8 as native encoding on Windows 10 is
only available in a custom build of R, like the one I linked from my
blog post.

Thanks
Tomas



On 6/10/20 1:06 PM, Juan Telleria Ruiz de Aguirre wrote:

> Dear R Developers,
>
> I am having an issue with the renv package and R 4.0.1, which I
> suspect is related to base R and not the renv package itself, as with
> R 3.6.3 such an "error" does not appear.
>
> The error is raised by a file.exists() path, and path
> "C:\Users\J-tel\Documents", which in R 3.6.3 is read correctly, but in
> R 4.0.1 fails (Probably because of the "-" symbol), and I suspect it
> might be related with the new UTF-8 usage on Windows 10?
> (https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html)
>
> I have also checked file.exists() function and its internals, and seem
> not to have happened changes in the meanwhile within them:
>
> https://github.com/wch/r-source/blob/0e3b3182f87a60af4b0293a5410dde680b910f49/src/library/base/R/files.R
> https://github.com/search?q=SEXP%20attribute_hidden%20do_fileexists+repo:wch/r-source&type=Code
>
> Error Details:
>
>> renv::init()
> Error in file.exists(children) :
>    file name conversion problem -- name too long?
>> traceback()
> 14: file.exists(children)
> 13: renv_dependencies_find_dir_children(path, root)
> 12: renv_dependencies_find_dir(path, root)
> 11: FUN(X[[i]], ...)
> 10: lapply(path, renv_dependencies_find_impl, root = root)
> 9: renv_dependencies_find(path, root)
> 8: (function (path = getwd(), root = NULL, ..., progress = TRUE,
>         errors = c("reported", "fatal", "ignored"), dev = FALSE)
>     {
>         path <- renv_path_normalize(path, winslash = "/", mustWork = TRUE)
>         root <- root %||% renv_dependencies_root(path)
>         if (exists(path, envir = `_renv_dependencies`))
>             return(get(path, envir = `_renv_dependencies`))
>         renv_dependencies_begin(root = root)
>         on.exit(renv_dependencies_end(), add = TRUE)
>         dots <- list(...)
>         if (identical(dots[["quiet"]], TRUE)) {
>             progress <- FALSE
>             errors <- "ignored"
>         }
>         files <- renv_dependencies_find(path, root)
>         deps <- renv_dependencies_discover(files, progress, errors)
>         renv_dependencies_report(errors)
>         deps
>     })(path, progress = FALSE, errors = errors, dev = TRUE)
> 7: eval(call, envir = parent.frame(2))
> 6: eval(call, envir = parent.frame(2))
> 5: delegate(renv_dependencies_impl)
> 4: dependencies(path, progress = FALSE, errors = errors, dev = TRUE)
> 3: withCallingHandlers(dependencies(path, progress = FALSE, errors = errors,
>         dev = TRUE), renv.dependencies.error =
> renv_dependencies_error_handler(message,
>         errors))
> 2: renv_dependencies_scope(project, action = "init")
> 1: renv::init()
>
>> renv::diagnostics()
> Diagnostics Report -- renv [0.10.0]
> ===================================
>
> # Session Info =======================
> R version 4.0.1 (2020-06-06)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 18362)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> [5] LC_TIME=Spanish_Spain.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] renv_0.10.0
>
> loaded via a namespace (and not attached):
>   [1] compiler_4.0.1   rsconnect_0.8.16 htmltools_0.4.0  tools_4.0.1
>   [5] yaml_2.2.1       Rcpp_1.0.4.6     rmarkdown_2.2    knitr_1.28
>   [9] xfun_0.14        digest_0.6.25    packrat_0.5.0    rlang_0.4.6
> [13] evaluate_0.14
>
> # Project ============================
> Project path: "~/Test2"
>
> # Status =============================
>
> # Lockfile ===========================
> This project has not yet been snapshotted: 'renv.lock' does not exist.
>
> # Library ============================
> The project library "~/Test2/renv/library/R-4.0/x86_64-w64-mingw32"
> does not exist.
>
> # Dependencies =======================
>
> # User Profile =======================
> [no user profile detected]
>
> # Settings ===========================
> List of 6
>   $ external.libraries       : chr(0)
>   $ ignored.packages         : chr(0)
>   $ package.dependency.fields: chr [1:3] "Imports" "Depends" "LinkingTo"
>   $ snapshot.type            : chr "implicit"
>   $ use.cache                : logi TRUE
>   $ vcs.ignore.library       : logi TRUE
>
> # Options ============================
> List of 1
>   $ renv.verbose: logi TRUE
>
> # Environment Variables ==============
> HOME        = C:\Users\J-tel\OneDrive\Documents
> LANG        = <NA>
> R_LIBS      = <NA>
> R_LIBS_SITE = <NA>
> R_LIBS_USER = C:/Users/J-tel/OneDrive/Documents/R/win-library/4.0
>
> # PATH ===============================
> - C:\rtools40\usr\bin
> - C:\Program Files\R\R-4.0.1\bin\x64
> - C:\ProgramData\Miniconda3
> - C:\ProgramData\Miniconda3\Library\mingw-w64\bin
> - C:\ProgramData\Miniconda3\Library\usr\bin
> - C:\ProgramData\Miniconda3\Library\bin
> - C:\ProgramData\Miniconda3\Scripts
> - C:\ProgramData\Oracle\Java\javapath
> - C:\WINDOWS\system32
> - C:\WINDOWS
> - C:\WINDOWS\System32\Wbem
> - C:\WINDOWS\System32\WindowsPowerShell\v1.0\
> - C:\WINDOWS\System32\OpenSSH\
> - C:\Program Files\MiKTeX 2.9\miktex\bin\x64\
> - C:\ProgramData\Miniconda3\Scripts\conda.exe
>
> # Cache ==============================
> There are a total of 0 package(s) installed in the renv cache.
> Cache path: "C:/Users/J-tel/AppData/Local/renv/cache/v5/R-4.0/x86_64-w64-mingw32"
>
> System Information:
>
>> R.Version()
> $platform
> [1] "x86_64-w64-mingw32"
>
> $arch
> [1] "x86_64"
>
> $os
> [1] "mingw32"
>
> $system
> [1] "x86_64, mingw32"
>
> $status
> [1] ""
>
> $major
> [1] "4"
>
> $minor
> [1] "0.1"
>
> $year
> [1] "2020"
>
> $month
> [1] "06"
>
> $day
> [1] "06"
>
> $`svn rev`
> [1] "78648"
>
> $language
> [1] "R"
>
> $version.string
> [1] "R version 4.0.1 (2020-06-06)"
>
> $nickname
> [1] "See Things Now"
>
> Thank you,
> Juan
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

Dirk Eddelbuettel
In reply to this post by Juan Telleria Ruiz de Aguirre

On 10 June 2020 at 13:06, Juan Telleria Ruiz de Aguirre wrote:
| I am having an issue with the renv package and R 4.0.1, which I
| suspect is related to base R and not the renv package itself, as with
| R 3.6.3 such an "error" does not appear.

So a bug in `renv` as it does not account for changes in R 4.0.0 ?

Stuff happens. I just fixed an 'change in R 4.0.0' for one small aspect of
Rcpp(Armadillo) (namely the change in package.skeleton() and NAMESPACE).

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

Kevin Ushey
Hi Juan,

For bug reports to R, you should attempt to create a
minimally-reproducible example, using only R's builtin facilities and
not any other addon packages. Given your report, it's not clear
whether the issue lies within renv or truly is caused by a change in R
4.0.0.

Also note that you have not supplied a minimally reproducible example.
If at all possible, you should be able to supply some code that
reproduces the issue -- ideally, one should be able to just copy +
paste the code into an R session to see the issue arise. Presumably,
if the issue is indeed in base R, then you should be able to supply a
reproducible example of the form:

    path <- "path/that/causes/issue"
    file.exists(path)

Alternatively, if you can distill this into a minimally-reproducible
example that does require renv, then you should report that to the
maintainer of renv (me), not this mailing list.

Best,
Kevin

On Wed, Jun 10, 2020 at 4:55 AM Dirk Eddelbuettel <[hidden email]> wrote:

>
>
> On 10 June 2020 at 13:06, Juan Telleria Ruiz de Aguirre wrote:
> | I am having an issue with the renv package and R 4.0.1, which I
> | suspect is related to base R and not the renv package itself, as with
> | R 3.6.3 such an "error" does not appear.
>
> So a bug in `renv` as it does not account for changes in R 4.0.0 ?
>
> Stuff happens. I just fixed an 'change in R 4.0.0' for one small aspect of
> Rcpp(Armadillo) (namely the change in package.skeleton() and NAMESPACE).
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

Juan Telleria Ruiz de Aguirre
Thank you Kevin, just checked that the error is solved in the latest
development version of "renv", and now it works as expected with R
4.0.1:

https://github.com/rstudio/renv/commit/976ae7af6dc348af30eaf2893d886f132a76aba0

Sorry for posting in r-devel, I was not sure if it was a R or "renv"
error due to different behaviour in different versions of R 4.0.1 and
R 3.6.3 for conversion from UTF16-LE to UTF-8 encoding.

Will provide a better reproducible example next time and traceback the
error with options(error=recover)) to make sure.

Thanks,
Juan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

Yihui Xie-2
In reply to this post by Tomas Kalibera
Hi Tomas,

I received a report about R 4.0.0 in the knitr package
(https://github.com/yihui/knitr/issues/1840), and I think it is
related to the issue here. I created a minimal reproducible example
below:

owd = setwd(tempdir())
z = 'K\u00e4sch.txt'
file.create(z)
list.files()
file.exists(list.files())
setwd(owd)

Output:

> owd = setwd(tempdir())
> z = 'K\u00e4sch.txt'
> file.create(z)
[1] TRUE
> list.files()
[1] "K?sch.txt"
> file.exists(list.files())
[1] FALSE
> setwd(owd)

I wonder if it is expected that file.exists() returns FALSE here.

> sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
system code page: 936

FWIW, I also tested Chinese characters in the variable `z` above, and
file.exists() returns TRUE only after I Sys.setlocale(, "Chinese").

Regards,
Yihui

On Thu, Jun 11, 2020 at 3:11 AM Tomas Kalibera <[hidden email]> wrote:

>
>
> Dear Juan,
>
> I don't see what is the problem from your report. Please try to create a
> minimal but complete reproducible example that does not use the renv
> package. Perhaps you could use the R debugger (e.g. via
> options(error=recover)) to find out what is the argument that
> file.exists() has been called with. And then you could try just to call
> file.exists() directly with that argument to trigger the problem.
>
> It may be that the argument has been corrupted/is invalid in the current
> native encoding. If that is the case, the next step would be to find out
> who corrupted it (renv, R, something else). The error is displayed when
> a path name cannot be converted from the current native encoding to
> UTF16-LE.
>
> The experimental support for UTF-8 as native encoding on Windows 10 is
> only available in a custom build of R, like the one I linked from my
> blog post.
>
> Thanks
> Tomas
>
>
>
> On 6/10/20 1:06 PM, Juan Telleria Ruiz de Aguirre wrote:
> > Dear R Developers,
> >
> > I am having an issue with the renv package and R 4.0.1, which I
> > suspect is related to base R and not the renv package itself, as with
> > R 3.6.3 such an "error" does not appear.
> >
> > The error is raised by a file.exists() path, and path
> > "C:\Users\J-tel\Documents", which in R 3.6.3 is read correctly, but in
> > R 4.0.1 fails (Probably because of the "-" symbol), and I suspect it
> > might be related with the new UTF-8 usage on Windows 10?
> > (https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html)
> >
> > I have also checked file.exists() function and its internals, and seem
> > not to have happened changes in the meanwhile within them:
> >
> > https://github.com/wch/r-source/blob/0e3b3182f87a60af4b0293a5410dde680b910f49/src/library/base/R/files.R
> > https://github.com/search?q=SEXP%20attribute_hidden%20do_fileexists+repo:wch/r-source&type=Code
> >
> > Error Details:
> >
> >> renv::init()
> > Error in file.exists(children) :
> >    file name conversion problem -- name too long?
> >> traceback()
> > 14: file.exists(children)
> > 13: renv_dependencies_find_dir_children(path, root)
> > 12: renv_dependencies_find_dir(path, root)
> > 11: FUN(X[[i]], ...)
> > 10: lapply(path, renv_dependencies_find_impl, root = root)
> > 9: renv_dependencies_find(path, root)
> > 8: (function (path = getwd(), root = NULL, ..., progress = TRUE,
> >         errors = c("reported", "fatal", "ignored"), dev = FALSE)
> >     {
> >         path <- renv_path_normalize(path, winslash = "/", mustWork = TRUE)
> >         root <- root %||% renv_dependencies_root(path)
> >         if (exists(path, envir = `_renv_dependencies`))
> >             return(get(path, envir = `_renv_dependencies`))
> >         renv_dependencies_begin(root = root)
> >         on.exit(renv_dependencies_end(), add = TRUE)
> >         dots <- list(...)
> >         if (identical(dots[["quiet"]], TRUE)) {
> >             progress <- FALSE
> >             errors <- "ignored"
> >         }
> >         files <- renv_dependencies_find(path, root)
> >         deps <- renv_dependencies_discover(files, progress, errors)
> >         renv_dependencies_report(errors)
> >         deps
> >     })(path, progress = FALSE, errors = errors, dev = TRUE)
> > 7: eval(call, envir = parent.frame(2))
> > 6: eval(call, envir = parent.frame(2))
> > 5: delegate(renv_dependencies_impl)
> > 4: dependencies(path, progress = FALSE, errors = errors, dev = TRUE)
> > 3: withCallingHandlers(dependencies(path, progress = FALSE, errors = errors,
> >         dev = TRUE), renv.dependencies.error =
> > renv_dependencies_error_handler(message,
> >         errors))
> > 2: renv_dependencies_scope(project, action = "init")
> > 1: renv::init()
> >
> >> renv::diagnostics()
> > Diagnostics Report -- renv [0.10.0]
> > ===================================
> >
> > # Session Info =======================
> > R version 4.0.1 (2020-06-06)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > Running under: Windows 10 x64 (build 18362)
> >
> > Matrix products: default
> >
> > locale:
> > [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
> > [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> > [5] LC_TIME=Spanish_Spain.1252
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> > [1] renv_0.10.0
> >
> > loaded via a namespace (and not attached):
> >   [1] compiler_4.0.1   rsconnect_0.8.16 htmltools_0.4.0  tools_4.0.1
> >   [5] yaml_2.2.1       Rcpp_1.0.4.6     rmarkdown_2.2    knitr_1.28
> >   [9] xfun_0.14        digest_0.6.25    packrat_0.5.0    rlang_0.4.6
> > [13] evaluate_0.14
> >
> > # Project ============================
> > Project path: "~/Test2"
> >
> > # Status =============================
> >
> > # Lockfile ===========================
> > This project has not yet been snapshotted: 'renv.lock' does not exist.
> >
> > # Library ============================
> > The project library "~/Test2/renv/library/R-4.0/x86_64-w64-mingw32"
> > does not exist.
> >
> > # Dependencies =======================
> >
> > # User Profile =======================
> > [no user profile detected]
> >
> > # Settings ===========================
> > List of 6
> >   $ external.libraries       : chr(0)
> >   $ ignored.packages         : chr(0)
> >   $ package.dependency.fields: chr [1:3] "Imports" "Depends" "LinkingTo"
> >   $ snapshot.type            : chr "implicit"
> >   $ use.cache                : logi TRUE
> >   $ vcs.ignore.library       : logi TRUE
> >
> > # Options ============================
> > List of 1
> >   $ renv.verbose: logi TRUE
> >
> > # Environment Variables ==============
> > HOME        = C:\Users\J-tel\OneDrive\Documents
> > LANG        = <NA>
> > R_LIBS      = <NA>
> > R_LIBS_SITE = <NA>
> > R_LIBS_USER = C:/Users/J-tel/OneDrive/Documents/R/win-library/4.0
> >
> > # PATH ===============================
> > - C:\rtools40\usr\bin
> > - C:\Program Files\R\R-4.0.1\bin\x64
> > - C:\ProgramData\Miniconda3
> > - C:\ProgramData\Miniconda3\Library\mingw-w64\bin
> > - C:\ProgramData\Miniconda3\Library\usr\bin
> > - C:\ProgramData\Miniconda3\Library\bin
> > - C:\ProgramData\Miniconda3\Scripts
> > - C:\ProgramData\Oracle\Java\javapath
> > - C:\WINDOWS\system32
> > - C:\WINDOWS
> > - C:\WINDOWS\System32\Wbem
> > - C:\WINDOWS\System32\WindowsPowerShell\v1.0\
> > - C:\WINDOWS\System32\OpenSSH\
> > - C:\Program Files\MiKTeX 2.9\miktex\bin\x64\
> > - C:\ProgramData\Miniconda3\Scripts\conda.exe
> >
> > # Cache ==============================
> > There are a total of 0 package(s) installed in the renv cache.
> > Cache path: "C:/Users/J-tel/AppData/Local/renv/cache/v5/R-4.0/x86_64-w64-mingw32"
> >
> > System Information:
> >
> >> R.Version()
> > $platform
> > [1] "x86_64-w64-mingw32"
> >
> > $arch
> > [1] "x86_64"
> >
> > $os
> > [1] "mingw32"
> >
> > $system
> > [1] "x86_64, mingw32"
> >
> > $status
> > [1] ""
> >
> > $major
> > [1] "4"
> >
> > $minor
> > [1] "0.1"
> >
> > $year
> > [1] "2020"
> >
> > $month
> > [1] "06"
> >
> > $day
> > [1] "06"
> >
> > $`svn rev`
> > [1] "78648"
> >
> > $language
> > [1] "R"
> >
> > $version.string
> > [1] "R version 4.0.1 (2020-06-06)"
> >
> > $nickname
> > [1] "See Things Now"
> >
> > Thank you,
> > Juan
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

Tomas Kalibera
Hi Yihui,

list.files() returns file names converted to native encoding by Windows,
so one needs to use only characters representable in current native
encoding for file names. If one wants to be safe, it makes sense to be
much stricter than that (only ASCII, and only a subset of it, there is a
number of recommendations that can be found online). Using more than
that is asking for trouble.

Unicode "\u00e4" is a Latin-1 character, so representable in CP1252. On
my Windows running in CP1252 as C locale and system code page, your
example works fine, file.exists() returns TRUE, and this is the expected
behavior (tested in R-devel and R4.0).

Your example was run in CP1252 as C locale but CP936 as the system code
page (see the sessionInfo() output). On Windows, unfortunately, there
are two different "current locales" at a time. With your settings
(CP1252 as C locale and CP936 as system code page), I get the same
results as you, file.exists() returns FALSE. enc2native(z) works fine
and returns a valid Latin-1 string, but that is because here "native" is
CP1252. Windows API functions and consequently some C library functions
that return strings from the OS, however, convert to the encoding from
the system code page, which is CP936 and it cannot represent "ä". So,
currently the behavior you are reporting is expected for R 4.0 and
earlier. I don't think this is a regression, it couldn't have worked
before, either - and I've tested in 3.6.3 and 3.4.3 on my system.

These problems will go away when UTF-8 is both the current native
encoding for the C locale and the system code page. This is possible in
recent Windows 10, but requires UCRT and hence a new toolchain to build
R, and requires all packages and libraries to be rebuilt from source.
More details on my blog, also there is experimental build of R
(installer) and experimental toolchain available:
https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html

Best
Tomas


On 6/22/20 6:11 AM, Yihui Xie wrote:

> Hi Tomas,
>
> I received a report about R 4.0.0 in the knitr package
> (https://github.com/yihui/knitr/issues/1840), and I think it is
> related to the issue here. I created a minimal reproducible example
> below:
>
> owd = setwd(tempdir())
> z = 'K\u00e4sch.txt'
> file.create(z)
> list.files()
> file.exists(list.files())
> setwd(owd)
>
> Output:
>
>> owd = setwd(tempdir())
>> z = 'K\u00e4sch.txt'
>> file.create(z)
> [1] TRUE
>> list.files()
> [1] "K?sch.txt"
>> file.exists(list.files())
> [1] FALSE
>> setwd(owd)
> I wonder if it is expected that file.exists() returns FALSE here.
>
>> sessionInfo()
> R version 4.0.1 (2020-06-06)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
> system code page: 936
>
> FWIW, I also tested Chinese characters in the variable `z` above, and
> file.exists() returns TRUE only after I Sys.setlocale(, "Chinese").
>
> Regards,
> Yihui
>
> On Thu, Jun 11, 2020 at 3:11 AM Tomas Kalibera <[hidden email]> wrote:
>>
>> Dear Juan,
>>
>> I don't see what is the problem from your report. Please try to create a
>> minimal but complete reproducible example that does not use the renv
>> package. Perhaps you could use the R debugger (e.g. via
>> options(error=recover)) to find out what is the argument that
>> file.exists() has been called with. And then you could try just to call
>> file.exists() directly with that argument to trigger the problem.
>>
>> It may be that the argument has been corrupted/is invalid in the current
>> native encoding. If that is the case, the next step would be to find out
>> who corrupted it (renv, R, something else). The error is displayed when
>> a path name cannot be converted from the current native encoding to
>> UTF16-LE.
>>
>> The experimental support for UTF-8 as native encoding on Windows 10 is
>> only available in a custom build of R, like the one I linked from my
>> blog post.
>>
>> Thanks
>> Tomas
>>
>>
>>
>> On 6/10/20 1:06 PM, Juan Telleria Ruiz de Aguirre wrote:
>>> Dear R Developers,
>>>
>>> I am having an issue with the renv package and R 4.0.1, which I
>>> suspect is related to base R and not the renv package itself, as with
>>> R 3.6.3 such an "error" does not appear.
>>>
>>> The error is raised by a file.exists() path, and path
>>> "C:\Users\J-tel\Documents", which in R 3.6.3 is read correctly, but in
>>> R 4.0.1 fails (Probably because of the "-" symbol), and I suspect it
>>> might be related with the new UTF-8 usage on Windows 10?
>>> (https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html)
>>>
>>> I have also checked file.exists() function and its internals, and seem
>>> not to have happened changes in the meanwhile within them:
>>>
>>> https://github.com/wch/r-source/blob/0e3b3182f87a60af4b0293a5410dde680b910f49/src/library/base/R/files.R
>>> https://github.com/search?q=SEXP%20attribute_hidden%20do_fileexists+repo:wch/r-source&type=Code
>>>
>>> Error Details:
>>>
>>>> renv::init()
>>> Error in file.exists(children) :
>>>     file name conversion problem -- name too long?
>>>> traceback()
>>> 14: file.exists(children)
>>> 13: renv_dependencies_find_dir_children(path, root)
>>> 12: renv_dependencies_find_dir(path, root)
>>> 11: FUN(X[[i]], ...)
>>> 10: lapply(path, renv_dependencies_find_impl, root = root)
>>> 9: renv_dependencies_find(path, root)
>>> 8: (function (path = getwd(), root = NULL, ..., progress = TRUE,
>>>          errors = c("reported", "fatal", "ignored"), dev = FALSE)
>>>      {
>>>          path <- renv_path_normalize(path, winslash = "/", mustWork = TRUE)
>>>          root <- root %||% renv_dependencies_root(path)
>>>          if (exists(path, envir = `_renv_dependencies`))
>>>              return(get(path, envir = `_renv_dependencies`))
>>>          renv_dependencies_begin(root = root)
>>>          on.exit(renv_dependencies_end(), add = TRUE)
>>>          dots <- list(...)
>>>          if (identical(dots[["quiet"]], TRUE)) {
>>>              progress <- FALSE
>>>              errors <- "ignored"
>>>          }
>>>          files <- renv_dependencies_find(path, root)
>>>          deps <- renv_dependencies_discover(files, progress, errors)
>>>          renv_dependencies_report(errors)
>>>          deps
>>>      })(path, progress = FALSE, errors = errors, dev = TRUE)
>>> 7: eval(call, envir = parent.frame(2))
>>> 6: eval(call, envir = parent.frame(2))
>>> 5: delegate(renv_dependencies_impl)
>>> 4: dependencies(path, progress = FALSE, errors = errors, dev = TRUE)
>>> 3: withCallingHandlers(dependencies(path, progress = FALSE, errors = errors,
>>>          dev = TRUE), renv.dependencies.error =
>>> renv_dependencies_error_handler(message,
>>>          errors))
>>> 2: renv_dependencies_scope(project, action = "init")
>>> 1: renv::init()
>>>
>>>> renv::diagnostics()
>>> Diagnostics Report -- renv [0.10.0]
>>> ===================================
>>>
>>> # Session Info =======================
>>> R version 4.0.1 (2020-06-06)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>> Running under: Windows 10 x64 (build 18362)
>>>
>>> Matrix products: default
>>>
>>> locale:
>>> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
>>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
>>> [5] LC_TIME=Spanish_Spain.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] renv_0.10.0
>>>
>>> loaded via a namespace (and not attached):
>>>    [1] compiler_4.0.1   rsconnect_0.8.16 htmltools_0.4.0  tools_4.0.1
>>>    [5] yaml_2.2.1       Rcpp_1.0.4.6     rmarkdown_2.2    knitr_1.28
>>>    [9] xfun_0.14        digest_0.6.25    packrat_0.5.0    rlang_0.4.6
>>> [13] evaluate_0.14
>>>
>>> # Project ============================
>>> Project path: "~/Test2"
>>>
>>> # Status =============================
>>>
>>> # Lockfile ===========================
>>> This project has not yet been snapshotted: 'renv.lock' does not exist.
>>>
>>> # Library ============================
>>> The project library "~/Test2/renv/library/R-4.0/x86_64-w64-mingw32"
>>> does not exist.
>>>
>>> # Dependencies =======================
>>>
>>> # User Profile =======================
>>> [no user profile detected]
>>>
>>> # Settings ===========================
>>> List of 6
>>>    $ external.libraries       : chr(0)
>>>    $ ignored.packages         : chr(0)
>>>    $ package.dependency.fields: chr [1:3] "Imports" "Depends" "LinkingTo"
>>>    $ snapshot.type            : chr "implicit"
>>>    $ use.cache                : logi TRUE
>>>    $ vcs.ignore.library       : logi TRUE
>>>
>>> # Options ============================
>>> List of 1
>>>    $ renv.verbose: logi TRUE
>>>
>>> # Environment Variables ==============
>>> HOME        = C:\Users\J-tel\OneDrive\Documents
>>> LANG        = <NA>
>>> R_LIBS      = <NA>
>>> R_LIBS_SITE = <NA>
>>> R_LIBS_USER = C:/Users/J-tel/OneDrive/Documents/R/win-library/4.0
>>>
>>> # PATH ===============================
>>> - C:\rtools40\usr\bin
>>> - C:\Program Files\R\R-4.0.1\bin\x64
>>> - C:\ProgramData\Miniconda3
>>> - C:\ProgramData\Miniconda3\Library\mingw-w64\bin
>>> - C:\ProgramData\Miniconda3\Library\usr\bin
>>> - C:\ProgramData\Miniconda3\Library\bin
>>> - C:\ProgramData\Miniconda3\Scripts
>>> - C:\ProgramData\Oracle\Java\javapath
>>> - C:\WINDOWS\system32
>>> - C:\WINDOWS
>>> - C:\WINDOWS\System32\Wbem
>>> - C:\WINDOWS\System32\WindowsPowerShell\v1.0\
>>> - C:\WINDOWS\System32\OpenSSH\
>>> - C:\Program Files\MiKTeX 2.9\miktex\bin\x64\
>>> - C:\ProgramData\Miniconda3\Scripts\conda.exe
>>>
>>> # Cache ==============================
>>> There are a total of 0 package(s) installed in the renv cache.
>>> Cache path: "C:/Users/J-tel/AppData/Local/renv/cache/v5/R-4.0/x86_64-w64-mingw32"
>>>
>>> System Information:
>>>
>>>> R.Version()
>>> $platform
>>> [1] "x86_64-w64-mingw32"
>>>
>>> $arch
>>> [1] "x86_64"
>>>
>>> $os
>>> [1] "mingw32"
>>>
>>> $system
>>> [1] "x86_64, mingw32"
>>>
>>> $status
>>> [1] ""
>>>
>>> $major
>>> [1] "4"
>>>
>>> $minor
>>> [1] "0.1"
>>>
>>> $year
>>> [1] "2020"
>>>
>>> $month
>>> [1] "06"
>>>
>>> $day
>>> [1] "06"
>>>
>>> $`svn rev`
>>> [1] "78648"
>>>
>>> $language
>>> [1] "R"
>>>
>>> $version.string
>>> [1] "R version 4.0.1 (2020-06-06)"
>>>
>>> $nickname
>>> [1] "See Things Now"
>>>
>>> Thank you,
>>> Juan
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

Yihui Xie-2
Hi Tomas,

Sorry for the false alarm! I did some further testing, and you were
right. There was no regression. I suspected it was a regression
because the user who reported the issue said his code worked in R 3.6
but not 4.0. I should have tested it more carefully by myself. After I
tested it again with the German locale and Chinese locale,
respectively, I found that the code worked for both versions of R in
the German locale, and failed in the Chinese locale. Your explanation
makes perfect sense to me. I have also read your blog post when it
came out last month, and I'm really looking forward to the end of this
character encoding pain! Thank you very much for the hard work!

Regards,
Yihui
--
https://yihui.org

On Mon, Jun 22, 2020 at 3:37 AM Tomas Kalibera <[hidden email]> wrote:

>
> Hi Yihui,
>
> list.files() returns file names converted to native encoding by Windows,
> so one needs to use only characters representable in current native
> encoding for file names. If one wants to be safe, it makes sense to be
> much stricter than that (only ASCII, and only a subset of it, there is a
> number of recommendations that can be found online). Using more than
> that is asking for trouble.
>
> Unicode "\u00e4" is a Latin-1 character, so representable in CP1252. On
> my Windows running in CP1252 as C locale and system code page, your
> example works fine, file.exists() returns TRUE, and this is the expected
> behavior (tested in R-devel and R4.0).
>
> Your example was run in CP1252 as C locale but CP936 as the system code
> page (see the sessionInfo() output). On Windows, unfortunately, there
> are two different "current locales" at a time. With your settings
> (CP1252 as C locale and CP936 as system code page), I get the same
> results as you, file.exists() returns FALSE. enc2native(z) works fine
> and returns a valid Latin-1 string, but that is because here "native" is
> CP1252. Windows API functions and consequently some C library functions
> that return strings from the OS, however, convert to the encoding from
> the system code page, which is CP936 and it cannot represent "ä". So,
> currently the behavior you are reporting is expected for R 4.0 and
> earlier. I don't think this is a regression, it couldn't have worked
> before, either - and I've tested in 3.6.3 and 3.4.3 on my system.
>
> These problems will go away when UTF-8 is both the current native
> encoding for the C locale and the system code page. This is possible in
> recent Windows 10, but requires UCRT and hence a new toolchain to build
> R, and requires all packages and libraries to be rebuilt from source.
> More details on my blog, also there is experimental build of R
> (installer) and experimental toolchain available:
> https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html
>
> Best
> Tomas
>
>
> On 6/22/20 6:11 AM, Yihui Xie wrote:
> > Hi Tomas,
> >
> > I received a report about R 4.0.0 in the knitr package
> > (https://github.com/yihui/knitr/issues/1840), and I think it is
> > related to the issue here. I created a minimal reproducible example
> > below:
> >
> > owd = setwd(tempdir())
> > z = 'K\u00e4sch.txt'
> > file.create(z)
> > list.files()
> > file.exists(list.files())
> > setwd(owd)
> >
> > Output:
> >
> >> owd = setwd(tempdir())
> >> z = 'K\u00e4sch.txt'
> >> file.create(z)
> > [1] TRUE
> >> list.files()
> > [1] "K?sch.txt"
> >> file.exists(list.files())
> > [1] FALSE
> >> setwd(owd)
> > I wonder if it is expected that file.exists() returns FALSE here.
> >
> >> sessionInfo()
> > R version 4.0.1 (2020-06-06)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > Running under: Windows 7 x64 (build 7601) Service Pack 1
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> > system code page: 936
> >
> > FWIW, I also tested Chinese characters in the variable `z` above, and
> > file.exists() returns TRUE only after I Sys.setlocale(, "Chinese").
> >
> > Regards,
> > Yihui
> >
> > On Thu, Jun 11, 2020 at 3:11 AM Tomas Kalibera <[hidden email]> wrote:
> >>
> >> Dear Juan,
> >>
> >> I don't see what is the problem from your report. Please try to create a
> >> minimal but complete reproducible example that does not use the renv
> >> package. Perhaps you could use the R debugger (e.g. via
> >> options(error=recover)) to find out what is the argument that
> >> file.exists() has been called with. And then you could try just to call
> >> file.exists() directly with that argument to trigger the problem.
> >>
> >> It may be that the argument has been corrupted/is invalid in the current
> >> native encoding. If that is the case, the next step would be to find out
> >> who corrupted it (renv, R, something else). The error is displayed when
> >> a path name cannot be converted from the current native encoding to
> >> UTF16-LE.
> >>
> >> The experimental support for UTF-8 as native encoding on Windows 10 is
> >> only available in a custom build of R, like the one I linked from my
> >> blog post.
> >>
> >> Thanks
> >> Tomas
> >>
> >>
> >>
> >> On 6/10/20 1:06 PM, Juan Telleria Ruiz de Aguirre wrote:
> >>> Dear R Developers,
> >>>
> >>> I am having an issue with the renv package and R 4.0.1, which I
> >>> suspect is related to base R and not the renv package itself, as with
> >>> R 3.6.3 such an "error" does not appear.
> >>>
> >>> The error is raised by a file.exists() path, and path
> >>> "C:\Users\J-tel\Documents", which in R 3.6.3 is read correctly, but in
> >>> R 4.0.1 fails (Probably because of the "-" symbol), and I suspect it
> >>> might be related with the new UTF-8 usage on Windows 10?
> >>> (https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html)
> >>>
> >>> I have also checked file.exists() function and its internals, and seem
> >>> not to have happened changes in the meanwhile within them:
> >>>
> >>> https://github.com/wch/r-source/blob/0e3b3182f87a60af4b0293a5410dde680b910f49/src/library/base/R/files.R
> >>> https://github.com/search?q=SEXP%20attribute_hidden%20do_fileexists+repo:wch/r-source&type=Code
> >>>
> >>> Error Details:
> >>>
> >>>> renv::init()
> >>> Error in file.exists(children) :
> >>>     file name conversion problem -- name too long?
> >>>> traceback()
> >>> 14: file.exists(children)
> >>> 13: renv_dependencies_find_dir_children(path, root)
> >>> 12: renv_dependencies_find_dir(path, root)
> >>> 11: FUN(X[[i]], ...)
> >>> 10: lapply(path, renv_dependencies_find_impl, root = root)
> >>> 9: renv_dependencies_find(path, root)
> >>> 8: (function (path = getwd(), root = NULL, ..., progress = TRUE,
> >>>          errors = c("reported", "fatal", "ignored"), dev = FALSE)
> >>>      {
> >>>          path <- renv_path_normalize(path, winslash = "/", mustWork = TRUE)
> >>>          root <- root %||% renv_dependencies_root(path)
> >>>          if (exists(path, envir = `_renv_dependencies`))
> >>>              return(get(path, envir = `_renv_dependencies`))
> >>>          renv_dependencies_begin(root = root)
> >>>          on.exit(renv_dependencies_end(), add = TRUE)
> >>>          dots <- list(...)
> >>>          if (identical(dots[["quiet"]], TRUE)) {
> >>>              progress <- FALSE
> >>>              errors <- "ignored"
> >>>          }
> >>>          files <- renv_dependencies_find(path, root)
> >>>          deps <- renv_dependencies_discover(files, progress, errors)
> >>>          renv_dependencies_report(errors)
> >>>          deps
> >>>      })(path, progress = FALSE, errors = errors, dev = TRUE)
> >>> 7: eval(call, envir = parent.frame(2))
> >>> 6: eval(call, envir = parent.frame(2))
> >>> 5: delegate(renv_dependencies_impl)
> >>> 4: dependencies(path, progress = FALSE, errors = errors, dev = TRUE)
> >>> 3: withCallingHandlers(dependencies(path, progress = FALSE, errors = errors,
> >>>          dev = TRUE), renv.dependencies.error =
> >>> renv_dependencies_error_handler(message,
> >>>          errors))
> >>> 2: renv_dependencies_scope(project, action = "init")
> >>> 1: renv::init()
> >>>
> >>>> renv::diagnostics()
> >>> Diagnostics Report -- renv [0.10.0]
> >>> ===================================
> >>>
> >>> # Session Info =======================
> >>> R version 4.0.1 (2020-06-06)
> >>> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>> Running under: Windows 10 x64 (build 18362)
> >>>
> >>> Matrix products: default
> >>>
> >>> locale:
> >>> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
> >>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> >>> [5] LC_TIME=Spanish_Spain.1252
> >>>
> >>> attached base packages:
> >>> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>>
> >>> other attached packages:
> >>> [1] renv_0.10.0
> >>>
> >>> loaded via a namespace (and not attached):
> >>>    [1] compiler_4.0.1   rsconnect_0.8.16 htmltools_0.4.0  tools_4.0.1
> >>>    [5] yaml_2.2.1       Rcpp_1.0.4.6     rmarkdown_2.2    knitr_1.28
> >>>    [9] xfun_0.14        digest_0.6.25    packrat_0.5.0    rlang_0.4.6
> >>> [13] evaluate_0.14
> >>>
> >>> # Project ============================
> >>> Project path: "~/Test2"
> >>>
> >>> # Status =============================
> >>>
> >>> # Lockfile ===========================
> >>> This project has not yet been snapshotted: 'renv.lock' does not exist.
> >>>
> >>> # Library ============================
> >>> The project library "~/Test2/renv/library/R-4.0/x86_64-w64-mingw32"
> >>> does not exist.
> >>>
> >>> # Dependencies =======================
> >>>
> >>> # User Profile =======================
> >>> [no user profile detected]
> >>>
> >>> # Settings ===========================
> >>> List of 6
> >>>    $ external.libraries       : chr(0)
> >>>    $ ignored.packages         : chr(0)
> >>>    $ package.dependency.fields: chr [1:3] "Imports" "Depends" "LinkingTo"
> >>>    $ snapshot.type            : chr "implicit"
> >>>    $ use.cache                : logi TRUE
> >>>    $ vcs.ignore.library       : logi TRUE
> >>>
> >>> # Options ============================
> >>> List of 1
> >>>    $ renv.verbose: logi TRUE
> >>>
> >>> # Environment Variables ==============
> >>> HOME        = C:\Users\J-tel\OneDrive\Documents
> >>> LANG        = <NA>
> >>> R_LIBS      = <NA>
> >>> R_LIBS_SITE = <NA>
> >>> R_LIBS_USER = C:/Users/J-tel/OneDrive/Documents/R/win-library/4.0
> >>>
> >>> # PATH ===============================
> >>> - C:\rtools40\usr\bin
> >>> - C:\Program Files\R\R-4.0.1\bin\x64
> >>> - C:\ProgramData\Miniconda3
> >>> - C:\ProgramData\Miniconda3\Library\mingw-w64\bin
> >>> - C:\ProgramData\Miniconda3\Library\usr\bin
> >>> - C:\ProgramData\Miniconda3\Library\bin
> >>> - C:\ProgramData\Miniconda3\Scripts
> >>> - C:\ProgramData\Oracle\Java\javapath
> >>> - C:\WINDOWS\system32
> >>> - C:\WINDOWS
> >>> - C:\WINDOWS\System32\Wbem
> >>> - C:\WINDOWS\System32\WindowsPowerShell\v1.0\
> >>> - C:\WINDOWS\System32\OpenSSH\
> >>> - C:\Program Files\MiKTeX 2.9\miktex\bin\x64\
> >>> - C:\ProgramData\Miniconda3\Scripts\conda.exe
> >>>
> >>> # Cache ==============================
> >>> There are a total of 0 package(s) installed in the renv cache.
> >>> Cache path: "C:/Users/J-tel/AppData/Local/renv/cache/v5/R-4.0/x86_64-w64-mingw32"
> >>>
> >>> System Information:
> >>>
> >>>> R.Version()
> >>> $platform
> >>> [1] "x86_64-w64-mingw32"
> >>>
> >>> $arch
> >>> [1] "x86_64"
> >>>
> >>> $os
> >>> [1] "mingw32"
> >>>
> >>> $system
> >>> [1] "x86_64, mingw32"
> >>>
> >>> $status
> >>> [1] ""
> >>>
> >>> $major
> >>> [1] "4"
> >>>
> >>> $minor
> >>> [1] "0.1"
> >>>
> >>> $year
> >>> [1] "2020"
> >>>
> >>> $month
> >>> [1] "06"
> >>>
> >>> $day
> >>> [1] "06"
> >>>
> >>> $`svn rev`
> >>> [1] "78648"
> >>>
> >>> $language
> >>> [1] "R"
> >>>
> >>> $version.string
> >>> [1] "R version 4.0.1 (2020-06-06)"
> >>>
> >>> $nickname
> >>> [1] "See Things Now"
> >>>
> >>> Thank you,
> >>> Juan
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

`basename` and `dirname` change the encoding to "UTF-8"

Johannes Rauh
Dear R Developers,

I noticed that `basename` and `dirname` always return "UTF-8" on Windows (tested with R-4.0.0 and R-3.6.3):

> p <- "Föö/Bär"
> Encoding(p)
[1] "latin1"
> Encoding(dirname(p))
[1] "UTF-8"
> Encoding(basename(p))
[1] "UTF-8"

Is this on purpose?  At least I did not find any relevant comment in the documentation of `dirname`/`basename`.

Background: I'm currently struggeling with a directory name containing a latin1-character.  (I know that this is a bad idea, but I did not create the directory and I cannot rename it.)  I now want to pass a latin1-directory name to a function, which internally uses `tools::makeLazyLoadDB`.  At that point, internally, `dirname` is called, which changes the encoding, and things break.  If I use `debug` to halt the processing and "fix" the encoding, things work as expected.

So, if possible, I would prefer that `dirname` and `basename` preserve the encoding.

Best regards
Johannes

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: `basename` and `dirname` change the encoding to "UTF-8"

Duncan Murdoch-2
On 29/06/2020 10:39 a.m., Johannes Rauh wrote:

> Dear R Developers,
>
> I noticed that `basename` and `dirname` always return "UTF-8" on Windows (tested with R-4.0.0 and R-3.6.3):
>
>> p <- "Föö/Bär"
>> Encoding(p)
> [1] "latin1"
>> Encoding(dirname(p))
> [1] "UTF-8"
>> Encoding(basename(p))
> [1] "UTF-8"
>
> Is this on purpose?  At least I did not find any relevant comment in the documentation of `dirname`/`basename`.
>
> Background: I'm currently struggeling with a directory name containing a latin1-character.  (I know that this is a bad idea, but I did not create the directory and I cannot rename it.)  I now want to pass a latin1-directory name to a function, which internally uses `tools::makeLazyLoadDB`.  At that point, internally, `dirname` is called, which changes the encoding, and things break.  If I use `debug` to halt the processing and "fix" the encoding, things work as expected.
>
> So, if possible, I would prefer that `dirname` and `basename` preserve the encoding.

Actually, makeLazyLoadDB isn't exported from tools, so strictly speaking
you shouldn't be calling it.  Or perhaps you have a good reason to call
it, and should be asking for it to be exported, or you are calling a
published function which calls it:  in either case it should probably be
fixed to accept UTF-8.

But it doesn't call dirname or basename, so maybe the function that
calls it is the one that needs fixing.

In any case, while asking dirname() and basename() to preserve the
encoding sounds reasonable, it seems like it would just be covering up a
deeper problem.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: `basename` and `dirname` change the encoding to "UTF-8"

Kevin Ushey
Did you test with R 4.0.2 or R-devel? A bug related to this issue was
recently fixed:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17833

Best,
Kevin

On Mon, Jun 29, 2020 at 11:51 AM Duncan Murdoch
<[hidden email]> wrote:

>
> On 29/06/2020 10:39 a.m., Johannes Rauh wrote:
> > Dear R Developers,
> >
> > I noticed that `basename` and `dirname` always return "UTF-8" on Windows (tested with R-4.0.0 and R-3.6.3):
> >
> >> p <- "Föö/Bär"
> >> Encoding(p)
> > [1] "latin1"
> >> Encoding(dirname(p))
> > [1] "UTF-8"
> >> Encoding(basename(p))
> > [1] "UTF-8"
> >
> > Is this on purpose?  At least I did not find any relevant comment in the documentation of `dirname`/`basename`.
> >
> > Background: I'm currently struggeling with a directory name containing a latin1-character.  (I know that this is a bad idea, but I did not create the directory and I cannot rename it.)  I now want to pass a latin1-directory name to a function, which internally uses `tools::makeLazyLoadDB`.  At that point, internally, `dirname` is called, which changes the encoding, and things break.  If I use `debug` to halt the processing and "fix" the encoding, things work as expected.
> >
> > So, if possible, I would prefer that `dirname` and `basename` preserve the encoding.
>
> Actually, makeLazyLoadDB isn't exported from tools, so strictly speaking
> you shouldn't be calling it.  Or perhaps you have a good reason to call
> it, and should be asking for it to be exported, or you are calling a
> published function which calls it:  in either case it should probably be
> fixed to accept UTF-8.
>
> But it doesn't call dirname or basename, so maybe the function that
> calls it is the one that needs fixing.
>
> In any case, while asking dirname() and basename() to preserve the
> encoding sounds reasonable, it seems like it would just be covering up a
> deeper problem.
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: `basename` and `dirname` change the encoding to "UTF-8"

Tomas Kalibera
In reply to this post by Johannes Rauh
On 6/29/20 4:39 PM, Johannes Rauh wrote:

> Dear R Developers,
>
> I noticed that `basename` and `dirname` always return "UTF-8" on Windows (tested with R-4.0.0 and R-3.6.3):
>
>> p <- "Föö/Bär"
>> Encoding(p)
> [1] "latin1"
>> Encoding(dirname(p))
> [1] "UTF-8"
>> Encoding(basename(p))
> [1] "UTF-8"
>
> Is this on purpose?  At least I did not find any relevant comment in the documentation of `dirname`/`basename`.
> Background: I'm currently struggeling with a directory name containing a latin1-character.  (I know that this is a bad idea, but I did not create the directory and I cannot rename it.)  I now want to pass a latin1-directory name to a function, which internally uses `tools::makeLazyLoadDB`.  At that point, internally, `dirname` is called, which changes the encoding, and things break.  If I use `debug` to halt the processing and "fix" the encoding, things work as expected.
>
> So, if possible, I would prefer that `dirname` and `basename` preserve the encoding.

Please try to always submit a minimal reproducible example with your
reports and test with at least the latest released version of R, ideally
also with R-devel.

As you have not sent a reproducible example, it is hard to tell for
sure, but most likely as Kevin wrote you have run into a real bug, which
was however already fixed in 4.0.2 and in R-devel (17833). The lazy
loading cache did not work with file names in non-native encoding.

That real bug has been uncovered by legitimate and correct changes like
the ones you report, where file operations started returning non-ASCII
strings in UTF-8. Historically in R such functions would instead return
native strings with misrepresented characters, and we were reluctant to
change that expecting waking bugs in code silently assuming native
encoding. Still, as people were increasingly running into problems with
non-representable characters, we did that change in several functions
anyway, and yes, it started waking up bugs.

With some performance overhead and added complexity, we could be
returning preferentially results in native encoding, and in UTF-8 only
when they included non-representable characters. That would increase the
code complexity, increase performance overhead, but wake up existing
bugs with smaller probability.  Note - some code that relied previously
on best-fit conversions done by Windows will have been broken anyway. We
would have to bypass win_iconv/iconv for that (adding more complexity).
Bugs in code not handling encodings properly would still be triggered
via non-representable characters. I've recently changed file.path() in
R-devel to be slightly more conservative again, along these lines.

We can still do it more widely, but it is not high on the priority list.
The way to fix all of these problems is switching to UTF-8 as native
encoding on Windows and every day spent on tuning the existing behavior
postpones that real solution.

Best
Tomas


>
> Best regards
> Johannes
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: `basename` and `dirname` change the encoding to "UTF-8"

Johannes Rauh
Hello, everyone,

thank you for your quick and helpful responses and the detailed information.

Sorry for not providing a reproducible example for the (potential) bug in `tools::makeLazyLoadDB`.  The main point of my mail was the surprising behaviour of `basename` and `dirname`.  Fixing those functions would probably solve my problem for me (as a workaround, probably hiding some underlying problem, and likely leading to a failure for someone else fighting with encodings).

Concerning my underlying direct problem with `tools::makeLazyLoadDB`, I'm having difficulty to make my example reproducible.  I'm trying to use a directory with a non-ASCII-name for a knitr cache.  My R-4.0.0 here behaves different from my R-3.6.3, but when I filed a bug report with knitr, Yihui could not reproduce this difference (https://github.com/yihui/knitr/issues/1840).  So I'll try R-4.0.2 next, let's see what happens.

Cheers
Johannes

> Gesendet: Dienstag, 30. Juni 2020 um 09:25 Uhr
> Von: "Tomas Kalibera" <[hidden email]>
> An: "Johannes Rauh" <[hidden email]>, "r-devel" <[hidden email]>
> Betreff: Re: [Rd] `basename` and `dirname` change the encoding to "UTF-8"
>
> On 6/29/20 4:39 PM, Johannes Rauh wrote:
> > Dear R Developers,
> >
> > I noticed that `basename` and `dirname` always return "UTF-8" on Windows (tested with R-4.0.0 and R-3.6.3):
> >
> >> p <- "Föö/Bär"
> >> Encoding(p)
> > [1] "latin1"
> >> Encoding(dirname(p))
> > [1] "UTF-8"
> >> Encoding(basename(p))
> > [1] "UTF-8"
> >
> > Is this on purpose?  At least I did not find any relevant comment in the documentation of `dirname`/`basename`.
> > Background: I'm currently struggeling with a directory name containing a latin1-character.  (I know that this is a bad idea, but I did not create the directory and I cannot rename it.)  I now want to pass a latin1-directory name to a function, which internally uses `tools::makeLazyLoadDB`.  At that point, internally, `dirname` is called, which changes the encoding, and things break.  If I use `debug` to halt the processing and "fix" the encoding, things work as expected.
> >
> > So, if possible, I would prefer that `dirname` and `basename` preserve the encoding.
>
> Please try to always submit a minimal reproducible example with your
> reports and test with at least the latest released version of R, ideally
> also with R-devel.
>
> As you have not sent a reproducible example, it is hard to tell for
> sure, but most likely as Kevin wrote you have run into a real bug, which
> was however already fixed in 4.0.2 and in R-devel (17833). The lazy
> loading cache did not work with file names in non-native encoding.
>
> That real bug has been uncovered by legitimate and correct changes like
> the ones you report, where file operations started returning non-ASCII
> strings in UTF-8. Historically in R such functions would instead return
> native strings with misrepresented characters, and we were reluctant to
> change that expecting waking bugs in code silently assuming native
> encoding. Still, as people were increasingly running into problems with
> non-representable characters, we did that change in several functions
> anyway, and yes, it started waking up bugs.
>
> With some performance overhead and added complexity, we could be
> returning preferentially results in native encoding, and in UTF-8 only
> when they included non-representable characters. That would increase the
> code complexity, increase performance overhead, but wake up existing
> bugs with smaller probability.  Note - some code that relied previously
> on best-fit conversions done by Windows will have been broken anyway. We
> would have to bypass win_iconv/iconv for that (adding more complexity).
> Bugs in code not handling encodings properly would still be triggered
> via non-representable characters. I've recently changed file.path() in
> R-devel to be slightly more conservative again, along these lines.
>
> We can still do it more widely, but it is not high on the priority list.
> The way to fix all of these problems is switching to UTF-8 as native
> encoding on Windows and every day spent on tuning the existing behavior
> postpones that real solution.
>
> Best
> Tomas
>
>
> >
> > Best regards
> > Johannes
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel