I ran into a weird corner-case of list.files today and I'm wondering what
people think about it and a potential wishlist enhancement related to it.
Consider the case where we call list.files with recursive and include.dirs both
TRUE and we supply a pattern. In this case pattern is applied to directory
names when deciding whether to list the directory return value but NOT when
recursing. This behavior is consistent, but I'd argue its also
counterintuitive. If a directory is excluded for not matching pattern, I
wouldnt necessarily expect its children/contents to even be candidates for
inclusion at first blush.
If others agree this behavior is strange/suboptimal I figure there are a
few different things that can be done here (which I discuss below):
1. Modify behavior list.files(., include.dirs=TRUE, recursive=TRUE,
pattern=<>) so that
1. pattern is applied when deciding where to recurse.
2. all directories that (recursively) contain least one file (or
*possibly* empty leaf subdirectory) that matches pattern are
themselves included in the return value.
2. Add a recurse.pattern argument to list.files (and list.dirs probably)
that is used to filter directories recursed into (ignored when
recursive == FALSE) .
3. Modify the documentation of list.files so it mentions this
inconsistency so that at least this behavior is documented, even if its
(arguably) not ideal
Both *1.1* and *1.2* are breaking changes, though I suspect that setting
include.dirs and recursive both to TRUE, (or, in fact setting include.dirs
to TRUE and having a pattern) is probably relatively rare. *1.1* is a more
drastic change but in my opinion ultimately more intuitive than *1.2*
I think *2 *could be useful, though only if the pattern would actually be
the same at different steps of recursion often enough in practice
(sometimes but not always, I'd think). *2* would fully backwards
compatible (computing on formals lists not withstanding...) unless its
default was set to pattern when include.dirs is TRUE, in which case it
would be a disable-able implementation of *1.1*
I think *3* would be good to do if there's no appetite for doing anything
higher on the list.
I am happy to submit patches (as wishlist items , except for *3*) for any
of the above if there is interest.
td = file.path(tempdir(), "listfilestst")
dns = c("good", "bad" )
## no b files
list.files(td, recursive = TRUE, pattern = "^[^b]+$")
##  "bad/bad/goodfil" "bad/good/goodfil" "good/bad/goodfil"
##  "good/good/goodfil"
## no b files include.dirs=TRUE
## bad is not included but bad/good is (both are directories)
## bad/bad/goodfil is also included
list.files(td, recursive = TRUE, pattern = "^[^b]+$", include.dirs = TRUE)
##  "bad/bad/goodfil" "bad/good" "bad/good/goodfil"
##  "good" "good/bad/goodfil" "good/good"
##  "good/good/goodfil"