conflicted: an alternative conflict resolution strategy

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

conflicted: an alternative conflict resolution strategy

hadley wickham
Hi all,

I’d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(<https://github.com/r-lib/conflicted>), and looking for feedback.

As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:

-   People don’t read messages about conflicts. Even if you are
    conscientious and do read the messages, it’s hard to notice a single
    new conflict caused by a package upgrade.

-   The warning and the problem may be quite far apart. If you load all
    your packages at the top of the script, it may potentially be 100s
    of lines before you encounter a conflict.

-   The error messages caused by conflicts are cryptic because you end
    up calling a function with utterly unexpected arguments.

For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:

    library(conflicted)
    library(dplyr)
    library(MASS)

    select
    #> Error: [conflicted] `select` found in 2 packages.
    #> Either pick the one you want with `::`
    #> * MASS::select
    #> * dplyr::select
    #> Or declare a preference with `conflicted_prefer()`
    #> * conflict_prefer("select", "MASS")
    #> * conflict_prefer("select", "dplyr")

conflicted works by attaching a new “conflicted” environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).

conflicted also provides a `conflict_scout()` helper which you can use
to see what’s going on:

    conflict_scout(c("dplyr", "MASS"))
    #> 1 conflict:
    #> * `select`: dplyr, MASS

conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.

-   A number of packages provide a function that appears to conflict
    with a function in a base package, but they follow the superset
    principle (i.e. they only extend the API, as explained to me by
    Hervè Pages).

    conflicted assumes that packages adhere to the superset principle,
    which appears to be true in most of the cases that I’ve seen. For
    example, the lubridate package provides `as.difftime()` and `date()`
    which extend the behaviour of base functions, and provides S4
    generics for the set operators.

        conflict_scout(c("lubridate", "base"))
        #> 5 conflicts:
        #> * `as.difftime`: [lubridate]
        #> * `date`       : [lubridate]
        #> * `intersect`  : [lubridate]
        #> * `setdiff`    : [lubridate]
        #> * `union`      : [lubridate]

    There are two popular functions that don’t adhere to this principle:
    `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
    special cases so they correctly generate conflicts. (I sure wish I’d
    know about the subset principle when creating dplyr!)

        conflict_scout(c("dplyr", "stats"))
        #> 2 conflicts:
        #> * `filter`: dplyr, stats
        #> * `lag`   : dplyr, stats

-   Deprecated functions should never win a conflict, so conflicted
    checks for use of `.Deprecated()`. This rule is very useful when
    moving functions from one package to another. For example, many
    devtools functions were moved to usethis, and conflicted ensures
    that you always get the non-deprecated version, regardess of package
    attach order:

        head(conflict_scout(c("devtools", "usethis")))
        #> 26 conflicts:
        #> * `use_appveyor`       : [usethis]
        #> * `use_build_ignore`   : [usethis]
        #> * `use_code_of_conduct`: [usethis]
        #> * `use_coverage`       : [usethis]
        #> * `use_cran_badge`     : [usethis]
        #> * `use_cran_comments`  : [usethis]
        #> ...

Finally, as mentioned above, the user can declare preferences:

    conflict_prefer("select", "MASS")
    #> [conflicted] Will prefer MASS::select over any other package
    conflict_scout(c("dplyr", "MASS"))
    #> 1 conflict:
    #> * `select`: [MASS]

I’d love to hear what people think about the general idea, and if there
are any obviously missing pieces.

Thanks!

Hadley


--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

Duncan Murdoch-2
First, some general comments:

This sounds like a useful package.

I would guess it has very little impact on runtime efficiency except
when attaching a new package; have you checked that?

I am not so sure about your heuristics.  Can they be disabled, so the
user is always forced to make the choice?  Even when a function is
intended to adhere to the superset principle, they don't always get it
right, so a really careful user should always do explicit disambiguation.

And of course, if users wrote most of their long scripts as packages
instead of as long scripts, the ambiguity issue would arise far less
often, because namespaces in packages are intended to solve the same
problem as your package does.

One more comment inline about a typo, possibly in an error message.

Duncan Murdoch

On 23/08/2018 2:31 PM, Hadley Wickham wrote:

> Hi all,
>
> I’d love to get your feedback on the conflicted package, which provides an
> alternative strategy for resolving ambiugous function names (i.e. when
> multiple packages provide identically named functions). conflicted 0.1.0
> is already on CRAN, but I’m currently preparing a revision
> (<https://github.com/r-lib/conflicted>), and looking for feedback.
>
> As you are no doubt aware, R’s default approach means that the most
> recently loaded package “wins” any conflicts. You do get a message about
> conflicts on load, but I see a lot newer R users experiencing problems
> caused by function conflicts. I think there are three primary reasons:
>
> -   People don’t read messages about conflicts. Even if you are
>      conscientious and do read the messages, it’s hard to notice a single
>      new conflict caused by a package upgrade.
>
> -   The warning and the problem may be quite far apart. If you load all
>      your packages at the top of the script, it may potentially be 100s
>      of lines before you encounter a conflict.
>
> -   The error messages caused by conflicts are cryptic because you end
>      up calling a function with utterly unexpected arguments.
>
> For these reasons, conflicted takes an alternative approach, forcing the
> user to explicitly disambiguate any conflicts:
>
>      library(conflicted)
>      library(dplyr)
>      library(MASS)
>
>      select
>      #> Error: [conflicted] `select` found in 2 packages.
>      #> Either pick the one you want with `::`
>      #> * MASS::select
>      #> * dplyr::select
>      #> Or declare a preference with `conflicted_prefer()`
>      #> * conflict_prefer("select", "MASS")
>      #> * conflict_prefer("select", "dplyr")

I don't know if this is a typo in your r-devel message or a typo in the
error message, but you say `conflicted_prefer()` in one place and
conflict_prefer() in the other.

>
> conflicted works by attaching a new “conflicted” environment just after
> the global environment. This environment contains an active binding for
> any ambiguous bindings. The conflicted environment also contains
> bindings for `library()` and `require()` that rebuild the conflicted
> environemnt suppress default reporting (but are otherwise thin wrapeprs
> around the base equivalents).
>
> conflicted also provides a `conflict_scout()` helper which you can use
> to see what’s going on:
>
>      conflict_scout(c("dplyr", "MASS"))
>      #> 1 conflict:
>      #> * `select`: dplyr, MASS
>
> conflicted applies a few heuristics to minimise false positives (at the
> cost of introducing a few false negatives). The overarching goal is to
> ensure that code behaves identically regardless of the order in which
> packages are attached.
>
> -   A number of packages provide a function that appears to conflict
>      with a function in a base package, but they follow the superset
>      principle (i.e. they only extend the API, as explained to me by
>      Hervè Pages).
>
>      conflicted assumes that packages adhere to the superset principle,
>      which appears to be true in most of the cases that I’ve seen. For
>      example, the lubridate package provides `as.difftime()` and `date()`
>      which extend the behaviour of base functions, and provides S4
>      generics for the set operators.
>
>          conflict_scout(c("lubridate", "base"))
>          #> 5 conflicts:
>          #> * `as.difftime`: [lubridate]
>          #> * `date`       : [lubridate]
>          #> * `intersect`  : [lubridate]
>          #> * `setdiff`    : [lubridate]
>          #> * `union`      : [lubridate]
>
>      There are two popular functions that don’t adhere to this principle:
>      `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>      special cases so they correctly generate conflicts. (I sure wish I’d
>      know about the subset principle when creating dplyr!)
>
>          conflict_scout(c("dplyr", "stats"))
>          #> 2 conflicts:
>          #> * `filter`: dplyr, stats
>          #> * `lag`   : dplyr, stats
>
> -   Deprecated functions should never win a conflict, so conflicted
>      checks for use of `.Deprecated()`. This rule is very useful when
>      moving functions from one package to another. For example, many
>      devtools functions were moved to usethis, and conflicted ensures
>      that you always get the non-deprecated version, regardess of package
>      attach order:
>
>          head(conflict_scout(c("devtools", "usethis")))
>          #> 26 conflicts:
>          #> * `use_appveyor`       : [usethis]
>          #> * `use_build_ignore`   : [usethis]
>          #> * `use_code_of_conduct`: [usethis]
>          #> * `use_coverage`       : [usethis]
>          #> * `use_cran_badge`     : [usethis]
>          #> * `use_cran_comments`  : [usethis]
>          #> ...
>
> Finally, as mentioned above, the user can declare preferences:
>
>      conflict_prefer("select", "MASS")
>      #> [conflicted] Will prefer MASS::select over any other package
>      conflict_scout(c("dplyr", "MASS"))
>      #> 1 conflict:
>      #> * `select`: [MASS]
>
> I’d love to hear what people think about the general idea, and if there
> are any obviously missing pieces.
>
> Thanks!
>
> Hadley
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

Jari Oksanen
If you have to load two packages which both export the same name in their namespaces, namespace does not help in resolving which synonymous function to use. Neither does it help to have a package instead of a script as long as you end up loading two namespaces with name conflicts. The order of importing namespaces can also be difficult to control, because you may end up loading a namespace already when you start your R with a saved workspace. Moving a function to another package may be a transitional issue which disappears when both packages are at their final stages, but if you use the recommend deprecation stage, the same names can live together for a long time. So this package is a good idea, and preferably base R should be able to handle the issue of choosing between exported synonymous functions.

This has bitten me several times in package development, and with growing CRAN it is a growing problem. Package authors often have poor control of the issue, as they do not know what packages users use. Now we can only have a FAQ that tells that a certain error message does not come from a function in our package, but from some other package having a synonymous function that was used instead.

cheers, Jari Oksanen

On 23 Aug 2018, at 23:46 pm, Duncan Murdoch <[hidden email]<mailto:[hidden email]>> wrote:

First, some general comments:

This sounds like a useful package.

I would guess it has very little impact on runtime efficiency except when attaching a new package; have you checked that?

I am not so sure about your heuristics.  Can they be disabled, so the user is always forced to make the choice?  Even when a function is intended to adhere to the superset principle, they don't always get it right, so a really careful user should always do explicit disambiguation.

And of course, if users wrote most of their long scripts as packages instead of as long scripts, the ambiguity issue would arise far less often, because namespaces in packages are intended to solve the same problem as your package does.

One more comment inline about a typo, possibly in an error message.

Duncan Murdoch

On 23/08/2018 2:31 PM, Hadley Wickham wrote:
Hi all,
I’d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(<https://github.com/r-lib/conflicted>), and looking for feedback.
As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:
-   People don’t read messages about conflicts. Even if you are
    conscientious and do read the messages, it’s hard to notice a single
    new conflict caused by a package upgrade.
-   The warning and the problem may be quite far apart. If you load all
    your packages at the top of the script, it may potentially be 100s
    of lines before you encounter a conflict.
-   The error messages caused by conflicts are cryptic because you end
    up calling a function with utterly unexpected arguments.
For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:
    library(conflicted)
    library(dplyr)
    library(MASS)
    select
    #> Error: [conflicted] `select` found in 2 packages.
    #> Either pick the one you want with `::`
    #> * MASS::select
    #> * dplyr::select
    #> Or declare a preference with `conflicted_prefer()`
    #> * conflict_prefer("select", "MASS")
    #> * conflict_prefer("select", "dplyr")

I don't know if this is a typo in your r-devel message or a typo in the error message, but you say `conflicted_prefer()` in one place and conflict_prefer() in the other.

conflicted works by attaching a new “conflicted” environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).
conflicted also provides a `conflict_scout()` helper which you can use
to see what’s going on:
    conflict_scout(c("dplyr", "MASS"))
    #> 1 conflict:
    #> * `select`: dplyr, MASS
conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.
-   A number of packages provide a function that appears to conflict
    with a function in a base package, but they follow the superset
    principle (i.e. they only extend the API, as explained to me by
    Hervè Pages).
    conflicted assumes that packages adhere to the superset principle,
    which appears to be true in most of the cases that I’ve seen. For
    example, the lubridate package provides `as.difftime()` and `date()`
    which extend the behaviour of base functions, and provides S4
    generics for the set operators.
        conflict_scout(c("lubridate", "base"))
        #> 5 conflicts:
        #> * `as.difftime`: [lubridate]
        #> * `date`       : [lubridate]
        #> * `intersect`  : [lubridate]
        #> * `setdiff`    : [lubridate]
        #> * `union`      : [lubridate]
    There are two popular functions that don’t adhere to this principle:
    `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
    special cases so they correctly generate conflicts. (I sure wish I’d
    know about the subset principle when creating dplyr!)
        conflict_scout(c("dplyr", "stats"))
        #> 2 conflicts:
        #> * `filter`: dplyr, stats
        #> * `lag`   : dplyr, stats
-   Deprecated functions should never win a conflict, so conflicted
    checks for use of `.Deprecated()`. This rule is very useful when
    moving functions from one package to another. For example, many
    devtools functions were moved to usethis, and conflicted ensures
    that you always get the non-deprecated version, regardess of package
    attach order:
        head(conflict_scout(c("devtools", "usethis")))
        #> 26 conflicts:
        #> * `use_appveyor`       : [usethis]
        #> * `use_build_ignore`   : [usethis]
        #> * `use_code_of_conduct`: [usethis]
        #> * `use_coverage`       : [usethis]
        #> * `use_cran_badge`     : [usethis]
        #> * `use_cran_comments`  : [usethis]
        #> ...
Finally, as mentioned above, the user can declare preferences:
    conflict_prefer("select", "MASS")
    #> [conflicted] Will prefer MASS::select over any other package
    conflict_scout(c("dplyr", "MASS"))
    #> 1 conflict:
    #> * `select`: [MASS]
I’d love to hear what people think about the general idea, and if there
are any obviously missing pieces.
Thanks!
Hadley


______________________________________________
[hidden email]<mailto:[hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

Joris FA Meys
In reply to this post by hadley wickham
Dear Hadley,

There's been some mails from you lately about packages on R-devel. I would
argue that the appropriate list for that is R-pkg-devel, as I've been told
myself not too long ago. People might get confused and think this is about
a change to R itself, which it obviously is not.

Kind regards
Joris

On Thu, Aug 23, 2018 at 8:32 PM Hadley Wickham <[hidden email]> wrote:

> Hi all,
>
> I’d love to get your feedback on the conflicted package, which provides an
> alternative strategy for resolving ambiugous function names (i.e. when
> multiple packages provide identically named functions). conflicted 0.1.0
> is already on CRAN, but I’m currently preparing a revision
> (<https://github.com/r-lib/conflicted>), and looking for feedback.
>
> As you are no doubt aware, R’s default approach means that the most
> recently loaded package “wins” any conflicts. You do get a message about
> conflicts on load, but I see a lot newer R users experiencing problems
> caused by function conflicts. I think there are three primary reasons:
>
> -   People don’t read messages about conflicts. Even if you are
>     conscientious and do read the messages, it’s hard to notice a single
>     new conflict caused by a package upgrade.
>
> -   The warning and the problem may be quite far apart. If you load all
>     your packages at the top of the script, it may potentially be 100s
>     of lines before you encounter a conflict.
>
> -   The error messages caused by conflicts are cryptic because you end
>     up calling a function with utterly unexpected arguments.
>
> For these reasons, conflicted takes an alternative approach, forcing the
> user to explicitly disambiguate any conflicts:
>
>     library(conflicted)
>     library(dplyr)
>     library(MASS)
>
>     select
>     #> Error: [conflicted] `select` found in 2 packages.
>     #> Either pick the one you want with `::`
>     #> * MASS::select
>     #> * dplyr::select
>     #> Or declare a preference with `conflicted_prefer()`
>     #> * conflict_prefer("select", "MASS")
>     #> * conflict_prefer("select", "dplyr")
>
> conflicted works by attaching a new “conflicted” environment just after
> the global environment. This environment contains an active binding for
> any ambiguous bindings. The conflicted environment also contains
> bindings for `library()` and `require()` that rebuild the conflicted
> environemnt suppress default reporting (but are otherwise thin wrapeprs
> around the base equivalents).
>
> conflicted also provides a `conflict_scout()` helper which you can use
> to see what’s going on:
>
>     conflict_scout(c("dplyr", "MASS"))
>     #> 1 conflict:
>     #> * `select`: dplyr, MASS
>
> conflicted applies a few heuristics to minimise false positives (at the
> cost of introducing a few false negatives). The overarching goal is to
> ensure that code behaves identically regardless of the order in which
> packages are attached.
>
> -   A number of packages provide a function that appears to conflict
>     with a function in a base package, but they follow the superset
>     principle (i.e. they only extend the API, as explained to me by
>     Hervè Pages).
>
>     conflicted assumes that packages adhere to the superset principle,
>     which appears to be true in most of the cases that I’ve seen. For
>     example, the lubridate package provides `as.difftime()` and `date()`
>     which extend the behaviour of base functions, and provides S4
>     generics for the set operators.
>
>         conflict_scout(c("lubridate", "base"))
>         #> 5 conflicts:
>         #> * `as.difftime`: [lubridate]
>         #> * `date`       : [lubridate]
>         #> * `intersect`  : [lubridate]
>         #> * `setdiff`    : [lubridate]
>         #> * `union`      : [lubridate]
>
>     There are two popular functions that don’t adhere to this principle:
>     `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>     special cases so they correctly generate conflicts. (I sure wish I’d
>     know about the subset principle when creating dplyr!)
>
>         conflict_scout(c("dplyr", "stats"))
>         #> 2 conflicts:
>         #> * `filter`: dplyr, stats
>         #> * `lag`   : dplyr, stats
>
> -   Deprecated functions should never win a conflict, so conflicted
>     checks for use of `.Deprecated()`. This rule is very useful when
>     moving functions from one package to another. For example, many
>     devtools functions were moved to usethis, and conflicted ensures
>     that you always get the non-deprecated version, regardess of package
>     attach order:
>
>         head(conflict_scout(c("devtools", "usethis")))
>         #> 26 conflicts:
>         #> * `use_appveyor`       : [usethis]
>         #> * `use_build_ignore`   : [usethis]
>         #> * `use_code_of_conduct`: [usethis]
>         #> * `use_coverage`       : [usethis]
>         #> * `use_cran_badge`     : [usethis]
>         #> * `use_cran_comments`  : [usethis]
>         #> ...
>
> Finally, as mentioned above, the user can declare preferences:
>
>     conflict_prefer("select", "MASS")
>     #> [conflicted] Will prefer MASS::select over any other package
>     conflict_scout(c("dplyr", "MASS"))
>     #> 1 conflict:
>     #> * `select`: [MASS]
>
> I’d love to hear what people think about the general idea, and if there
> are any obviously missing pieces.
>
> Thanks!
>
> Hadley
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

hadley wickham
In reply to this post by Duncan Murdoch-2
On Thu, Aug 23, 2018 at 3:46 PM Duncan Murdoch <[hidden email]> wrote:
>
> First, some general comments:
>
> This sounds like a useful package.
>
> I would guess it has very little impact on runtime efficiency except
> when attaching a new package; have you checked that?

It adds one extra element to the search path, so the impact on speed
should be equivalent to loading one additional package (i.e.
negligible)

I've also done some benchmarking to see the impact on calls to
library(). These are now a little outdated (because I've added more
heuristics so I should re-do), but previously conflicted added about
100 ms overhead to a library() call when I had ~170 packages loaded
(the most I could load without running out of dlls).

> I am not so sure about your heuristics.  Can they be disabled, so the
> user is always forced to make the choice?  Even when a function is
> intended to adhere to the superset principle, they don't always get it
> right, so a really careful user should always do explicit disambiguation.

That is a good question - my intuition is always to start with less
user control as it makes it easier to get the core ideas right, and
it's easy to add more control later (whereas if you later take it
away, people get unhappy). Maybe it's natural to have a function that
does the opposite of conflict_prefer(), and declare that something
that doesn't appear to be a conflict actually is?

I don't think that an option to suppress the superset principle
altogether will work - my sense is that it will generate too many
false positives, to the point where you'll get frustrated and stop
using conflicted.

> And of course, if users wrote most of their long scripts as packages
> instead of as long scripts, the ambiguity issue would arise far less
> often, because namespaces in packages are intended to solve the same
> problem as your package does.

Agreed.

> One more comment inline about a typo, possibly in an error message.

Thanks for spotting; fixed in devel now.

Hadley


--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

hadley wickham
In reply to this post by Joris FA Meys
On Fri, Aug 24, 2018 at 4:28 AM Joris Meys <[hidden email]> wrote:
>
> Dear Hadley,
>
> There's been some mails from you lately about packages on R-devel. I would argue that the appropriate list for that is R-pkg-devel, as I've been told myself not too long ago. People might get confused and think this is about a change to R itself, which it obviously is not.

The description for R-pkg-devel states:

> This list is to get help about package development in R. The goal of the list is to provide a forum for learning about the package development process. We hope to build a community of R package developers who can help each other solve problems, and reduce some of the burden on the CRAN maintainers. If you are having problems developing a package or passing R CMD check, this is the place to ask!

The description for R-devel states:

> This list is intended for questions and discussion about code development in R. Questions likely to prompt discussion unintelligible to non-programmers or topics that are too technical for R-help's audience should go to R-devel, unless they are specifically about problems in R package development where the R-package-devel list is rather appropriate, see the posting guide section. The main R mailing list is R-help.

My questions are not about how to develop a package, R CMD check, or
how to get it on CRAN, but instead about the semantics of the packages
I am working on. My opinion is supported by the fact that a number of
members of the R core team have responded (both on list and off) and
have not expressed concern about my choice of venue.

That said, I am happy to change venues (or simply not email at all) if
there is widespread concern that my emails are inappropriate.

Hadley

--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

Joris FA Meys
On Fri, Aug 24, 2018 at 2:27 PM Hadley Wickham <[hidden email]> wrote:

>
> My questions are not about how to develop a package, R CMD check, or
> how to get it on CRAN, but instead about the semantics of the packages
> I am working on. My opinion is supported by the fact that a number of
> members of the R core team have responded (both on list and off) and
> have not expressed concern about my choice of venue.
>

If those moderating the lists are fine with it, all good.

Cheers
Joris


> That said, I am happy to change venues (or simply not email at all) if
> there is widespread concern that my emails are inappropriate.
>
> Hadley
>
> --
> http://hadley.nz
>


--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

Duncan Murdoch-2
In reply to this post by Jari Oksanen
On 24/08/2018 3:12 AM, Jari Oksanen wrote:
> If you have to load two packages which both export the same name in
> their namespaces, namespace does not help in resolving which synonymous
> function to use. Neither does it help to have a package instead of a
> script as long as you end up loading two namespaces with name conflicts.

You can't import the same name from two packages without getting an
error message (at least when checking --as-cran, I'm not sure about
vanilla checks), so this is already handled.

If you really only want one of the imports, then importing individual
functions is the solution.  Don't import everything from the package.
This is a good idea in any case.

If you want both of the imports, then there's the undocumented (?)
ability to rename a function on import, as well as the documented
possibility of using :: for one of them instead of importing it.

> The order of importing namespaces can also be difficult to control,
> because you may end up loading a namespace already when you start your R
> with a saved workspace.

That doesn't make sense in the context of a package.  Packages import
what they ask to import. The user's workspace is irrelevant to code
within the package if it does its imports properly.  You can reference
functions that are not imported, but you get a message when you run
checks to tell you not to do that.

Duncan Murdoch

  Moving a function to another package may be a

> transitional issue which disappears when both packages are at their
> final stages, but if you use the recommend deprecation stage, the same
> names can live together for a long time. So this package is a good idea,
> and preferably base R should be able to handle the issue of choosing
> between exported synonymous functions.
>
> This has bitten me several times in package development, and with
> growing CRAN it is a growing problem. Package authors often have poor
> control of the issue, as they do not know what packages users use. Now
> we can only have a FAQ that tells that a certain error message does not
> come from a function in our package, but from some other package having
> a synonymous function that was used instead.
>
> cheers, Jari Oksanen
>
>> On 23 Aug 2018, at 23:46 pm, Duncan Murdoch <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> First, some general comments:
>>
>> This sounds like a useful package.
>>
>> I would guess it has very little impact on runtime efficiency except
>> when attaching a new package; have you checked that?
>>
>> I am not so sure about your heuristics.  Can they be disabled, so the
>> user is always forced to make the choice?  Even when a function is
>> intended to adhere to the superset principle, they don't always get it
>> right, so a really careful user should always do explicit disambiguation.
>>
>> And of course, if users wrote most of their long scripts as packages
>> instead of as long scripts, the ambiguity issue would arise far less
>> often, because namespaces in packages are intended to solve the same
>> problem as your package does.
>>
>> One more comment inline about a typo, possibly in an error message.
>>
>> Duncan Murdoch
>>
>> On 23/08/2018 2:31 PM, Hadley Wickham wrote:
>>> Hi all,
>>> I’d love to get your feedback on the conflicted package, which
>>> provides an
>>> alternative strategy for resolving ambiugous function names (i.e. when
>>> multiple packages provide identically named functions). conflicted 0.1.0
>>> is already on CRAN, but I’m currently preparing a revision
>>> (<https://github.com/r-lib/conflicted>), and looking for feedback.
>>> As you are no doubt aware, R’s default approach means that the most
>>> recently loaded package “wins” any conflicts. You do get a message about
>>> conflicts on load, but I see a lot newer R users experiencing problems
>>> caused by function conflicts. I think there are three primary reasons:
>>> -   People don’t read messages about conflicts. Even if you are
>>>     conscientious and do read the messages, it’s hard to notice a single
>>>     new conflict caused by a package upgrade.
>>> -   The warning and the problem may be quite far apart. If you load all
>>>     your packages at the top of the script, it may potentially be 100s
>>>     of lines before you encounter a conflict.
>>> -   The error messages caused by conflicts are cryptic because you end
>>>     up calling a function with utterly unexpected arguments.
>>> For these reasons, conflicted takes an alternative approach, forcing the
>>> user to explicitly disambiguate any conflicts:
>>>     library(conflicted)
>>>     library(dplyr)
>>>     library(MASS)
>>>     select
>>>     #> Error: [conflicted] `select` found in 2 packages.
>>>     #> Either pick the one you want with `::`
>>>     #> * MASS::select
>>>     #> * dplyr::select
>>>     #> Or declare a preference with `conflicted_prefer()`
>>>     #> * conflict_prefer("select", "MASS")
>>>     #> * conflict_prefer("select", "dplyr")
>>
>> I don't know if this is a typo in your r-devel message or a typo in
>> the error message, but you say `conflicted_prefer()` in one place and
>> conflict_prefer() in the other.
>>
>>> conflicted works by attaching a new “conflicted” environment just after
>>> the global environment. This environment contains an active binding for
>>> any ambiguous bindings. The conflicted environment also contains
>>> bindings for `library()` and `require()` that rebuild the conflicted
>>> environemnt suppress default reporting (but are otherwise thin wrapeprs
>>> around the base equivalents).
>>> conflicted also provides a `conflict_scout()` helper which you can use
>>> to see what’s going on:
>>>     conflict_scout(c("dplyr", "MASS"))
>>>     #> 1 conflict:
>>>     #> * `select`: dplyr, MASS
>>> conflicted applies a few heuristics to minimise false positives (at the
>>> cost of introducing a few false negatives). The overarching goal is to
>>> ensure that code behaves identically regardless of the order in which
>>> packages are attached.
>>> -   A number of packages provide a function that appears to conflict
>>>     with a function in a base package, but they follow the superset
>>>     principle (i.e. they only extend the API, as explained to me by
>>>     Hervè Pages).
>>>     conflicted assumes that packages adhere to the superset principle,
>>>     which appears to be true in most of the cases that I’ve seen. For
>>>     example, the lubridate package provides `as.difftime()` and `date()`
>>>     which extend the behaviour of base functions, and provides S4
>>>     generics for the set operators.
>>>         conflict_scout(c("lubridate", "base"))
>>>         #> 5 conflicts:
>>>         #> * `as.difftime`: [lubridate]
>>>         #> * `date`       : [lubridate]
>>>         #> * `intersect`  : [lubridate]
>>>         #> * `setdiff`    : [lubridate]
>>>         #> * `union`      : [lubridate]
>>>     There are two popular functions that don’t adhere to this principle:
>>>     `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>>>     special cases so they correctly generate conflicts. (I sure wish I’d
>>>     know about the subset principle when creating dplyr!)
>>>         conflict_scout(c("dplyr", "stats"))
>>>         #> 2 conflicts:
>>>         #> * `filter`: dplyr, stats
>>>         #> * `lag`   : dplyr, stats
>>> -   Deprecated functions should never win a conflict, so conflicted
>>>     checks for use of `.Deprecated()`. This rule is very useful when
>>>     moving functions from one package to another. For example, many
>>>     devtools functions were moved to usethis, and conflicted ensures
>>>     that you always get the non-deprecated version, regardess of package
>>>     attach order:
>>>         head(conflict_scout(c("devtools", "usethis")))
>>>         #> 26 conflicts:
>>>         #> * `use_appveyor`       : [usethis]
>>>         #> * `use_build_ignore`   : [usethis]
>>>         #> * `use_code_of_conduct`: [usethis]
>>>         #> * `use_coverage`       : [usethis]
>>>         #> * `use_cran_badge`     : [usethis]
>>>         #> * `use_cran_comments`  : [usethis]
>>>         #> ...
>>> Finally, as mentioned above, the user can declare preferences:
>>>     conflict_prefer("select", "MASS")
>>>     #> [conflicted] Will prefer MASS::select over any other package
>>>     conflict_scout(c("dplyr", "MASS"))
>>>     #> 1 conflict:
>>>     #> * `select`: [MASS]
>>> I’d love to hear what people think about the general idea, and if there
>>> are any obviously missing pieces.
>>> Thanks!
>>> Hadley
>>>
>>
>> ______________________________________________
>> [hidden email] <mailto:[hidden email]> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

Gabe Becker
In reply to this post by hadley wickham
Hadley,

Overall seems like a cool and potentially really idea. I do have some
thoughts/feedback, which I've put in-line below

On Thu, Aug 23, 2018 at 11:31 AM, Hadley Wickham <[hidden email]>
wrote:

>
> <snip>
>

> conflicted applies a few heuristics to minimise false positives (at the
> cost of introducing a few false negatives). The overarching goal is to
> ensure that code behaves identically regardless of the order in which
> packages are attached.
>
> -   A number of packages provide a function that appears to conflict
>     with a function in a base package, but they follow the superset
>     principle (i.e. they only extend the API, as explained to me by
>     Hervè Pages).
>
>     conflicted assumes that packages adhere to the superset principle,
>     which appears to be true in most of the cases that I’ve seen.


It seems that you may be able to strengthen this heuristic from a blanket
assumption to something more narrowly targeted by looking for one or more
of the following to confirm likely-superset adherence

   1. matching or purely extending formals (ie all the named arguments of
   base::fun match including order, and there are new arguments in pkg::fun
   only if base::fun takes ...)
   2. explicit call to  base::fun in the body of pkg::fun
   3. UseMethod(funname) and at least one provided S3 method calls base::fun
   4. S4 generic creation using fun or base::fun as the seeding/default
   method body or called from at least one method



> For
>     example, the lubridate package provides `as.difftime()` and `date()`
>     which extend the behaviour of base functions, and provides S4
>     generics for the set operators.
>
>         conflict_scout(c("lubridate", "base"))
>         #> 5 conflicts:
>         #> * `as.difftime`: [lubridate]
>         #> * `date`       : [lubridate]
>         #> * `intersect`  : [lubridate]
>         #> * `setdiff`    : [lubridate]
>         #> * `union`      : [lubridate]
>
>     There are two popular functions that don’t adhere to this principle:
>     `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>     special cases so they correctly generate conflicts. (I sure wish I’d
>     know about the subset principle when creating dplyr!)
>
>         conflict_scout(c("dplyr", "stats"))
>         #> 2 conflicts:
>         #> * `filter`: dplyr, stats
>         #> * `lag`   : dplyr, stats
>
> -   Deprecated functions should never win a conflict, so conflicted
>     checks for use of `.Deprecated()`. This rule is very useful when
>     moving functions from one package to another. For example, many
>     devtools functions were moved to usethis, and conflicted ensures
>     that you always get the non-deprecated version, regardess of package
>     attach order:
>

I would completely believe this rule is useful for refactoring as you
describe, but that is the "same function" case. For an end-user in the
"different function same symbol" case it's not at all clear to me that the
deprecated function should always win.

People sometimes use deprecated functions. It's not great, and eventually
they'll need to fix that for any given case, but imagine if you deprecated
the filter verb in dplyr (I know this will never happen, but I think it's
illustrative none the less).

Consider a piece of code someone wrote before this hypothetical deprecation
of filter. The fact that it's now deprecated certainly doesn't mean that
they secretly wanted stats::filter all along, right? Conflicted acting as
if it does will lead to them getting the exact kind of error you're looking
to protect them from, and with even less ability to understand why because
they are already doing "The right thing" to protect themselves by using
conflicted in the first place...


> Finally, as mentioned above, the user can declare preferences:
>
>     conflict_prefer("select", "MASS")
>     #> [conflicted] Will prefer MASS::select over any other package
>     conflict_scout(c("dplyr", "MASS"))
>     #> 1 conflict:
>     #> * `select`: [MASS]
>
>
I deeply worry about people putting this kind of thing, or even just
library(conflicted), in their .Rprofile and thus making their scripts
*substantially* less reproducible. Is that a consequence you have thought
about to this kind of functionality?

Best,
~G


> I’d love to hear what people think about the general idea, and if there
> are any obviously missing pieces.
>
> Thanks!
>
> Hadley
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
Best,
~G

--
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: conflicted: an alternative conflict resolution strategy

hadley wickham
>> conflicted applies a few heuristics to minimise false positives (at the
>> cost of introducing a few false negatives). The overarching goal is to
>> ensure that code behaves identically regardless of the order in which
>> packages are attached.
>>
>> -   A number of packages provide a function that appears to conflict
>>     with a function in a base package, but they follow the superset
>>     principle (i.e. they only extend the API, as explained to me by
>>     Hervè Pages).
>>
>>     conflicted assumes that packages adhere to the superset principle,
>>     which appears to be true in most of the cases that I’ve seen.
>
>
> It seems that you may be able to strengthen this heuristic from a blanket assumption to something more narrowly targeted by looking for one or more of the following to confirm likely-superset adherence
>
> matching or purely extending formals (ie all the named arguments of base::fun match including order, and there are new arguments in pkg::fun only if base::fun takes ...)
> explicit call to  base::fun in the body of pkg::fun
> UseMethod(funname) and at least one provided S3 method calls base::fun
> S4 generic creation using fun or base::fun as the seeding/default method body or called from at least one method

Oooh nice, idea I'll definitely try it out.

>> For
>>     example, the lubridate package provides `as.difftime()` and `date()`
>>     which extend the behaviour of base functions, and provides S4
>>     generics for the set operators.
>>
>>         conflict_scout(c("lubridate", "base"))
>>         #> 5 conflicts:
>>         #> * `as.difftime`: [lubridate]
>>         #> * `date`       : [lubridate]
>>         #> * `intersect`  : [lubridate]
>>         #> * `setdiff`    : [lubridate]
>>         #> * `union`      : [lubridate]
>>
>>     There are two popular functions that don’t adhere to this principle:
>>     `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>>     special cases so they correctly generate conflicts. (I sure wish I’d
>>     know about the subset principle when creating dplyr!)
>>
>>         conflict_scout(c("dplyr", "stats"))
>>         #> 2 conflicts:
>>         #> * `filter`: dplyr, stats
>>         #> * `lag`   : dplyr, stats
>>
>> -   Deprecated functions should never win a conflict, so conflicted
>>     checks for use of `.Deprecated()`. This rule is very useful when
>>     moving functions from one package to another. For example, many
>>     devtools functions were moved to usethis, and conflicted ensures
>>     that you always get the non-deprecated version, regardess of package
>>     attach order:
>
>
> I would completely believe this rule is useful for refactoring as you describe, but that is the "same function" case. For an end-user in the "different function same symbol" case it's not at all clear to me that the deprecated function should always win.
>
> People sometimes use deprecated functions. It's not great, and eventually they'll need to fix that for any given case, but imagine if you deprecated the filter verb in dplyr (I know this will never happen, but I think it's illustrative none the less).
>
> Consider a piece of code someone wrote before this hypothetical deprecation of filter. The fact that it's now deprecated certainly doesn't mean that they secretly wanted stats::filter all along, right? Conflicted acting as if it does will lead to them getting the exact kind of error you're looking to protect them from, and with even less ability to understand why because they are already doing "The right thing" to protect themselves by using conflicted in the first place...

Ah yes, good point. I'll add some heuristic to check that the function
name appears in the first argument of the .Deprecated call (assuming
that the call looks something like `.Deprecated("pkg::foo")`)

>> Finally, as mentioned above, the user can declare preferences:
>>
>>     conflict_prefer("select", "MASS")
>>     #> [conflicted] Will prefer MASS::select over any other package
>>     conflict_scout(c("dplyr", "MASS"))
>>     #> 1 conflict:
>>     #> * `select`: [MASS]
>>
>
> I deeply worry about people putting this kind of thing, or even just library(conflicted), in their .Rprofile and thus making their scripts substantially less reproducible. Is that a consequence you have thought about to this kind of functionality?

Yes, and I've already recommended against it in two places :)  I'm not
sure if there's any more I can do - people already put (e.g.)
`library(ggplot2)` in their .Rprofile, which is just as bad from a
reproducibility standpoint.

Thanks for the thoughtful feedback!

Hadley

--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel