Underscores in package names

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Underscores in package names

S Ellison-2
To throw a very small pennyworth into this debate, the metRology package I maintain uses mixed case to highlight R for that community when I'm talking about, or citing it. R takeup in that community is not yet high and the visible reminder  seems to help.

I'll obviously accept a consensus decision for some other case convention taken on sound technical grounds, but if this is essentially an aesthetic matter I'd prefer not to change it for someone else's idea of what looks pretty and what doesn’t.

Steve Ellison

> -----Original Message-----
> From: R-devel [mailto:[hidden email]] On Behalf Of neonira
> Arinoem
> Sent: 09 August 2019 20:39
> To: Ben Bolker
> Cc: [hidden email]
> Subject: Re: [Rd] Underscores in package names
>
>
> Naming policies are always tricky. The one proposed by Hadley, as the one
> proposed by Google, are usable but not optimal according to most common
> needs, that are
>
> 1. Name a package
> 2. Name a class
> 3. Name a function
> 4. Name a parameter of a function
> 5. Name a variable
>
> ...


*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If
you have received this message in error, please notify the sender
immediately via +44(0)20 8943 7000 or notify [hidden email]
and delete this message and any copies from your computer and network.
LGC Limited. Registered in England 2991879.
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Underscores in package names

Martin Maechler
In reply to this post by Duncan Murdoch-2
>>>>> Duncan Murdoch
>>>>>     on Fri, 9 Aug 2019 20:23:28 -0400 writes:

    > On 09/08/2019 4:37 p.m., Gabriel Becker wrote:
    >> Duncan,
    >>
    >>
    >> On Fri, Aug 9, 2019 at 1:17 PM Duncan Murdoch <[hidden email]
    >> <mailto:[hidden email]>> wrote:
    >>
    >> On 09/08/2019 2:41 p.m., Gabriel Becker wrote:
    >> > Note that this proposal would make mypackage_2.3.1 a valid
    >> *package name*,
    >> > whose corresponding tarball name might be mypackage_2.3.1_2.3.2
    >> after a
    >> > patch. Yes its a silly example, but why allow that kind of ambiguity?
    >> >
    >> CRAN already has a package named "FuzzyNumbers.Ext.2", whose tarball is
    >> FuzzyNumbers.Ext.2_3.2.tar.gz, so I think we've already lost that game.
    >>
    >>
    >> I suppose technically 2 is a valid version number for a package (?) so I
    >> suppose you have me there. But as Ben pointed out while I was writing
    >> this, all I can really say is that in practice they read to me (as
    >> someone who has administered R on a large cluster and written
    >> build-system software for it) as substantially different levels of
    >> ambiguity. I do acknowledge, as Ben does, that yes a more complex
    >> regular expression/splitting algorithm can be written that would handle
    >> the more general package names. I just don't personally see a motivation
    >> that justifies changing something this fundamental (even if it is both
    >> narrow and was initially more or less arbitrarily chosen) about R at
    >> this late date.
    >>
    >> I guess at the end of the day, I guess what I'm saying is that breaking
    >> and changing things is sometimes good, but if we're going to rock the
    >> boat personally I'd want to do so going after bigger wins than this one.
    >> Thats just my opinion though.

    > Sorry, I wasn't clear.  I agree with you.  I was just saying that the
    > particular argument based on ugly tarball names isn't the reason.

    > Duncan Murdoch

Thank you (and Gabe).

We have had some R core internal "talk" about Jim Hester's
suggestion (of adding underscores to the allow characters in
package names).
Duncan had already given a good reason why such a change would be problematic
(the underscore being used as unique separator of package name
 and version in source and binary package archives),
and with Jim's offer to find and provide patches for all places
this is used in the R sources, we've convinced ourselves that
there is much more code "out there", notably 'devops' code in
scripts, which currently relies on the current package naming
rules and which could break, often only rarely and hence
possibly unnoticed for too long.

Also, we've not seen compelling arguments why the current scheme
would be too limited (people mentioned that if you must use a
separator, "." was available).

Consequence:  We stay with the stability principle and the
package naming scheme is _not_ going to be changed for now.

Martin Maechler
ETH Zurich and R Core Team

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Underscores in package names

Jim Hester
Martin,

Thank you for discussing this amongst R-core and for detailing the
R-core discussion here.

Some specific examples where having underscores available would have
been useful.

1. My primerTree package (2013) was originally primer_tree, but I had
to change the name to camelCase to comply with the check requirements.
Using camelCase in the package name makes reading code jarring, as the
functions all use snake_case.
2. The widely used testthat package would likely be called test_that,
like the corresponding function within the package. This also
highlights one of the drawbacks of the current situation, without
separators the package name is more difficult to read, does it have
two t's or three?
3. The assertive suite of packages use `.` for separation, e.g.
`assertive.base`, `assertive.datetimes` etc. but all functions within
the packages use `_` separators, again likely this was done out of
necessity rather than desire.

There are many more I am sure, these were some that came immediately
to mind. More important than the specific examples is the opportunity
cost of having this restriction, which we cannot really quantify.

Using dots for separators has a number of practical problems.
Functions using dots are ambiguous, e.g. is `as.data.frame()` a
regular function, an `as.data()` method for a `frame` object, or an
`as()` method for a `data.frame` object? And in fact regular functions
can be accidentally promoted to S3 methods by defining a S3 generic,
which does actually happen in real life, confusing users [1]. While
package names are not functions, using dots in package names
encourages the use of dots in functions, a dangerous practice. Dots in
names is also one of the common stones cast at R as a language, as
dots are used for object oriented method dispatch in other common
languages.

The prevalence of dotted functions is the only major naming convention
which is steadily decreasing over time. It now accounts for only
around 15% of all function names when looking at all 94 Million lines
of code currently available on CRAN (See Figure 2. from Yen et. al.
[2]).

Thanks again for the public discussion,

Jim

[1]: https://twitter.com/_ColinFay/status/1105579764797108230
[2]: https://osf.io/preprints/socarxiv/ts2wq/

On Wed, Aug 14, 2019 at 5:16 AM Martin Maechler
<[hidden email]> wrote:

>
> >>>>> Duncan Murdoch
> >>>>>     on Fri, 9 Aug 2019 20:23:28 -0400 writes:
>
>     > On 09/08/2019 4:37 p.m., Gabriel Becker wrote:
>     >> Duncan,
>     >>
>     >>
>     >> On Fri, Aug 9, 2019 at 1:17 PM Duncan Murdoch <[hidden email]
>     >> <mailto:[hidden email]>> wrote:
>     >>
>     >> On 09/08/2019 2:41 p.m., Gabriel Becker wrote:
>     >> > Note that this proposal would make mypackage_2.3.1 a valid
>     >> *package name*,
>     >> > whose corresponding tarball name might be mypackage_2.3.1_2.3.2
>     >> after a
>     >> > patch. Yes its a silly example, but why allow that kind of ambiguity?
>     >> >
>     >> CRAN already has a package named "FuzzyNumbers.Ext.2", whose tarball is
>     >> FuzzyNumbers.Ext.2_3.2.tar.gz, so I think we've already lost that game.
>     >>
>     >>
>     >> I suppose technically 2 is a valid version number for a package (?) so I
>     >> suppose you have me there. But as Ben pointed out while I was writing
>     >> this, all I can really say is that in practice they read to me (as
>     >> someone who has administered R on a large cluster and written
>     >> build-system software for it) as substantially different levels of
>     >> ambiguity. I do acknowledge, as Ben does, that yes a more complex
>     >> regular expression/splitting algorithm can be written that would handle
>     >> the more general package names. I just don't personally see a motivation
>     >> that justifies changing something this fundamental (even if it is both
>     >> narrow and was initially more or less arbitrarily chosen) about R at
>     >> this late date.
>     >>
>     >> I guess at the end of the day, I guess what I'm saying is that breaking
>     >> and changing things is sometimes good, but if we're going to rock the
>     >> boat personally I'd want to do so going after bigger wins than this one.
>     >> Thats just my opinion though.
>
>     > Sorry, I wasn't clear.  I agree with you.  I was just saying that the
>     > particular argument based on ugly tarball names isn't the reason.
>
>     > Duncan Murdoch
>
> Thank you (and Gabe).
>
> We have had some R core internal "talk" about Jim Hester's
> suggestion (of adding underscores to the allow characters in
> package names).
> Duncan had already given a good reason why such a change would be problematic
> (the underscore being used as unique separator of package name
>  and version in source and binary package archives),
> and with Jim's offer to find and provide patches for all places
> this is used in the R sources, we've convinced ourselves that
> there is much more code "out there", notably 'devops' code in
> scripts, which currently relies on the current package naming
> rules and which could break, often only rarely and hence
> possibly unnoticed for too long.
>
> Also, we've not seen compelling arguments why the current scheme
> would be too limited (people mentioned that if you must use a
> separator, "." was available).
>
> Consequence:  We stay with the stability principle and the
> package naming scheme is _not_ going to be changed for now.
>
> Martin Maechler
> ETH Zurich and R Core Team

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Underscores in package names

Abby Spurdle
> While
> package names are not functions, using dots in package names
> encourages the use of dots in functions, a dangerous practice.

"dangerous"...?
I can't understand the necessity of RStudio and Tiny-Verse affiliated
persons to repeatedly use subjective and unscientific phrasing.

Elegant, Advanced, Dangerous...
At UseR, there was even "Advanced Use of your Favorite IDE".

This is not science.
This is marketing.

There's nothing dangerous about it other than your belief that it's
dangerous.
I note that many functions in the stats package use dots in function names.
Your statement implies that the stats package is badly designed, which it
is not.
Out of 14,800-ish packages on CRAN, very few of them are even close to the
standard set by the stats package, in my opinion.

And as noted by other people in this thread, changing naming policies could
interfere with a lot of software "out there", which is dangerous.

> Dots in
> names is also one of the common stones cast at R as a language, as
> dots are used for object oriented method dispatch in other common
> languages.

I don't think the goal is to copy other OOP systems.
Furthermore, some shells use dot as the current working directory and Java
uses dots in package namespaces.
And then there's regular expressions...

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Underscores in package names

Jan Gorecki
Thanks Abby and Martin,

In every company I worked using R - 3 in total - there was at least
one (up to ~10) processes designed (dev and implemented) to depend on
current package naming scheme, having underscore as separator of
package name and its version. From my experience I believe this is a
(very?) common practice. I also use it myself.
Arguments for having underscore in package names are simply weak.
Dot in function names is an entirely different issue caused by S3
dispatch. No need to look at other OOP languages, it is R.
Package name is not a function name.
There are no practical gains.
There is nothing wrong in having package "a.pkg" and function "a_pkg()".

Regards,
Jan Gorecki


On Fri, Aug 16, 2019 at 1:20 AM Abby Spurdle <[hidden email]> wrote:

>
> > While
> > package names are not functions, using dots in package names
> > encourages the use of dots in functions, a dangerous practice.
>
> "dangerous"...?
> I can't understand the necessity of RStudio and Tiny-Verse affiliated
> persons to repeatedly use subjective and unscientific phrasing.
>
> Elegant, Advanced, Dangerous...
> At UseR, there was even "Advanced Use of your Favorite IDE".
>
> This is not science.
> This is marketing.
>
> There's nothing dangerous about it other than your belief that it's
> dangerous.
> I note that many functions in the stats package use dots in function names.
> Your statement implies that the stats package is badly designed, which it
> is not.
> Out of 14,800-ish packages on CRAN, very few of them are even close to the
> standard set by the stats package, in my opinion.
>
> And as noted by other people in this thread, changing naming policies could
> interfere with a lot of software "out there", which is dangerous.
>
> > Dots in
> > names is also one of the common stones cast at R as a language, as
> > dots are used for object oriented method dispatch in other common
> > languages.
>
> I don't think the goal is to copy other OOP systems.
> Furthermore, some shells use dot as the current working directory and Java
> uses dots in package namespaces.
> And then there's regular expressions...
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Underscores in package names

Kevin Wright-5
In reply to this post by Jim Hester
I've heard the arguments against dots in names many times. The t.test and
data.frame examples have been repeated so often that it has become accepted
as gospel.  In my experience, evidence of any actual problems is fairly
limited (almost non-existent).  I've been happily using dots in function
names for 20 (sigh) years and only 1 time had an unanticipated S3 class
kick in.  I find the "." much easier to type than "_" because of the
proximity of the keys to the home-row on the keyboard.

On Thu, Aug 15, 2019 at 8:00 AM Jim Hester <[hidden email]> wrote:

> Martin,
>
> Thank you for discussing this amongst R-core and for detailing the
> R-core discussion here.
>
> Some specific examples where having underscores available would have
> been useful.
>
> 1. My primerTree package (2013) was originally primer_tree, but I had
> to change the name to camelCase to comply with the check requirements.
> Using camelCase in the package name makes reading code jarring, as the
> functions all use snake_case.
> 2. The widely used testthat package would likely be called test_that,
> like the corresponding function within the package. This also
> highlights one of the drawbacks of the current situation, without
> separators the package name is more difficult to read, does it have
> two t's or three?
> 3. The assertive suite of packages use `.` for separation, e.g.
> `assertive.base`, `assertive.datetimes` etc. but all functions within
> the packages use `_` separators, again likely this was done out of
> necessity rather than desire.
>
> There are many more I am sure, these were some that came immediately
> to mind. More important than the specific examples is the opportunity
> cost of having this restriction, which we cannot really quantify.
>
> Using dots for separators has a number of practical problems.
> Functions using dots are ambiguous, e.g. is `as.data.frame()` a
> regular function, an `as.data()` method for a `frame` object, or an
> `as()` method for a `data.frame` object? And in fact regular functions
> can be accidentally promoted to S3 methods by defining a S3 generic,
> which does actually happen in real life, confusing users [1]. While
> package names are not functions, using dots in package names
> encourages the use of dots in functions, a dangerous practice. Dots in
> names is also one of the common stones cast at R as a language, as
> dots are used for object oriented method dispatch in other common
> languages.
>
> The prevalence of dotted functions is the only major naming convention
> which is steadily decreasing over time. It now accounts for only
> around 15% of all function names when looking at all 94 Million lines
> of code currently available on CRAN (See Figure 2. from Yen et. al.
> [2]).
>
> Thanks again for the public discussion,
>
> Jim
>
> [1]: https://twitter.com/_ColinFay/status/1105579764797108230
> [2]: https://osf.io/preprints/socarxiv/ts2wq/
>
> On Wed, Aug 14, 2019 at 5:16 AM Martin Maechler
> <[hidden email]> wrote:
> >
> > >>>>> Duncan Murdoch
> > >>>>>     on Fri, 9 Aug 2019 20:23:28 -0400 writes:
> >
> >     > On 09/08/2019 4:37 p.m., Gabriel Becker wrote:
> >     >> Duncan,
> >     >>
> >     >>
> >     >> On Fri, Aug 9, 2019 at 1:17 PM Duncan Murdoch <
> [hidden email]
> >     >> <mailto:[hidden email]>> wrote:
> >     >>
> >     >> On 09/08/2019 2:41 p.m., Gabriel Becker wrote:
> >     >> > Note that this proposal would make mypackage_2.3.1 a valid
> >     >> *package name*,
> >     >> > whose corresponding tarball name might be mypackage_2.3.1_2.3.2
> >     >> after a
> >     >> > patch. Yes its a silly example, but why allow that kind of
> ambiguity?
> >     >> >
> >     >> CRAN already has a package named "FuzzyNumbers.Ext.2", whose
> tarball is
> >     >> FuzzyNumbers.Ext.2_3.2.tar.gz, so I think we've already lost that
> game.
> >     >>
> >     >>
> >     >> I suppose technically 2 is a valid version number for a package
> (?) so I
> >     >> suppose you have me there. But as Ben pointed out while I was
> writing
> >     >> this, all I can really say is that in practice they read to me (as
> >     >> someone who has administered R on a large cluster and written
> >     >> build-system software for it) as substantially different levels of
> >     >> ambiguity. I do acknowledge, as Ben does, that yes a more complex
> >     >> regular expression/splitting algorithm can be written that would
> handle
> >     >> the more general package names. I just don't personally see a
> motivation
> >     >> that justifies changing something this fundamental (even if it is
> both
> >     >> narrow and was initially more or less arbitrarily chosen) about R
> at
> >     >> this late date.
> >     >>
> >     >> I guess at the end of the day, I guess what I'm saying is that
> breaking
> >     >> and changing things is sometimes good, but if we're going to rock
> the
> >     >> boat personally I'd want to do so going after bigger wins than
> this one.
> >     >> Thats just my opinion though.
> >
> >     > Sorry, I wasn't clear.  I agree with you.  I was just saying that
> the
> >     > particular argument based on ugly tarball names isn't the reason.
> >
> >     > Duncan Murdoch
> >
> > Thank you (and Gabe).
> >
> > We have had some R core internal "talk" about Jim Hester's
> > suggestion (of adding underscores to the allow characters in
> > package names).
> > Duncan had already given a good reason why such a change would be
> problematic
> > (the underscore being used as unique separator of package name
> >  and version in source and binary package archives),
> > and with Jim's offer to find and provide patches for all places
> > this is used in the R sources, we've convinced ourselves that
> > there is much more code "out there", notably 'devops' code in
> > scripts, which currently relies on the current package naming
> > rules and which could break, often only rarely and hence
> > possibly unnoticed for too long.
> >
> > Also, we've not seen compelling arguments why the current scheme
> > would be too limited (people mentioned that if you must use a
> > separator, "." was available).
> >
> > Consequence:  We stay with the stability principle and the
> > package naming scheme is _not_ going to be changed for now.
> >
> > Martin Maechler
> > ETH Zurich and R Core Team
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


--
Kevin Wright

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
12