RFC: API design of package "modules"

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: API design of package "modules"

Konrad Rudolph
Some time ago I’ve published the first draft of the package “modules”
[1] which aims to provide a module system as an alternative to
packages for R. Very briefly, this is aimed to complement the existing
package system for very small code units which do not require the
(small, but existing) overhead associated with writing a package. I’ve
noticed that people around me put off writing packages (and thus,
reusable code) due to that, and use `source` instead. Modules would
work (in many cases) as a drop-in replacement for `source`, and could
thus encourage code reuse.

However, now I’m stuck on a particular aspect of the API and would
like to solicit feedback from r-devel.

`import('foo')` imports a given module, `foo`. In addition to other
differences detailed in [2], modules allow/impose a hierarchical
organisation. That way, `import('foo')` might load code from a file
called `foo.r` or from a file called `foo/__init__.r` (reminiscent of
Python’s module mechanism) and `import('foo/bar')` would load a file
`foo/bar.r` or `foo/bar/__init__r.` [3].

`import` also allows selectively importing only some functions, so
that a user might write `import('foo', c('f', 'g'))` to only import
the functions `f` and `g`.

However, at the moment modules don’t allow the equivalent of Python’s
`from foo import bar` for nested modules. That is, if I have two
nested modules `bar` and `baz`, I cannot import both of them in one
`import` statement, I need two (`import('foo/bar');
import('foo/baz')`).

I would like feedback on what people think is the best way of solving
this. Here are some suggestions I’ve gathered; in the following,
`foo`, `bar`, `qux` are (sub)modules. `f1`, `b1`, `b2`, `q1` … are
functions within the modules whose name starts with the same letter:

(1) Use of Bash-like wildcards to specify which modules to import:

```
foo = import('foo')
# Exposes `foo$f1`, `foo$f2` …, but no submodules

bar = import('foo/bar')
# Exposes `bar$b1`, `bar$b2`

foo = import('foo/{bar,qux}')
# Exposes `foo$f1`, `foo$bar$b1`, `foo$bar$b2`, `foo$qux$q1` etc.

foo = import('foo/*')
# Exposes everything

# Specifying which functions to import:
foo = import('foo/{bar,baz}', c('bar$b1', qux$q1'))
# Exposes `foo$bar$b1`, `foo$qux$q1` but NOT `foo$f1`, `foo$bar$b2` etc.
```

This is straightforward, but I feel vaguely that it’s too stringly
typed [4]. A colleague dislikes this proposal because it treats nested
modules and functions unequal: as mentioned above, `import('foo',
'f')` will import only `f` from `foo`. His argument is that there
should be a uniform way of specifying which nested modules or
functions to import – somewhat analogously to Python’s mechanism,
where `from a import b` might import a submodule *or* an object `b`.

(2) Treat submodules and functions uniformly, one per argument:

```
foo = import('foo')
# Exposes `foo$f1`, `foo$f2` …, but no submodules

bar = import('foo/bar')
# Exposes `bar$b1`, `bar$b2`

foo = import('foo/f1', 'foo/bar', 'foo/qux/q1')
# Exposes `foo$f1`, `foo$bar$b1`, `foo$bar$b2`, `foo$qux$q1`.
```

However, this has the disadvantage of cramming even more functionality
into the first argument and using stringly typing for everything
instead of using “proper” function arguments.

(3) Drop the whole thing, force people to use a separate `import`
statement for every submodule (.NET does this for namespace imports,
but then, .NET’s namespaces don’t implement a module system):

```
foo = import('foo')
# Exposes `foo$f1`, `foo$f2` …, but no submodules

bar = import('foo/bar')
# Exposes `bar$b1`, `bar$b2`

foo = import('foo', 'f1')
# Exposes `foo$f1`

bar = import('foo/bar')
# Exposes `bar$b1`, `bar$b2` …
```

(4) Something else?

So this is my question: what do other people think? Which is the most
useful and least confusing alternative from the users’ perspective?

[1]: https://github.com/klmr/modules
[2]: https://github.com/klmr/modules/blob/master/README.md#feature-comparison
[3] The original syntax for this was `import(foo)` and
`import(foo.bar)`, respectively, but Hadley convinced me to drop
non-standard argument evaluation. I’m still not convinced that NSE is
actually harmful here, but I’m likewise not convinced that it’s
beneficial (although I personally like it in this case).
[4]: http://c2.com/cgi/wiki?StringlyTyped

Kind regards,
Konrad

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: RFC: API design of package "modules"

barry rowlingson
On Mon, Apr 28, 2014 at 2:55 PM, Konrad Rudolph <
[hidden email]> wrote:

>
> So this is my question: what do other people think? Which is the most
> useful and least confusing alternative from the users’ perspective?
>

The most useful is alternative is "write packages".

 The overhead is minimal (install devtools, create("foo"); repeat {
load_all("foo") ; edit; until_bugs==0} ). Reloading a package is a
one-liner, you can't get more minimal.

 And with that you get a structure for documentation, a metadata standard,
a wide range of sanity checks, and the option to push to CRAN or github for
distribution. What you don't get is hierarchies.

 Can we get a hierarchy into base packages? That's the real question, and
if answered I think it makes your module package redundant. I'd love to see
a hierarchy with a colon-separator or something, so if I have a package
with foo/R/thing.R and foo/R/this/thing.R I can do:

 require(foo)
 thing()
 this:thing()

or similar....

I do like your approach of returning an object that provides an access to
the functions without side-effects, but the masses are so brainwashed into
thinking that require(foo) can put an unknown number of unknown-named
functions into your search list that I don't think it will ever get into
base R...

Note I did once write a simple file loader to avoid using source - it used
sys.source to load files into an environment on the search path, storing
the folder name so that it could be easily reloaded, but then devtools came
along...

If you want your module package to succeed you are going to have to
duplicate all the good stuff in packages - documentation, metadata,
distribution (trivial: zip/unzip/pull/push), and then another problem -
people will grow out of it - they'll start writing C and Fortran code.
Going to support that? devtools already does.


Barry

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: RFC: API design of package "modules"

Gabriel Becker
Just a quick note because (perhaps embarassingly) I didn't know this for a
long time:

pkg::fun() will call function fun from the namespace of package pkg
*without loading it onto the search path*

> fastdigest::fastdigest("hi there")
[1] "6fed537931bd23b42d3046c4d80790a1"
> search()
[1] ".GlobalEnv"        "package:stats"     "package:graphics"
[4] "package:grDevices" "package:utils"     "package:datasets"
[7] "package:methods"   "Autoloads"         "package:base"
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] fastdigest_0.5-0


So packages, especially with a depenency hierarchy, can get users a pretty
far way towards what you're describing.

I do think the nested namespace stuff is pretty interesting though. Looks
like there could be some fun stuff there.

~G


On Tue, Apr 29, 2014 at 6:11 AM, Barry Rowlingson <
[hidden email]> wrote:

> On Mon, Apr 28, 2014 at 2:55 PM, Konrad Rudolph <
> [hidden email]> wrote:
>
> >
> > So this is my question: what do other people think? Which is the most
> > useful and least confusing alternative from the users’ perspective?
> >
>
> The most useful is alternative is "write packages".
>
>  The overhead is minimal (install devtools, create("foo"); repeat {
> load_all("foo") ; edit; until_bugs==0} ). Reloading a package is a
> one-liner, you can't get more minimal.
>
>  And with that you get a structure for documentation, a metadata standard,
> a wide range of sanity checks, and the option to push to CRAN or github for
> distribution. What you don't get is hierarchies.
>
>  Can we get a hierarchy into base packages? That's the real question, and
> if answered I think it makes your module package redundant. I'd love to see
> a hierarchy with a colon-separator or something, so if I have a package
> with foo/R/thing.R and foo/R/this/thing.R I can do:
>
>  require(foo)
>  thing()
>  this:thing()
>
> or similar....
>
> I do like your approach of returning an object that provides an access to
> the functions without side-effects, but the masses are so brainwashed into
> thinking that require(foo) can put an unknown number of unknown-named
> functions into your search list that I don't think it will ever get into
> base R...
>
> Note I did once write a simple file loader to avoid using source - it used
> sys.source to load files into an environment on the search path, storing
> the folder name so that it could be easily reloaded, but then devtools came
> along...
>
> If you want your module package to succeed you are going to have to
> duplicate all the good stuff in packages - documentation, metadata,
> distribution (trivial: zip/unzip/pull/push), and then another problem -
> people will grow out of it - they'll start writing C and Fortran code.
> Going to support that? devtools already does.
>
>
> Barry
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: RFC: API design of package "modules"

Jeroen Ooms
On Tue, Apr 29, 2014 at 6:37 AM, Gabriel Becker <[hidden email]> wrote:
>
> pkg::fun() will call function fun from the namespace of package pkg
> *without loading it onto the search path*

It is important to use conventional terminology here. The package (and
its dependencies) gets loaded but not *attached*. The `library` and
`require` functions load and attach a package in a single step. You
can also manually attach and detach environments to/from the search
path using the `attach` and `detach` functions.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: RFC: API design of package "modules"

Gabriel Becker
On Tue, Apr 29, 2014 at 12:51 PM, Jeroen Ooms <[hidden email]> wrote:

> On Tue, Apr 29, 2014 at 6:37 AM, Gabriel Becker <[hidden email]>
> wrote:
> >
> > pkg::fun() will call function fun from the namespace of package pkg
> > *without loading it onto the search path*
>
> It is important to use conventional terminology here. The package (and
> its dependencies) gets loaded but not *attached*.


Well, yes, but AFAIK loaded but unattached namespaces don't do very much
that is going to affect the user running code. Symbols in such namespaces
are only reachable from within other namespaces, or via :: (or through some
get() based acrobatics, I suppose).


> The `library` and
> `require` functions load and attach a package in a single step. You
> can also manually attach and detach environments to/from the search
> path using the `attach` and `detach` functions.
>

You can, but it makes your code a nightmare from the maintenance,
readability, and  reproducibility of results perspectives.

~G

--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel