valid package repositories

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

valid package repositories

Federico Calboli-3
Hi All,

I noticed that it is quite common to find in papers mentions to ‘R libraries’ developed for the algorithms/models/code/whatever that is being described by the paper, so that third parties will be able to use said method for themselves.  On further enquiries these libraries are not actually available on CRAN, but need to be requested from the devs.  

That is in itself does not seem a big issue, were it not for the fact most of the time I am in such situation the code is very specific for the environment of the developer, and does not actually work on any machine I try to run it on (something that is painfully true for code calling C/C++/Fortran).  A second pattern I seem to have noticed is that, despite said libraries being advertised for general use in a *published* paper, when I raise the issue the library is not actually formally published and it does not actually work like a CRAN published library would, I get a vague ‘the person who actually did the work left and nobody can maintain the code/fix stuff/finish the job’.

As a referee I am trying to weed out what I see as malpractice: the promise that third parties outside the developers might actually use the code because it has been packaged as a R library, a claim that seems to boost publishing chances.

Thus my question: when can I consider a library to be properly published and really publicly available?  CRAN and BioConductor are clearly gold standards.  What about Github?  I am currently using the rule ‘not on CRAN == outright rejection’.  If Github is as good as CRAN I will include it on my list of ‘the code is available in a functional state as claimed’.

Finally, please note the scope of my query:  I am not looking at those cases where a colleague gives me half finished code that might be useful but I need to sort out.  I am looking at formal claims ‘we have developed a method to do X and said method is available to the public as a R library’.  If that is the claim I expect it to be true.

Best

F




--
Federico Calboli
LBEG - Laboratory of Biodiversity and Evolutionary Genomics
Charles Deberiotstraat 32 box 2439
3000 Leuven
+32 16 32 87 67





______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: valid package repositories

plangfelder
On Mon, Oct 2, 2017 at 7:47 AM, Federico Calboli
<[hidden email]> wrote:

>
> Thus my question: when can I consider a library to be properly published and really publicly available?  CRAN and BioConductor are clearly gold standards.  What about Github?  I am currently using the rule ‘not on CRAN == outright rejection’.  If Github is as good as CRAN I will include it on my list of ‘the code is available in a functional state as claimed’.

CRAN has certain rules that are necessary for CRAN to function but may
not be necessary for a package to be useful (e.g. size of data in a
non-data package, licensing, run time of examples etc). I would ask
two things from developers of a new package: 1. package is available
for download from somewhere public; 2. package passes R CMD check
without errors or warnings. Possibly also an explanation why they
cannot upload the package to CRAN or Bioconductor, but I would not
make the acceptance by CRAN or Bioconductor a condition for
publishing.

Just my humble opinion.

Peter

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: valid package repositories

jdnewmil-2
In reply to this post by Federico Calboli-3
I tend to regard GitHub as a bit of wild west... anyone can upload anything there, working or not. CRAN packages at least have to compile so there is some additional verification in being there.

GitHub does have the advantage that you can easily download it and run an example if the authors have set up such scaffolding... which is better than "it ran once on that laptop that died". However, there is a distinct extra level of sophistication involved in getting researchers to make those examples or test cases beyond their mainline code, and nothing about GitHub requires that such features be present in uploaded code.
--
Sent from my phone. Please excuse my brevity.

On October 2, 2017 7:47:35 AM PDT, Federico Calboli <[hidden email]> wrote:

>Hi All,
>
>I noticed that it is quite common to find in papers mentions to ‘R
>libraries’ developed for the algorithms/models/code/whatever that is
>being described by the paper, so that third parties will be able to use
>said method for themselves.  On further enquiries these libraries are
>not actually available on CRAN, but need to be requested from the devs.
>
>
>That is in itself does not seem a big issue, were it not for the fact
>most of the time I am in such situation the code is very specific for
>the environment of the developer, and does not actually work on any
>machine I try to run it on (something that is painfully true for code
>calling C/C++/Fortran).  A second pattern I seem to have noticed is
>that, despite said libraries being advertised for general use in a
>*published* paper, when I raise the issue the library is not actually
>formally published and it does not actually work like a CRAN published
>library would, I get a vague ‘the person who actually did the work left
>and nobody can maintain the code/fix stuff/finish the job’.
>
>As a referee I am trying to weed out what I see as malpractice: the
>promise that third parties outside the developers might actually use
>the code because it has been packaged as a R library, a claim that
>seems to boost publishing chances.
>
>Thus my question: when can I consider a library to be properly
>published and really publicly available?  CRAN and BioConductor are
>clearly gold standards.  What about Github?  I am currently using the
>rule ‘not on CRAN == outright rejection’.  If Github is as good as CRAN
>I will include it on my list of ‘the code is available in a functional
>state as claimed’.
>
>Finally, please note the scope of my query:  I am not looking at those
>cases where a colleague gives me half finished code that might be
>useful but I need to sort out.  I am looking at formal claims ‘we have
>developed a method to do X and said method is available to the public
>as a R library’.  If that is the claim I expect it to be true.
>
>Best
>
>F
>
>
>
>
>--
>Federico Calboli
>LBEG - Laboratory of Biodiversity and Evolutionary Genomics
>Charles Deberiotstraat 32 box 2439
>3000 Leuven
>+32 16 32 87 67
>
>
>
>
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: valid package repositories

Berend Hasselman
In reply to this post by Federico Calboli-3

> On 2 Oct 2017, at 16:47, Federico Calboli <[hidden email]> wrote:
> .....

> As a referee I am trying to weed out what I see as malpractice: the promise that third parties outside the developers might actually use the code because it has been packaged as a R library, a claim that seems to boost publishing chances.
>
> Thus my question: when can I consider a library to be properly published and really publicly available?  CRAN and BioConductor are clearly gold standards.  What about Github?  I am currently using the rule ‘not on CRAN == outright rejection’.  If Github is as good as CRAN I will include it on my list of ‘the code is available in a functional state as claimed’.
>

As others have suggested:
I would insist that code is presented as valid R package which the maker has at least checked with R CMD check with no errors (preferably with the --as-cran option).

In addition I would also insist that  packages have been sent to the winbuilder and passed all checks without error or warning.

Berend Hasselman

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: valid package repositories

Henrik Bengtsson-5
In reply to this post by jdnewmil-2
Here's my view on this:

CRAN = Comprehensive R Archive Network.  The "Archive" part is very
important - it "promises" the research community that R packages that
have ever been published on CRAN, and all the versions of each
package, will be available also in the future.  It requires quite a
bit for a package/code to disappear from CRAN, e.g. a package contains
code/data that is not allowed to be shared (due to licenses and
copyrights).  Not even the original developer/maintainer can remove a
package that has already been released on CRAN.  What we see at times,
a package is "archived" on CRAN (i.e. no longer available via
install.packages()), but the old package versions are still
distributed.  That CRAN protects us this way is extremely valuable to
the research community, open science, and reproducible research.  The
Bioconductor has a similar philosophy.

However convenient GitHub / GitLab / ... is for development etc, it
certainly does not provide scientific archiving - in that sense it is
no different than sharing packages on Dropbox, Google Drive, etc.

/Henrik


On Mon, Oct 2, 2017 at 10:25 AM, Jeff Newmiller
<[hidden email]> wrote:

> I tend to regard GitHub as a bit of wild west... anyone can upload anything there, working or not. CRAN packages at least have to compile so there is some additional verification in being there.
>
> GitHub does have the advantage that you can easily download it and run an example if the authors have set up such scaffolding... which is better than "it ran once on that laptop that died". However, there is a distinct extra level of sophistication involved in getting researchers to make those examples or test cases beyond their mainline code, and nothing about GitHub requires that such features be present in uploaded code.
> --
> Sent from my phone. Please excuse my brevity.
>
> On October 2, 2017 7:47:35 AM PDT, Federico Calboli <[hidden email]> wrote:
>>Hi All,
>>
>>I noticed that it is quite common to find in papers mentions to ‘R
>>libraries’ developed for the algorithms/models/code/whatever that is
>>being described by the paper, so that third parties will be able to use
>>said method for themselves.  On further enquiries these libraries are
>>not actually available on CRAN, but need to be requested from the devs.
>>
>>
>>That is in itself does not seem a big issue, were it not for the fact
>>most of the time I am in such situation the code is very specific for
>>the environment of the developer, and does not actually work on any
>>machine I try to run it on (something that is painfully true for code
>>calling C/C++/Fortran).  A second pattern I seem to have noticed is
>>that, despite said libraries being advertised for general use in a
>>*published* paper, when I raise the issue the library is not actually
>>formally published and it does not actually work like a CRAN published
>>library would, I get a vague ‘the person who actually did the work left
>>and nobody can maintain the code/fix stuff/finish the job’.
>>
>>As a referee I am trying to weed out what I see as malpractice: the
>>promise that third parties outside the developers might actually use
>>the code because it has been packaged as a R library, a claim that
>>seems to boost publishing chances.
>>
>>Thus my question: when can I consider a library to be properly
>>published and really publicly available?  CRAN and BioConductor are
>>clearly gold standards.  What about Github?  I am currently using the
>>rule ‘not on CRAN == outright rejection’.  If Github is as good as CRAN
>>I will include it on my list of ‘the code is available in a functional
>>state as claimed’.
>>
>>Finally, please note the scope of my query:  I am not looking at those
>>cases where a colleague gives me half finished code that might be
>>useful but I need to sort out.  I am looking at formal claims ‘we have
>>developed a method to do X and said method is available to the public
>>as a R library’.  If that is the claim I expect it to be true.
>>
>>Best
>>
>>F
>>
>>
>>
>>
>>--
>>Federico Calboli
>>LBEG - Laboratory of Biodiversity and Evolutionary Genomics
>>Charles Deberiotstraat 32 box 2439
>>3000 Leuven
>>+32 16 32 87 67
>>
>>
>>
>>
>>
>>______________________________________________
>>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: valid package repositories

Jordan J
I would be on a similar wavelength to Peter, I believe it should be
more about the state of the package rather than the location.

Yes, the location matters to a degree but I think GitHub is more than
well enough established at this point to consider their hosting
sufficiently reliable.
The most important thing in my opinion is that it is a valid package
that passes a build/check.

There a number of things that may make CRAN unsuitable compared to
GitHub, the most obvious of which is license issues.
The CRAN Repository Policy has a specific list of licenses and notes
licenses outside that list are generally not accepted.
There is also cross platform requirements in CRAN (at least two major
platforms), which make sense for the default central repository but
are by no means required for a package to have utility and be worthy
of publishing.
Finally, GitHub when used properly also provides a full history of
changes, rationales, etc that CRAN doesn't provide for as it is not
that type of hosting service.

None of that is to say I think CRAN shouldn't be the default. Having
the package in the primary central repository should always be the
preferred option, but should not be the only option.
Personally, I would accept a GitHub library that passes a build/check
without much hesitation.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.