Proposal to reduce check times by skipping GitHub pulls and issues URL checks

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Proposal to reduce check times by skipping GitHub pulls and issues URL checks

Hugh Parsonage
When a package is submitted to CRAN, part of the quality control
process is to ensure any URLs in the package point are valid. While
this requirement is sound, it can add considerably to check times
since each URL takes around a second to check.

There are around 70,000 URLs on CRAN that are checked currently, of
which around 12,000 have a domain (by far the most common
domain, the next most common being with < 3000). I propose the
QC process be slightly weakened to skip checks of URLs that point to a
pull request or issue of a repository, provided the repository URL
itself has been checked. This patch would skip around 5000 URLs.

I claim that this would not actually weaken the quality control
process in practice. While this patch would skip invalid URLs like<repo>/<package>/9999999999999, I think it is much
more likely that a URL would point to the wrong issue or pull request,
rather than one which does not exist. Since the current QC doesn't
check whether a valid link is the intended page, my proposal would not
be a real change in this regard.

The patch should not affect the QC of packages with no URLs at all.

This change was motivated by a recent somewhat regrettable change to
the data.table package. That particular package had over 500 such URLs
in its NEWS file that took so long to check it choked the R CMD check
process. As a result, the NEWS file was split, which avoided the
checks but makes it harder to navigate historical changes.


Hugh Parsonage.
Grattan Institute
[hidden email] mailing list