improving the performance of install.packages

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

improving the performance of install.packages

Joshua Bradley
Hello,

Currently if you install a package twice:

install.packages("testit")
install.packages("testit")

R will build the package from source (depending on what OS you're using)
twice by default. This becomes especially burdensome when people are using
big packages (i.e. lots of depends) and someone has a script with:

install.packages("tidyverse")
...
... later on down the script
...
install.packages("dplyr")

In this case, "dplyr" is part of the tidyverse and will install twice. As
the primary "package manager" for R, it should not install a package twice
(by default) when it can be so easily checked. Indeed, many people resort
to writing a few lines of code to filter out already-installed packages An
r-help post from 2010 proposed a solution to improving the default
behavior, by adding "force=FALSE" as a api addition to install.packages.(
https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)

Would the R-core devs still consider this proposal?

Josh Bradley

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Duncan Murdoch-2
On 08/11/2019 2:06 a.m., Joshua Bradley wrote:

> Hello,
>
> Currently if you install a package twice:
>
> install.packages("testit")
> install.packages("testit")
>
> R will build the package from source (depending on what OS you're using)
> twice by default. This becomes especially burdensome when people are using
> big packages (i.e. lots of depends) and someone has a script with:
>
> install.packages("tidyverse")
> ...
> ... later on down the script
> ...
> install.packages("dplyr")
>
> In this case, "dplyr" is part of the tidyverse and will install twice. As
> the primary "package manager" for R, it should not install a package twice
> (by default) when it can be so easily checked. Indeed, many people resort
> to writing a few lines of code to filter out already-installed packages An
> r-help post from 2010 proposed a solution to improving the default
> behavior, by adding "force=FALSE" as a api addition to install.packages.(
> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
>
> Would the R-core devs still consider this proposal?

Whether or not they'd do it, it's easy for you to do it.

install.packages <- function(pkgs, ..., force = FALSE) {
   if (!force) {
     pkgs <- Filter(Negate(requireNamespace), pkgs

   utils::install.packages(pkgs, ...)
}

You might want to make this more elaborate, e.g. doing update.packages()
on the ones that exist.  But really, isn't the problem with the script
you're using, which could have done a simple test before forcing a slow
install?

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Joshua Bradley
I could do this...and I have before. This brings up a more fundamental
question though. You're asking me to write code that changes the logic of
the installation process (i.e. writing my own package installer). Instead
of doing that, I would rather integrate that logic into R itself to improve
the baseline installation process. This api proposal change would be
additive and would not break legacy code.

Package managers like pip (python), conda (python), yum (CentOS), apt
(Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
defaults) when to not download a package again. By proposing this change,
I'm essentially asking that R follow some of the same conventions and best
practices that other package managers have adopted over the decades.

I assumed this list is used to discuss proposals like this to the R
codebase. If I'm on the wrong list, please let me know.

P.S. if this change happened, it would be interesting to study the effect
it has on the bandwidth across all CRAN mirrors. A significant drop would
turn into actual $$ saved

Josh Bradley


On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <[hidden email]>
wrote:

> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
> > Hello,
> >
> > Currently if you install a package twice:
> >
> > install.packages("testit")
> > install.packages("testit")
> >
> > R will build the package from source (depending on what OS you're using)
> > twice by default. This becomes especially burdensome when people are
> using
> > big packages (i.e. lots of depends) and someone has a script with:
> >
> > install.packages("tidyverse")
> > ...
> > ... later on down the script
> > ...
> > install.packages("dplyr")
> >
> > In this case, "dplyr" is part of the tidyverse and will install twice. As
> > the primary "package manager" for R, it should not install a package
> twice
> > (by default) when it can be so easily checked. Indeed, many people resort
> > to writing a few lines of code to filter out already-installed packages
> An
> > r-help post from 2010 proposed a solution to improving the default
> > behavior, by adding "force=FALSE" as a api addition to install.packages.(
> > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
> >
> > Would the R-core devs still consider this proposal?
>
> Whether or not they'd do it, it's easy for you to do it.
>
> install.packages <- function(pkgs, ..., force = FALSE) {
>    if (!force) {
>      pkgs <- Filter(Negate(requireNamespace), pkgs
>
>    utils::install.packages(pkgs, ...)
> }
>
> You might want to make this more elaborate, e.g. doing update.packages()
> on the ones that exist.  But really, isn't the problem with the script
> you're using, which could have done a simple test before forcing a slow
> install?
>
> Duncan Murdoch
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

R devel mailing list
While developing a package, I often run install.packages() on it many times
in a session without updating its version number.  How would your proposed
change affect this workflow?
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]> wrote:

> I could do this...and I have before. This brings up a more fundamental
> question though. You're asking me to write code that changes the logic of
> the installation process (i.e. writing my own package installer). Instead
> of doing that, I would rather integrate that logic into R itself to improve
> the baseline installation process. This api proposal change would be
> additive and would not break legacy code.
>
> Package managers like pip (python), conda (python), yum (CentOS), apt
> (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> defaults) when to not download a package again. By proposing this change,
> I'm essentially asking that R follow some of the same conventions and best
> practices that other package managers have adopted over the decades.
>
> I assumed this list is used to discuss proposals like this to the R
> codebase. If I'm on the wrong list, please let me know.
>
> P.S. if this change happened, it would be interesting to study the effect
> it has on the bandwidth across all CRAN mirrors. A significant drop would
> turn into actual $$ saved
>
> Josh Bradley
>
>
> On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <[hidden email]>
> wrote:
>
> > On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
> > > Hello,
> > >
> > > Currently if you install a package twice:
> > >
> > > install.packages("testit")
> > > install.packages("testit")
> > >
> > > R will build the package from source (depending on what OS you're
> using)
> > > twice by default. This becomes especially burdensome when people are
> > using
> > > big packages (i.e. lots of depends) and someone has a script with:
> > >
> > > install.packages("tidyverse")
> > > ...
> > > ... later on down the script
> > > ...
> > > install.packages("dplyr")
> > >
> > > In this case, "dplyr" is part of the tidyverse and will install twice.
> As
> > > the primary "package manager" for R, it should not install a package
> > twice
> > > (by default) when it can be so easily checked. Indeed, many people
> resort
> > > to writing a few lines of code to filter out already-installed packages
> > An
> > > r-help post from 2010 proposed a solution to improving the default
> > > behavior, by adding "force=FALSE" as a api addition to
> install.packages.(
> > > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
> > >
> > > Would the R-core devs still consider this proposal?
> >
> > Whether or not they'd do it, it's easy for you to do it.
> >
> > install.packages <- function(pkgs, ..., force = FALSE) {
> >    if (!force) {
> >      pkgs <- Filter(Negate(requireNamespace), pkgs
> >
> >    utils::install.packages(pkgs, ...)
> > }
> >
> > You might want to make this more elaborate, e.g. doing update.packages()
> > on the ones that exist.  But really, isn't the problem with the script
> > you're using, which could have done a simple test before forcing a slow
> > install?
> >
> > Duncan Murdoch
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Pages, Herve
In reply to this post by Joshua Bradley
Since we are on this topic, another area of improvement is when
install.packages() downloads hundreds of packages only to realize later
that many of them actually fail to install because one of the packages
they depend on (directly or indirectly) failed to install.

Cheers,
H.


On 11/8/19 11:55, Joshua Bradley wrote:

> I could do this...and I have before. This brings up a more fundamental
> question though. You're asking me to write code that changes the logic of
> the installation process (i.e. writing my own package installer). Instead
> of doing that, I would rather integrate that logic into R itself to improve
> the baseline installation process. This api proposal change would be
> additive and would not break legacy code.
>
> Package managers like pip (python), conda (python), yum (CentOS), apt
> (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> defaults) when to not download a package again. By proposing this change,
> I'm essentially asking that R follow some of the same conventions and best
> practices that other package managers have adopted over the decades.
>
> I assumed this list is used to discuss proposals like this to the R
> codebase. If I'm on the wrong list, please let me know.
>
> P.S. if this change happened, it would be interesting to study the effect
> it has on the bandwidth across all CRAN mirrors. A significant drop would
> turn into actual $$ saved
>
> Josh Bradley
>
>
> On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <[hidden email]>
> wrote:
>
>> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
>>> Hello,
>>>
>>> Currently if you install a package twice:
>>>
>>> install.packages("testit")
>>> install.packages("testit")
>>>
>>> R will build the package from source (depending on what OS you're using)
>>> twice by default. This becomes especially burdensome when people are
>> using
>>> big packages (i.e. lots of depends) and someone has a script with:
>>>
>>> install.packages("tidyverse")
>>> ...
>>> ... later on down the script
>>> ...
>>> install.packages("dplyr")
>>>
>>> In this case, "dplyr" is part of the tidyverse and will install twice. As
>>> the primary "package manager" for R, it should not install a package
>> twice
>>> (by default) when it can be so easily checked. Indeed, many people resort
>>> to writing a few lines of code to filter out already-installed packages
>> An
>>> r-help post from 2010 proposed a solution to improving the default
>>> behavior, by adding "force=FALSE" as a api addition to install.packages.(
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Dhelp_2010-2DMay_239492.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=UA8pThQCyQOMZf_tiAAnzSPckXg-h9-262Eu2WCyGHs&s=qtl85Oi2X2-U4rTQW-78pu9_Jb2vhBo1VZZN9pm6M8U&e= )
>>>
>>> Would the R-core devs still consider this proposal?
>>
>> Whether or not they'd do it, it's easy for you to do it.
>>
>> install.packages <- function(pkgs, ..., force = FALSE) {
>>     if (!force) {
>>       pkgs <- Filter(Negate(requireNamespace), pkgs
>>
>>     utils::install.packages(pkgs, ...)
>> }
>>
>> You might want to make this more elaborate, e.g. doing update.packages()
>> on the ones that exist.  But really, isn't the problem with the script
>> you're using, which could have done a simple test before forcing a slow
>> install?
>>
>> Duncan Murdoch
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=UA8pThQCyQOMZf_tiAAnzSPckXg-h9-262Eu2WCyGHs&s=HfzpeqddkrDu5eqZrrwPlN34KZIazW5yNGF7Hp-B0Go&e=
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Pages, Herve
In reply to this post by R devel mailing list
I guess you would just use force=TRUE

H.

On 11/8/19 12:06, William Dunlap via R-devel wrote:

> While developing a package, I often run install.packages() on it many times
> in a session without updating its version number.  How would your proposed
> change affect this workflow?
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]> wrote:
>
>> I could do this...and I have before. This brings up a more fundamental
>> question though. You're asking me to write code that changes the logic of
>> the installation process (i.e. writing my own package installer). Instead
>> of doing that, I would rather integrate that logic into R itself to improve
>> the baseline installation process. This api proposal change would be
>> additive and would not break legacy code.
>>
>> Package managers like pip (python), conda (python), yum (CentOS), apt
>> (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
>> defaults) when to not download a package again. By proposing this change,
>> I'm essentially asking that R follow some of the same conventions and best
>> practices that other package managers have adopted over the decades.
>>
>> I assumed this list is used to discuss proposals like this to the R
>> codebase. If I'm on the wrong list, please let me know.
>>
>> P.S. if this change happened, it would be interesting to study the effect
>> it has on the bandwidth across all CRAN mirrors. A significant drop would
>> turn into actual $$ saved
>>
>> Josh Bradley
>>
>>
>> On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <[hidden email]>
>> wrote:
>>
>>> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
>>>> Hello,
>>>>
>>>> Currently if you install a package twice:
>>>>
>>>> install.packages("testit")
>>>> install.packages("testit")
>>>>
>>>> R will build the package from source (depending on what OS you're
>> using)
>>>> twice by default. This becomes especially burdensome when people are
>>> using
>>>> big packages (i.e. lots of depends) and someone has a script with:
>>>>
>>>> install.packages("tidyverse")
>>>> ...
>>>> ... later on down the script
>>>> ...
>>>> install.packages("dplyr")
>>>>
>>>> In this case, "dplyr" is part of the tidyverse and will install twice.
>> As
>>>> the primary "package manager" for R, it should not install a package
>>> twice
>>>> (by default) when it can be so easily checked. Indeed, many people
>> resort
>>>> to writing a few lines of code to filter out already-installed packages
>>> An
>>>> r-help post from 2010 proposed a solution to improving the default
>>>> behavior, by adding "force=FALSE" as a api addition to
>> install.packages.(
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Dhelp_2010-2DMay_239492.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=iJofJNzrnbF8idVP_KjXyi-Pt9e0cAgor0UEiDJPPro&s=R1s-MHqzxEbvj-KerylYVqz-IkWatde6QREua4MPqmU&e= )
>>>>
>>>> Would the R-core devs still consider this proposal?
>>>
>>> Whether or not they'd do it, it's easy for you to do it.
>>>
>>> install.packages <- function(pkgs, ..., force = FALSE) {
>>>     if (!force) {
>>>       pkgs <- Filter(Negate(requireNamespace), pkgs
>>>
>>>     utils::install.packages(pkgs, ...)
>>> }
>>>
>>> You might want to make this more elaborate, e.g. doing update.packages()
>>> on the ones that exist.  But really, isn't the problem with the script
>>> you're using, which could have done a simple test before forcing a slow
>>> install?
>>>
>>> Duncan Murdoch
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=iJofJNzrnbF8idVP_KjXyi-Pt9e0cAgor0UEiDJPPro&s=mIZ0fcjSg7KaJAY4wgLlKOaWwcD2uv9lI-GQNvcj4cg&e=
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=iJofJNzrnbF8idVP_KjXyi-Pt9e0cAgor0UEiDJPPro&s=mIZ0fcjSg7KaJAY4wgLlKOaWwcD2uv9lI-GQNvcj4cg&e=
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Avraham Adler
In reply to this post by R devel mailing list
Exactly. Every major commit isn’t want to check that the package works.

Also, besides package development, there are other reasons why one would
install packages over themselves. For example, rebuilding from source after
changing options in Makevars[.win]. The package hasn’t been updated but
recompilation is desired.

Avi

On Fri, Nov 8, 2019 at 3:07 PM William Dunlap via R-devel <
[hidden email]> wrote:

> While developing a package, I often run install.packages() on it many times
> in a session without updating its version number.  How would your proposed
> change affect this workflow?
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]>
> wrote:
>
> > I could do this...and I have before. This brings up a more fundamental
> > question though. You're asking me to write code that changes the logic of
> > the installation process (i.e. writing my own package installer). Instead
> > of doing that, I would rather integrate that logic into R itself to
> improve
> > the baseline installation process. This api proposal change would be
> > additive and would not break legacy code.
> >
> > Package managers like pip (python), conda (python), yum (CentOS), apt
> > (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> > defaults) when to not download a package again. By proposing this change,
> > I'm essentially asking that R follow some of the same conventions and
> best
> > practices that other package managers have adopted over the decades.
> >
> > I assumed this list is used to discuss proposals like this to the R
> > codebase. If I'm on the wrong list, please let me know.
> >
> > P.S. if this change happened, it would be interesting to study the effect
> > it has on the bandwidth across all CRAN mirrors. A significant drop would
> > turn into actual $$ saved
> >
> > Josh Bradley
> >
> >
> > On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <[hidden email]>
> > wrote:
> >
> > > On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
> > > > Hello,
> > > >
> > > > Currently if you install a package twice:
> > > >
> > > > install.packages("testit")
> > > > install.packages("testit")
> > > >
> > > > R will build the package from source (depending on what OS you're
> > using)
> > > > twice by default. This becomes especially burdensome when people are
> > > using
> > > > big packages (i.e. lots of depends) and someone has a script with:
> > > >
> > > > install.packages("tidyverse")
> > > > ...
> > > > ... later on down the script
> > > > ...
> > > > install.packages("dplyr")
> > > >
> > > > In this case, "dplyr" is part of the tidyverse and will install
> twice.
> > As
> > > > the primary "package manager" for R, it should not install a package
> > > twice
> > > > (by default) when it can be so easily checked. Indeed, many people
> > resort
> > > > to writing a few lines of code to filter out already-installed
> packages
> > > An
> > > > r-help post from 2010 proposed a solution to improving the default
> > > > behavior, by adding "force=FALSE" as a api addition to
> > install.packages.(
> > > > https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
> > > >
> > > > Would the R-core devs still consider this proposal?
> > >
> > > Whether or not they'd do it, it's easy for you to do it.
> > >
> > > install.packages <- function(pkgs, ..., force = FALSE) {
> > >    if (!force) {
> > >      pkgs <- Filter(Negate(requireNamespace), pkgs
> > >
> > >    utils::install.packages(pkgs, ...)
> > > }
> > >
> > > You might want to make this more elaborate, e.g. doing
> update.packages()
> > > on the ones that exist.  But really, isn't the problem with the script
> > > you're using, which could have done a simple test before forcing a slow
> > > install?
> > >
> > > Duncan Murdoch
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Sent from Gmail Mobile

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Gabriel Becker-2
In reply to this post by Joshua Bradley
Hi Josh,

There are a few issues I can think of with this. The primary one is that
CRAN(/Bioconductor) is not the only place one can install packages from. I
might have version x.y.z of a package installed that was, at the time, a
development version I got from github, or installed locally, etc. Hell I
might have a later devel version but want the CRAN version. Not common,
sure, but wiill likely happen often enough that install.packages not doing
that for me when I tell it to is probably bad.

Currently (though there has been some discussion of changing this) packages
do not remember where they were installed from, so R wouldn't know if the
version you have is actually fully the same one on the repository you
pointed install.packages to or not.  If that were changed  and we knew that
we were getting the byte identical package from the actual same source, I
think this would be a nice addition, though without it I think it would be
right a high but not high enough proportion of the time.

R will build the package from source (depending on what OS you're using)
> twice by default. This becomes especially burdensome when people are using
> big packages (i.e. lots of depends) and someone has a script with:
>


install.packages("tidyverse")
> ...
> ... later on down the script
> ...
> install.packages("dplyr")
>

I mean, IMHO and as I think Duncan was alluding to, that's straight up an
error by the script author. I think its a few of them, actually, but its at
least one. An understandable one, sure, but thats still what it is. Scripts
(which are meant to be run more than once, generally) usually shouldn't
really be calling install.packages in the first place, but if they do, they
should certainly not be installing umbrella packages and the packages they
bring with them separately.

Even having one vectorized call to install.packages where all the packages
are installed would prevent this issue, including in the case where the
user doesn't understand the purpose of the tidyverse package. Though the
installation would still occur every time the script was run.


The last thing to note is that there are at least 2 packages which provide
a function which does this already (install.load and remotes), so people
can get this functionality if they need it.


On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]> wrote:

>
>
> I assumed this list is used to discuss proposals like this to the R
> codebase. If I'm on the wrong list, please let me know.
>

This is the right place to discuss things like this. Thanks for starting
the conversation.

Best,
~G

>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Duncan Murdoch-2
In reply to this post by Joshua Bradley
On 08/11/2019 2:55 p.m., Joshua Bradley wrote:
> I could do this...and I have before. This brings up a more fundamental
> question though. You're asking me to write code that changes the logic of
> the installation process (i.e. writing my own package installer). Instead
> of doing that, I would rather integrate that logic into R itself to improve
> the baseline installation process. This api proposal change would be
> additive and would not break legacy code.

That's not true.  The current behaviour is equivalent to force=TRUE; I
believe the proposal was to change the default to force=FALSE.

If you didn't change the default, it wouldn't help your example:  the
badly written script would run with force=TRUE, and wouldn't benefit at all.

Duncan Murdoch

>
> Package managers like pip (python), conda (python), yum (CentOS), apt
> (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> defaults) when to not download a package again. By proposing this change,
> I'm essentially asking that R follow some of the same conventions and best
> practices that other package managers have adopted over the decades.
>
> I assumed this list is used to discuss proposals like this to the R
> codebase. If I'm on the wrong list, please let me know.
>
> P.S. if this change happened, it would be interesting to study the effect
> it has on the bandwidth across all CRAN mirrors. A significant drop would
> turn into actual $$ saved
>
> Josh Bradley
>
>
> On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <[hidden email]>
> wrote:
>
>> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
>>> Hello,
>>>
>>> Currently if you install a package twice:
>>>
>>> install.packages("testit")
>>> install.packages("testit")
>>>
>>> R will build the package from source (depending on what OS you're using)
>>> twice by default. This becomes especially burdensome when people are
>> using
>>> big packages (i.e. lots of depends) and someone has a script with:
>>>
>>> install.packages("tidyverse")
>>> ...
>>> ... later on down the script
>>> ...
>>> install.packages("dplyr")
>>>
>>> In this case, "dplyr" is part of the tidyverse and will install twice. As
>>> the primary "package manager" for R, it should not install a package
>> twice
>>> (by default) when it can be so easily checked. Indeed, many people resort
>>> to writing a few lines of code to filter out already-installed packages
>> An
>>> r-help post from 2010 proposed a solution to improving the default
>>> behavior, by adding "force=FALSE" as a api addition to install.packages.(
>>> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
>>>
>>> Would the R-core devs still consider this proposal?
>>
>> Whether or not they'd do it, it's easy for you to do it.
>>
>> install.packages <- function(pkgs, ..., force = FALSE) {
>>     if (!force) {
>>       pkgs <- Filter(Negate(requireNamespace), pkgs
>>
>>     utils::install.packages(pkgs, ...)
>> }
>>
>> You might want to make this more elaborate, e.g. doing update.packages()
>> on the ones that exist.  But really, isn't the problem with the script
>> you're using, which could have done a simple test before forcing a slow
>> install?
>>
>> Duncan Murdoch
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

R devel mailing list
Suppose update.packages("pkg") installed "pkg" if it were not already
installed, in addition to its current behavior of installing "pkg" if "pkg"
is installed but a newer version is available.  The OP could then use
update.packages() all the time instead of install.packages() the first time
and update.packages() subsequent times.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Nov 8, 2019 at 2:51 PM Duncan Murdoch <[hidden email]>
wrote:

> On 08/11/2019 2:55 p.m., Joshua Bradley wrote:
> > I could do this...and I have before. This brings up a more fundamental
> > question though. You're asking me to write code that changes the logic of
> > the installation process (i.e. writing my own package installer). Instead
> > of doing that, I would rather integrate that logic into R itself to
> improve
> > the baseline installation process. This api proposal change would be
> > additive and would not break legacy code.
>
> That's not true.  The current behaviour is equivalent to force=TRUE; I
> believe the proposal was to change the default to force=FALSE.
>
> If you didn't change the default, it wouldn't help your example:  the
> badly written script would run with force=TRUE, and wouldn't benefit at
> all.
>
> Duncan Murdoch
>
> >
> > Package managers like pip (python), conda (python), yum (CentOS), apt
> > (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
> > defaults) when to not download a package again. By proposing this change,
> > I'm essentially asking that R follow some of the same conventions and
> best
> > practices that other package managers have adopted over the decades.
> >
> > I assumed this list is used to discuss proposals like this to the R
> > codebase. If I'm on the wrong list, please let me know.
> >
> > P.S. if this change happened, it would be interesting to study the effect
> > it has on the bandwidth across all CRAN mirrors. A significant drop would
> > turn into actual $$ saved
> >
> > Josh Bradley
> >
> >
> > On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch <[hidden email]>
> > wrote:
> >
> >> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
> >>> Hello,
> >>>
> >>> Currently if you install a package twice:
> >>>
> >>> install.packages("testit")
> >>> install.packages("testit")
> >>>
> >>> R will build the package from source (depending on what OS you're
> using)
> >>> twice by default. This becomes especially burdensome when people are
> >> using
> >>> big packages (i.e. lots of depends) and someone has a script with:
> >>>
> >>> install.packages("tidyverse")
> >>> ...
> >>> ... later on down the script
> >>> ...
> >>> install.packages("dplyr")
> >>>
> >>> In this case, "dplyr" is part of the tidyverse and will install twice.
> As
> >>> the primary "package manager" for R, it should not install a package
> >> twice
> >>> (by default) when it can be so easily checked. Indeed, many people
> resort
> >>> to writing a few lines of code to filter out already-installed packages
> >> An
> >>> r-help post from 2010 proposed a solution to improving the default
> >>> behavior, by adding "force=FALSE" as a api addition to
> install.packages.(
> >>> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
> >>>
> >>> Would the R-core devs still consider this proposal?
> >>
> >> Whether or not they'd do it, it's easy for you to do it.
> >>
> >> install.packages <- function(pkgs, ..., force = FALSE) {
> >>     if (!force) {
> >>       pkgs <- Filter(Negate(requireNamespace), pkgs
> >>
> >>     utils::install.packages(pkgs, ...)
> >> }
> >>
> >> You might want to make this more elaborate, e.g. doing update.packages()
> >> on the ones that exist.  But really, isn't the problem with the script
> >> you're using, which could have done a simple test before forcing a slow
> >> install?
> >>
> >> Duncan Murdoch
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Pages, Herve
In reply to this post by Gabriel Becker-2
Hi Gabe,

Keeping track of where a package was installed from would be a nice
feature. However it wouldn't be as reliable as comparing hashes to
decide whether a package needs re-installation or not.

H.

On 11/8/19 12:37, Gabriel Becker wrote:

> Hi Josh,
>
> There are a few issues I can think of with this. The primary one is that
> CRAN(/Bioconductor) is not the only place one can install packages from. I
> might have version x.y.z of a package installed that was, at the time, a
> development version I got from github, or installed locally, etc. Hell I
> might have a later devel version but want the CRAN version. Not common,
> sure, but wiill likely happen often enough that install.packages not doing
> that for me when I tell it to is probably bad.
>
> Currently (though there has been some discussion of changing this) packages
> do not remember where they were installed from, so R wouldn't know if the
> version you have is actually fully the same one on the repository you
> pointed install.packages to or not.  If that were changed  and we knew that
> we were getting the byte identical package from the actual same source, I
> think this would be a nice addition, though without it I think it would be
> right a high but not high enough proportion of the time.
>
> R will build the package from source (depending on what OS you're using)
>> twice by default. This becomes especially burdensome when people are using
>> big packages (i.e. lots of depends) and someone has a script with:
>>
>
>
> install.packages("tidyverse")
>> ...
>> ... later on down the script
>> ...
>> install.packages("dplyr")
>>
>
> I mean, IMHO and as I think Duncan was alluding to, that's straight up an
> error by the script author. I think its a few of them, actually, but its at
> least one. An understandable one, sure, but thats still what it is. Scripts
> (which are meant to be run more than once, generally) usually shouldn't
> really be calling install.packages in the first place, but if they do, they
> should certainly not be installing umbrella packages and the packages they
> bring with them separately.
>
> Even having one vectorized call to install.packages where all the packages
> are installed would prevent this issue, including in the case where the
> user doesn't understand the purpose of the tidyverse package. Though the
> installation would still occur every time the script was run.
>
>
> The last thing to note is that there are at least 2 packages which provide
> a function which does this already (install.load and remotes), so people
> can get this functionality if they need it.
>
>
> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]> wrote:
>
>>
>>
>> I assumed this list is used to discuss proposals like this to the R
>> codebase. If I'm on the wrong list, please let me know.
>>
>
> This is the right place to discuss things like this. Thanks for starting
> the conversation.
>
> Best,
> ~G
>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e=
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Duncan Murdoch-2
In reply to this post by R devel mailing list
On 08/11/2019 6:02 p.m., William Dunlap wrote:
> Suppose update.packages("pkg") installed "pkg" if it were not already
> installed, in addition to its current behavior of installing "pkg" if
> "pkg" is installed but a newer version is available.  The OP could then
> use update.packages() all the time instead of install.packages() the
> first time and update.packages() subsequent times.

That makes more sense to me than the "force = FALSE" proposal.

Duncan Murdoch

>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
>
> On Fri, Nov 8, 2019 at 2:51 PM Duncan Murdoch <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 08/11/2019 2:55 p.m., Joshua Bradley wrote:
>      > I could do this...and I have before. This brings up a more
>     fundamental
>      > question though. You're asking me to write code that changes the
>     logic of
>      > the installation process (i.e. writing my own package installer).
>     Instead
>      > of doing that, I would rather integrate that logic into R itself
>     to improve
>      > the baseline installation process. This api proposal change would be
>      > additive and would not break legacy code.
>
>     That's not true.  The current behaviour is equivalent to force=TRUE; I
>     believe the proposal was to change the default to force=FALSE.
>
>     If you didn't change the default, it wouldn't help your example:  the
>     badly written script would run with force=TRUE, and wouldn't benefit
>     at all.
>
>     Duncan Murdoch
>
>      >
>      > Package managers like pip (python), conda (python), yum (CentOS), apt
>      > (Ubuntu), and apk (Alpine) are all "smart" enough to know (by their
>      > defaults) when to not download a package again. By proposing this
>     change,
>      > I'm essentially asking that R follow some of the same conventions
>     and best
>      > practices that other package managers have adopted over the decades.
>      >
>      > I assumed this list is used to discuss proposals like this to the R
>      > codebase. If I'm on the wrong list, please let me know.
>      >
>      > P.S. if this change happened, it would be interesting to study
>     the effect
>      > it has on the bandwidth across all CRAN mirrors. A significant
>     drop would
>      > turn into actual $$ saved
>      >
>      > Josh Bradley
>      >
>      >
>      > On Fri, Nov 8, 2019 at 5:00 AM Duncan Murdoch
>     <[hidden email] <mailto:[hidden email]>>
>      > wrote:
>      >
>      >> On 08/11/2019 2:06 a.m., Joshua Bradley wrote:
>      >>> Hello,
>      >>>
>      >>> Currently if you install a package twice:
>      >>>
>      >>> install.packages("testit")
>      >>> install.packages("testit")
>      >>>
>      >>> R will build the package from source (depending on what OS
>     you're using)
>      >>> twice by default. This becomes especially burdensome when
>     people are
>      >> using
>      >>> big packages (i.e. lots of depends) and someone has a script with:
>      >>>
>      >>> install.packages("tidyverse")
>      >>> ...
>      >>> ... later on down the script
>      >>> ...
>      >>> install.packages("dplyr")
>      >>>
>      >>> In this case, "dplyr" is part of the tidyverse and will install
>     twice. As
>      >>> the primary "package manager" for R, it should not install a
>     package
>      >> twice
>      >>> (by default) when it can be so easily checked. Indeed, many
>     people resort
>      >>> to writing a few lines of code to filter out already-installed
>     packages
>      >> An
>      >>> r-help post from 2010 proposed a solution to improving the default
>      >>> behavior, by adding "force=FALSE" as a api addition to
>     install.packages.(
>      >>> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
>      >>>
>      >>> Would the R-core devs still consider this proposal?
>      >>
>      >> Whether or not they'd do it, it's easy for you to do it.
>      >>
>      >> install.packages <- function(pkgs, ..., force = FALSE) {
>      >>     if (!force) {
>      >>       pkgs <- Filter(Negate(requireNamespace), pkgs
>      >>
>      >>     utils::install.packages(pkgs, ...)
>      >> }
>      >>
>      >> You might want to make this more elaborate, e.g. doing
>     update.packages()
>      >> on the ones that exist.  But really, isn't the problem with the
>     script
>      >> you're using, which could have done a simple test before forcing
>     a slow
>      >> install?
>      >>
>      >> Duncan Murdoch
>      >>
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > [hidden email] <mailto:[hidden email]> mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>      >
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Henrik Bengtsson-5
In reply to this post by Pages, Herve
I believe introducing a backward compatible force=TRUE is a good
start, even if we're not ready for making force=FALSE the default at
this point.  It would help simplify quite-common instructions like:

if (requireNamespace("BiocManager"))
  install.packages("BiocManager")
BiocManager::install(...)

to

install.packages("BiocManager", force=FALSE)
BiocManager::install(...)

and more so when installing lots of packages conditionally, e.g.

if (requireNamespace("foo")) install.packages("foo")
if (requireNamespace("bar")) install.packages("bar")
...

to

install.packages(c("foo", "bar", ...), force = FALSE)

Before deciding on making force=FALSE the new default, I think it
would be valuable to play the devil's advocate and explore and
identify all possible downsides of such a default, e.g. breaking
existing instructions, downstream package code that uses
install.packages() internally, and so on.

/Henrik

PS. Although the idea of having update.packages() install missing
packages is not bad, I don't think I'm a not a fan for the sole
purpose of risking installation instructions starting using
update.packages() instead, which will certainly confuse those who
don't know the history (think require() vs library()).

On Fri, Nov 8, 2019 at 3:11 PM Pages, Herve <[hidden email]> wrote:

>
> Hi Gabe,
>
> Keeping track of where a package was installed from would be a nice
> feature. However it wouldn't be as reliable as comparing hashes to
> decide whether a package needs re-installation or not.
>
> H.
>
> On 11/8/19 12:37, Gabriel Becker wrote:
> > Hi Josh,
> >
> > There are a few issues I can think of with this. The primary one is that
> > CRAN(/Bioconductor) is not the only place one can install packages from. I
> > might have version x.y.z of a package installed that was, at the time, a
> > development version I got from github, or installed locally, etc. Hell I
> > might have a later devel version but want the CRAN version. Not common,
> > sure, but wiill likely happen often enough that install.packages not doing
> > that for me when I tell it to is probably bad.
> >
> > Currently (though there has been some discussion of changing this) packages
> > do not remember where they were installed from, so R wouldn't know if the
> > version you have is actually fully the same one on the repository you
> > pointed install.packages to or not.  If that were changed  and we knew that
> > we were getting the byte identical package from the actual same source, I
> > think this would be a nice addition, though without it I think it would be
> > right a high but not high enough proportion of the time.
> >
> > R will build the package from source (depending on what OS you're using)
> >> twice by default. This becomes especially burdensome when people are using
> >> big packages (i.e. lots of depends) and someone has a script with:
> >>
> >
> >
> > install.packages("tidyverse")
> >> ...
> >> ... later on down the script
> >> ...
> >> install.packages("dplyr")
> >>
> >
> > I mean, IMHO and as I think Duncan was alluding to, that's straight up an
> > error by the script author. I think its a few of them, actually, but its at
> > least one. An understandable one, sure, but thats still what it is. Scripts
> > (which are meant to be run more than once, generally) usually shouldn't
> > really be calling install.packages in the first place, but if they do, they
> > should certainly not be installing umbrella packages and the packages they
> > bring with them separately.
> >
> > Even having one vectorized call to install.packages where all the packages
> > are installed would prevent this issue, including in the case where the
> > user doesn't understand the purpose of the tidyverse package. Though the
> > installation would still occur every time the script was run.
> >
> >
> > The last thing to note is that there are at least 2 packages which provide
> > a function which does this already (install.load and remotes), so people
> > can get this functionality if they need it.
> >
> >
> > On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]> wrote:
> >
> >>
> >>
> >> I assumed this list is used to discuss proposals like this to the R
> >> codebase. If I'm on the wrong list, please let me know.
> >>
> >
> > This is the right place to discuss things like this. Thanks for starting
> > the conversation.
> >
> > Best,
> > ~G
> >
> >>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e=
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Pages, Herve
In reply to this post by Pages, Herve
Actually there is one gotcha here: even if a package has not changed
(i.e. same exact hash), there are situations where you want to reinstall
it because one package it depends on has changed. This is because some
of the stuff that gets cached at installation time (e.g. method table)
can become stale and needs to be resynced.

We sometimes have to deal with this kind of situation in Bioconductor
when we make changes to some infrastructure packages. To avoid package
caches to become out-of-sync on the user machine after the user gets the
new version of the infrastructure package, we also bump the versions of
all the reverse deps for which the cache needs to be resynced. A side
effect of the version bumps is to also trigger build and propagation of
new Windows and Mac binaries for the reverse deps affected by the
change, which is good, because they also need to be rebuilt and
reinstalled. This is an ugly situation but luckily a rare one and it
generally happens in BioC devel only.

H.


On 11/8/19 15:05, Hervé Pagès wrote:

> Hi Gabe,
>
> Keeping track of where a package was installed from would be a nice
> feature. However it wouldn't be as reliable as comparing hashes to
> decide whether a package needs re-installation or not.
>
> H.
>
> On 11/8/19 12:37, Gabriel Becker wrote:
>> Hi Josh,
>>
>> There are a few issues I can think of with this. The primary one is that
>> CRAN(/Bioconductor) is not the only place one can install packages
>> from. I
>> might have version x.y.z of a package installed that was, at the time, a
>> development version I got from github, or installed locally, etc. Hell I
>> might have a later devel version but want the CRAN version. Not common,
>> sure, but wiill likely happen often enough that install.packages not
>> doing
>> that for me when I tell it to is probably bad.
>>
>> Currently (though there has been some discussion of changing this)
>> packages
>> do not remember where they were installed from, so R wouldn't know if the
>> version you have is actually fully the same one on the repository you
>> pointed install.packages to or not.  If that were changed  and we knew
>> that
>> we were getting the byte identical package from the actual same source, I
>> think this would be a nice addition, though without it I think it
>> would be
>> right a high but not high enough proportion of the time.
>>
>> R will build the package from source (depending on what OS you're using)
>>> twice by default. This becomes especially burdensome when people are
>>> using
>>> big packages (i.e. lots of depends) and someone has a script with:
>>>
>>
>>
>> install.packages("tidyverse")
>>> ...
>>> ... later on down the script
>>> ...
>>> install.packages("dplyr")
>>>
>>
>> I mean, IMHO and as I think Duncan was alluding to, that's straight up an
>> error by the script author. I think its a few of them, actually, but
>> its at
>> least one. An understandable one, sure, but thats still what it is.
>> Scripts
>> (which are meant to be run more than once, generally) usually shouldn't
>> really be calling install.packages in the first place, but if they do,
>> they
>> should certainly not be installing umbrella packages and the packages
>> they
>> bring with them separately.
>>
>> Even having one vectorized call to install.packages where all the
>> packages
>> are installed would prevent this issue, including in the case where the
>> user doesn't understand the purpose of the tidyverse package. Though the
>> installation would still occur every time the script was run.
>>
>>
>> The last thing to note is that there are at least 2 packages which
>> provide
>> a function which does this already (install.load and remotes), so people
>> can get this functionality if they need it.
>>
>>
>> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]>
>> wrote:
>>
>>>
>>>
>>> I assumed this list is used to discuss proposals like this to the R
>>> codebase. If I'm on the wrong list, please let me know.
>>>
>>
>> This is the right place to discuss things like this. Thanks for starting
>> the conversation.
>>
>> Best,
>> ~G
>>
>>>
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e= 
>>
>>
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Pages, Herve
In reply to this post by Henrik Bengtsson-5
Sounds a very reasonable approach to me.

H.

On 11/8/19 15:17, Henrik Bengtsson wrote:

> I believe introducing a backward compatible force=TRUE is a good
> start, even if we're not ready for making force=FALSE the default at
> this point.  It would help simplify quite-common instructions like:
>
> if (requireNamespace("BiocManager"))
>    install.packages("BiocManager")
> BiocManager::install(...)
>
> to
>
> install.packages("BiocManager", force=FALSE)
> BiocManager::install(...)
>
> and more so when installing lots of packages conditionally, e.g.
>
> if (requireNamespace("foo")) install.packages("foo")
> if (requireNamespace("bar")) install.packages("bar")
> ...
>
> to
>
> install.packages(c("foo", "bar", ...), force = FALSE)
>
> Before deciding on making force=FALSE the new default, I think it
> would be valuable to play the devil's advocate and explore and
> identify all possible downsides of such a default, e.g. breaking
> existing instructions, downstream package code that uses
> install.packages() internally, and so on.
>
> /Henrik
>
> PS. Although the idea of having update.packages() install missing
> packages is not bad, I don't think I'm a not a fan for the sole
> purpose of risking installation instructions starting using
> update.packages() instead, which will certainly confuse those who
> don't know the history (think require() vs library()).
>
> On Fri, Nov 8, 2019 at 3:11 PM Pages, Herve <[hidden email]> wrote:
>>
>> Hi Gabe,
>>
>> Keeping track of where a package was installed from would be a nice
>> feature. However it wouldn't be as reliable as comparing hashes to
>> decide whether a package needs re-installation or not.
>>
>> H.
>>
>> On 11/8/19 12:37, Gabriel Becker wrote:
>>> Hi Josh,
>>>
>>> There are a few issues I can think of with this. The primary one is that
>>> CRAN(/Bioconductor) is not the only place one can install packages from. I
>>> might have version x.y.z of a package installed that was, at the time, a
>>> development version I got from github, or installed locally, etc. Hell I
>>> might have a later devel version but want the CRAN version. Not common,
>>> sure, but wiill likely happen often enough that install.packages not doing
>>> that for me when I tell it to is probably bad.
>>>
>>> Currently (though there has been some discussion of changing this) packages
>>> do not remember where they were installed from, so R wouldn't know if the
>>> version you have is actually fully the same one on the repository you
>>> pointed install.packages to or not.  If that were changed  and we knew that
>>> we were getting the byte identical package from the actual same source, I
>>> think this would be a nice addition, though without it I think it would be
>>> right a high but not high enough proportion of the time.
>>>
>>> R will build the package from source (depending on what OS you're using)
>>>> twice by default. This becomes especially burdensome when people are using
>>>> big packages (i.e. lots of depends) and someone has a script with:
>>>>
>>>
>>>
>>> install.packages("tidyverse")
>>>> ...
>>>> ... later on down the script
>>>> ...
>>>> install.packages("dplyr")
>>>>
>>>
>>> I mean, IMHO and as I think Duncan was alluding to, that's straight up an
>>> error by the script author. I think its a few of them, actually, but its at
>>> least one. An understandable one, sure, but thats still what it is. Scripts
>>> (which are meant to be run more than once, generally) usually shouldn't
>>> really be calling install.packages in the first place, but if they do, they
>>> should certainly not be installing umbrella packages and the packages they
>>> bring with them separately.
>>>
>>> Even having one vectorized call to install.packages where all the packages
>>> are installed would prevent this issue, including in the case where the
>>> user doesn't understand the purpose of the tidyverse package. Though the
>>> installation would still occur every time the script was run.
>>>
>>>
>>> The last thing to note is that there are at least 2 packages which provide
>>> a function which does this already (install.load and remotes), so people
>>> can get this functionality if they need it.
>>>
>>>
>>> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]> wrote:
>>>
>>>>
>>>>
>>>> I assumed this list is used to discuss proposals like this to the R
>>>> codebase. If I'm on the wrong list, please let me know.
>>>>
>>>
>>> This is the right place to discuss things like this. Thanks for starting
>>> the conversation.
>>>
>>> Best,
>>> ~G
>>>
>>>>
>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e=
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: [hidden email]
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> ______________________________________________
>> [hidden email] mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=fGJJxDES27LnpzyoNVndAepN8xSbeWQ7mB48xpQ-5UU&s=OQXCqMhgyQJDnh8FbLqcbXNHOXbd3F1uDWvKDS6Fk3s&e=

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Duncan Murdoch-2
In reply to this post by Henrik Bengtsson-5
On 08/11/2019 6:17 p.m., Henrik Bengtsson wrote:

> I believe introducing a backward compatible force=TRUE is a good
> start, even if we're not ready for making force=FALSE the default at
> this point.  It would help simplify quite-common instructions like
>
> if (requireNamespace("BiocManager"))
>    install.packages("BiocManager")
> BiocManager::install(...)
>
> to
>
> install.packages("BiocManager", force=FALSE)
> BiocManager::install(...)

If simplifying instructions is the goal, it would be even simpler to
just install it unconditionally:

install.packages("BiocManager")

Unlike dplyr (the original example in this thread), BiocManager is a
tiny package with no compiling needed, so it hardly needs any time to
install.

And as previously mentioned, the backward compatible force=TRUE wouldn't
help with the bad script at all.  In fact, the bad script could be fixed
simply by realizing that

install.packages("tidyverse")

means it's actually a bad idea to also include

install.packages("dplyr")

because the former would install dplyr if and only if it was not already
installed.  So it seems to me that fixing the bad script (by deleting
one line) is the solution to the problem, not fixing R with a multistage
series of revisions, tests, etc.

Duncan Murdoch

>
> and more so when installing lots of packages conditionally, e.g.
>
> if (requireNamespace("foo")) install.packages("foo")
> if (requireNamespace("bar")) install.packages("bar")
> ...
>
> to
>
> install.packages(c("foo", "bar", ...), force = FALSE)
>
> Before deciding on making force=FALSE the new default, I think it
> would be valuable to play the devil's advocate and explore and
> identify all possible downsides of such a default, e.g. breaking
> existing instructions, downstream package code that uses
> install.packages() internally, and so on.
>
> /Henrik
>
> PS. Although the idea of having update.packages() install missing
> packages is not bad, I don't think I'm a not a fan for the sole
> purpose of risking installation instructions starting using
> update.packages() instead, which will certainly confuse those who
> don't know the history (think require() vs library()).
>
> On Fri, Nov 8, 2019 at 3:11 PM Pages, Herve <[hidden email]> wrote:
>>
>> Hi Gabe,
>>
>> Keeping track of where a package was installed from would be a nice
>> feature. However it wouldn't be as reliable as comparing hashes to
>> decide whether a package needs re-installation or not.
>>
>> H.
>>
>> On 11/8/19 12:37, Gabriel Becker wrote:
>>> Hi Josh,
>>>
>>> There are a few issues I can think of with this. The primary one is that
>>> CRAN(/Bioconductor) is not the only place one can install packages from. I
>>> might have version x.y.z of a package installed that was, at the time, a
>>> development version I got from github, or installed locally, etc. Hell I
>>> might have a later devel version but want the CRAN version. Not common,
>>> sure, but wiill likely happen often enough that install.packages not doing
>>> that for me when I tell it to is probably bad.
>>>
>>> Currently (though there has been some discussion of changing this) packages
>>> do not remember where they were installed from, so R wouldn't know if the
>>> version you have is actually fully the same one on the repository you
>>> pointed install.packages to or not.  If that were changed  and we knew that
>>> we were getting the byte identical package from the actual same source, I
>>> think this would be a nice addition, though without it I think it would be
>>> right a high but not high enough proportion of the time.
>>>
>>> R will build the package from source (depending on what OS you're using)
>>>> twice by default. This becomes especially burdensome when people are using
>>>> big packages (i.e. lots of depends) and someone has a script with:
>>>>
>>>
>>>
>>> install.packages("tidyverse")
>>>> ...
>>>> ... later on down the script
>>>> ...
>>>> install.packages("dplyr")
>>>>
>>>
>>> I mean, IMHO and as I think Duncan was alluding to, that's straight up an
>>> error by the script author. I think its a few of them, actually, but its at
>>> least one. An understandable one, sure, but thats still what it is. Scripts
>>> (which are meant to be run more than once, generally) usually shouldn't
>>> really be calling install.packages in the first place, but if they do, they
>>> should certainly not be installing umbrella packages and the packages they
>>> bring with them separately.
>>>
>>> Even having one vectorized call to install.packages where all the packages
>>> are installed would prevent this issue, including in the case where the
>>> user doesn't understand the purpose of the tidyverse package. Though the
>>> installation would still occur every time the script was run.
>>>
>>>
>>> The last thing to note is that there are at least 2 packages which provide
>>> a function which does this already (install.load and remotes), so people
>>> can get this functionality if they need it.
>>>
>>>
>>> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]> wrote:
>>>
>>>>
>>>>
>>>> I assumed this list is used to discuss proposals like this to the R
>>>> codebase. If I'm on the wrong list, please let me know.
>>>>
>>>
>>> This is the right place to discuss things like this. Thanks for starting
>>> the conversation.
>>>
>>> Best,
>>> ~G
>>>
>>>>
>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e=
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: [hidden email]
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Joshua Bradley
Just to clarify the expected behavior I had in mind when proposing the
force argument.

force = T would mean you will "force" an install no matter what (aligns
with the current behavior of the command)

force = F means install a package if it is not found in the local R library
on your system. If it is already installed, do nothing and return as if a
successfull install occurred.



On Fri, Nov 8, 2019, 7:27 PM Duncan Murdoch <[hidden email]>
wrote:

> On 08/11/2019 6:17 p.m., Henrik Bengtsson wrote:
> > I believe introducing a backward compatible force=TRUE is a good
> > start, even if we're not ready for making force=FALSE the default at
> > this point.  It would help simplify quite-common instructions like
> >
> > if (requireNamespace("BiocManager"))
> >    install.packages("BiocManager")
> > BiocManager::install(...)
> >
> > to
> >
> > install.packages("BiocManager", force=FALSE)
> > BiocManager::install(...)
>
> If simplifying instructions is the goal, it would be even simpler to
> just install it unconditionally:
>
> install.packages("BiocManager")
>
> Unlike dplyr (the original example in this thread), BiocManager is a
> tiny package with no compiling needed, so it hardly needs any time to
> install.
>
> And as previously mentioned, the backward compatible force=TRUE wouldn't
> help with the bad script at all.  In fact, the bad script could be fixed
> simply by realizing that
>
> install.packages("tidyverse")
>
> means it's actually a bad idea to also include
>
> install.packages("dplyr")
>
> because the former would install dplyr if and only if it was not already
> installed.  So it seems to me that fixing the bad script (by deleting
> one line) is the solution to the problem, not fixing R with a multistage
> series of revisions, tests, etc.
>
> Duncan Murdoch
>
> >
> > and more so when installing lots of packages conditionally, e.g.
> >
> > if (requireNamespace("foo")) install.packages("foo")
> > if (requireNamespace("bar")) install.packages("bar")
> > ...
> >
> > to
> >
> > install.packages(c("foo", "bar", ...), force = FALSE)
> >
> > Before deciding on making force=FALSE the new default, I think it
> > would be valuable to play the devil's advocate and explore and
> > identify all possible downsides of such a default, e.g. breaking
> > existing instructions, downstream package code that uses
> > install.packages() internally, and so on.
> >
> > /Henrik
> >
> > PS. Although the idea of having update.packages() install missing
> > packages is not bad, I don't think I'm a not a fan for the sole
> > purpose of risking installation instructions starting using
> > update.packages() instead, which will certainly confuse those who
> > don't know the history (think require() vs library()).
> >
> > On Fri, Nov 8, 2019 at 3:11 PM Pages, Herve <[hidden email]>
> wrote:
> >>
> >> Hi Gabe,
> >>
> >> Keeping track of where a package was installed from would be a nice
> >> feature. However it wouldn't be as reliable as comparing hashes to
> >> decide whether a package needs re-installation or not.
> >>
> >> H.
> >>
> >> On 11/8/19 12:37, Gabriel Becker wrote:
> >>> Hi Josh,
> >>>
> >>> There are a few issues I can think of with this. The primary one is
> that
> >>> CRAN(/Bioconductor) is not the only place one can install packages
> from. I
> >>> might have version x.y.z of a package installed that was, at the time,
> a
> >>> development version I got from github, or installed locally, etc. Hell
> I
> >>> might have a later devel version but want the CRAN version. Not common,
> >>> sure, but wiill likely happen often enough that install.packages not
> doing
> >>> that for me when I tell it to is probably bad.
> >>>
> >>> Currently (though there has been some discussion of changing this)
> packages
> >>> do not remember where they were installed from, so R wouldn't know if
> the
> >>> version you have is actually fully the same one on the repository you
> >>> pointed install.packages to or not.  If that were changed  and we knew
> that
> >>> we were getting the byte identical package from the actual same
> source, I
> >>> think this would be a nice addition, though without it I think it
> would be
> >>> right a high but not high enough proportion of the time.
> >>>
> >>> R will build the package from source (depending on what OS you're
> using)
> >>>> twice by default. This becomes especially burdensome when people are
> using
> >>>> big packages (i.e. lots of depends) and someone has a script with:
> >>>>
> >>>
> >>>
> >>> install.packages("tidyverse")
> >>>> ...
> >>>> ... later on down the script
> >>>> ...
> >>>> install.packages("dplyr")
> >>>>
> >>>
> >>> I mean, IMHO and as I think Duncan was alluding to, that's straight up
> an
> >>> error by the script author. I think its a few of them, actually, but
> its at
> >>> least one. An understandable one, sure, but thats still what it is.
> Scripts
> >>> (which are meant to be run more than once, generally) usually shouldn't
> >>> really be calling install.packages in the first place, but if they do,
> they
> >>> should certainly not be installing umbrella packages and the packages
> they
> >>> bring with them separately.
> >>>
> >>> Even having one vectorized call to install.packages where all the
> packages
> >>> are installed would prevent this issue, including in the case where the
> >>> user doesn't understand the purpose of the tidyverse package. Though
> the
> >>> installation would still occur every time the script was run.
> >>>
> >>>
> >>> The last thing to note is that there are at least 2 packages which
> provide
> >>> a function which does this already (install.load and remotes), so
> people
> >>> can get this functionality if they need it.
> >>>
> >>>
> >>> On Fri, Nov 8, 2019 at 11:56 AM Joshua Bradley <[hidden email]>
> wrote:
> >>>
> >>>>
> >>>>
> >>>> I assumed this list is used to discuss proposals like this to the R
> >>>> codebase. If I'm on the wrong list, please let me know.
> >>>>
> >>>
> >>> This is the right place to discuss things like this. Thanks for
> starting
> >>> the conversation.
> >>>
> >>> Best,
> >>> ~G
> >>>
> >>>>
> >>>>
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list
> >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=XG4gVQKZam41YLfI3w8XRAu8s7f2I5jCppA45q6NBu0&s=cOXQGMA9Va3o9x1USGggzF82D1LtFQb2ALpLRLQs2k4&e=
> >>>
> >>
> >> --
> >> Hervé Pagès
> >>
> >> Program in Computational Biology
> >> Division of Public Health Sciences
> >> Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N, M1-B514
> >> P.O. Box 19024
> >> Seattle, WA 98109-1024
> >>
> >> E-mail: [hidden email]
> >> Phone:  (206) 667-5791
> >> Fax:    (206) 667-1319
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

hadley wickham
In reply to this post by Joshua Bradley
If this is the behaviour you are looking for, you might like to try
pak (https://pak.r-lib.org)

# Create a temporary library
path <- tempfile()
dir.create(path)
.libPaths(path)

pak::pkg_install("scales")
#> → Will install 8 packages:
#>   colorspace (1.4-1), labeling (0.3), munsell (0.5.0), R6 (2.4.0),
RColorBrewer
#>   (1.1-2), Rcpp (1.0.2), scales (1.0.0), viridisLite (0.3.0)
#>
#> → Will download 2 CRAN packages (4.7 MB), cached: 6 (3.69 MB).
#>
#> ✔ Installed colorspace 1.4-1 [139ms]
#> ✔ Installed labeling 0.3 [206ms]
#> ✔ Installed munsell 0.5.0 [288ms]
#> ✔ Installed R6 2.4.0 [375ms]
#> ✔ Installed RColorBrewer 1.1-2 [423ms]
#> ✔ Installed Rcpp 1.0.2 [472ms]
#> ✔ Installed scales 1.0.0 [511ms]
#> ✔ Installed viridisLite 0.3.0 [569ms]
#> ✔ 1 + 7 pkgs | kept 0, updated 0, new 8 | downloaded 2 (4.7 MB) [2.8s]

pak::pkg_install("scales")
#> ✔ No changes needed
#> ✔ 1 + 7 pkgs | kept 7, updated 0, new 0 | downloaded 0 (0 B) [855ms]

remove.packages(c("Rcpp", "munsell"))
pak::pkg_install("scales")
#> → Will install 2 packages:
#>   munsell (0.5.0), Rcpp (1.0.2)
#>
#> → All 2 packages (4.88 MB) are cached.
#>
#> ✔ Installed munsell 0.5.0 [75ms]
#> ✔ Installed Rcpp 1.0.2 [242ms]
#> ✔ 1 + 7 pkgs | kept 6, updated 0, new 2 | downloaded 0 (0 B) [1.5s]

On Fri, Nov 8, 2019 at 1:07 AM Joshua Bradley <[hidden email]> wrote:

>
> Hello,
>
> Currently if you install a package twice:
>
> install.packages("testit")
> install.packages("testit")
>
> R will build the package from source (depending on what OS you're using)
> twice by default. This becomes especially burdensome when people are using
> big packages (i.e. lots of depends) and someone has a script with:
>
> install.packages("tidyverse")
> ...
> ... later on down the script
> ...
> install.packages("dplyr")
>
> In this case, "dplyr" is part of the tidyverse and will install twice. As
> the primary "package manager" for R, it should not install a package twice
> (by default) when it can be so easily checked. Indeed, many people resort
> to writing a few lines of code to filter out already-installed packages An
> r-help post from 2010 proposed a solution to improving the default
> behavior, by adding "force=FALSE" as a api addition to install.packages.(
> https://stat.ethz.ch/pipermail/r-help/2010-May/239492.html)
>
> Would the R-core devs still consider this proposal?
>
> Josh Bradley
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: improving the performance of install.packages

Dirk Eddelbuettel
In reply to this post by Joshua Bradley

Joshua,

Doing this well "horizontally" (across different OSs even though for just one
domain, like CRAN and R) is difficult.

We have decent "vertical" solutions (with one OS / distro) for (at least
some) use / deployment cases as I show in a brief blog post and video here

  http://dirk.eddelbuettel.com/blog/2019/06/09#022_rocker_and_ppas

  https://www.youtube.com/watch?v=qIjWirNma-8&t=19s

Installing either 'tidyverse' or 'rstan' reduces to a single 'apt-get
install' command invocation which installs everything needed in a minute or
two. In a vertical stack, we can control for other OS-specific dependencies
which is powerful.  But it doesn't span across OSs. Covering installations
both "horizontally" and "vertically" is hard.

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel