Creating a vignette which depends on a non-distributable file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating a vignette which depends on a non-distributable file

January Weiner-3
Dear all,

I am writing a vignette that requires a file which I am not allowed to
distribute, but which the user can easily download manually. Moreover, it
is not possible to download this file automatically from R: downloading
requires a (free) registration that seems to work only through a browser.
(I'm talking here about the MSigDB from the Broad Institute,
http://www.broadinstitute.org/gsea/msigdb/index.jsp).

In the vignette, I tell the user to download the file and then show how it
can be parsed and used in R. Thus, I can compile the vignette only if this
file is present in the vignettes/ directory of the package. However, it
would then get included in the package -- which I am not allowed to do.

What should I do?

(1) finding an alternative to MSigDB is not a solution -- there simply is
no alternative.
(2) I could enter the code (and the results) in a verbatim environment
instead of using Sweave. This has obvious drawbacks (for one thing, it
would look incosistent).
(3) I could build vignette outside of the package and put it into the
inst/doc directory. This also has obvious drawbacks.
(4) Leaving this example out defies the purpose of my package.

I am tending towards solution (2). What do you think?

Kind regards,

j.



--
-------- January Weiner --------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Creating a vignette which depends on a non-distributable file

Henrik Bengtsson-4
On May 14, 2015 15:04, "January Weiner" <[hidden email]> wrote:

>
> Dear all,
>
> I am writing a vignette that requires a file which I am not allowed to
> distribute, but which the user can easily download manually. Moreover, it
> is not possible to download this file automatically from R: downloading
> requires a (free) registration that seems to work only through a browser.
> (I'm talking here about the MSigDB from the Broad Institute,
> http://www.broadinstitute.org/gsea/msigdb/index.jsp).
>
> In the vignette, I tell the user to download the file and then show how it
> can be parsed and used in R. Thus, I can compile the vignette only if this
> file is present in the vignettes/ directory of the package. However, it
> would then get included in the package -- which I am not allowed to do.
>
> What should I do?
>
> (1) finding an alternative to MSigDB is not a solution -- there simply is
> no alternative.
> (2) I could enter the code (and the results) in a verbatim environment
> instead of using Sweave. This has obvious drawbacks (for one thing, it
> would look incosistent).
> (3) I could build vignette outside of the package and put it into the
> inst/doc directory. This also has obvious drawbacks.
> (4) Leaving this example out defies the purpose of my package.
>
> I am tending towards solution (2). What do you think?

Not clear how big of a static piece you're taking about, but maybe you
could set it up such that you use (2) as a fallback, i.e. have the vignette
include a static/pre-generated piece (which is clearly marked as such) only
if the external dependency is not available.

Just a thought

Henrik

>
> Kind regards,
>
> j.
>
>
>
> --
> -------- January Weiner --------------------------------------
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Creating a vignette which depends on a non-distributable file

Martin Morgan-2
On 05/14/2015 04:33 PM, Henrik Bengtsson wrote:

> On May 14, 2015 15:04, "January Weiner" <[hidden email]> wrote:
>>
>> Dear all,
>>
>> I am writing a vignette that requires a file which I am not allowed to
>> distribute, but which the user can easily download manually. Moreover, it
>> is not possible to download this file automatically from R: downloading
>> requires a (free) registration that seems to work only through a browser.
>> (I'm talking here about the MSigDB from the Broad Institute,
>> http://www.broadinstitute.org/gsea/msigdb/index.jsp).
>>
>> In the vignette, I tell the user to download the file and then show how it
>> can be parsed and used in R. Thus, I can compile the vignette only if this
>> file is present in the vignettes/ directory of the package. However, it
>> would then get included in the package -- which I am not allowed to do.
>>
>> What should I do?
>>
>> (1) finding an alternative to MSigDB is not a solution -- there simply is
>> no alternative.
>> (2) I could enter the code (and the results) in a verbatim environment
>> instead of using Sweave. This has obvious drawbacks (for one thing, it
>> would look incosistent).

use the chunk argument eval=FALSE instead of placing the code in a verbatim
argument. See ?RweaveLatex if you're compiling a PDF vignette from Rnw or the
knitr documentation for (much nicer for users of your vignette, in my opinion)
Rmd vignettes processed to HTML.

A common pattern is to process chunks 1, 2, 3, 4, and then there is a 'leap of
faith' in chunk 5 (with eval=FALSE) and a second chunk (maybe with echo=FALSE,
eval=TRUE) that reads the _result_ that would have been produced by chunk 5 from
a serialized instance into the R session for processing in chunks 6, 7, 8...

Also very often while it might make sense to analyse an entire data set as part
of a typical work flow, for illustrative purposes a much smaller subset or
simulated data might be relevant; again a strategy would be to illustrate the
problematic steps with simulated data, and then resume the narrative with the
analyzed full data.

A secondary consideration may be that if your package _requires_ MSigDB to
function, then it can't be automatically tested by repository build machines --
you'll want to have unit tests or other approaches to ensure that 'bit rot' does
not set in without you being aware of it.

If this is a Bioconductor package, then it's appropriate to ask on the
Bioconductor devel mailing list.

   http://bioconductor.org/developers/

http://bioconductor.org/packages/BiocStyle/ might be your friend for producing
stylish vignettes.

Martin

>> (3) I could build vignette outside of the package and put it into the
>> inst/doc directory. This also has obvious drawbacks.
>> (4) Leaving this example out defies the purpose of my package.
>>
>> I am tending towards solution (2). What do you think?
>
> Not clear how big of a static piece you're taking about, but maybe you
> could set it up such that you use (2) as a fallback, i.e. have the vignette
> include a static/pre-generated piece (which is clearly marked as such) only
> if the external dependency is not available.
>
> Just a thought
>
> Henrik
>
>>
>> Kind regards,
>>
>> j.
>>
>>
>>
>> --
>> -------- January Weiner --------------------------------------
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Creating a vignette which depends on a non-distributable file

January Weiner-3
Dear Martin,

thank you for the food for thought. My package does not depend on MSigDB
(it implements something better than MSigDB), but being able to work with
MSigDB (for comparative purposes) is important. Also, Bioconductor makes
sense only if you really want to take advantage of the Bioconductor
structures / tools, which I don't.

However, I find your suggestion with eval=FALSE and data subsets very good,
I will implement it, using hidden sections to simulate the output, thanks!

Kind regards,

j.


On 15 May 2015 at 01:50, Martin Morgan <[hidden email]> wrote:

> On 05/14/2015 04:33 PM, Henrik Bengtsson wrote:
>
>> On May 14, 2015 15:04, "January Weiner" <[hidden email]> wrote:
>>
>>>
>>> Dear all,
>>>
>>> I am writing a vignette that requires a file which I am not allowed to
>>> distribute, but which the user can easily download manually. Moreover, it
>>> is not possible to download this file automatically from R: downloading
>>> requires a (free) registration that seems to work only through a browser.
>>> (I'm talking here about the MSigDB from the Broad Institute,
>>> http://www.broadinstitute.org/gsea/msigdb/index.jsp).
>>>
>>> In the vignette, I tell the user to download the file and then show how
>>> it
>>> can be parsed and used in R. Thus, I can compile the vignette only if
>>> this
>>> file is present in the vignettes/ directory of the package. However, it
>>> would then get included in the package -- which I am not allowed to do.
>>>
>>> What should I do?
>>>
>>> (1) finding an alternative to MSigDB is not a solution -- there simply is
>>> no alternative.
>>> (2) I could enter the code (and the results) in a verbatim environment
>>> instead of using Sweave. This has obvious drawbacks (for one thing, it
>>> would look incosistent).
>>>
>>
> use the chunk argument eval=FALSE instead of placing the code in a
> verbatim argument. See ?RweaveLatex if you're compiling a PDF vignette from
> Rnw or the knitr documentation for (much nicer for users of your vignette,
> in my opinion) Rmd vignettes processed to HTML.
>
> A common pattern is to process chunks 1, 2, 3, 4, and then there is a
> 'leap of faith' in chunk 5 (with eval=FALSE) and a second chunk (maybe with
> echo=FALSE, eval=TRUE) that reads the _result_ that would have been
> produced by chunk 5 from a serialized instance into the R session for
> processing in chunks 6, 7, 8...
>
> Also very often while it might make sense to analyse an entire data set as
> part of a typical work flow, for illustrative purposes a much smaller
> subset or simulated data might be relevant; again a strategy would be to
> illustrate the problematic steps with simulated data, and then resume the
> narrative with the analyzed full data.
>
> A secondary consideration may be that if your package _requires_ MSigDB to
> function, then it can't be automatically tested by repository build
> machines -- you'll want to have unit tests or other approaches to ensure
> that 'bit rot' does not set in without you being aware of it.
>
> If this is a Bioconductor package, then it's appropriate to ask on the
> Bioconductor devel mailing list.
>
>   http://bioconductor.org/developers/
>
> http://bioconductor.org/packages/BiocStyle/ might be your friend for
> producing stylish vignettes.
>
> Martin
>
>
>  (3) I could build vignette outside of the package and put it into the
>>> inst/doc directory. This also has obvious drawbacks.
>>> (4) Leaving this example out defies the purpose of my package.
>>>
>>> I am tending towards solution (2). What do you think?
>>>
>>
>> Not clear how big of a static piece you're taking about, but maybe you
>> could set it up such that you use (2) as a fallback, i.e. have the
>> vignette
>> include a static/pre-generated piece (which is clearly marked as such)
>> only
>> if the external dependency is not available.
>>
>> Just a thought
>>
>> Henrik
>>
>>
>>> Kind regards,
>>>
>>> j.
>>>
>>>
>>>
>>> --
>>> -------- January Weiner --------------------------------------
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>



--
-------- January Weiner --------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel