Issue with data() function

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with data() function

R devel mailing list
I found an issue with the data() command this evening when working on the survival package.

1. I have a lot of data sets in the package, almost all used in at least one vignette,
help file, or test.  As a space saving measure, I have bundled many of them together,
i.e., the file data/cancer.rda contains 19 data sets, many of them small. The resulting
file (using xz compression) is quite a bit smaller than the individual ones.  (I still get
a warning note about size from R CMD check, but I'm no longer 2x the limit.)

2. Consider the lung data set.  All of these fail:
    data(lung)
    data("lung")
    data(lung, package="survival")

  a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R CMD check. 
(Several other .Rd files as well.)

  b. In broader examples for teaching, I sometimes load data from other packages, e.g
data(aidssi, package="mstate").  But this does not work for survival.  (The larger
survival data sets that are in separate .rda files can be found.)

  c. What does work is survival::lung.  Might it be useful to add a comment to data.Rd to
this effect?


3. Creating a separate package 'survivaldata' is of course one route, and is suggested in
the "Writing R Extensions" guide.  But this is not possible since survival is a
recommended package: it can't load any non-recommended package for it's tests or
vignettes.  Longer term, perhaps there is way around this constraint?

Terry T.

--
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
[hidden email]

"TERR-ree THUR-noh"


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Issue with data() function

Duncan Murdoch-2
On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote:

> I found an issue with the data() command this evening when working on the survival package.
>
> 1. I have a lot of data sets in the package, almost all used in at least one vignette,
> help file, or test.  As a space saving measure, I have bundled many of them together,
> i.e., the file data/cancer.rda contains 19 data sets, many of them small. The resulting
> file (using xz compression) is quite a bit smaller than the individual ones.  (I still get
> a warning note about size from R CMD check, but I'm no longer 2x the limit.)
>
> 2. Consider the lung data set.  All of these fail:
>      data(lung)
>      data("lung")
>      data(lung, package="survival")
>
>    a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R CMD check.
> (Several other .Rd files as well.)
>
>    b. In broader examples for teaching, I sometimes load data from other packages, e.g
> data(aidssi, package="mstate").  But this does not work for survival.  (The larger
> survival data sets that are in separate .rda files can be found.)
>
>    c. What does work is survival::lung.  Might it be useful to add a comment to data.Rd to
> this effect?

You don't describe how this dataset is being included in your package.
Have you moved it from data/lung.rda to data/cancer.rda?  Currently (in
survival 3.2-7) each of these works for me:

  library(survival); data(lung)

  library(survival); data("lung")

  # Without library(survival):
  data(lung, package="survival")

I think if the lung dataset is now being included in cancer.rda, you'd need

   data(cancer, package="survival")

or equivalent to load it (and the rest of the datasets there).

>
>
> 3. Creating a separate package 'survivaldata' is of course one route, and is suggested in
> the "Writing R Extensions" guide.  But this is not possible since survival is a
> recommended package: it can't load any non-recommended package for it's tests or
> vignettes.  Longer term, perhaps there is way around this constraint?

Maybe the solution is to put your datasets into the "datasets" package,
or make "survivaldata" a recommended package, or just leave things as
they are and ignore the warnings about package size.  I think that's a
negotiation you should have with R Core.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Issue with data() function

Dirk Eddelbuettel

On 24 October 2020 at 05:28, Duncan Murdoch wrote:
| they are and ignore the warnings about package size.  I think that's a
| negotiation you should have with R Core.

s/R Core/CRAN/  ?

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Issue with data() function

Duncan Murdoch-2
On 24/10/2020 2:00 p.m., Dirk Eddelbuettel wrote:
>
> On 24 October 2020 at 05:28, Duncan Murdoch wrote:
> | they are and ignore the warnings about package size.  I think that's a
> | negotiation you should have with R Core.
>
> s/R Core/CRAN/  ?

Yes, for that part.  The other suggestions need R Core agreement.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Issue with data() function

R devel mailing list
In reply to this post by Duncan Murdoch-2
Duncan and others:  I was not being careful with my description.  This concerned tests of
version 3.2-8, not yet on CRAN, in which I was trying some size-limiting measures.   My
apologies for not making this clear.

   - I feel mild pressure to make the survival package smaller, per CRAN guidelines, and
shrinking the data appears to be one way to approach that.  So a real point of the query
is my attempts to do so.   (I am much more resistant to shrinking the extensive test suite
or the vignettes.)
   -  The survival package has a lot of small data sets, and bundling them up into a
single .rda file does save space, but it causes some issues with data().   The overall
tarball goes from 7480 to 6100 in size (ls -s).

   Terry

On 10/24/20 4:28 AM, Duncan Murdoch wrote:

> On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote:
>> I found an issue with the data() command this evening when working on the survival
>> package.
>>
>> 1. I have a lot of data sets in the package, almost all used in at least one vignette,
>> help file, or test.  As a space saving measure, I have bundled many of them together,
>> i.e., the file data/cancer.rda contains 19 data sets, many of them small. The resulting
>> file (using xz compression) is quite a bit smaller than the individual ones.  (I still get
>> a warning note about size from R CMD check, but I'm no longer 2x the limit.)
>>
>> 2. Consider the lung data set.  All of these fail:
>>      data(lung)
>>      data("lung")
>>      data(lung, package="survival")
>>
>>    a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R CMD check.
>> (Several other .Rd files as well.)
>>
>>    b. In broader examples for teaching, I sometimes load data from other packages, e.g
>> data(aidssi, package="mstate").  But this does not work for survival.  (The larger
>> survival data sets that are in separate .rda files can be found.)
>>
>>    c. What does work is survival::lung.  Might it be useful to add a comment to data.Rd to
>> this effect?
>
> You don't describe how this dataset is being included in your package. Have you moved it
> from data/lung.rda to data/cancer.rda? Currently (in survival 3.2-7) each of these works
> for me:
>
>  library(survival); data(lung)
>
>  library(survival); data("lung")
>
>  # Without library(survival):
>  data(lung, package="survival")
>
> I think if the lung dataset is now being included in cancer.rda, you'd need
>
>   data(cancer, package="survival")
>
> or equivalent to load it (and the rest of the datasets there).
>
>>
>>
>> 3. Creating a separate package 'survivaldata' is of course one route, and is suggested in
>> the "Writing R Extensions" guide.  But this is not possible since survival is a
>> recommended package: it can't load any non-recommended package for it's tests or
>> vignettes.  Longer term, perhaps there is way around this constraint?
>
> Maybe the solution is to put your datasets into the "datasets" package, or make
> "survivaldata" a recommended package, or just leave things as they are and ignore the
> warnings about package size.  I think that's a negotiation you should have with R Core.
>
> Duncan Murdoch


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel