Citation of R packages

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Citation of R packages

John Maindonald
The bibtex citations provided by citation() do not
work all that well in cases where there is no printed
document to reference:
(1) A version field is needed, as the note field is
required for other purposes, currently trying to
sort out nuances that cannot be sorted out in the
author list (author, compiler, implementor of R version,
contributor, ...) and maybe giving a cross-reference
to a book or paper that is somehow relevant.
(2) Maybe the author field should be more nuanced, or
maybe ...
(3) In compiling a list of packages, name order seems
preferable, and one wants the title first (achieved by
relocating the format.title field in the manual FUNCTION
in the .bst file
(4) manual seems not an ideal name for the class, if
there is no manual.

Maybe what is needed is a package or suchlike class,
and several alternative .bst files that handle the needed
listings.

I know at least one other person who is wrestling with
this, and others on this list must be wrestling with it.

John Maindonald             email: [hidden email]
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Mathematical Sciences Institute, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Citation of R packages

Friedrich.Leisch
>>>>> On Mon, 30 Jan 2006 10:06:52 +1100 (EST),
>>>>> John Maindonald (JM) wrote:

  > The bibtex citations provided by citation() do not
  > work all that well in cases where there is no printed
  > document to reference:

That's why there is a warning at the end that they will need manual
editing ... IMHO they at least save you some typing effort in many
cases.

  > (1) A version field is needed, as the note field is
  > required for other purposes, currently trying to
  > sort out nuances that cannot be sorted out in the
  > author list (author, compiler, implementor of R version,
  > contributor, ...) and maybe giving a cross-reference
  > to a book or paper that is somehow relevant.

Why should a reference cross-reference another reference? Could you
give an example?

  > (2) Maybe the author field should be more nuanced, or
  > maybe ...

author fields of bibtex entries have a strict format (names separated
by "and"), what do you mean by "more nuanced"?

  > (3) In compiling a list of packages, name order seems
  > preferable, and one wants the title first (achieved by
  > relocating the format.title field in the manual FUNCTION
  > in the .bst file
  > (4) manual seems not an ideal name for the class, if
  > there is no manual.

A package always has a "reference manual", the concatenated help pages
certainly qualify as such and can be downloaded in PDF format from
CRAN. The ISBN rules even allow to assign an ISBN number to the online
help of a software package which also can serve as the ISBN number of
the *software itself* (which we did for base R).

  > Maybe what is needed is a package or suchlike class,
  > and several alternative .bst files that handle the needed
  > listings.

  > I know at least one other person who is wrestling with
  > this, and others on this list must be wrestling with it.

I am certainly open for discussions and any suggestions for
improvements, but it must be within the standard bibtex entry types,
we cannot write our own entry types and .bst files. Many journals
require the usage of their own (or standard) bibtex styles, and the
entries we produce must work with those. If R creates nonstandard
bibtex entries even more manual work will be necessary in many
cases.

I have no definitive bibtex reference at hand, but the natbib style
files (a very popular collection of bibtex styles, at least I
definitely want to be compatible with those) define

 article
 book
 booklet
 conference  (= alias for inproceedings)
 inbook
 incollection
 inproceedings
 manual
 mastersthesis
 misc
 phdthesis
 proceedings
 techreport
 unpublished

which coincide with the choices the emacs bibtex mode offers. Out of
these only "manual", "misc" and "unpublished" seem appropriate for
packages, and the description suggests to use manual for citing
software manuals, but the definitions of those three are very similar
anyway.

Maybe you could give an example what your candidate for a bibtex entry
for packages should look like?

Best,
Fritz

--
-------------------------------------------------------------------
                        Friedrich Leisch
Institut für Statistik                     Tel: (+43 1) 58801 10715
Technische Universität Wien                Fax: (+43 1) 58801 10798
Wiedner Hauptstraße 8-10/1071
A-1040 Wien, Austria             http://www.ci.tuwien.ac.at/~leisch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Citation of R packages

John Maindonald
On 5 Feb 2006, at 2:27 AM, [hidden email] wrote:

>>>>>> On Mon, 30 Jan 2006 10:06:52 +1100 (EST),
>>>>>> John Maindonald (JM) wrote:
>
>> The bibtex citations provided by citation() do not
>> work all that well in cases where there is no printed
>> document to reference:
>
> That's why there is a warning at the end that they will need manual
> editing ... IMHO they at least save you some typing effort in many
> cases.

They are certainly a useful start.

>> (1) A version field is needed, as the note field is
>> required for other purposes, currently trying to
>> sort out nuances that cannot be sorted out in the
>> author list (author, compiler, implementor of R version,
>> contributor, ...) and maybe giving a cross-reference
>> to a book or paper that is somehow relevant.
>
> Why should a reference cross-reference another reference? Could you
> give an example?

Where there is a published paper or a book (such as MASS), or a manual
for which a url can be given, my decision was to include that in the  
main
list of references, but not to include references there that were  
references
to the package itself, which as you suggest below can be a reference to
the concatenated help pages.

It seemed anyway useful to have a separate list of packages.  For
consistency, these were always references to the package, with a
cross-reference to any relevant document in the references to papers.

>> (2) Maybe the author field should be more nuanced, or
>> maybe ...
>
> author fields of bibtex entries have a strict format (names separated
> by "and"), what do you mean by "more nuanced"?

Those named in the list of authors may be any combination of: the  
authors
of an R package, the authors of an original S version, the person or  
persons
responsible for an R port, the authors of the Fortran code, compiler
(s), and
contributors of ideas.

For John Fox's car, citation() gives the following:
     author = {John Fox. I am grateful to Douglas Bates and David  
Firth and Michael Friendly and Gregor Gorjanc and Georges Monette and  
Henric Nilsson and Brian Ripley and Sanford Weisberg and and Achim  
Zeleis for various suggestions and contributions.},

For Rcmdr:
     author = {John Fox and with contributions from Michael Ash and  
Philippe Grosjean and Martin Maechler and Dan Putler and and Peter  
Wolf.},

For car, maybe John Fox should be identified as author.  For Rcmdr,  
maybe the other persons that are named should be added?

For leaps:
     author = {Thomas Lumley using Fortran code by Alan Miller},

It seems reasonable to cite Lumley and Miller as authors.  Should  
there be a note that identifies Miller as the contributor of the  
Fortran code?

Should the name(s) of porters (usually from S) be included as author
(s)?  Or should their contribution be acknowledged in the note field?  
Or ...

Possibilities are to cite all those individuals as author, or to cite  
John Fox only,
with any combination of no additional information in the note field,  
or using the
note field to explain who did what.  The citation() function leaves  
it unclear who
are to be acknowledged as authors, and in fact

>> (3) In compiling a list of packages, name order seems
>> preferable, and one wants the title first (achieved by
>> relocating the format.title field in the manual FUNCTION
>> in the .bst file
>> (4) manual seems not an ideal name for the class, if
>> there is no manual.
>
> A package always has a "reference manual", the concatenated help pages
> certainly qualify as such and can be downloaded in PDF format from
> CRAN. The ISBN rules even allow to assign an ISBN number to the online
> help of a software package which also can serve as the ISBN number of
> the *software itself* (which we did for base R).

I'd prefer some consistency in the way that R packages are referenced.
Thus, if reference for one package is to the concatenated help pages,
do it that way for all of them.

>> Maybe what is needed is a package or suchlike class,
>> and several alternative .bst files that handle the needed
>> listings.
>
>> I know at least one other person who is wrestling with
>> this, and others on this list must be wrestling with it.
>
> I am certainly open for discussions and any suggestions for
> improvements, but it must be within the standard bibtex entry types,
> we cannot write our own entry types and .bst files. Many journals
> require the usage of their own (or standard) bibtex styles, and the
> entries we produce must work with those. If R creates nonstandard
> bibtex entries even more manual work will be necessary in many
> cases.
>
> I have no definitive bibtex reference at hand, but the natbib style
> files (a very popular collection of bibtex styles, at least I
> definitely want to be compatible with those) define
>
>  article
>  book
>  booklet
>  conference  (= alias for inproceedings)
>  inbook
>  incollection
>  inproceedings
>  manual
>  mastersthesis
>  misc
>  phdthesis
>  proceedings
>  techreport
>  unpublished
>
> which coincide with the choices the emacs bibtex mode offers. Out of
> these only "manual", "misc" and "unpublished" seem appropriate for
> packages, and the description suggests to use manual for citing
> software manuals, but the definitions of those three are very similar
> anyway.
>
> Maybe you could give an example what your candidate for a bibtex entry
> for packages should look like?

It will depend on context.  The requirement for a paper will be  
different
from that for a book.

Here's what I've done for boot:

@Manual{boot-package,
     title = {boot: Bootstrap R (S-Plus) Functions (Canty)},
     author = "{\noopsort{boot}}{Canty, A. and Ripley, B.}",
     key  = {boot},
     year = {2005},
     note = {(Version 1.2-24). S original by A.~Canty; R port by  
B.~Ripley.
See further \citet{Canty}}
   }

Maybe I should either omit the version number or include it along
with the title.

John


> Best,
> Fritz
>
> --
> -------------------------------------------------------------------
>                         Friedrich Leisch
> Institut für Statistik                     Tel: (+43 1) 58801 10715
> Technische Universität Wien                Fax: (+43 1) 58801 10798
> Wiedner Hauptstraße 8-10/1071
> A-1040 Wien, Austria             http://www.ci.tuwien.ac.at/~leisch
> -------------------------------------------------------------------

John Maindonald             email: [hidden email]
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Mathematical Sciences Institute, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Citation of R packages

Friedrich.Leisch
>>>>> On Fri, 10 Feb 2006 21:01:44 +1100,
>>>>> John Maindonald (JM) wrote:

[...]

  > Where there is a published paper or a book (such as MASS), or a
  > manual for which a url can be given, my decision was to include
  > that in the main list of references, but not to include references
  > there that were references to the package itself, which as you
  > suggest below can be a reference to the concatenated help pages.

The CITATION file of a package may contain as many entries as the
author wants, including both a reference to the help pages and to the
book (or whatever).


  > It seemed anyway useful to have a separate list of packages.  For
  > consistency, these were always references to the package, with a
  > cross-reference to any relevant document in the references to papers.

  >>> (2) Maybe the author field should be more nuanced, or
  >>> maybe ...
  >>
  >> author fields of bibtex entries have a strict format (names separated
  >> by "and"), what do you mean by "more nuanced"?

  > Those named in the list of authors may be any combination of: the  
  > authors
  > of an R package, the authors of an original S version, the person or  
  > persons
  > responsible for an R port, the authors of the Fortran code, compiler
  > (s), and
  > contributors of ideas.

  > For John Fox's car, citation() gives the following:
  >      author = {John Fox. I am grateful to Douglas Bates and David  
  > Firth and Michael Friendly and Gregor Gorjanc and Georges Monette and  
  > Henric Nilsson and Brian Ripley and Sanford Weisberg and and Achim  
  > Zeleis for various suggestions and contributions.},

  > For Rcmdr:
  >      author = {John Fox and with contributions from Michael Ash and  
  > Philippe Grosjean and Martin Maechler and Dan Putler and and Peter  
  > Wolf.},

  > For car, maybe John Fox should be identified as author.  For Rcmdr,  
  > maybe the other persons that are named should be added?

  > For leaps:
  >      author = {Thomas Lumley using Fortran code by Alan Miller},

  > It seems reasonable to cite Lumley and Miller as authors.  Should  
  > there be a note that identifies Miller as the contributor of the  
  > Fortran code?

  > Should the name(s) of porters (usually from S) be included as author
  > (s)?  Or should their contribution be acknowledged in the note field?  
  > Or ...

  > Possibilities are to cite all those individuals as author, or to cite  
  > John Fox only,
  > with any combination of no additional information in the note field,  
  > or using the
  > note field to explain who did what.  The citation() function leaves  
  > it unclear who
  > are to be acknowledged as authors, and in fact


Umm, the problem there is not the citation() function, but that the
authors of all those packages obviously have not included a CITATION
file in their package which overrides the default (extracted from the
DESCRIPTION file).

E.g., package flexclust has DESCRIPTION

Package: flexclust
Version: 0.8-1
Date: 2006-01-11
Author: Friedrich Leisch, parts based on code by Evgenia Dimitriadou

but

****
R> citation("flexclust")

To cite package flexclust in publications use:

  Friedrich Leisch. A Toolbox for K-Centroids Cluster Analysis.
  Computational Statistics and Data Analysis, 2006. Accepted for
  publication.

A BibTeX entry for LaTeX users is

  @Article{,
    author = {Friedrich Leisch},
    title = {A Toolbox for K-Centroids Cluster Analysis},
    journal = {Computational Statistics and Data Analysis},
    year = {2006},
    note = {Accepted for publication},
  }
****

because the CITATION file overrides the DESCRIPTION file. Writing a
CITATION file is of course also intended for those cases where a
proper reference cannot be auto-generated from the DESCRIPTION file.


  >>> (3) In compiling a list of packages, name order seems
  >>> preferable, and one wants the title first (achieved by
  >>> relocating the format.title field in the manual FUNCTION
  >>> in the .bst file
  >>> (4) manual seems not an ideal name for the class, if
  >>> there is no manual.
  >>
  >> A package always has a "reference manual", the concatenated help pages
  >> certainly qualify as such and can be downloaded in PDF format from
  >> CRAN. The ISBN rules even allow to assign an ISBN number to the online
  >> help of a software package which also can serve as the ISBN number of
  >> the *software itself* (which we did for base R).

  > I'd prefer some consistency in the way that R packages are referenced.
  > Thus, if reference for one package is to the concatenated help pages,
  > do it that way for all of them.

But we recommend that package authors should (try to) get their work
into reviewed journals like JSS, JCGS, or CSDA, and then package
authors usually prefer if the article gets cited. Unfortunately, many
academic institutions value paper publications higher than software.
Citing the help pages is mainly intended as a substitute if no journal
article is available.

Best,
Fritz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Citation of R packages

John Maindonald
Even if a CITATION file is included, there is an issue of what to put  
in it.
Authorship of a book or paper is not always the simple matter that might
appear.  With an R package, it can be a far from simple matter.  We are
trying to adapt a tool, surely, that was designed for different  
purposes.

1. I'd like to see the definition of a new BibTeX entry type that has  
fields for
additional author details and version number. There is surely some
mechanism for getting agreement on a new entry type.

2. In any case, there's a message for maintainers of packages to include
CITATION files that reflect what they want to appear in any citation,  
with
citation("lattice") as maybe a suitable model?

John.

John Maindonald             email: [hidden email]
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Mathematical Sciences Institute, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 11 Feb 2006, at 5:36 AM, [hidden email] wrote:

>>>>>> On Fri, 10 Feb 2006 21:01:44 +1100,
>>>>>> John Maindonald (JM) wrote:
>
> [...]
>
>> Where there is a published paper or a book (such as MASS), or a
>> manual for which a url can be given, my decision was to include
>> that in the main list of references, but not to include references
>> there that were references to the package itself, which as you
>> suggest below can be a reference to the concatenated help pages.
>
> The CITATION file of a package may contain as many entries as the
> author wants, including both a reference to the help pages and to the
> book (or whatever).
>
>
>> It seemed anyway useful to have a separate list of packages.  For
>> consistency, these were always references to the package, with a
>> cross-reference to any relevant document in the references to papers.
>
>>>> (2) Maybe the author field should be more nuanced, or
>>>> maybe ...
>>>
>>> author fields of bibtex entries have a strict format (names  
>>> separated
>>> by "and"), what do you mean by "more nuanced"?
>
>> Those named in the list of authors may be any combination of: the
>> authors
>> of an R package, the authors of an original S version, the person or
>> persons
>> responsible for an R port, the authors of the Fortran code, compiler
>> (s), and
>> contributors of ideas.
>
>> For John Fox's car, citation() gives the following:
>>      author = {John Fox. I am grateful to Douglas Bates and David
>> Firth and Michael Friendly and Gregor Gorjanc and Georges Monette and
>> Henric Nilsson and Brian Ripley and Sanford Weisberg and and Achim
>> Zeleis for various suggestions and contributions.},
>
>> For Rcmdr:
>>      author = {John Fox and with contributions from Michael Ash and
>> Philippe Grosjean and Martin Maechler and Dan Putler and and Peter
>> Wolf.},
>
>> For car, maybe John Fox should be identified as author.  For Rcmdr,
>> maybe the other persons that are named should be added?
>
>> For leaps:
>>      author = {Thomas Lumley using Fortran code by Alan Miller},
>
>> It seems reasonable to cite Lumley and Miller as authors.  Should
>> there be a note that identifies Miller as the contributor of the
>> Fortran code?
>
>> Should the name(s) of porters (usually from S) be included as author
>> (s)?  Or should their contribution be acknowledged in the note field?
>> Or ...
>
>> Possibilities are to cite all those individuals as author, or to cite
>> John Fox only,
>> with any combination of no additional information in the note field,
>> or using the
>> note field to explain who did what.  The citation() function leaves
>> it unclear who
>> are to be acknowledged as authors, and in fact
>
>
> Umm, the problem there is not the citation() function, but that the
> authors of all those packages obviously have not included a CITATION
> file in their package which overrides the default (extracted from the
> DESCRIPTION file).
>
> E.g., package flexclust has DESCRIPTION
>
> Package: flexclust
> Version: 0.8-1
> Date: 2006-01-11
> Author: Friedrich Leisch, parts based on code by Evgenia Dimitriadou
>
> but
>
> ****
> R> citation("flexclust")
>
> To cite package flexclust in publications use:
>
>   Friedrich Leisch. A Toolbox for K-Centroids Cluster Analysis.
>   Computational Statistics and Data Analysis, 2006. Accepted for
>   publication.
>
> A BibTeX entry for LaTeX users is
>
>   @Article{,
>     author = {Friedrich Leisch},
>     title = {A Toolbox for K-Centroids Cluster Analysis},
>     journal = {Computational Statistics and Data Analysis},
>     year = {2006},
>     note = {Accepted for publication},
>   }
> ****
>
> because the CITATION file overrides the DESCRIPTION file. Writing a
> CITATION file is of course also intended for those cases where a
> proper reference cannot be auto-generated from the DESCRIPTION file.
>
>
>>>> (3) In compiling a list of packages, name order seems
>>>> preferable, and one wants the title first (achieved by
>>>> relocating the format.title field in the manual FUNCTION
>>>> in the .bst file
>>>> (4) manual seems not an ideal name for the class, if
>>>> there is no manual.
>>>
>>> A package always has a "reference manual", the concatenated help  
>>> pages
>>> certainly qualify as such and can be downloaded in PDF format from
>>> CRAN. The ISBN rules even allow to assign an ISBN number to the  
>>> online
>>> help of a software package which also can serve as the ISBN  
>>> number of
>>> the *software itself* (which we did for base R).
>
>> I'd prefer some consistency in the way that R packages are  
>> referenced.
>> Thus, if reference for one package is to the concatenated help pages,
>> do it that way for all of them.
>
> But we recommend that package authors should (try to) get their work
> into reviewed journals like JSS, JCGS, or CSDA, and then package
> authors usually prefer if the article gets cited. Unfortunately, many
> academic institutions value paper publications higher than software.
> Citing the help pages is mainly intended as a substitute if no journal
> article is available.
>
> Best,
> Fritz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel