Quantcast

popular R packages

classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

popular R packages

Jeroen Ooms.
I would like to get some idea of which R-packages are popular, and what R is used for in general. Are there any statistics available on which R packages are downloaded often, or is there something like a package-survey? Something similar to http://popcon.debian.org/ maybe? Any tips are welcome!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Gabor Grothendieck
This function will show which other packages depend on a particular
package:

> dep <- function(pkg, AP = available.packages()) {
+    pkg <- paste("\\b", pkg, "\\b", sep = "")
+    cat("Depends:", rownames(AP)[grep(pkg, AP[, "Depends"])], "\n")
+    cat("Suggests:", rownames(AP)[grep(pkg, AP[, "Suggests"])], "\n")
+ }
> dep("zoo")
Depends: AER BootPR FinTS PerformanceAnalytics RBloomberg
StreamMetabolism TSfame TShistQuote VhayuR dyn dynlm fda fxregime
lmtest meboot party quantmod sandwich sde strucchange tripEstimation
tseries xts
Suggests: TSMySQL TSPostgreSQL TSSQLite TSdbi TSodbc UsingR Zelig
gsubfn playwith pscl tframePlus


On Sat, Mar 7, 2009 at 2:57 PM, Jeroen Ooms <[hidden email]> wrote:

>
> I would like to get some idea of which R-packages are popular, and what R is
> used for in general. Are there any statistics available on which R packages
> are downloaded often, or is there something like a package-survey? Something
> similar to http://popcon.debian.org/ maybe? Any tips are welcome!
>
> -----
> Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University
>
> Visit  http://www.jeroenooms.com www.jeroenooms.com  to explore some of my
> current projects.
>
>
>
>
>
>
> --
> View this message in context: http://www.nabble.com/popular-R-packages-tp22391260p22391260.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

David Winsemius
In reply to this post by Jeroen Ooms.
When the question arises "How many R-users there are?", the consensus  
seems to be that there is no valid method to address the question. The  
thread "R-business case" from 2004 can be found here:
https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html

I did not see any material revision to that conclusion during the  
recent discussion of the New York Times article on the r-challenge to  
SAS.

Gmane tracks the number of r-help activity (I realize not what you  
asked for):
http://www.gmane.org/info.php?group=gmane.comp.lang.r.general

The distribution of r-packages is, well  ... distributed:
http://cran.r-project.org/mirrors.html

At least one of the participants in the 2004 thread suggested that it  
would be a "good thing" to track the numbers of downloads by package.  
I have not heard of any such system being installed in the mirror  
software and I see nothing that suggests data gathering in the CRAN  
Mirror How-to:
http://cran.r-project.org/mirror-howto.html

On the other hand I am not part of R-core, so you must await more  
authoritative opinion since a 5 year-old thread and amateur  
speculation is not much of a leg to stand on.

There are lexicographic packages for R. One approach to a de novo  
analysis would be to do some sort of natural language analysis of the  
r-help archives counting up either package names with non-English  
names or  close proximity of the words "library" or "package" to  
package names that overlap the 30,000 common English words. That would  
have the danger of inflating counts of the packages with the least  
adequate documentation or a paucity of good worked examples, but there  
are many readers of this list who suspect that new users don't look at  
the documentation, so who knows?

--
David Winsemius


On Mar 7, 2009, at 2:57 PM, Jeroen Ooms wrote:

>
> I would like to get some idea of which R-packages are popular, and  
> what R is
> used for in general. Are there any statistics available on which R  
> packages
> are downloaded often, or is there something like a package-survey?  
> Something
> similar to http://popcon.debian.org/ maybe? Any tips are welcome!
>
> -----
> Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University
>
> Visit  http://www.jeroenooms.com www.jeroenooms.com  to explore some  
> of my
> current projects.
>
>
>
>
>
>
> --
> View this message in context: http://www.nabble.com/popular-R-packages-tp22391260p22391260.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Thomas Adams
I don't think "At least one of the participants in the 2004 thread
suggested that it would be a "good thing" to track the numbers of
downloads by package." is reasonable because I download R packages for 2
home computers (laptop & desktop) and 2 at work (1 Linux & 1 Mac). There
must be many such cases…

Tom

David Winsemius wrote:

> When the question arises "How many R-users there are?", the consensus
> seems to be that there is no valid method to address the question. The
> thread "R-business case" from 2004 can be found here:
> https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html
>
> I did not see any material revision to that conclusion during the
> recent discussion of the New York Times article on the r-challenge to
> SAS.
>
> Gmane tracks the number of r-help activity (I realize not what you
> asked for):
> http://www.gmane.org/info.php?group=gmane.comp.lang.r.general
>
> The distribution of r-packages is, well ... distributed:
> http://cran.r-project.org/mirrors.html
>
> At least one of the participants in the 2004 thread suggested that it
> would be a "good thing" to track the numbers of downloads by package.
> I have not heard of any such system being installed in the mirror
> software and I see nothing that suggests data gathering in the CRAN
> Mirror How-to:
> http://cran.r-project.org/mirror-howto.html
>
> On the other hand I am not part of R-core, so you must await more
> authoritative opinion since a 5 year-old thread and amateur
> speculation is not much of a leg to stand on.
>
> There are lexicographic packages for R. One approach to a de novo
> analysis would be to do some sort of natural language analysis of the
> r-help archives counting up either package names with non-English
> names or close proximity of the words "library" or "package" to
> package names that overlap the 30,000 common English words. That would
> have the danger of inflating counts of the packages with the least
> adequate documentation or a paucity of good worked examples, but there
> are many readers of this list who suspect that new users don't look at
> the documentation, so who knows?
>


--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL: [hidden email]

VOICE: 937-383-0528
FAX: 937-383-0033

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Tal Galili
I agree with Thomas, over the years I have installed R on at least 5
computers.

BTW: does any one knows how the website statistics of r-project are
being analyzed?
Since I can't see any "google analytics" or other tracking code in the main
website, I am guessing someone might be running some log-file analyzer - but
I'd rather hear that then assume.






On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams <[hidden email]> wrote:

> I don't think "At least one of the participants in the 2004 thread
> suggested that it would be a "good thing" to track the numbers of downloads
> by package." is reasonable because I download R packages for 2 home
> computers (laptop & desktop) and 2 at work (1 Linux & 1 Mac). There must be
> many such cases…
>
> Tom
>
> David Winsemius wrote:
>
>> When the question arises "How many R-users there are?", the consensus
>> seems to be that there is no valid method to address the question. The
>> thread "R-business case" from 2004 can be found here:
>> https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html
>>
>> I did not see any material revision to that conclusion during the recent
>> discussion of the New York Times article on the r-challenge to SAS.
>>
>> Gmane tracks the number of r-help activity (I realize not what you asked
>> for):
>> http://www.gmane.org/info.php?group=gmane.comp.lang.r.general
>>
>> The distribution of r-packages is, well ... distributed:
>> http://cran.r-project.org/mirrors.html
>>
>> At least one of the participants in the 2004 thread suggested that it
>> would be a "good thing" to track the numbers of downloads by package. I have
>> not heard of any such system being installed in the mirror software and I
>> see nothing that suggests data gathering in the CRAN Mirror How-to:
>> http://cran.r-project.org/mirror-howto.html
>>
>> On the other hand I am not part of R-core, so you must await more
>> authoritative opinion since a 5 year-old thread and amateur speculation is
>> not much of a leg to stand on.
>>
>> There are lexicographic packages for R. One approach to a de novo analysis
>> would be to do some sort of natural language analysis of the r-help archives
>> counting up either package names with non-English names or close proximity
>> of the words "library" or "package" to package names that overlap the 30,000
>> common English words. That would have the danger of inflating counts of the
>> packages with the least adequate documentation or a paucity of good worked
>> examples, but there are many readers of this list who suspect that new users
>> don't look at the documentation, so who knows?
>>
>>
>
> --
> Thomas E Adams
> National Weather Service
> Ohio River Forecast Center
> 1901 South State Route 134
> Wilmington, OH 45177
>
> EMAIL:  [hidden email]
>
> VOICE:  937-383-0528
> FAX:    937-383-0033
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

David Winsemius
In reply to this post by Thomas Adams
Quite so. It certainly is the case that Dirk Eddelbuettel suggested  
would be very desirable and I think Dirk's track record speaks for  
itself. I never said (and I am sure Dirk never intended) that one  
could take the raw numbers as a basis for blandly asserting that  
<nnnn> copies of <ttt> package are currently installed.

When I update packages, the automated process takes hold and I go for  
a cup of coffee. I only have at the moment two computers with R  
installed and have not updated any binary packages on Windoze in over  
a year.  Nonetheless, I do think the relative numbers of package  
downloads might be interpretable, or at the very least, the basis for  
discussions over beer.

--
David Winsemius


On Mar 7, 2009, at 5:45 PM, Thomas Adams wrote:

> I don't think "At least one of the participants in the 2004 thread  
> suggested that it would be a "good thing" to track the numbers of  
> downloads by package." is reasonable because I download R packages  
> for 2 home computers (laptop & desktop) and 2 at work (1 Linux & 1  
> Mac). There must be many such cases…
>
> Tom
>
> David Winsemius wrote:
>> When the question arises "How many R-users there are?", the  
>> consensus seems to be that there is no valid method to address the  
>> question. The thread "R-business case" from 2004 can be found here:
>> https://stat.ethz.ch/pipermail/r-help/2004-March/047606.html
>>
>> I did not see any material revision to that conclusion during the  
>> recent discussion of the New York Times article on the r-challenge  
>> to SAS.
>>
>> Gmane tracks the number of r-help activity (I realize not what you  
>> asked for):
>> http://www.gmane.org/info.php?group=gmane.comp.lang.r.general
>>
>> The distribution of r-packages is, well ... distributed:
>> http://cran.r-project.org/mirrors.html
>>
>> At least one of the participants in the 2004 thread suggested that  
>> it would be a "good thing" to track the numbers of downloads by  
>> package. I have not heard of any such system being installed in the  
>> mirror software and I see nothing that suggests data gathering in  
>> the CRAN Mirror How-to:
>> http://cran.r-project.org/mirror-howto.html
>>
>> On the other hand I am not part of R-core, so you must await more  
>> authoritative opinion since a 5 year-old thread and amateur  
>> speculation is not much of a leg to stand on.
>>
>> There are lexicographic packages for R. One approach to a de novo  
>> analysis would be to do some sort of natural language analysis of  
>> the r-help archives counting up either package names with non-
>> English names or close proximity of the words "library" or  
>> "package" to package names that overlap the 30,000 common English  
>> words. That would have the danger of inflating counts of the  
>> packages with the least adequate documentation or a paucity of good  
>> worked examples, but there are many readers of this list who  
>> suspect that new users don't look at the documentation, so who knows?
>>
>
>
> --
> Thomas E Adams
> National Weather Service
> Ohio River Forecast Center
> 1901 South State Route 134
> Wilmington, OH 45177
>
> EMAIL: [hidden email]
>
> VOICE: 937-383-0528
> FAX: 937-383-0033
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Jeroen Ooms.
In reply to this post by Tal Galili
>
> I agree with Thomas, over the years I have installed R on at least 5
> computers.
>

I don't see why per-marchine statistics would not be useful. When you
installed a package on five machines, you probably use it a lot, and it is
more important to you than packages that you only installed once.

Furthermore I don't think the distribution of packages has to be
problematic. I guess downloads are only slightly related to the specific
mirror, so download statistics from one of the popular mirror's would do for
me.

Of course these statistics are never perfect, but they could be
informative...

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Wacek Kusnierczyk
In reply to this post by Tal Galili
i have kept r installed on more than ten computers during the past few
years, some of them running win + more than one linux distro, all of
them having r, most often installed from a separate download.

i know of many cases where students download r for the purpose of a
course in statistics -- often an introductory course for students who
otherwise have little to do with stats. some of them do it more than
once during the semester, and many of them never use r again.

taking into account that basic statistics courses are taught to most
university students and that r is surely the most popular free
statistical computing environment, download-based usage estimates may be
a bit optimistic, unless 'usage' is taken to include 'learn-pass-forget'.

vQ



Tal Galili wrote:

> I agree with Thomas, over the years I have installed R on at least 5
> computers.
>
> BTW: does any one knows how the website statistics of r-project are
> being analyzed?
> Since I can't see any "google analytics" or other tracking code in the main
> website, I am guessing someone might be running some log-file analyzer - but
> I'd rather hear that then assume.
>
>
>
>
>
>
> On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams <[hidden email]> wrote:
>
>  
>> I don't think "At least one of the participants in the 2004 thread
>> suggested that it would be a "good thing" to track the numbers of downloads
>> by package." is reasonable because I download R packages for 2 home
>> computers (laptop & desktop) and 2 at work (1 Linux & 1 Mac). There must be
>> many such cases…
>>
>> Tom
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Spencer Graves-3
      I just did RSiteSearch("library(xxx)") with xxx = the names of 6
packages familiar to me, with the following numbers of hits:


hits package

 169 lme4
 165 nlme
   6 fda
   4 maps
   2 FinTS
   2 DierckxSpline
     

      Software could be written to (1) extract the names of current
packages from CRAN then (2) perform queries similar to this on all such
packages and summarize the results.  I don't have the time now to write
code for this, but I've written similar code before for step (1);  it
can be found in "scripts/TsayFiles.R" in the "FinTS" package on CRAN.  
For step (2), Sundar Dorai-Raj wrote code that is is included in the
preliminary "RSiteSearch" package available from R-Forge via
install.'packages("RSiteSearch",repos="http://r-forge.r-project.org")'.

      Code to do this could probably be written (a) in a matter of
seconds by many of those in the R Core team or (b) in a matter of hours
by virtually any reader of this list using the examples I just cited.  
And it could provide numbers without a need to convince others to keep
download statistics and make them available later.

      Hope this helps.
      Spencer Graves    

Wacek Kusnierczyk wrote:

> i have kept r installed on more than ten computers during the past few
> years, some of them running win + more than one linux distro, all of
> them having r, most often installed from a separate download.
>
> i know of many cases where students download r for the purpose of a
> course in statistics -- often an introductory course for students who
> otherwise have little to do with stats. some of them do it more than
> once during the semester, and many of them never use r again.
>
> taking into account that basic statistics courses are taught to most
> university students and that r is surely the most popular free
> statistical computing environment, download-based usage estimates may be
> a bit optimistic, unless 'usage' is taken to include 'learn-pass-forget'.
>
> vQ
>
>
>
> Tal Galili wrote:
>  
>> I agree with Thomas, over the years I have installed R on at least 5
>> computers.
>>
>> BTW: does any one knows how the website statistics of r-project are
>> being analyzed?
>> Since I can't see any "google analytics" or other tracking code in the main
>> website, I am guessing someone might be running some log-file analyzer - but
>> I'd rather hear that then assume.
>>
>>
>>
>>
>>
>>
>> On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams <[hidden email]> wrote:
>>
>>  
>>    
>>> I don't think "At least one of the participants in the 2004 thread
>>> suggested that it would be a "good thing" to track the numbers of downloads
>>> by package." is reasonable because I download R packages for 2 home
>>> computers (laptop & desktop) and 2 at work (1 Linux & 1 Mac). There must be
>>> many such cases…
>>>
>>> Tom
>>>
>>>      
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

sue@xlsolutions-corp.com
In reply to this post by Jeroen Ooms.
 Hi Spencer,

 XLSolutions is currently analyzing r-help archived questions to rank
packages for the upcoming R-PLUS 3.3 Professional version and we will be
happy to share the outcome with interested parties. Please email
[hidden email]


 Regards -
 Sue Turner
 Senior Account Manager
 XLSolutions Corporation
 North American Division
 1700 7th Ave
 Suite 2100
 Seattle, WA 98101
 Phone: 206-686-1578
 Email: [hidden email]
 web: www.xlsolutions-corp.com



--- On Sat, 3/7/09, Spencer Graves <[hidden email]> wrote:

> From: Spencer Graves <[hidden email]>
> Subject: Re: [R] popular R packages
> To: "Wacek Kusnierczyk" <[hidden email]>
> Cc: [hidden email], "Jeroen Ooms" <[hidden email]>, "Thomas Adams" <[hidden email]>
> Date: Saturday, March 7, 2009, 5:22 PM
> I just did RSiteSearch("library(xxx)") with xxx =
> the names of 6 packages familiar to me, with the following
> numbers of hits:
>
> hits package
>
> 169 lme4
> 165 nlme
>   6 fda
>   4 maps
>   2 FinTS
>   2 DierckxSpline
>    
>      Software could be written to (1) extract the names of
> current packages from CRAN then (2) perform queries similar
> to this on all such packages and summarize the results.  I
> don't have the time now to write code for this, but
> I've written similar code before for step (1);  it can
> be found in "scripts/TsayFiles.R" in the
> "FinTS" package on CRAN.  For step (2), Sundar
> Dorai-Raj wrote code that is is included in the preliminary
> "RSiteSearch" package available from R-Forge via
> install.'packages("RSiteSearch",repos="http://r-forge.r-project.org")'.
>
>      Code to do this could probably be written (a) in a
> matter of seconds by many of those in the R Core team or (b)
> in a matter of hours by virtually any reader of this list
> using the examples I just cited.  And it could provide
> numbers without a need to convince others to keep download
> statistics and make them available later.
>      Hope this helps.      Spencer Graves    
> Wacek Kusnierczyk wrote:
> > i have kept r installed on more than ten computers
> during the past few
> > years, some of them running win + more than one linux
> distro, all of
> > them having r, most often installed from a separate
> download.
> >
> > i know of many cases where students download r for the
> purpose of a
> > course in statistics -- often an introductory course
> for students who
> > otherwise have little to do with stats. some of them
> do it more than
> > once during the semester, and many of them never use r
> again.
> >
> > taking into account that basic statistics courses are
> taught to most
> > university students and that r is surely the most
> popular free
> > statistical computing environment, download-based
> usage estimates may be
> > a bit optimistic, unless 'usage' is taken to
> include 'learn-pass-forget'.
> >
> > vQ
> >
> >
> >
> > Tal Galili wrote:
> >  
> >> I agree with Thomas, over the years I have
> installed R on at least 5
> >> computers.
> >>
> >> BTW: does any one knows how the website statistics
> of r-project are
> >> being analyzed?
> >> Since I can't see any "google
> analytics" or other tracking code in the main
> >> website, I am guessing someone might be running
> some log-file analyzer - but
> >> I'd rather hear that then assume.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Mar 8, 2009 at 12:45 AM, Thomas Adams
> <[hidden email]> wrote:
> >>
> >>      
> >>> I don't think "At least one of the
> participants in the 2004 thread
> >>> suggested that it would be a "good
> thing" to track the numbers of downloads
> >>> by package." is reasonable because I
> download R packages for 2 home
> >>> computers (laptop & desktop) and 2 at work
> (1 Linux & 1 Mac). There must be
> >>> many such cases…
> >>>
> >>> Tom
> >>>
> >>>      
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Jim Lemon
In reply to this post by Jeroen Ooms.
Hi all,
I'm kind of amazed at the answers suggested for the relatively simple
question, "How many times has each R package been downloaded?". Some
have veered off in another direction, like working out how many packages
a package depends upon, or whether someone downloads more than one copy.
The response about ranking packages by the number of questions asked
about them may be interesting, but may not relate very well at all to
popularity in terms of downloads. If people were constantly asking
questions about one of the packages I maintain, I would be working on
the help pages to improve them, not basking in the inferred glory of
having a popular package. There is one way that the download count would
be very useful for package maintainers, if no one else. Take as an
example the package concord, that has not been maintained for a year or
more since the content was merged into the irr package. If I knew that
no one downloaded concord any more, I would surely petition those in
charge of the archive to remove it or at least transfer it to the
package museum. No point in having ever more packages on CRAN if they
are never downloaded.

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Emmanuel Charpentier
In reply to this post by David Winsemius
On Sat, 07 Mar 2009 18:04:24 -0500, David Winsemius wrote :

[ Snip ... ]
> Nonetheless, I do think the relative numbers of package downloads might
> be interpretable, or at the very least, the basis for discussions over
> beer.

*Anything* might be the basis for discussions over beer (obvious
corollary to Thermogoddamics' second principle....).

More seriously : I don't think relative numbers of package downloads can
be interpreted in any reasonable way, because reasons for package
download have a very wide range from curiosity ("what's this ?"), fun
(think "fortunes"...), to vital need tthink lme4 if/when a consensus on
denominator DFs can be reached :-)...). What can you infer in good faith
from such a mess ?

                                        Emmanuel Charpentier

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

hadley wickham
> More seriously : I don't think relative numbers of package downloads can
> be interpreted in any reasonable way, because reasons for package
> download have a very wide range from curiosity ("what's this ?"), fun
> (think "fortunes"...), to vital need tthink lme4 if/when a consensus on
> denominator DFs can be reached :-)...). What can you infer in good faith
> from such a mess ?

So when we have messy data with measurement error, we should just give
up?  Doesn't sound very statistical! ;)

Hadley


--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Gabor Grothendieck
On Sun, Mar 8, 2009 at 10:49 AM, hadley wickham <[hidden email]> wrote:

>> More seriously : I don't think relative numbers of package downloads can
>> be interpreted in any reasonable way, because reasons for package
>> download have a very wide range from curiosity ("what's this ?"), fun
>> (think "fortunes"...), to vital need tthink lme4 if/when a consensus on
>> denominator DFs can be reached :-)...). What can you infer in good faith
>> from such a mess ?
>
> So when we have messy data with measurement error, we should just give
> up?  Doesn't sound very statistical! ;)
>

Also I would think that the rankings would be meaningful since
the factors that cause the absolute numbers to be off would affect
all packages equally.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Duncan Murdoch
In reply to this post by hadley wickham
On 08/03/2009 10:49 AM, hadley wickham wrote:
>> More seriously : I don't think relative numbers of package downloads can
>> be interpreted in any reasonable way, because reasons for package
>> download have a very wide range from curiosity ("what's this ?"), fun
>> (think "fortunes"...), to vital need tthink lme4 if/when a consensus on
>> denominator DFs can be reached :-)...). What can you infer in good faith
>> from such a mess ?
>
> So when we have messy data with measurement error, we should just give
> up?  Doesn't sound very statistical! ;)

I think the situation is worse than messy.  If a client comes in with
data that doesn't address the question they're interested in, I think
they are better served to be told that, than to be given an answer that
is not actually valid.  They should also be told how to design a study
that actually does address their question.

You (and others) have mentioned Google Analytics as a possible way to
address the quality of data; that's helpful.  But analyzing bad data
will just give bad conclusions.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Barry Rowlingson
> I think the situation is worse than messy.  If a client comes in with data
> that doesn't address the question they're interested in, I think they are
> better served to be told that, than to be given an answer that is not
> actually valid.  They should also be told how to design a study that
> actually does address their question.
>
> You (and others) have mentioned Google Analytics as a possible way to
> address the quality of data; that's helpful.  But analyzing bad data will
> just give bad conclusions.

 As long as we say 'package Foo is the most downloaded package on
CRAN', and not 'package Foo is the most used package for R', we can
leave it to the user to decide if the latter conclusion follows from
the former. In the absence of actual usage data I would think it a
good approximation. Not that I would risk my life on it.

 Pop music charts are now based on download counts, but I wouldn't
believe they represent the songs that are listened to the most times.
Nor would I go so far as to believe they represent the quality of the
songs...

 Should R have a 'Would you like to tell CRAN every time you do
library(foo) so we can do usage counts (no personal data is
transmitted blah blah) ?'? I don't think so....

Barry

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Ted.Harding-2
In reply to this post by Duncan Murdoch
On 08-Mar-09 15:14:03, Duncan Murdoch wrote:

> On 08/03/2009 10:49 AM, hadley wickham wrote:
>>> More seriously : I don't think relative numbers of package downloads
>>> can be interpreted in any reasonable way, because reasons for
>>> package download have a very wide range from curiosity ("what's
>>> this ?"), fun (think "fortunes"...), to vital need tthink lme4
>>> if/when a consensus on denominator DFs can be reached :-)...).
>>> What can you infer in good faith from such a mess ?
>>
>> So when we have messy data with measurement error, we should just
>> give up?  Doesn't sound very statistical! ;)
>
> I think the situation is worse than messy.  If a client comes in with
> data that doesn't address the question they're interested in, I think
> they are better served to be told that, than to be given an answer that
> is not actually valid.  They should also be told how to design a study
> that actually does address their question.
>
> You (and others) have mentioned Google Analytics as a possible way to
> address the quality of data; that's helpful.  But analyzing bad data
> will just give bad conclusions.
> Duncan Murdoch

The population of R users (which we would need to sample in order
to obtain good data) is probably more elusive than a fish population
in the ocean -- only partially visible at best, and with an unknown
proportion invisible.

At least in Fisheries research, there are long established capture
techniques (from trawling to netting to electro-fishing to ... )
which can be deployed, for research purposes, in such a way as to
potentially reach all members of a target population, with at least
a moderately good approximation to random sampling. What have we
for R?

Come to think of it, electro-fishing, ...

Suppose R were released with 2 types of cookie embedded in base R.
Each type is randomly configured, when R is first run, to be Active
or Inactive (probability of activation to be decided at the design
stage ... ). Type 1, if active, on a certain date generates an
event which brings it to the notice of R-Core (e.g. by clandestine
email or by inducing a bug report). Type 2 acts similarly on a later
date. If Type 2 acts, it carries with it information as to whether
there was a Type 1 action along with whether, apparently, the Type 1
action "succeeded".

We then have, in effect, an analogue of the Mark-Recapture technique
of population estimation (along with the usual questions about
equal catchability and so forth).

However, since this sort of thing (which I am not proposing seriously,
only for the sake of argument) is undoubtedly unethical (and would
do R's reputation no good if it came to light), I tentatively conclude
that the population of R users is likely to remain as elusive as ever.

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09                                       Time: 16:11:44
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Tal Galili
Hi Ted,

Coming to think about your direction - another idea came to mind:
The next time a major release is made (there is one scheduled quite soon
actually), the core team could add a "survey" on the downloading page of the
R base package asking for just one question
"please click here if this is the first computer you are downloading this
package for".
This, combined with the fact that when serving a user we can obtain his IP
address (which gives geo information) could give a pretty nice rough
estimate of how many "major release downloaders" the R community has.



Tal








On Sun, Mar 8, 2009 at 6:11 PM, Ted Harding <[hidden email]>wrote:

> On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
> > On 08/03/2009 10:49 AM, hadley wickham wrote:
> >>> More seriously : I don't think relative numbers of package downloads
> >>> can be interpreted in any reasonable way, because reasons for
> >>> package download have a very wide range from curiosity ("what's
> >>> this ?"), fun (think "fortunes"...), to vital need tthink lme4
> >>> if/when a consensus on denominator DFs can be reached :-)...).
> >>> What can you infer in good faith from such a mess ?
> >>
> >> So when we have messy data with measurement error, we should just
> >> give up?  Doesn't sound very statistical! ;)
> >
> > I think the situation is worse than messy.  If a client comes in with
> > data that doesn't address the question they're interested in, I think
> > they are better served to be told that, than to be given an answer that
> > is not actually valid.  They should also be told how to design a study
> > that actually does address their question.
> >
> > You (and others) have mentioned Google Analytics as a possible way to
> > address the quality of data; that's helpful.  But analyzing bad data
> > will just give bad conclusions.
> > Duncan Murdoch
>
> The population of R users (which we would need to sample in order
> to obtain good data) is probably more elusive than a fish population
> in the ocean -- only partially visible at best, and with an unknown
> proportion invisible.
>
> At least in Fisheries research, there are long established capture
> techniques (from trawling to netting to electro-fishing to ... )
> which can be deployed, for research purposes, in such a way as to
> potentially reach all members of a target population, with at least
> a moderately good approximation to random sampling. What have we
> for R?
>
> Come to think of it, electro-fishing, ...
>
> Suppose R were released with 2 types of cookie embedded in base R.
> Each type is randomly configured, when R is first run, to be Active
> or Inactive (probability of activation to be decided at the design
> stage ... ). Type 1, if active, on a certain date generates an
> event which brings it to the notice of R-Core (e.g. by clandestine
> email or by inducing a bug report). Type 2 acts similarly on a later
> date. If Type 2 acts, it carries with it information as to whether
> there was a Type 1 action along with whether, apparently, the Type 1
> action "succeeded".
>
> We then have, in effect, an analogue of the Mark-Recapture technique
> of population estimation (along with the usual questions about
> equal catchability and so forth).
>
> However, since this sort of thing (which I am not proposing seriously,
> only for the sake of argument) is undoubtedly unethical (and would
> do R's reputation no good if it came to light), I tentatively conclude
> that the population of R users is likely to remain as elusive as ever.
>
> Best wishes to all,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <[hidden email]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 08-Mar-09                                       Time: 16:11:44
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Spencer Graves-3
In reply to this post by Ted.Harding-2
      Is this another discussion of what data might be collected and
analyzed, and what could and could not be said if we only had such data?

      Has anyone but me produced any actual data?  If so, I missed it.  
Hadly mentioned the 'fortunes' package.  My earlier methodology,
"RSiteSearch('library(fortunes)')", produced 40 hits for 'fortunes',
compared to 169 for 'lme4' and 2 for 'DierckxSpline'.

      With anything like this, it would be wise to approach the problem
from many different perspectives, recognizing that the strengths of one
approach can help improve our understanding of what other analyses say
about the question at hand.

      Happy Sunday.
      Spencer Graves    

(Ted Harding) wrote:

> On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
>  
>> On 08/03/2009 10:49 AM, hadley wickham wrote:
>>    
>>>> More seriously : I don't think relative numbers of package downloads
>>>> can be interpreted in any reasonable way, because reasons for
>>>> package download have a very wide range from curiosity ("what's
>>>> this ?"), fun (think "fortunes"...), to vital need tthink lme4
>>>> if/when a consensus on denominator DFs can be reached :-)...).
>>>> What can you infer in good faith from such a mess ?
>>>>        
>>> So when we have messy data with measurement error, we should just
>>> give up?  Doesn't sound very statistical! ;)
>>>      
>> I think the situation is worse than messy.  If a client comes in with
>> data that doesn't address the question they're interested in, I think
>> they are better served to be told that, than to be given an answer that
>> is not actually valid.  They should also be told how to design a study
>> that actually does address their question.
>>
>> You (and others) have mentioned Google Analytics as a possible way to
>> address the quality of data; that's helpful.  But analyzing bad data
>> will just give bad conclusions.
>> Duncan Murdoch
>>    
>
> The population of R users (which we would need to sample in order
> to obtain good data) is probably more elusive than a fish population
> in the ocean -- only partially visible at best, and with an unknown
> proportion invisible.
>
> At least in Fisheries research, there are long established capture
> techniques (from trawling to netting to electro-fishing to ... )
> which can be deployed, for research purposes, in such a way as to
> potentially reach all members of a target population, with at least
> a moderately good approximation to random sampling. What have we
> for R?
>
> Come to think of it, electro-fishing, ...
>
> Suppose R were released with 2 types of cookie embedded in base R.
> Each type is randomly configured, when R is first run, to be Active
> or Inactive (probability of activation to be decided at the design
> stage ... ). Type 1, if active, on a certain date generates an
> event which brings it to the notice of R-Core (e.g. by clandestine
> email or by inducing a bug report). Type 2 acts similarly on a later
> date. If Type 2 acts, it carries with it information as to whether
> there was a Type 1 action along with whether, apparently, the Type 1
> action "succeeded".
>
> We then have, in effect, an analogue of the Mark-Recapture technique
> of population estimation (along with the usual questions about
> equal catchability and so forth).
>
> However, since this sort of thing (which I am not proposing seriously,
> only for the sake of argument) is undoubtedly unethical (and would
> do R's reputation no good if it came to light), I tentatively conclude
> that the population of R users is likely to remain as elusive as ever.
>
> Best wishes to all,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <[hidden email]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 08-Mar-09                                       Time: 16:11:44
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: popular R packages

Duncan Murdoch
In reply to this post by Barry Rowlingson
On 08/03/2009 12:08 PM, Barry Rowlingson wrote:

>> I think the situation is worse than messy.  If a client comes in with data
>> that doesn't address the question they're interested in, I think they are
>> better served to be told that, than to be given an answer that is not
>> actually valid.  They should also be told how to design a study that
>> actually does address their question.
>>
>> You (and others) have mentioned Google Analytics as a possible way to
>> address the quality of data; that's helpful.  But analyzing bad data will
>> just give bad conclusions.
>
>  As long as we say 'package Foo is the most downloaded package on
> CRAN', and not 'package Foo is the most used package for R', we can
> leave it to the user to decide if the latter conclusion follows from
> the former.

But we don't even have that data, since CRAN is distributed across lots
of mirrors.

Duncan Murdoch

  In the absence of actual usage data I would think it a

> good approximation. Not that I would risk my life on it.
>
>  Pop music charts are now based on download counts, but I wouldn't
> believe they represent the songs that are listened to the most times.
> Nor would I go so far as to believe they represent the quality of the
> songs...
>
>  Should R have a 'Would you like to tell CRAN every time you do
> library(foo) so we can do usage counts (no personal data is
> transmitted blah blah) ?'? I don't think so....
>
> Barry

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
123
Loading...