|
Hi All,
I've been fiddling around with various ways to estimate the popularity of R, SAS, SPSS, Stata, JMP, Minitab, Statistica, Systat, BMDP, S-PLUS, R-PLUS and Revolution R. It's not an easy task. You can see what I've come up with so far at http://r4stats.com/popularity . I'm sure people will have plenty of ideas on how to improve this, so please let me know what you think. Cheers, Bob ========================================================= Bob Muenchen (pronounced Min'-chen), Manager Research Computing Support Voice: (865) 974-5230 Email: [hidden email] Web: http://oit.utk.edu/research, News: http://oit.utk.edu/research/news.php Feedback: http://oit.utk.edu/feedback/ ========================================================= ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Am 20.06.2010 15:31, schrieb Muenchen, Robert A (Bob):
> I've been fiddling around with various ways to estimate the popularity > of R, SAS, SPSS, Stata, JMP, Minitab, Statistica, Systat, BMDP, S-PLUS, > R-PLUS and Revolution R. It's not an easy task. You can see what I've > come up with so far at http://r4stats.com/popularity . I'm sure people > will have plenty of ideas on how to improve this, so please let me know > what you think. Your analysis is quite web-based. But to define what popular means is - I believe - hard. R is open source and very broad in its different applications so of course it generates much more e-mail and web traffic because there are many different uses and users. SPSS and Stata for example are closed and very specialized. You get support also directly from the company and do not necessarily need a mailing list. Does this mean that they are less popular? I'd say no. So the question I would raise here is whether it is a fair comparison? I know that is a sufficient statistics-subset like panel econometrics Stata is by far leading and for time series econometrics Eviews, Gauss in research. I would say that in the industry that I know plus in econometrics research those programs are much more widespread or "popular". To measure their popularity I would say a industry-and-education-wide-questionnaire should be used. Plus it is not sufficient so I would also name Matlab, Gauss, Ox, Eviews from the areas of my "interest" (econometrics) as "popular" proprietary software. I do not deny that R is becoming more popular, but I doubt whether mailing lists and search requests are enough to prove this hypothesis. My 2cents Stefan ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Jun 20, 2010, at 10:24 AM, Stefan Grosse wrote: > Am 20.06.2010 15:31, schrieb Muenchen, Robert A (Bob): > >> I've been fiddling around with various ways to estimate the >> popularity >> of R, SAS, SPSS, Stata, JMP, Minitab, Statistica, Systat, BMDP, S- >> PLUS, >> R-PLUS and Revolution R. It's not an easy task. You can see what I've >> come up with so far at http://r4stats.com/popularity . I'm sure >> people >> will have plenty of ideas on how to improve this, so please let me >> know >> what you think. > > Your analysis is quite web-based. But to define what popular means > is - > I believe - hard. R is open source and very broad in its different > applications so of course it generates much more e-mail and web > traffic > because there are many different uses and users. > > SPSS and Stata for example are closed and very specialized. I suspect proponents of their use would actively dispute the "very specialized" description. > You get > support also directly from the company and do not necessarily need a > mailing list. Does this mean that they are less popular? I'd say no. I was under the impression that both SAS and Stata actively support their two mailing lists, but the SAS FAQ disputes this impression regarding SAS. > > So the question I would raise here is whether it is a fair comparison? > I know that is a sufficient statistics-subset like panel econometrics > Stata is by far leading and for time series econometrics Eviews, Gauss > in research. I would say that in the industry that I know plus in > econometrics research those programs are much more widespread or > "popular". To measure their popularity I would say a > industry-and-education-wide-questionnaire should be used. > > Plus it is not sufficient so I would also name Matlab, Gauss, Ox, > Eviews > from the areas of my "interest" (econometrics) as "popular" > proprietary > software. > > I do not deny that R is becoming more popular, but I doubt whether > mailing lists and search requests are enough to prove this hypothesis. Certainly there are additional factors that might influence the absolute numbers of posting to a particular mailing list. The SAS mailing list/newsgroup, SAS-L/comp.soft-sys.sas, has a well- established Internet presence. Each one probably has a particular culture. (I was stunned to see the low number of daily posts to comp.soft-sys.sas when I just looked at the last week on GoogelGroups.) I didn't think either the SAS or the Stata lists had any sort of published or informal effort to steer users in the direction of R-ing the FM, searching-before-posting, or admonishments to RT-FAQ. However, now that I look, it does appear that the Statalist FAQ makes an effort similar to that of the r-help Posting Guide. There may be differences in the degree and clarity of the documentation as well. The Stata distribution includes a medium-sized library. All of that said, ..., the relative frequency of postings would seem to less subject to such influences. The SAS curve with its peak in 2006-2008 and significantly lower numbers in more recent years contrasted with the steady increase in R and Stata would seem to reflect a material shift. Agreed, you cannot say that R passed SAS in number of active users, or that SAS has the same number of users as Stata. The flatness of SPSS also appears meaningful. And within the R/S world the differences in the activity on Snews and rhelp are likewise pretty dramatic. -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Stefan Grosse-2
>-----Original Message----- >From: [hidden email] [mailto:[hidden email]] >On Behalf Of Stefan Grosse >Sent: Sunday, June 20, 2010 10:25 AM >To: [hidden email] >Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... > >Am 20.06.2010 15:31, schrieb Muenchen, Robert A (Bob): > >> I've been fiddling around with various ways to estimate the popularity >> of R, SAS, SPSS, Stata, JMP, Minitab, Statistica, Systat, BMDP, S- >PLUS, >> R-PLUS and Revolution R. It's not an easy task. You can see what I've >> come up with so far at http://r4stats.com/popularity . I'm sure people >> will have plenty of ideas on how to improve this, so please let me >know >> what you think. > >Your analysis is quite web-based. But to define what popular means is - >I believe - hard. Stefan, I agree with all your points. What I have so far is nowhere near the big picture, but it's a start. When you install some software it asks if you mind it reporting usage stats back to its home site. I know that sort of thing has been discussed before on R-help. I'd love to see that added so we would have a better estimate of R's user base. Cheers, Bob >R is open source and very broad in its different >applications so of course it generates much more e-mail and web traffic >because there are many different uses and users. > >SPSS and Stata for example are closed and very specialized. You get >support also directly from the company and do not necessarily need a >mailing list. Does this mean that they are less popular? I'd say no. > >So the question I would raise here is whether it is a fair comparison? >I know that is a sufficient statistics-subset like panel econometrics >Stata is by far leading and for time series econometrics Eviews, Gauss >in research. I would say that in the industry that I know plus in >econometrics research those programs are much more widespread or >"popular". To measure their popularity I would say a >industry-and-education-wide-questionnaire should be used. > >Plus it is not sufficient so I would also name Matlab, Gauss, Ox, >from the areas of my "interest" (econometrics) as "popular" proprietary >software. > >I do not deny that R is becoming more popular, but I doubt whether >mailing lists and search requests are enough to prove this hypothesis. > >My 2cents >Stefan > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting- >guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
> I agree with all your points. What I have so far is nowhere near the big
> picture, but it's a start. When you install some software it asks if you > mind it reporting usage stats back to its home site. I know that sort of > thing has been discussed before on R-help. I'd love to see that added so > we would have a better estimate of R's user base. I wonder if there are any capture-recapture type methodologies for estimating open-source software usage? Another idea would be to combine with some other known numbers, e.g. book sales, conference attendance etc. You'd need personal information to link the data sets together. Hadley PS. It would be also interesting to see the contributions of the R-SIG mailing lists and other specialised R related mailing lists. My feeling is that there is not a lot of overlap between the members of the ggplot2 mailing list and R-help. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Winsemius
>-----Original Message----- >From: [hidden email] [mailto:[hidden email]] >On Behalf Of David Winsemius >Sent: Sunday, June 20, 2010 1:05 PM >To: Stefan Grosse >Cc: [hidden email] >Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... > > >On Jun 20, 2010, at 10:24 AM, Stefan Grosse wrote: > >> Am 20.06.2010 15:31, schrieb Muenchen, Robert A (Bob): >> >>> I've been fiddling around with various ways to estimate the >>> popularity >>> of R, SAS, SPSS, Stata, JMP, Minitab, Statistica, Systat, BMDP, S- >>> PLUS, >>> R-PLUS and Revolution R. It's not an easy task. You can see what >>> come up with so far at http://r4stats.com/popularity . I'm sure >>> people >>> will have plenty of ideas on how to improve this, so please let me >>> know >>> what you think. >> >> Your analysis is quite web-based. But to define what popular means >> is - >> I believe - hard. R is open source and very broad in its different >> applications so of course it generates much more e-mail and web >> traffic >> because there are many different uses and users. >> >> SPSS and Stata for example are closed and very specialized. > >I suspect proponents of their use would actively dispute the "very >specialized" description. Here at UT SPSS is dominant across a wide range of departments with around 3,600 users. The older professors never stopped programming in it while the many programming-phobic students love its point-and-click interface. SAS is also used widely with about 800 users, many of them caused by class requirements. When it comes to dissertation time, many switch over to SPSS. Stata has around 120 concentrated in just a few departments. With R it's hard to tell as we don't get local counts and R users tend to not need much consulting support. Cheers, Bob > >> You get >> support also directly from the company and do not necessarily need a >> mailing list. Does this mean that they are less popular? I'd say no. > >I was under the impression that both SAS and Stata actively support >their two mailing lists, but the SAS FAQ disputes this impression >regarding SAS. > >> >> So the question I would raise here is whether it is a fair >> I know that is a sufficient statistics-subset like panel econometrics >> Stata is by far leading and for time series econometrics Eviews, Gauss >> in research. I would say that in the industry that I know plus in >> econometrics research those programs are much more widespread or >> "popular". To measure their popularity I would say a >> industry-and-education-wide-questionnaire should be used. >> >> Plus it is not sufficient so I would also name Matlab, Gauss, Ox, >> Eviews >> from the areas of my "interest" (econometrics) as "popular" >> proprietary >> software. >> >> I do not deny that R is becoming more popular, but I doubt whether >> mailing lists and search requests are enough to prove this > >Certainly there are additional factors that might influence the >absolute numbers of posting to a particular mailing list. The SAS >mailing list/newsgroup, SAS-L/comp.soft-sys.sas, has a well- >established Internet presence. Each one probably has a particular >culture. (I was stunned to see the low number of daily posts to >comp.soft-sys.sas when I just looked at the last week on >GoogelGroups.) I didn't think either the SAS or the Stata lists had >any sort of published or informal effort to steer users in the >direction of R-ing the FM, searching-before-posting, or admonishments >to RT-FAQ. However, now that I look, it does appear that the >Statalist FAQ makes an effort similar to that of the r-help Posting >Guide. There may be differences in the degree and clarity of the >documentation as well. The Stata distribution includes a medium-sized >library. All of that said, ..., the relative frequency of postings >would seem to less subject to such influences. > >The SAS curve with its peak in 2006-2008 and significantly lower >numbers in more recent years contrasted with the steady increase in R >and Stata would seem to reflect a material shift. Agreed, you cannot >say that R passed SAS in number of active users, or that SAS has the >same number of users as Stata. The flatness of SPSS also appears >meaningful. And within the R/S world the differences in the activity >on Snews and rhelp are likewise pretty dramatic. > >-- > >David Winsemius, MD >West Hartford, CT > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting- >guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Hadley Wickham-2
>I wonder if there are any capture-recapture type methodologies for
>estimating open-source software usage? Another idea would be to >combine with some other known numbers, e.g. book sales, conference >attendance etc. You'd need personal information to link the data sets >together. > >Hadley This totally cracked me up! I'm envisioning going into one of our computer labs, tossing a net over an unsuspecting student, and then tagging their ear with a code that represents which stat package they're using. Then release and later recapture. What percent did we get? That's what the profs I deal with do with animals to estimate populations. Conference attendance might be easy to get if I remember to contact the people running them. Does anyone know how many we expect at UseR 2010? I recall SAS conferences with 3,500 but data analysis is a tiny part of that conference. I also heard someone say that they took it to Hawaii one year to REDUCE the attendance as it had grown so large. Sounds crazy to me, but if there are attempts to manage the figures, that could muck up the interpretation. Well, all these approaches have their own problems, so that's just another "limitation of the study." I think SPSS Directions has more like 500 but it's all focused on some sort of analysis. I did try to count books at Amazon and papers published via Google Scholar. Those searches are devilishly difficult for SAS let alone for letter R! An easy one to get should be number of list subscribers. I'll try to get those figures. Anyone know it for R-help? Cheers, Bob > >PS. It would be also interesting to see the contributions of the >R-SIG mailing lists and other specialised R related mailing lists. My >feeling is that there is not a lot of overlap between the members of >the ggplot2 mailing list and R-help. > >-- >Assistant Professor / Dobelman Family Junior Chair >Department of Statistics / Rice University >http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On 20-Jun-10 19:07:21, Muenchen, Robert A (Bob) wrote:
>>I wonder if there are any capture-recapture type methodologies for >>estimating open-source software usage? Another idea would be to >>combine with some other known numbers, e.g. book sales, conference >>attendance etc. You'd need personal information to link the data sets >>together. >> >>Hadley > > This totally cracked me up! I'm envisioning going into one of our > computer labs, tossing a net over an unsuspecting student, and then > tagging their ear with a code that represents which stat package > they're using. Then release and later recapture. What percent did > we get? That's what the profs I deal with do with animals to estimate > populations. I've given thought in the past to the question of estimating the R user base, and came to the conclusion that it is impossible to get an estimate of the number of users that one could trust (or even put anything like a margin of error to). I think one could get a number which represented a moderately informative lower bound -- just count the number of different email addresses that have ever posted to the R-help list. This will of course include people who post (or have posted) from more than one email address, and people who tried R for a while and then dropped it, but my feeling is that these are likely to be outweighed by the number of people who have used R but have never posted (for example students who are getting their R help from their instructors, people using R in a corporate context who are discouraged from posting to public lists, etc.). The number of subscribers to R-help (currently about 10200) is a definite lower bound for the number of R users, but many users post to R-help without being subscribed. I would expect that the total number of different email addresses that have posted to R-help would be considerably larger than 10200. I don't think a "Mark-Recapture" approach is feasible. Further, I don't know how one might take account of the fact that some installations of R (e.g. on a corporate or institutional or departmental server) may each be used by several users. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 20-Jun-10 Time: 20:41:43 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Muenchen, Robert A (Bob)
Bob,
I have no idea whether it is realistic, but if you look for the papers that used R or SAS (or anything), you might get better results by searching for the way R and SAS are cited. It looks to me that what I'm saying is not clear, so here an example. To cite R in a paper you have to write it this way: > citation("base") To cite R in publications use: R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org. So instead of searching for "R", searching for "R Development Core Team" might give better results. And same thing for SAS or any other softwares. If that doesn't help, just forget it! Ivan Le 20 juin 2010 à 21:07, Muenchen, Robert A (Bob) a écrit : >> I wonder if there are any capture-recapture type methodologies for >> estimating open-source software usage? Another idea would be to >> combine with some other known numbers, e.g. book sales, conference >> attendance etc. You'd need personal information to link the data sets >> together. >> >> Hadley > > This totally cracked me up! I'm envisioning going into one of our > computer labs, tossing a net over an unsuspecting student, and then > tagging their ear with a code that represents which stat package they're > using. Then release and later recapture. What percent did we get? That's > what the profs I deal with do with animals to estimate populations. > > Conference attendance might be easy to get if I remember to contact the > people running them. Does anyone know how many we expect at UseR 2010? I > recall SAS conferences with 3,500 but data analysis is a tiny part of > that conference. I also heard someone say that they took it to Hawaii > one year to REDUCE the attendance as it had grown so large. Sounds crazy > to me, but if there are attempts to manage the figures, that could muck > up the interpretation. Well, all these approaches have their own > problems, so that's just another "limitation of the study." I think SPSS > Directions has more like 500 but it's all focused on some sort of > analysis. > > I did try to count books at Amazon and papers published via Google > Scholar. Those searches are devilishly difficult for SAS let alone for > letter R! > > An easy one to get should be number of list subscribers. I'll try to get > those figures. Anyone know it for R-help? > > Cheers, > Bob > >> >> PS. It would be also interesting to see the contributions of the >> R-SIG mailing lists and other specialised R related mailing lists. My >> feeling is that there is not a lot of overlap between the members of >> the ggplot2 mailing list and R-help. >> >> -- >> Assistant Professor / Dobelman Family Junior Chair >> Department of Statistics / Rice University >> http://had.co.nz/ > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Institut und Museum Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 [hidden email] ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Ted.Harding-2
> I've given thought in the past to the question of estimating the R
> user base, and came to the conclusion that it is impossible to get > an estimate of the number of users that one could trust (or even > put anything like a margin of error to). I find it hard to believe that it should be harder to estimate the number of whales than the number of R users. Sure there's a definitional problem of exactly what an R user is, but there must be some way to come up with some useful estimates. What about snowball sampling with R-help as an initial frame? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
How about getting statistics of downloads of the R-base from the different CRAN mirrors ? This should (in principle) allow one to estimate the total # of people who intended to use R at some point in their life. It may even be possible to analyze those numbers for temporal trends since the day of release of each R version is known. Christos _________________________________________________________________ Hotmail: Trusted email with powerful SPAM protection. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Ted.Harding-2
>-----Original Message----- >From: [hidden email] [mailto:[hidden email]] >On Behalf Of Ted Harding >Sent: Sunday, June 20, 2010 3:42 PM >To: [hidden email] >Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... > > >I've given thought in the past to the question of estimating the R >user base, and came to the conclusion that it is impossible to get >an estimate of the number of users that one could trust (or even >put anything like a margin of error to). > >I think one could get a number which represented a moderately >informative lower bound -- just count the number of different email >addresses that have ever posted to the R-help list. This will of >course include people who post (or have posted) from more than one >email address, and people who tried R for a while and then dropped >it, but my feeling is that these are likely to be outweighed by the >number of people who have used R but have never posted (for example >students who are getting their R help from their instructors, people >using R in a corporate context who are discouraged from posting to >public lists, etc.). Ted, that's a very interesting suggestion. Do you know of a practical way of getting that count? > >The number of subscribers to R-help (currently about 10200) is >a definite lower bound for the number of R users, but many users >post to R-help without being subscribed. 10,200 is quite an amazing number! Here are the number of subscribers to: SAS-L 3,251 SPSSX-L 2,103 Statlist 1,847 S-PLUS - havn't figured out how to get this yet How did you get the R-help figure? > >I would expect that the total number of different email addresses >that have posted to R-help would be considerably larger than 10200. > >I don't think a "Mark-Recapture" approach is feasible. > >Further, I don't know how one might take account of the fact that >some installations of R (e.g. on a corporate or institutional >or departmental server) may each be used by several users. The server question in particular intrigues me. Research organizations are stuffed with high performance clusters. The cost of all the commercial packages is just incredible. Even at the heavily discounted rate academia gets, they're still unaffordable. However, if queried we'd find the commercial packages on them, but limited to 4 out of 2,500 nodes! You might see the reverse in industry, with one mainframe copy of SAS serving hundreds of users. Cheers, Bob > >Ted. > >-------------------------------------------------------------------- >E-Mail: (Ted Harding) <[hidden email]> >Fax-to-email: +44 (0)870 094 0861 >Date: 20-Jun-10 Time: 20:41:43 >------------------------------ XFMail ------------------------------ > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting- >guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Ivan Calandra
>-----Original Message----- >From: [hidden email] [mailto:[hidden email]] >On Behalf Of Ivan Calandra >Sent: Sunday, June 20, 2010 3:47 PM >To: [hidden email] >Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... > >Bob, > >I have no idea whether it is realistic, but if you look for the papers >that used R or SAS (or anything), you might get better results by >searching for the way R and SAS are cited. Hi Ivan, that was what I tried when more generic keywords failed. However, almost no one seems to use that citation. For example, in 2009, only 28 papers contain "R Foundation" and 61 contain Bioconductor, which uses R. One single paper contains both. I appreciate the idea though! Thanks, Bob > >It looks to me that what I'm saying is not clear, so here an example. >To cite R in a paper you have to write it this way: >> citation("base") >To cite R in publications use: > R Development Core Team (2009). R: A language and environment for > statistical computing. R Foundation for Statistical Computing, Vienna, > Austria. ISBN 3-900051-07-0, URL http://www.R-project.org. > >So instead of searching for "R", searching for "R Development Core Team" >might give better results. And same thing for SAS or any other >softwares. > >If that doesn't help, just forget it! > >Ivan > > > >Le 20 juin 2010 à 21:07, Muenchen, Robert A (Bob) a écrit : > >>> I wonder if there are any capture-recapture type methodologies for >>> estimating open-source software usage? Another idea would be to >>> combine with some other known numbers, e.g. book sales, conference >>> attendance etc. You'd need personal information to link the data sets >>> together. >>> >>> Hadley >> >> This totally cracked me up! I'm envisioning going into one of our >> computer labs, tossing a net over an unsuspecting student, and then >> tagging their ear with a code that represents which stat package >they're >> using. Then release and later recapture. What percent did we get? >That's >> what the profs I deal with do with animals to estimate populations. >> >> Conference attendance might be easy to get if I remember to contact >the >> people running them. Does anyone know how many we expect at UseR 2010? >I >> recall SAS conferences with 3,500 but data analysis is a tiny part of >> that conference. I also heard someone say that they took it to Hawaii >> one year to REDUCE the attendance as it had grown so large. Sounds >crazy >> to me, but if there are attempts to manage the figures, that could >muck >> up the interpretation. Well, all these approaches have their own >> problems, so that's just another "limitation of the study." I think >SPSS >> Directions has more like 500 but it's all focused on some sort of >> analysis. >> >> I did try to count books at Amazon and papers published via Google >> Scholar. Those searches are devilishly difficult for SAS let alone for >> letter R! >> >> An easy one to get should be number of list subscribers. I'll try to >get >> those figures. Anyone know it for R-help? >> >> Cheers, >> Bob >> >>> >>> PS. It would be also interesting to see the contributions of the >>> R-SIG mailing lists and other specialised R related mailing lists. >My >>> feeling is that there is not a lot of overlap between the members of >>> the ggplot2 mailing list and R-help. >>> >>> -- >>> Assistant Professor / Dobelman Family Junior Chair >>> Department of Statistics / Rice University >>> http://had.co.nz/ >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > >-- >Ivan CALANDRA >PhD Student >University of Hamburg >Biozentrum Grindel und Zoologisches Institut und Museum >Martin-Luther-King-Platz 3 >D-20146 Hamburg, GERMANY >+49(0)40 42838 6231 >[hidden email] > >********** >http://www.for771.uni-bonn.de >http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting- >guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Hadley Wickham-2
>-----Original Message----- >From: [hidden email] [mailto:[hidden email]] >On Behalf Of Hadley Wickham ... What about snowball >sampling with R-help as an initial frame? That's an interesting idea! I could put together a Two-item web survey: 1. What stat package do you use? 2. What's your main email address If they choose R, I could optionally ask what their favorite packages are. I might be able to get that on a web survey this week if it doesn't get too crazy. Bob > >Hadley > >-- >Assistant Professor / Dobelman Family Junior Chair >Department of Statistics / Rice University >http://had.co.nz/ > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting- >guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
>-----Original Message----- >From: [hidden email] [mailto:[hidden email]] >On Behalf Of Muenchen, Robert A (Bob) >Sent: Sunday, June 20, 2010 6:43 PM >To: Hadley Wickham; [hidden email] >Cc: [hidden email] >Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... > > > >>-----Original Message----- >>From: [hidden email] >[mailto:[hidden email]] >>On Behalf Of Hadley Wickham >... What about snowball >>sampling with R-help as an initial frame? > >That's an interesting idea! I could put together a Two-item web survey: > >1. What stat package do you use? >2. What's your main email address P.S. the email address was an attempt to keep people from "stuffing the ballot box" but on the other hand, it could turn people off. I guess the number of blank fields would tell us which. Also, stat package choice would have to be a "check all that apply" question. > >If they choose R, I could optionally ask what their favorite packages >are. I might be able to get that on a web survey this week if it doesn't >get too crazy. > >Bob > >> >>Hadley >> >>-- >>Assistant Professor / Dobelman Family Junior Chair >>Department of Statistics / Rice University >>http://had.co.nz/ >> >>______________________________________________ >>[hidden email] mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.R-project.org/posting- >>guide.html >>and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting- >guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Muenchen, Robert A (Bob)
On 20/06/2010 6:36 PM, Muenchen, Robert A (Bob) wrote:
> >> -----Original Message----- >> From: [hidden email] [mailto:[hidden email]] >> On Behalf Of Ivan Calandra >> Sent: Sunday, June 20, 2010 3:47 PM >> To: [hidden email] >> Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... >> >> Bob, >> >> I have no idea whether it is realistic, but if you look for the papers >> that used R or SAS (or anything), you might get better results by >> searching for the way R and SAS are cited. >> > > Hi Ivan, that was what I tried when more generic keywords failed. However, almost no one seems to use that citation. For example, in 2009, only 28 papers contain "R Foundation" and 61 contain Bioconductor, which uses R. One single paper contains both. I appreciate the idea though! If you use Web of Science, then the abbreviation for the author in the standard citation for R is R DEV COR TEAM. Doing a search for citations to that author in 2009 or 2010 finds 249 papers. Variations on the spelling that I see include R DEV C3R TEAM R DEV CAR GROUP R DEV CAR TEAM R DEV CIR TEAM R DEV COD TEAM R DEV COR R DEV COR T R DEV COR TEA R DEV COR TEAM R DEV COR TEAM C R DEV COR TEAM CO R DEV COR TEAM FD R DEV COR TEAM OR R DEV COR TEAM R R DEV COR TEAM RD R DEV COR TEAM VI R DEV COR TEAMR R DEV COR TEMA R DEV COR TRAM R DEV CORE TEAM R DEV CORETEAM R DEV CORR TEAM R DEV CORT TEAM R DEV CPR TEAM R DEV CT R DEV TEAM R DEVCOR TEAM R DEVELOPMENTCORE Not all of those might really be R. For example, there's probably a north Atlantic codfishing team named R DEV COD TEAM. But most of them are, and they lead to 289 cited papers in 2009/10. Duncan Murdoch ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Hadley Wickham-2
On 20-Jun-10 19:49:43, Hadley Wickham wrote:
>> I've given thought in the past to the question of estimating the R >> user base, and came to the conclusion that it is impossible to get >> an estimate of the number of users that one could trust (or even >> put anything like a margin of error to). > > I find it hard to believe that it should be harder to estimate the > number of whales than the number of R users. Sure there's a > definitional problem of exactly what an R user is, but there must be > some way to come up with some useful estimates. What about snowball > sampling with R-help as an initial frame? > > Hadley Whales are a different kettle of fish! They are much more directly observable, in principle, than are R-users. For one thing, a whale has to come to the surface to breathe every so often, and if you are in a ship nearby you can see it happen. There have been many research ships out in the oceans in known whale areas looking out for just that, and planning their transects so as to be able to scale up their observed data into population estimates. In many cases individual whales can be recognised (by markings or by notches on the fins), enabling a kind of passive mark-recapture. Also, active mark-recapture is carried out, with tags being planted into the animals and recovered later (though this was a sounder method prior to the moratorium commercial on commercial whaling). In addition, catch per unit effort (or observations per unit effot) data can be used to estimate abundance. Data have been available on Sex and Age. These days, responder beacons can be planted as tags, and their numbers within visually observed whale groups determined. Data from such sources, and others, can be combined with analysis of population-dynamics models, thus improving the quality of the estimates. R-users are not so easy to study! For one thing, they don't all come up to breathe, they can do that in the darkest depths and not be seen. Their population dynamics is obscure. The big problem with any sort of survey or "sample" of R users is that the target population is only partially visible, and seeking responses to any kind of survey is subject to non-reponse (including failure to target) bias from an intangible and therefore unknown number of users. The idea of a "snowball sample" came up when this same topic was discussed back in 2000. Go to https://stat.ethz.ch/pipermail/r-help/2000-June/thread.html and find the thread (and the various side-threads) which starts with a message "[R] # of users of R, and biological examples of the use of R" from Ramon Diaz-Uriarte (Tue Jun 20 10:21:37 CEST 2000). Searching that month (Jube 2000) of archives for the word "users" in the Subject will find them all (and nothing else). The snowball was proposed by John Logsdon "[R] # of users of R" (Wed Jun 21 11:59:34 CEST 2000) John and I discussed the snowball idea at some length off-list, and that is when I came to the conclusion (for reasons such as the above) that although it had some mileage, and could provide information supplementary to other methods, the extent of its potential reach into the unkown was, well, unknowable ... [with acknowledgement to Donald Rumsfeld]. In reponse to the question from Bob Muenchen as to "How did you get the R-help figure?" (of email addresses subscribed to R-help), since I am one of the list moderators I can log in and access the subscriber's list. As of today, the numbers are: 4629 Non-digested Members of R-help 5560 Digested Members of R-help (190 private members not shown) ---- 10379 (A few more than the number I picked up a some days ago). Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 21-Jun-10 Time: 02:00:45 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
At 09:01 PM 6/20/2010, Ted Harding wrote:
>On 20-Jun-10 19:49:43, Hadley Wickham wrote: >Whales are a different kettle of fish! They are much more directly >observable, in principle, than are R-users. For one thing, a whale >has to come to the surface to breathe every so often, and if you >are in a ship nearby you can see it happen. ><snip> Once thing both whales and R users have in common is that, when you sight one, you say "R, Matey! Thar she blows!" ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [hidden email] Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire" ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Christos Argyropoulos
You would probably need to talk to the people maintaining the mirrors as you would need access to the download statistics. I presume that the IP addresses of the people who download are also stored somewhere, so you could possibly georeference the download statistics. Flame mode on Then you could analyze everything with SAS and email the results to the SAS corporation to see if they appreciate the (not so subtle) irony. Flame mode off Seriously though, I'd expect the download statistics to uncover a substantially higher use base than the figures you initially posted suggest. Christos > Subject: RE: [R] Popularity of R, SAS, SPSS, Stata... > Date: Sun, 20 Jun 2010 21:11:14 -0400 > From: [hidden email] > To: [hidden email] > > > > >-----Original Message----- > >From: [hidden email] > [mailto:[hidden email]] > >On Behalf Of Christos Argyropoulos > >Sent: Sunday, June 20, 2010 6:26 PM > >To: [hidden email] > >Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... > > > > > >How about getting statistics of downloads of the R-base from the > >different CRAN mirrors ? > > I'd love to do that but there are lots of mirrors. Does anyone know of a > way to automate such a search? Thanks, Bob > > > > >This should (in principle) allow one to estimate the total # of people > >who intended to use R at some point in their life. > > > >It may even be possible to analyze those numbers for temporal trends > >since the day of release of each R version is known. > > > >Christos > > > >_________________________________________________________________ > >Hotmail: Trusted email with powerful SPAM protection. > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >[hidden email] mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide http://www.R-project.org/posting- > >guide.html > >and provide commented, minimal, self-contained, reproducible code. _________________________________________________________________ Hotmail: Free, trusted and rich email service. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Duncan Murdoch-2
Duncan Murdoch wrote:
> On 20/06/2010 6:36 PM, Muenchen, Robert A (Bob) wrote: >> >>> -----Original Message----- >>> From: [hidden email] [mailto:[hidden email]] >>> On Behalf Of Ivan Calandra >>> Sent: Sunday, June 20, 2010 3:47 PM >>> To: [hidden email] >>> Subject: Re: [R] Popularity of R, SAS, SPSS, Stata... >>> >>> Bob, >>> >>> I have no idea whether it is realistic, but if you look for the papers >>> that used R or SAS (or anything), you might get better results by >>> searching for the way R and SAS are cited. >>> >> Hi Ivan, that was what I tried when more generic keywords failed. However, almost no one seems to use that citation. For example, in 2009, only 28 papers contain "R Foundation" and 61 contain Bioconductor, which uses R. One single paper contains both. I appreciate the idea though! > > > If you use Web of Science, then the abbreviation for the author in the > standard citation for R is R DEV COR TEAM. Doing a search for > citations to that author in 2009 or 2010 finds 249 papers. Variations > on the spelling that I see include > > > R DEV C3R TEAM > R DEV CAR GROUP > R DEV CAR TEAM > R DEV CIR TEAM > R DEV COD TEAM > R DEV COR > R DEV COR T > R DEV COR TEA > R DEV COR TEAM > R DEV COR TEAM C > R DEV COR TEAM CO > R DEV COR TEAM FD > R DEV COR TEAM OR > R DEV COR TEAM R > R DEV COR TEAM RD > R DEV COR TEAM VI > R DEV COR TEAMR > R DEV COR TEMA > R DEV COR TRAM > R DEV CORE TEAM > R DEV CORETEAM > R DEV CORR TEAM > R DEV CORT TEAM > R DEV CPR TEAM > R DEV CT > R DEV TEAM > R DEVCOR TEAM > R DEVELOPMENTCORE > > Not all of those might really be R. For example, there's probably a > north Atlantic codfishing team named R DEV COD TEAM. But most of them > are, and they lead to 289 cited papers in 2009/10. > > Duncan Murdoch That sound a bit low. Last I checked R DEV COR TEAM, for ALL publication years, it came up with about 13000 references within 511 different misspellings of the R manual reference (& a few more). Papers currently being registered tend to reference the version of R that was used when the research was done, and with review delays etc. that can be a few years back. Another matter is that software citation varies widely by field. Of the above 13000 references, I think about 3000 were from ecology (or was it environmental science?). In economics, or indeed in mathematical statistics, the tradition is to cite methods, but not software. (And one "sinner" is the R Journal, in which is would be absurd to have every paper cite R...) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
