Today's GNU R tutorial in http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics points out how bad statistical practice is being further perpetuated, by virtue of "significance stars" still being the default in printed output from lm models.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
Dear Frank,
I'd like to second your implicit motion to make options(show.signif.stars=FALSE) the default. Thanks for raising this point. John On Thu, 7 Feb 2013 05:32:04 -0800 (PST) Frank Harrell <[hidden email]> wrote: > Today's GNU R tutorial in > http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics > points out how bad statistical practice is being further perpetuated, by > virtue of "significance stars" still being the default in printed output > from lm models. > > > > > ----- > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
FWIW, that has been my default setting for years in my .Rprofile.
If there is some agreement on this from R Core, it would seem that version 3.0.0 would be a reasonable breakpoint for this change in default behavior. Regards, Marc Schwartz On Feb 7, 2013, at 8:27 AM, John Fox <[hidden email]> wrote: > Dear Frank, > > I'd like to second your implicit motion to make options(show.signif.stars=FALSE) the default. > > Thanks for raising this point. > > John > > On Thu, 7 Feb 2013 05:32:04 -0800 (PST) > Frank Harrell <[hidden email]> wrote: >> Today's GNU R tutorial in >> http://how-to.linuxcareer.com/a-quick-gnu-r-tutorial-to-statistical-models-and-graphics >> points out how bad statistical practice is being further perpetuated, by >> virtue of "significance stars" still being the default in printed output >> from lm models. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by Frank Harrell
There are only a few things in R where we override the global defaults on a departmental
level -- we really don't like to do so. But "show.signif.stars" is one of the 3. The other 2 if you are curious: set stringsAsFactors=FALSE and make NA included by default in the output of table. We've been overriding both of these for 10+ years. Terry Therneau ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by Frank Harrell
Thanks for bringing this up, Frank.
Since many of us are "educators," I'd like to suggest a bolder approach. Discontinue even offering the stars as an option. Sadly, we can't stop reporting p-values, as the world expects them, but does R need to cater to that attitude by offering star display? For that matter, why not have R report confidence intervals as a default? Many years ago, I wrote a short textbook on stat, and included a substantial section on the dangers of significance testing. All three internal reviewers liked it, but the funny part is that all three said, "I agree with this, but no one else will." :-) Norm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Changing the default for show.signif.stars should be sufficient to ensure
that, if people are going to get themselves into trouble, they will have to do it on purpose. It's just a visual cue; removing it will not remove the underlying issue, namely blind acceptance of unlikely null models and distributions. For any complex problem, there is a solution that is simple, elegant, and wrong. As grants and careers can depend on these magic numbers, Upton Sinclair might save everyone some trouble... It is difficult to get a man to understand something, when his salary depends upon his not understanding. stringsAsFactors, however, is responsible for an endless stream of mildly irritating misunderstandings, and defaulting that to FALSE would be very nice. Just my $0.02. Defaults are one of the most powerful forces in the universe. Also, I liked your book. On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff <[hidden email]>wrote: > Thanks for bringing this up, Frank. > > Since many of us are "educators," I'd like to suggest a bolder approach. > Discontinue even offering the stars as an option. Sadly, we can't stop > reporting p-values, as the world expects them, but does R need to cater > to that attitude by offering star display? For that matter, why not > have R report confidence intervals as a default? > > Many years ago, I wrote a short textbook on stat, and included a > substantial section on the dangers of significance testing. All three > internal reviewers liked it, but the funny part is that all three said, > "I agree with this, but no one else will." :-) > > Norm > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
To clarify, I favor changing the defaults for stringsAsFactors and
show.signif.stars to FALSE in R-3.0.0, and view any attempt to remove either functionality as a seemingly simple but fundamentally misguided idea. This is just my opinion, of course. The change could easily be accompanied by a startup notice or release notes indicating that the changes have been made, and can be reverted to past behavior if the user so desires. Perhaps more users will investigate the various settings, as a happy side effect. My thanks to everyone who spends time supporting and working on R-core. On Sat, Feb 9, 2013 at 12:44 PM, Tim Triche, Jr. <[hidden email]>wrote: > Changing the default for show.signif.stars should be sufficient to ensure > that, if people are going to get themselves into trouble, they will have to > do it on purpose. It's just a visual cue; removing it will not remove the > underlying issue, namely blind acceptance of unlikely null models and > distributions. > > For any complex problem, there is a solution that is simple, elegant, and > wrong. As grants and careers can depend on these magic numbers, Upton > Sinclair might save everyone some trouble... It is difficult to get a man > to understand something, when his salary depends upon his not > understanding. > > stringsAsFactors, however, is responsible for an endless stream of mildly > irritating misunderstandings, and defaulting that to FALSE would be very > nice. > > Just my $0.02. Defaults are one of the most powerful forces in the > universe. > > Also, I liked your book. > > > > On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff <[hidden email]>wrote: > >> Thanks for bringing this up, Frank. >> >> Since many of us are "educators," I'd like to suggest a bolder approach. >> Discontinue even offering the stars as an option. Sadly, we can't stop >> reporting p-values, as the world expects them, but does R need to cater >> to that attitude by offering star display? For that matter, why not >> have R report confidence intervals as a default? >> >> Many years ago, I wrote a short textbook on stat, and included a >> substantial section on the dangers of significance testing. All three >> internal reviewers liked it, but the funny part is that all three said, >> "I agree with this, but no one else will." :-) >> >> Norm >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by Frank Harrell
I appreciate Tim's comments.
I myself have a "social science" paper coming out soon in which I felt forced to use p-values, given their ubiquity. However, I also told readers of the paper that confidence intervals are much more informative and I do provide them. As I said earlier, there is no avoiding that, and R needs to report p-values for that reason. Instead, the question is what to do about the stars; I proposed eliminating them altogether. Star-crazed users know how to determine them themselves from the p-values, but deleting them from R would send a message. I did say my proposal was "bold," which really meant I was suggesting that R do SOMETHING to send that message, not necessarily star elimination. One such "something" would be the proposal I made, which would be to add confidence intervals to the output. This too could be just an option, but again offering that option would send a message. Indeed, I would suggest that the help page explain that confidence intervals are more informative. (The help page could make a similar statement regarding the stars.) When I pitch R to people, I say that in addition to the large function and library base and the nice graphics capabilities, R is above all Statistically Correct--it's written by statisticians who know what they are doing, rather than some programmer simply implementing a formula from a textbook. I know that a lot of people feel this is one of R's biggest strengths. Given that, one might argue that R should do what it can to help users engage in good statistical practice. I think this was Frank's point. Norm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Great discussion. Tim's Sinclair quote is priceless and relates to the non-reproducible research done in some quarters. Norm's wish to remove stars altogether is entirely consistent with good statistical practice and would make a statement that R base adheres to good practice. I don't think it will work to add confidence intervals because models can have nonlinear or interaction terms, and the reference cell for a factor variable may not be what the analyst chooses for a comparison group.
I would like for us to find a way to, over time, implement Norm's wish to de-emphasize P-values in general. The harm done by P-values is immeasureable. Frank
Frank Harrell
Department of Biostatistics, Vanderbilt University |
In reply to this post by Tim Triche, Jr.
On 13-02-09 3:49 PM, Tim Triche, Jr. wrote:
> To clarify, I favor changing the defaults for stringsAsFactors and > show.signif.stars to FALSE in R-3.0.0, and view any attempt to remove > either functionality as a seemingly simple but fundamentally misguided idea. Both of these were discussed by R Core. I think it's unlikely the default for stringsAsFactors will be changed (some R Core members like the current behaviour), but it's fairly likely the show.signif.stars default will change. (That's if someone gets around to it: I personally don't care about that one. P-values are commonly used statistics, and the stars are just a simple graphical display of them. I find some p-values to be useful, and the display to be harmless.) I think it's really unlikely the more extreme changes (i.e. dropping show.signif.stars completely, or dropping p-values) will happen. Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. Duncan Murdoch > > This is just my opinion, of course. The change could easily be accompanied > by a startup notice or release notes indicating that the changes have been > made, and can be reverted to past behavior if the user so desires. Perhaps > more users will investigate the various settings, as a happy side effect. > > My thanks to everyone who spends time supporting and working on R-core. > > > > On Sat, Feb 9, 2013 at 12:44 PM, Tim Triche, Jr. <[hidden email]>wrote: > >> Changing the default for show.signif.stars should be sufficient to ensure >> that, if people are going to get themselves into trouble, they will have to >> do it on purpose. It's just a visual cue; removing it will not remove the >> underlying issue, namely blind acceptance of unlikely null models and >> distributions. >> >> For any complex problem, there is a solution that is simple, elegant, and >> wrong. As grants and careers can depend on these magic numbers, Upton >> Sinclair might save everyone some trouble... It is difficult to get a man >> to understand something, when his salary depends upon his not >> understanding. >> >> stringsAsFactors, however, is responsible for an endless stream of mildly >> irritating misunderstandings, and defaulting that to FALSE would be very >> nice. >> >> Just my $0.02. Defaults are one of the most powerful forces in the >> universe. >> >> Also, I liked your book. >> >> >> >> On Sat, Feb 9, 2013 at 10:48 AM, Norm Matloff <[hidden email]>wrote: >> >>> Thanks for bringing this up, Frank. >>> >>> Since many of us are "educators," I'd like to suggest a bolder approach. >>> Discontinue even offering the stars as an option. Sadly, we can't stop >>> reporting p-values, as the world expects them, but does R need to cater >>> to that attitude by offering star display? For that matter, why not >>> have R report confidence intervals as a default? >>> >>> Many years ago, I wrote a short textbook on stat, and included a >>> substantial section on the dangers of significance testing. All three >>> internal reviewers liked it, but the funny part is that all three said, >>> "I agree with this, but no one else will." :-) >>> >>> Norm >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> >> > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Duncan Murdoch <murdoch.duncan <at> gmail.com> writes:
[snip] > > Regarding stringsAsFactors: I'm not going to defend keeping it as is, > I'll let the people who like it defend it. Would someone (anyone) like to come forward and give us a defense of stringsAsFactors=TRUE -- even someone who doesn't personally like it but would like to play devil's advocate? > What I will likely do is > make a few changes so that character vectors are automatically changed > to factors in modelling functions, so that operating with > stringsAsFactors=FALSE doesn't trigger silly warnings. > > Duncan Murdoch > [apologies for snipping context: "gmane made me do it"] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
On 12.02.2013 14:54, Ben Bolker wrote: > Duncan Murdoch <murdoch.duncan <at> gmail.com> writes: > > [snip] >> >> Regarding stringsAsFactors: I'm not going to defend keeping it as is, >> I'll let the people who like it defend it. > > Would someone (anyone) like to come forward and give us a defense > of stringsAsFactors=TRUE -- even someone who doesn't personally like > it but would like to play devil's advocate? Sure: I will have to change all my scripts, my teaching examples, my book, and lots of code examples for research and particularly consulting jobs. Personally, I think having stringsAsFactors=TRUE is not too bad for read.table() but less useful for data.frame(). And since you ask for the devil's advocate already, related to the subject line: Removing stars is horrible for consulting: With all those people from biology, medicine and other fields who even ask us questions in term of significance stars that are obviously very common for them. Many of them will certainly ask us for the stars, and ask us to switch to another software product once they do not get it from R. They may not be interested in being taught about the advantages or disadvantages of p-values or stars. There are different use cases of R, and I want to keep stars for consulting tasks where things have to be delivered within minutes. I am happy with or without for teaching, where I have the time and can easily talk about the sense and nonsense of p-values. Best, Uwe > >> What I will likely do is >> make a few changes so that character vectors are automatically changed >> to factors in modelling functions, so that operating with >> stringsAsFactors=FALSE doesn't trigger silly warnings. >> >> Duncan Murdoch >> > > [apologies for snipping context: "gmane made me do it"] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Uwe I've been consulting for decades and have never once been asked for such stars. And when a clinical researcher puts a sentence in a study protocol that P<0.05 will be considered "significant" I get them to take it out.
Frank
Frank Harrell
Department of Biostatistics, Vanderbilt University |
In reply to this post by Uwe Ligges-3
On 12/02/2013 9:20 AM, Uwe Ligges wrote:
> > On 12.02.2013 14:54, Ben Bolker wrote: > > Duncan Murdoch <murdoch.duncan <at> gmail.com> writes: > > > > [snip] > >> > >> Regarding stringsAsFactors: I'm not going to defend keeping it as is, > >> I'll let the people who like it defend it. > > > > Would someone (anyone) like to come forward and give us a defense > > of stringsAsFactors=TRUE -- even someone who doesn't personally like > > it but would like to play devil's advocate? > > Sure: > I will have to change all my scripts, my teaching examples, my book, and > lots of code examples for research and particularly consulting jobs. Could you post an example of a non-trivial one? (By trivial, I mean one that says "data.frame() converts character vectors to factors". Obviously that would need to change. I mean one that just assumes current behaviour, and would be broken by the change.) Duncan Murdoch > > Personally, I think having stringsAsFactors=TRUE is not too bad for > read.table() but less useful for data.frame(). > > And since you ask for the devil's advocate already, related to the > subject line: Removing stars is horrible for consulting: With all those > people from biology, medicine and other fields who even ask us questions > in term of significance stars that are obviously very common for them. > Many of them will certainly ask us for the stars, and ask us to switch > to another software product once they do not get it from R. They may not > be interested in being taught about the advantages or disadvantages of > p-values or stars. > > There are different use cases of R, and I want to keep stars for > consulting tasks where things have to be delivered within minutes. I am > happy with or without for teaching, where I have the time and can easily > talk about the sense and nonsense of p-values. > > > Best, > Uwe > > > > > > > > > > > > > > > > >> What I will likely do is > >> make a few changes so that character vectors are automatically changed > >> to factors in modelling functions, so that operating with > >> stringsAsFactors=FALSE doesn't trigger silly warnings. > >> > >> Duncan Murdoch > >> > > > > [apologies for snipping context: "gmane made me do it"] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by Frank Harrell
I think that we should use P < .03 (which approximates the probability of 5 consecutive heads) for assigning significance!
Ravi -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Frank Harrell Sent: Tuesday, February 12, 2013 9:43 AM To: [hidden email] Subject: Re: [Rd] Regression stars Uwe I've been consulting for decades and have never once been asked for such stars. And when a clinical researcher puts a sentence in a study protocol that P<0.05 will be considered "significant" I get them to take it out. Frank Uwe Ligges-3 wrote > On 12.02.2013 14:54, Ben Bolker wrote: >> Duncan Murdoch > <murdoch.duncan <at> > gmail.com> writes: >> >> [snip] >>> >>> Regarding stringsAsFactors: I'm not going to defend keeping it as >>> is, I'll let the people who like it defend it. >> >> Would someone (anyone) like to come forward and give us a defense >> of stringsAsFactors=TRUE -- even someone who doesn't personally like >> it but would like to play devil's advocate? > > Sure: > I will have to change all my scripts, my teaching examples, my book, > and lots of code examples for research and particularly consulting jobs. > > Personally, I think having stringsAsFactors=TRUE is not too bad for > read.table() but less useful for data.frame(). > > And since you ask for the devil's advocate already, related to the > subject line: Removing stars is horrible for consulting: With all > those people from biology, medicine and other fields who even ask us > questions in term of significance stars that are obviously very common for them. > Many of them will certainly ask us for the stars, and ask us to switch > to another software product once they do not get it from R. They may > not be interested in being taught about the advantages or > disadvantages of p-values or stars. > > There are different use cases of R, and I want to keep stars for > consulting tasks where things have to be delivered within minutes. I > am happy with or without for teaching, where I have the time and can > easily talk about the sense and nonsense of p-values. > > > Best, > Uwe > > > > > > > > > > > > > >> >>> What I will likely do is >>> make a few changes so that character vectors are automatically >>> changed to factors in modelling functions, so that operating with >>> stringsAsFactors=FALSE doesn't trigger silly warnings. >>> >>> Duncan Murdoch >>> >> >> [apologies for snipping context: "gmane made me do it"] >> >> ______________________________________________ >> > R-devel@ > mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel@ > mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html Sent from the R devel mailing list archive at Nabble.com. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by Frank Harrell
On 12.02.2013 15:42, Frank Harrell wrote: > Uwe I've been consulting for decades and have never once been asked for such > stars. Honestly: last time I have been asked last week. And when I answered (in another case few months ago) "OK, I can add you another 5 stars for p values smaller than 0.5" they did not find it too funny. Best, Uwe > And when a clinical researcher puts a sentence in a study protocol > that P<0.05 will be considered "significant" I get them to take it out. > > Frank > > Uwe Ligges-3 wrote >> On 12.02.2013 14:54, Ben Bolker wrote: >>> Duncan Murdoch >> <murdoch.duncan <at> >> gmail.com> writes: >>> >>> [snip] >>>> >>>> Regarding stringsAsFactors: I'm not going to defend keeping it as is, >>>> I'll let the people who like it defend it. >>> >>> Would someone (anyone) like to come forward and give us a defense >>> of stringsAsFactors=TRUE -- even someone who doesn't personally like >>> it but would like to play devil's advocate? >> >> Sure: >> I will have to change all my scripts, my teaching examples, my book, and >> lots of code examples for research and particularly consulting jobs. >> >> Personally, I think having stringsAsFactors=TRUE is not too bad for >> read.table() but less useful for data.frame(). >> >> And since you ask for the devil's advocate already, related to the >> subject line: Removing stars is horrible for consulting: With all those >> people from biology, medicine and other fields who even ask us questions >> in term of significance stars that are obviously very common for them. >> Many of them will certainly ask us for the stars, and ask us to switch >> to another software product once they do not get it from R. They may not >> be interested in being taught about the advantages or disadvantages of >> p-values or stars. >> >> There are different use cases of R, and I want to keep stars for >> consulting tasks where things have to be delivered within minutes. I am >> happy with or without for teaching, where I have the time and can easily >> talk about the sense and nonsense of p-values. >> >> >> Best, >> Uwe >> >> >> >> >> >> >> >> >> >> >> >> >> >>> >>>> What I will likely do is >>>> make a few changes so that character vectors are automatically changed >>>> to factors in modelling functions, so that operating with >>>> stringsAsFactors=FALSE doesn't trigger silly warnings. >>>> >>>> Duncan Murdoch >>>> >>> >>> [apologies for snipping context: "gmane made me do it"] >>> >>> ______________________________________________ >>> > >> R-devel@ > >> mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ > >> R-devel@ > >> mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > ----- > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
They are "reaching for the stars". Pardon my jest, but I couldn't resist.
Ravi -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Uwe Ligges Sent: Tuesday, February 12, 2013 10:01 AM To: Frank Harrell Cc: [hidden email] Subject: Re: [Rd] Regression stars On 12.02.2013 15:42, Frank Harrell wrote: > Uwe I've been consulting for decades and have never once been asked > for such stars. Honestly: last time I have been asked last week. And when I answered (in another case few months ago) "OK, I can add you another 5 stars for p values smaller than 0.5" they did not find it too funny. Best, Uwe > And when a clinical researcher puts a sentence in a study protocol > that P<0.05 will be considered "significant" I get them to take it out. > > Frank > > Uwe Ligges-3 wrote >> On 12.02.2013 14:54, Ben Bolker wrote: >>> Duncan Murdoch >> <murdoch.duncan <at> >> gmail.com> writes: >>> >>> [snip] >>>> >>>> Regarding stringsAsFactors: I'm not going to defend keeping it as >>>> is, I'll let the people who like it defend it. >>> >>> Would someone (anyone) like to come forward and give us a >>> defense of stringsAsFactors=TRUE -- even someone who doesn't >>> personally like it but would like to play devil's advocate? >> >> Sure: >> I will have to change all my scripts, my teaching examples, my book, >> and lots of code examples for research and particularly consulting jobs. >> >> Personally, I think having stringsAsFactors=TRUE is not too bad for >> read.table() but less useful for data.frame(). >> >> And since you ask for the devil's advocate already, related to the >> subject line: Removing stars is horrible for consulting: With all >> those people from biology, medicine and other fields who even ask us >> questions in term of significance stars that are obviously very common for them. >> Many of them will certainly ask us for the stars, and ask us to >> switch to another software product once they do not get it from R. >> They may not be interested in being taught about the advantages or >> disadvantages of p-values or stars. >> >> There are different use cases of R, and I want to keep stars for >> consulting tasks where things have to be delivered within minutes. I >> am happy with or without for teaching, where I have the time and can >> easily talk about the sense and nonsense of p-values. >> >> >> Best, >> Uwe >> >> >> >> >> >> >> >> >> >> >> >> >> >>> >>>> What I will likely do is >>>> make a few changes so that character vectors are automatically >>>> changed to factors in modelling functions, so that operating with >>>> stringsAsFactors=FALSE doesn't trigger silly warnings. >>>> >>>> Duncan Murdoch >>>> >>> >>> [apologies for snipping context: "gmane made me do it"] >>> >>> ______________________________________________ >>> > >> R-devel@ > >> mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ > >> R-devel@ > >> mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > ----- > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: > http://r.789695.n4.nabble.com/Regression-stars-tp4657795p4658268.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by Uwe Ligges-3
On 13-02-12 09:20 AM, Uwe Ligges wrote:
> > > On 12.02.2013 14:54, Ben Bolker wrote: >> Duncan Murdoch <murdoch.duncan <at> gmail.com> writes: >> >> [snip] >>> >>> Regarding stringsAsFactors: I'm not going to defend keeping it as is, >>> I'll let the people who like it defend it. >> >> Would someone (anyone) like to come forward and give us a defense >> of stringsAsFactors=TRUE -- even someone who doesn't personally like >> it but would like to play devil's advocate? > > Sure: > I will have to change all my scripts, my teaching examples, my book, and > lots of code examples for research and particularly consulting jobs. > > Personally, I think having stringsAsFactors=TRUE is not too bad for > read.table() but less useful for data.frame(). > > And since you ask for the devil's advocate already, related to the > subject line: Removing stars is horrible for consulting: With all those > people from biology, medicine and other fields who even ask us questions > in term of significance stars that are obviously very common for them. > Many of them will certainly ask us for the stars, and ask us to switch > to another software product once they do not get it from R. They may not > be interested in being taught about the advantages or disadvantages of > p-values or stars. > > There are different use cases of R, and I want to keep stars for > consulting tasks where things have to be delivered within minutes. I am > happy with or without for teaching, where I have the time and can easily > talk about the sense and nonsense of p-values. > > > Best, > Uwe Thanks, Uwe. Now let me go one step farther. Can you (or anyone) give a good argument **other than backward compatibility** for keeping the stringAsFactors=TRUE argument on data.frame()? I appreciate your distinction between data.frame() and read.table()'s use of stringAsFactors, and I can see that there is some point for quick-and-dirty interactive use in setting all non-numeric variables to factors (arguing that wanting non-numerics as factors is somewhat more common than wanting them as strings). It might be nice to add an optional stringsAsFactors (and check.names) argument to transform(): I've had to write my own Transform() function to allow the defaults to be overridden, since transform() calls data.frame() with the defaults. (Setting the stringsAsFactors option globally would work, although not for check.names.) Ben BOlker > >> >>> What I will likely do is >>> make a few changes so that character vectors are automatically changed >>> to factors in modelling functions, so that operating with >>> stringsAsFactors=FALSE doesn't trigger silly warnings. >>> >>> Duncan Murdoch >>> >> >> [apologies for snipping context: "gmane made me do it"] >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
On 12.02.2013 16:40, Ben Bolker wrote: > On 13-02-12 09:20 AM, Uwe Ligges wrote: >> >> >> On 12.02.2013 14:54, Ben Bolker wrote: >>> Duncan Murdoch <murdoch.duncan <at> gmail.com> writes: >>> >>> [snip] >>>> >>>> Regarding stringsAsFactors: I'm not going to defend keeping it as is, >>>> I'll let the people who like it defend it. >>> >>> Would someone (anyone) like to come forward and give us a defense >>> of stringsAsFactors=TRUE -- even someone who doesn't personally like >>> it but would like to play devil's advocate? >> >> Sure: >> I will have to change all my scripts, my teaching examples, my book, and >> lots of code examples for research and particularly consulting jobs. >> >> Personally, I think having stringsAsFactors=TRUE is not too bad for >> read.table() but less useful for data.frame(). >> >> And since you ask for the devil's advocate already, related to the >> subject line: Removing stars is horrible for consulting: With all those >> people from biology, medicine and other fields who even ask us questions >> in term of significance stars that are obviously very common for them. >> Many of them will certainly ask us for the stars, and ask us to switch >> to another software product once they do not get it from R. They may not >> be interested in being taught about the advantages or disadvantages of >> p-values or stars. >> >> There are different use cases of R, and I want to keep stars for >> consulting tasks where things have to be delivered within minutes. I am >> happy with or without for teaching, where I have the time and can easily >> talk about the sense and nonsense of p-values. >> >> >> Best, >> Uwe > > Thanks, Uwe. > Now let me go one step farther. > > Can you (or anyone) give a good argument **other than backward > compatibility** for keeping the stringAsFactors=TRUE argument on > data.frame()? No, I cannot, Uwe > > I appreciate your distinction between data.frame() and read.table()'s > use of stringAsFactors, and I can see that there is some point for > quick-and-dirty interactive use in setting all non-numeric variables to > factors (arguing that wanting non-numerics as factors is somewhat more > common than wanting them as strings). > > It might be nice to add an optional stringsAsFactors (and check.names) > argument to transform(): I've had to write my own Transform() function > to allow the defaults to be overridden, since transform() calls > data.frame() with the defaults. (Setting the stringsAsFactors option > globally would work, although not for check.names.) > > Ben BOlker > >> >>> >>>> What I will likely do is >>>> make a few changes so that character vectors are automatically changed >>>> to factors in modelling functions, so that operating with >>>> stringsAsFactors=FALSE doesn't trigger silly warnings. >>>> >>>> Duncan Murdoch >>>> >>> >>> [apologies for snipping context: "gmane made me do it"] >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by bbolker
On 12/02/2013 10:40 AM, Ben Bolker wrote:
> On 13-02-12 09:20 AM, Uwe Ligges wrote: > > > > > > On 12.02.2013 14:54, Ben Bolker wrote: > >> Duncan Murdoch <murdoch.duncan <at> gmail.com> writes: > >> > >> [snip] > >>> > >>> Regarding stringsAsFactors: I'm not going to defend keeping it as is, > >>> I'll let the people who like it defend it. > >> > >> Would someone (anyone) like to come forward and give us a defense > >> of stringsAsFactors=TRUE -- even someone who doesn't personally like > >> it but would like to play devil's advocate? > > > > Sure: > > I will have to change all my scripts, my teaching examples, my book, and > > lots of code examples for research and particularly consulting jobs. > > > > Personally, I think having stringsAsFactors=TRUE is not too bad for > > read.table() but less useful for data.frame(). > > > > And since you ask for the devil's advocate already, related to the > > subject line: Removing stars is horrible for consulting: With all those > > people from biology, medicine and other fields who even ask us questions > > in term of significance stars that are obviously very common for them. > > Many of them will certainly ask us for the stars, and ask us to switch > > to another software product once they do not get it from R. They may not > > be interested in being taught about the advantages or disadvantages of > > p-values or stars. > > > > There are different use cases of R, and I want to keep stars for > > consulting tasks where things have to be delivered within minutes. I am > > happy with or without for teaching, where I have the time and can easily > > talk about the sense and nonsense of p-values. > > > > > > Best, > > Uwe > > Thanks, Uwe. > Now let me go one step farther. > > Can you (or anyone) give a good argument **other than backward > compatibility** for keeping the stringAsFactors=TRUE argument on > data.frame()? I can, under two assumptions: 1. We keep stringsAsFactors=TRUE on read.table(). 2. We keep the stringsAsFactors argument in data.frame(). Under those assumptions, it would just be confusing to have opposite defaults. (Just in case someone hasn't read all of this thread: I'd be happier to have the default be FALSE in both cases, but not until 3.1.x. For 3.0.x I think I'd just change the default value of default.stringsAsFactors() to FALSE, so people could easily get the old behaviour.) Duncan Murdoch > > I appreciate your distinction between data.frame() and read.table()'s > use of stringAsFactors, and I can see that there is some point for > quick-and-dirty interactive use in setting all non-numeric variables to > factors (arguing that wanting non-numerics as factors is somewhat more > common than wanting them as strings). > > It might be nice to add an optional stringsAsFactors (and check.names) > argument to transform(): I've had to write my own Transform() function > to allow the defaults to be overridden, since transform() calls > data.frame() with the defaults. (Setting the stringsAsFactors option > globally would work, although not for check.names.) > > Ben BOlker > > > > >> > >>> What I will likely do is > >>> make a few changes so that character vectors are automatically changed > >>> to factors in modelling functions, so that operating with > >>> stringsAsFactors=FALSE doesn't trigger silly warnings. > >>> > >>> Duncan Murdoch > >>> > >> > >> [apologies for snipping context: "gmane made me do it"] > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Free forum by Nabble | Edit this page |