A comment about R:

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

A comment about R:

Peter Muhlberger
I'm someone who from time to time comes to R to do applied stats for social
science research.  I think the R language is excellent--much better than
Stata for writing complex statistical programs.  I am thrilled that I can do
complex stats readily in R--sem, maximum likelihood, bootstrapping, some
Bayesian analysis.  I wish I could make R my main statistical package, but
find that a few stats that are important to my work are difficult to find or
produce in R.  Before I list some examples, I recognize that people view R
not as a statistical package but rather as a statistical programming
environment.  That said, however, it seems, from my admittedly limited
perspective, that it would be fairly easy to make a few adjustments to R
that would make it a lot more practical and friendly for a broader range of
people--including people like me who from time to time want to do
statistical programming but more often need to run canned procedures.  I'm
not a statistician, so I don't want to have to learn everything there is to
know about common procedures I use, including how to write them from
scratch.  I want to be able to focus my efforts on more novel problems w/o
reinventing the wheel.  I would also prefer not to have to work through a
couple books on R or S+ to learn how to meet common needs in R.  If R were
extended a bit in the direction of helping people like me, I wonder whether
it would not acquire a much broader audience.  Then again, these may just be
the rantings of someone not sufficiently familiar w/ R or the community of
stat package users--so take my comments w/ a grain of salt.

Some examples of statistics I typically use that are difficult to find and /
or produce or produce in a usefully formatted way in R--

Ex. 1)  Wald tests of linear hypotheses after max. likelihood or even after
a regression.  "Wald" does not even appear in my standard R package on a
search.  There's no comment in the lm help or optim help about what function
to use for hypothesis tests.  I know that statisticians prefer likelihood
ratio tests, but Wald tests are still useful and indeed crucial for
first-pass analysis.  After searching with Google for some time, I found
several Wald functions in various contributed R packages I did not have
installed.  One confusion was which one would be relevant to my needs.  This
took some time to resolve.  I concluded, perhaps on insufficient evidence,
that package car's Wald test would be most helpful.  To use it, however, one
has to put together a matrix for the hypotheses, which can be arduous for a
many-term regression or a complex hypothesis.  In comparison, in Stata one
simply states the hypothesis in symbolic terms.  I also don't know for
certain that this function in car will work or work properly w/ various
kinds of output, say from lm or from optim.  To be sure, I'd need to run
time-consuming tests comparing it with Stata output or examine the
function's code.  In Stata the test is easy to find, and there's no
uncertainty about where it can be run or its accuracy.  Simply having a
comment or "see also" in lm help or mle or optim help pointing the user to
the right Wald function would be of enormous help.

Ex. 2) Getting neat output of a regression with Huberized variance matrix.
I frequently have to run regressions w/ robust variances.  In Stata, one
simply adds the word "robust" to the end of the command or
"cluster(cluster.variable)" for a cluster-robust error.  In R, there are two
functions, robcov and hccm.  I had to run tests to figure out what the
relationship is between them and between them and Stata (robcov w/o cluster
gives hccm's hc0; hccm's hc1 is equivalent to Stata's 'robust' w/o cluster;
etc.).  A single sentence in hccm's help saying something to the effect that
statisticians prefer hc3 for most types of data might save me from having to
scramble through the statistical literature to try to figure out which of
these I should be using.  A few sentences on what the differences are
between these methods would be even better.  Then, there's the problem of
output.  Given that hc1 or hc3 are preferred for non-clustered data, I'd
need to be able to get regression output of the form summary(lm) out of
hccm, for any practical use.  Getting this, however, would require
programming my own function.  Huberized t-stats for regressions are
commonplace needs, an R oriented a little toward more everyday needs would
not require programming of such needs.  Also, I'm not sure yet how well any
of the existing functions handle missing data.

Ex. 3)  I need to do bootstrapping w/ clustered data, again a common
statistical need.  I wasted a good deal of time reading the help contents of
boot and Bootstrap, only to conclude that I'd need to write my own, probably
inefficient, function to bootstrap clustered data if I were to use boot.
It's odd that boot can't handle this more directly.  After more digging, I
learned that bootcov in package Design would handle the cluster bootstrap
and save the parameters.  I wouldn't have found this if I had not needed
bootcov for another purpose.  Again, maybe a few words in the boot help
saying that 'for clustered data, you could use bootcov or program a function
in boot' would be very helpful.  I still don't know whether I can feed the
results of bootcov back into functions in the boot package for further
analysis.

My 2 bits for what they're worth,

Peter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

Fox, John
Dear Peter,

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Peter
> Muhlberger
> Sent: Wednesday, January 04, 2006 2:43 PM
> To: rhelp
> Subject: [R] A comment about R:
>

. . .
 

> Ex. 1)  Wald tests of linear hypotheses after max. likelihood
> or even after a regression.  "Wald" does not even appear in
> my standard R package on a search.  There's no comment in the
> lm help or optim help about what function to use for
> hypothesis tests.  I know that statisticians prefer
> likelihood ratio tests, but Wald tests are still useful and
> indeed crucial for first-pass analysis.  After searching with
> Google for some time, I found several Wald functions in
> various contributed R packages I did not have installed.  One
> confusion was which one would be relevant to my needs.  This
> took some time to resolve.  I concluded, perhaps on
> insufficient evidence, that package car's Wald test would be
> most helpful.  To use it, however, one has to put together a
> matrix for the hypotheses, which can be arduous for a
> many-term regression or a complex hypothesis.  
> In comparison,
> in Stata one simply states the hypothesis in symbolic terms.  
> I also don't know for certain that this function in car will
> work or work properly w/ various kinds of output, say from lm
> or from optim.  To be sure, I'd need to run time-consuming
> tests comparing it with Stata output or examine the
> function's code.  In Stata the test is easy to find, and
> there's no uncertainty about where it can be run or its
> accuracy.  Simply having a comment or "see also" in lm help
> or mle or optim help pointing the user to the right Wald
> function would be of enormous help.
>


The reference, I believe, is to the linear.hypothesis() function, which has
methods for lm and glm objects. [To see what kinds of objects
linear.hypothesis is suitable for, use the command
methods(linear.hypothesis).] For lm objects, you get an F-test by default.
Note that the Anova() function, also in car, can more conveniently compute
Wald tests for certain kinds of hypotheses. More generally, however, I'd be
interested in your suggestions for an alternative method of specifying
linear hypotheses. There is currently no method for mle objects, but adding
one is a good idea, and I'll do that when I have a chance. (In the meantime,
it's very easy to compute Wald tests from the coefficients and the
hypothesis and coefficient-covariance matrices. Writing a small function to
do so, without the bells and whistles of something like linear.hypothesis(),
should not be hard. Indeed, the ability to do this kind of thing easily is
what I see as the primary advantage of working in a statistical computing
environment like R -- or Stata.

> Ex. 2) Getting neat output of a regression with Huberized
> variance matrix.
> I frequently have to run regressions w/ robust variances.  In
> Stata, one simply adds the word "robust" to the end of the
> command or "cluster(cluster.variable)" for a cluster-robust
> error.  In R, there are two functions, robcov and hccm.  I
> had to run tests to figure out what the relationship is
> between them and between them and Stata (robcov w/o cluster
> gives hccm's hc0; hccm's hc1 is equivalent to Stata's
> 'robust' w/o cluster; etc.).  A single sentence in hccm's
> help saying something to the effect that statisticians prefer
> hc3 for most types of data might save me from having to
> scramble through the statistical literature to try to figure
> out which of these I should be using.  A few sentences on
> what the differences are between these methods would be even
> better.  Then, there's the problem of output.  Given that hc1
> or hc3 are preferred for non-clustered data, I'd need to be
> able to get regression output of the form summary(lm) out of
> hccm, for any practical use.  Getting this, however, would
> require programming my own function.  Huberized t-stats for
> regressions are commonplace needs, an R oriented a little
> toward more everyday needs would not require programming of
> such needs.  Also, I'm not sure yet how well any of the
> existing functions handle missing data.
>

I think that we have a philosophical difference here: I don't like giving
advice in documentation. An egregious extended example of this, in my
opinion, is the SPSS documentation. The hccm() function uses hc3 as the
default, which is an implicit recommendation, but more usefully, in my view,
points to Long and Erwin's American Statistician paper on the subject, which
does give advice and which is quite accessible. As well, and more generally,
the car package is associated with a book (my R and S-PLUS Companion to
Applied Regression), which gives advice, though, admittedly, tersely in this
case.

The Anova() function with argument white=TRUE will give you F-tests
corresponding to the t-tests to which you refer (though it will combine df
for multiple-df terms in the model). To get the kind of summary you
describe, you could use something like

mysummary <- function(model){
        coef <- coef(model)
        se <- sqrt(diag(hccm(model)))
        t <- coef/se
        p <- 2*pt(abs(t), df=model$df.residual, lower=FALSE)
        table <- cbind(coef, se, t, p)
        rownames(table) <- names(coef)
        colnames(table) <- c("Estimate", "Std. Error", "t value",
"Pr(>|t|)")
        table
        }

Again, it's not time-consuming to write simple functions like this for one's
own use, and the ability to do so is a strength of R, in my view.

I'm not sure what you mean about handling missing data: functions like
hccm(), linear.hypothesis(), and Anova() start with a model object for which
missing data have already been handled.

Regards,
 John

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

Achim Zeileis
As John and myself seem to have written our replies in parallel, hence
I added some more clarifying remarks in this mail:

> Note that the Anova() function, also in car, can more conveniently compute
> Wald tests for certain kinds of hypotheses. More generally, however, I'd be
> interested in your suggestions for an alternative method of specifying
> linear hypotheses.

My understanding was that Peter just wants to eliminate various elements
from the terms(obj) which is what waldtest() in lmtest supports. If some
other way of specifying nested models is required, I'd also be interested
in that.

> The Anova() function with argument white=TRUE will give you F-tests
> corresponding to the t-tests to which you refer (though it will combine df
> for multiple-df terms in the model). To get the kind of summary you
> describe, you could use something like
>
> mysummary <- function(model){
> coef <- coef(model)
> se <- sqrt(diag(hccm(model)))
> t <- coef/se
> p <- 2*pt(abs(t), df=model$df.residual, lower=FALSE)
> table <- cbind(coef, se, t, p)
> rownames(table) <- names(coef)
> colnames(table) <- c("Estimate", "Std. Error", "t value",
> "Pr(>|t|)")
> table
> }

This is supported out of the box in coeftest() in lmtest.
Z

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

Frank Harrell
In reply to this post by Fox, John
John Fox wrote:

> Dear Peter,
>
>
>>-----Original Message-----
>>From: [hidden email]
>>[mailto:[hidden email]] On Behalf Of Peter
>>Muhlberger
>>Sent: Wednesday, January 04, 2006 2:43 PM
>>To: rhelp
>>Subject: [R] A comment about R:
>>
>
>
> . . .
>  
>
>>Ex. 1)  Wald tests of linear hypotheses after max. likelihood
>>or even after a regression.  "Wald" does not even appear in
>>my standard R package on a search.  There's no comment in the
>>lm help or optim help about what function to use for
>>hypothesis tests.  I know that statisticians prefer
>>likelihood ratio tests, but Wald tests are still useful and
>>indeed crucial for first-pass analysis.  After searching with
>>Google for some time, I found several Wald functions in
>>various contributed R packages I did not have installed.  One
>>confusion was which one would be relevant to my needs.  This
>>took some time to resolve.  I concluded, perhaps on
>>insufficient evidence, that package car's Wald test would be
>>most helpful.  To use it, however, one has to put together a
>>matrix for the hypotheses, which can be arduous for a
>>many-term regression or a complex hypothesis.  
>>In comparison,
>>in Stata one simply states the hypothesis in symbolic terms.  
>>I also don't know for certain that this function in car will
>>work or work properly w/ various kinds of output, say from lm
>>or from optim.  To be sure, I'd need to run time-consuming
>>tests comparing it with Stata output or examine the
>>function's code.  In Stata the test is easy to find, and
>>there's no uncertainty about where it can be run or its
>>accuracy.  Simply having a comment or "see also" in lm help
>>or mle or optim help pointing the user to the right Wald
>>function would be of enormous help.

The Design package's anova.Design and contrast.Design make many Wald
tests very easy.  contrast( ) will allow you to test all kinds of
hypotheses by stating which differences in predicted values you are
interested in.

Frank Harrell

>>
>
>
>
> The reference, I believe, is to the linear.hypothesis() function, which has
> methods for lm and glm objects. [To see what kinds of objects
> linear.hypothesis is suitable for, use the command
> methods(linear.hypothesis).] For lm objects, you get an F-test by default.
> Note that the Anova() function, also in car, can more conveniently compute
> Wald tests for certain kinds of hypotheses. More generally, however, I'd be
> interested in your suggestions for an alternative method of specifying
> linear hypotheses. There is currently no method for mle objects, but adding
> one is a good idea, and I'll do that when I have a chance. (In the meantime,
> it's very easy to compute Wald tests from the coefficients and the
> hypothesis and coefficient-covariance matrices. Writing a small function to
> do so, without the bells and whistles of something like linear.hypothesis(),
> should not be hard. Indeed, the ability to do this kind of thing easily is
> what I see as the primary advantage of working in a statistical computing
> environment like R -- or Stata.
>
>
>>Ex. 2) Getting neat output of a regression with Huberized
>>variance matrix.
>>I frequently have to run regressions w/ robust variances.  In
>>Stata, one simply adds the word "robust" to the end of the
>>command or "cluster(cluster.variable)" for a cluster-robust
>>error.  In R, there are two functions, robcov and hccm.  I
>>had to run tests to figure out what the relationship is
>>between them and between them and Stata (robcov w/o cluster
>>gives hccm's hc0; hccm's hc1 is equivalent to Stata's
>>'robust' w/o cluster; etc.).  A single sentence in hccm's
>>help saying something to the effect that statisticians prefer
>>hc3 for most types of data might save me from having to
>>scramble through the statistical literature to try to figure
>>out which of these I should be using.  A few sentences on
>>what the differences are between these methods would be even
>>better.  Then, there's the problem of output.  Given that hc1
>>or hc3 are preferred for non-clustered data, I'd need to be
>>able to get regression output of the form summary(lm) out of
>>hccm, for any practical use.  Getting this, however, would
>>require programming my own function.  Huberized t-stats for
>>regressions are commonplace needs, an R oriented a little
>>toward more everyday needs would not require programming of
>>such needs.  Also, I'm not sure yet how well any of the
>>existing functions handle missing data.
>>
>
>
> I think that we have a philosophical difference here: I don't like giving
> advice in documentation. An egregious extended example of this, in my
> opinion, is the SPSS documentation. The hccm() function uses hc3 as the
> default, which is an implicit recommendation, but more usefully, in my view,
> points to Long and Erwin's American Statistician paper on the subject, which
> does give advice and which is quite accessible. As well, and more generally,
> the car package is associated with a book (my R and S-PLUS Companion to
> Applied Regression), which gives advice, though, admittedly, tersely in this
> case.
>
> The Anova() function with argument white=TRUE will give you F-tests
> corresponding to the t-tests to which you refer (though it will combine df
> for multiple-df terms in the model). To get the kind of summary you
> describe, you could use something like
>
> mysummary <- function(model){
> coef <- coef(model)
> se <- sqrt(diag(hccm(model)))
> t <- coef/se
> p <- 2*pt(abs(t), df=model$df.residual, lower=FALSE)
> table <- cbind(coef, se, t, p)
> rownames(table) <- names(coef)
> colnames(table) <- c("Estimate", "Std. Error", "t value",
> "Pr(>|t|)")
> table
> }
>
> Again, it's not time-consuming to write simple functions like this for one's
> own use, and the ability to do so is a strength of R, in my view.
>
> I'm not sure what you mean about handling missing data: functions like
> hccm(), linear.hypothesis(), and Anova() start with a model object for which
> missing data have already been handled.
>
> Regards,
>  John
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

Peter Muhlberger
In reply to this post by Achim Zeileis
On 1/5/06 11:27 AM, "Achim Zeileis" <[hidden email]> wrote:

> As John and myself seem to have written our replies in parallel, hence
> I added some more clarifying remarks in this mail:
>> Note that the Anova() function, also in car, can more conveniently compute
>> Wald tests for certain kinds of hypotheses. More generally, however, I'd be
>> interested in your suggestions for an alternative method of specifying
>> linear hypotheses.
> My understanding was that Peter just wants to eliminate various elements
> from the terms(obj) which is what waldtest() in lmtest supports. If some
> other way of specifying nested models is required, I'd also be interested
> in that.


My two most immediate problems were a) to test whether a set of coefficients
were jointly zero (as Achim suggests, though the complication here is that
the varcov matrix is bootstrapped), but also b) to test whether the average
of a set of coefficients was equal to zero.  At other points in time, I
remember having had to test more complex linear hypotheses involving joint
combinations of equality, non-zero, and 'averages.'  The Stata interface for
linear hypothesis tests is amazingly straightforward.  For example, after a
regression, I could use the following to test the joint hypothesis that
v1=v2 and the average (or sum) of v3 through v5 is zero and .75v6+.25v7 is
zero:

test v1=v2
test v3+v4+v5=0, accum
test .75*v6+.25*v7=0, accum

I don't even have to set up a matrix for my test ];-) !  The output would
show not merely the joint test of all the hypotheses but the tests along the
way, one for each line of commands.  I vaguely remember the hypothesis
testing command after an ml run is much the same and cross-equation
hypothesis tests simply involve adding an equation indicator to the terms.
I can get huberized var-cov matrices simply by adding "robust" to the
regression command.  I believe there's also a command that will huberize a
var-cov matrix after the fact.  Subsequent hypothesis tests would be on the
huberized matrix.

I won't claim to know what's good for R or the R community, but it would be
nice for me and perhaps others if there were a comparable straightforward
command as in Stata that could meet a variety of needs.  I need to play w/
the commands that have been suggested to me by you guys recently, but I'm
looking at a multitude of commands none of which I suspect have the
flexibility and ease of use of the above Stata commands, at least for the
kind of applications I'd like.  Perhaps the point of R isn't to serve as a
package for a wider set of non-statisticians, but if it wishes to develop in
that direction, facilities like this may be helpful.  It's interesting that
Achim points out that a function John suggests is already available in R--an
indication that even R experts don't have a complete handle on everything in
R even on a relatively straightforward topic like hypothesis tests.

John is no doubt right that editorializing about statistics would be out of
place on an R help page.  But when I have gone to statistical papers, many
have been difficult to access & not very helpful for practical concerns.
I'm glad to hear that Long and Erwin's paper is helpful, but there's a
goodly list of papers mentioned in help.  Perhaps something that would be
useful is some way of highlighting on a help page which reference is most
helpful for practical concerns?

Again, thanks for all the great input from everyone!

Peter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

Achim Zeileis
Peter:

> My two most immediate problems were a) to test whether a set of coefficients
> were jointly zero (as Achim suggests, though the complication here is that
> the varcov matrix is bootstrapped), but also b) to test whether the average

This can be tested with both waldtest() and linear.hypothesis() when
you've got the bootstrapped vcov estimator of your choice available. This
can be conveniently plugged into both functions (either as a vcov matrix
or as a function extracting the vcov matrix from the fitted model object).
There is some discussion about this in the vignette accompanying the
sandwich package.

> of a set of coefficients was equal to zero.  At other points in time, I
> remember having had to test more complex linear hypotheses involving joint
> combinations of equality, non-zero, and 'averages.'  The Stata interface for
> linear hypothesis tests is amazingly straightforward.  For example, after a
> regression, I could use the following to test the joint hypothesis that
> v1=v2 and the average (or sum) of v3 through v5 is zero and .75v6+.25v7 is
> zero:
>
> test v1=v2
> test v3+v4+v5=0, accum
> test .75*v6+.25*v7=0, accum

Mmmh, should be possible to derive the restriction matrix from this
together with the terms structure...I'll think about this.

> I don't even have to set up a matrix for my test ];-) !  The output would
> show not merely the joint test of all the hypotheses but the tests along the
> way, one for each line of commands.  I vaguely remember the hypothesis
> testing command after an ml run is much the same and cross-equation
> hypothesis tests simply involve adding an equation indicator to the terms.
> I can get huberized var-cov matrices simply by adding "robust" to the
> regression command.

Whether you find this simple or not depends on what you might want to
have. Personally, I always find it very limiting if I've only got a switch
to choose one or another vcov matrix when there is a multitude of vcov
matrices in use in the literature. What if you would want to do HC3
instead of the HC(0) that is offered by Eviews...or HC4...or HAC...or
something bootstrapped...or...
In my view, this is the stengths of many implementation in R: you can make
programs very modular so that the user can easily extend the software or
re-use it for other purposes. The price you pay for that is that it is not
as easy to as a point-and-click software that offers some standard tools.
Of course, both sides have advantages or disadvantages.

> I won't claim to know what's good for R or the R community, but it would be
> nice for me and perhaps others if there were a comparable straightforward
> command as in Stata that could meet a variety of needs.  I need to play w/
> the commands that have been suggested to me by you guys recently, but I'm
> looking at a multitude of commands none of which I suspect have the
> flexibility and ease of use of the above Stata commands, at least for the
> kind of applications I'd like.  Perhaps the point of R isn't to serve as a
> package for a wider set of non-statisticians, but if it wishes to develop in
> that direction, facilities like this may be helpful.

The point of R is hard to determine, R itself does not wish this or that,
it is an open source project which is driven by many contributors. If
there are people out there that want to use R for social sciences, they
are free to contribute to the project. And in this particular case, I
think that there has been some activity in the last one or two years
aiming at providing tools for econometrics, quantitative methods in the
social and political sciences.
However, you won't be very happy with R when you want R to be Stata. If
you want Stata, use it.

> It's interesting that
> Achim points out that a function John suggests is already available in R--an
> indication that even R experts don't have a complete handle on everything in
> R even on a relatively straightforward topic like hypothesis tests.

In fairness to John, this functionality became available rather recently.
And it's not surprising that John knows his car package better and that
I'm more familiar with my lmtest package. Therefore, it's very natural to
think first how you would do a certain task using your own package...in
particular given that you specifically asked about car.

> John is no doubt right that editorializing about statistics would be out of
> place on an R help page.  But when I have gone to statistical papers, many
> have been difficult to access & not very helpful for practical concerns.
> I'm glad to hear that Long and Erwin's paper is helpful, but there's a
> goodly list of papers mentioned in help.

I would think this to be an advantage not a drawback. It's the user's
responsiblity to know what he/she is doing.

Best wishes,
Z

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

Leif Kirschenbaum-4
In reply to this post by Peter Muhlberger
A few thoughts about R vs SAS:
I started learning SAS 8 years ago at IBM, I believe it was version 6.10.
I started with R 7 months ago.

Learning curve:
  I think I can do everything in R after 7 months that I could do in SAS after about 4 years.

Bugs:
  I suffered through several SAS version changes, 7.0, 7.1, 7.2, 8.0, 9.0 (I may have misquoted some version numbers). Every version change gave me headaches, as every version release (of an expensive commercially produced software set) had bugs which upset or crashed previously working code. I had code which ran fine under Windows 2000 and terribly under Windows XP. Most bugs I found were noted by SAS, but never fixed.
  With R I have encounted very few bugs, except for an occasional crash of R, which I usually ascribe to some bug in Windows XP.

Help:
  SAS help was OK. As others have mentioned, there is too much. I even had the set of printed manuals on my desk (stretching 4 feet or so), which were quote impenetrable. I had almost no support from colleagues: even within IBM the number of advanced SAS users was small.
  With R this mailing list has been of great help: almost every issue I copy some program and save it as a "R hint xxxx" file.
--> A REQUEST
I would say that I would appreciate a few more program examples with the help pages for some functions. For instance, "?Control" tells me about "if(cond) cons.expr  else  alt.expr", however an example of
   if(i==1) { print("one")
   } else if(i==2) { print("two")
   } else if(i>2) { print("bigger than two") }
 at the end of that help section would have been very helpful for me a few months ago.

Functions:
  Writing my own functions in SAS was by use of macros, and usually depended heavily on macro substitution. Learning SAS's macro language, especially macro substitution, was very difficult and it took me years to be able to write complicated functions. Quite different situation in R. Some functions I have written by dint of copying code from other people's packages, which has been very helpful.
  I wanted to generate arbitrary k-values (the k-multiplier of sigma for a given alpha, beta, and N to establish confidence limits around a mean for small populations). I had a table from a years old microfiche book giving values but wanted to generate my own. I had to find the correct integrals to approximate the k-values and then write two SAS macros which iterated to the desired level of tolerance to generate values. I would guess that there is either an R base function or a package which will do this for me (when I need to start generating AQL tables). Given the utility of these numbers, I was disappointed with SAS.

Data manipulation:
  All SAS data is in 2-dimensional datasets, which was very frustrating after having used variables, arrays, and matrices in BASIC, APL, FORTRAN, C, Pascal, and LabVIEW. SAS allows you to access only 1 row of a dataset at a time which was terribly horribly incomprehensibly frustrating. There were so many many problems I had to solve where I had to work around this SAS paradigm.
  In R, I can access all the elements of a matrix/dataframe at once, and I can use >2 dimensional matrices. In fact, the limitations of SAS I had ingrained from 7.5 years has sometimes made me forget how I can do something so easily in R, like be able to know when a value in a column of a dataframe changes:
  DF$marker <- DF[1:(nrow(DF)-1),icol] != DF[2:nrow(DF),icol]
This was hard to do in SAS...and even after years it was sometimes buggy, keeping variable values from previous iterations of a SAS program.
  One very nice advantage with SAS is that after data is saved in libraries, there is a GUI showing all the libraries and the datasets inside the libraries with sizes and dates. While we can save Rdata objects in an external file, the base package doesn't seem to have the same capabilities as SAS.

Graphics:
  SAS graphics were quite mediocre, and generating customized labels was cumbersome. Porting code from one Windows platform to another produced unpredictable and sometimes unworkable results.
  It has been easier in R: I anticipate that I will be able to port R Windows code to *NIX and generate the same graphics.

Batch commands:
  I am working on porting some of my R code to our *NIX server to generate reports and graphs on a scheduled basis. Although a few at IBM did this with SAS, I would have found doing this fairly daunting.


-Leif

-----------------------------
 Leif Kirschenbaum, Ph.D.
 Senior Yield Engineer
 Reflectivity
 [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

Frank Harrell
Leif Kirschenbaum wrote:

> A few thoughts about R vs SAS:
> I started learning SAS 8 years ago at IBM, I believe it was version 6.10.
> I started with R 7 months ago.
>
> Learning curve:
>   I think I can do everything in R after 7 months that I could do in SAS after about 4 years.
>
> Bugs:
>   I suffered through several SAS version changes, 7.0, 7.1, 7.2, 8.0, 9.0 (I may have misquoted some version numbers). Every version change gave me headaches, as every version release (of an expensive commercially produced software set) had bugs which upset or crashed previously working code. I had code which ran fine under Windows 2000 and terribly under Windows XP. Most bugs I found were noted by SAS, but never fixed.
>   With R I have encounted very few bugs, except for an occasional crash of R, which I usually ascribe to some bug in Windows XP.
>
> Help:
>   SAS help was OK. As others have mentioned, there is too much. I even had the set of printed manuals on my desk (stretching 4 feet or so), which were quote impenetrable. I had almost no support from colleagues: even within IBM the number of advanced SAS users was small.
>   With R this mailing list has been of great help: almost every issue I copy some program and save it as a "R hint xxxx" file.
> --> A REQUEST
> I would say that I would appreciate a few more program examples with the help pages for some functions. For instance, "?Control" tells me about "if(cond) cons.expr  else  alt.expr", however an example of
>    if(i==1) { print("one")
>    } else if(i==2) { print("two")
>    } else if(i>2) { print("bigger than two") }
>  at the end of that help section would have been very helpful for me a few months ago.
>
> Functions:
>   Writing my own functions in SAS was by use of macros, and usually depended heavily on macro substitution. Learning SAS's macro language, especially macro substitution, was very difficult and it took me years to be able to write complicated functions. Quite different situation in R. Some functions I have written by dint of copying code from other people's packages, which has been very helpful.
>   I wanted to generate arbitrary k-values (the k-multiplier of sigma for a given alpha, beta, and N to establish confidence limits around a mean for small populations). I had a table from a years old microfiche book giving values but wanted to generate my own. I had to find the correct integrals to approximate the k-values and then write two SAS macros which iterated to the desired level of tolerance to generate values. I would guess that there is either an R base function or a package which will do this for me (when I need to start generating AQL tables). Given the utility of these numbers, I was disappointed with SAS.
>
> Data manipulation:
>   All SAS data is in 2-dimensional datasets, which was very frustrating after having used variables, arrays, and matrices in BASIC, APL, FORTRAN, C, Pascal, and LabVIEW. SAS allows you to access only 1 row of a dataset at a time which was terribly horribly incomprehensibly frustrating. There were so many many problems I had to solve where I had to work around this SAS paradigm.
>   In R, I can access all the elements of a matrix/dataframe at once, and I can use >2 dimensional matrices. In fact, the limitations of SAS I had ingrained from 7.5 years has sometimes made me forget how I can do something so easily in R, like be able to know when a value in a column of a dataframe changes:
>   DF$marker <- DF[1:(nrow(DF)-1),icol] != DF[2:nrow(DF),icol]
> This was hard to do in SAS...and even after years it was sometimes buggy, keeping variable values from previous iterations of a SAS program.
>   One very nice advantage with SAS is that after data is saved in libraries, there is a GUI showing all the libraries and the datasets inside the libraries with sizes and dates. While we can save Rdata objects in an external file, the base package doesn't seem to have the same capabilities as SAS.
>
> Graphics:
>   SAS graphics were quite mediocre, and generating customized labels was cumbersome. Porting code from one Windows platform to another produced unpredictable and sometimes unworkable results.
>   It has been easier in R: I anticipate that I will be able to port R Windows code to *NIX and generate the same graphics.
>
> Batch commands:
>   I am working on porting some of my R code to our *NIX server to generate reports and graphs on a scheduled basis. Although a few at IBM did this with SAS, I would have found doing this fairly daunting.
>
>
> -Leif

Leif,

Those are excellent points.  I'm especially glad you mentioned data
manipulation.  I find that R is far ahead of SAS in this respect
although most people are shocked to hear me say that.  We are doing all
our data manipulation (merging, recoding, etc.) in R for pharmaceutical
research.  The ability to deal with lists of data frames also helps us a
great deal when someone sends us a clinical trial database made of 50
SAS datasets.

Frank

>
> -----------------------------
>  Leif Kirschenbaum, Ph.D.
>  Senior Yield Engineer
>  Reflectivity
>  [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

Frank Harrell
In reply to this post by Leif Kirschenbaum-4
Leif Kirschenbaum wrote:

> A few thoughts about R vs SAS:
> I started learning SAS 8 years ago at IBM, I believe it was version 6.10.
> I started with R 7 months ago.
>
> Learning curve:
>   I think I can do everything in R after 7 months that I could do in SAS after about 4 years.
>
> Bugs:
>   I suffered through several SAS version changes, 7.0, 7.1, 7.2, 8.0, 9.0 (I may have misquoted some version numbers). Every version change gave me headaches, as every version release (of an expensive commercially produced software set) had bugs which upset or crashed previously working code. I had code which ran fine under Windows 2000 and terribly under Windows XP. Most bugs I found were noted by SAS, but never fixed.
>   With R I have encounted very few bugs, except for an occasional crash of R, which I usually ascribe to some bug in Windows XP.
>
> Help:
>   SAS help was OK. As others have mentioned, there is too much. I even had the set of printed manuals on my desk (stretching 4 feet or so), which were quote impenetrable. I had almost no support from colleagues: even within IBM the number of advanced SAS users was small.
>   With R this mailing list has been of great help: almost every issue I copy some program and save it as a "R hint xxxx" file.
> --> A REQUEST
> I would say that I would appreciate a few more program examples with the help pages for some functions. For instance, "?Control" tells me about "if(cond) cons.expr  else  alt.expr", however an example of
>    if(i==1) { print("one")
>    } else if(i==2) { print("two")
>    } else if(i>2) { print("bigger than two") }
>  at the end of that help section would have been very helpful for me a few months ago.
>
> Functions:
>   Writing my own functions in SAS was by use of macros, and usually depended heavily on macro substitution. Learning SAS's macro language, especially macro substitution, was very difficult and it took me years to be able to write complicated functions. Quite different situation in R. Some functions I have written by dint of copying code from other people's packages, which has been very helpful.
>   I wanted to generate arbitrary k-values (the k-multiplier of sigma for a given alpha, beta, and N to establish confidence limits around a mean for small populations). I had a table from a years old microfiche book giving values but wanted to generate my own. I had to find the correct integrals to approximate the k-values and then write two SAS macros which iterated to the desired level of tolerance to generate values. I would guess that there is either an R base function or a package which will do this for me (when I need to start generating AQL tables). Given the utility of these numbers, I was disappointed with SAS.
>
> Data manipulation:
>   All SAS data is in 2-dimensional datasets, which was very frustrating after having used variables, arrays, and matrices in BASIC, APL, FORTRAN, C, Pascal, and LabVIEW. SAS allows you to access only 1 row of a dataset at a time which was terribly horribly incomprehensibly frustrating. There were so many many problems I had to solve where I had to work around this SAS paradigm.
>   In R, I can access all the elements of a matrix/dataframe at once, and I can use >2 dimensional matrices. In fact, the limitations of SAS I had ingrained from 7.5 years has sometimes made me forget how I can do something so easily in R, like be able to know when a value in a column of a dataframe changes:
>   DF$marker <- DF[1:(nrow(DF)-1),icol] != DF[2:nrow(DF),icol]
> This was hard to do in SAS...and even after years it was sometimes buggy, keeping variable values from previous iterations of a SAS program.
>   One very nice advantage with SAS is that after data is saved in libraries, there is a GUI showing all the libraries and the datasets inside the libraries with sizes and dates. While we can save Rdata objects in an external file, the base package doesn't seem to have the same capabilities as SAS.
>
> Graphics:
>   SAS graphics were quite mediocre, and generating customized labels was cumbersome. Porting code from one Windows platform to another produced unpredictable and sometimes unworkable results.
>   It has been easier in R: I anticipate that I will be able to port R Windows code to *NIX and generate the same graphics.
>
> Batch commands:
>   I am working on porting some of my R code to our *NIX server to generate reports and graphs on a scheduled basis. Although a few at IBM did this with SAS, I would have found doing this fairly daunting.
>
>
> -Leif

Leif,

Those are excellent points.  I'm especially glad you mentioned data
manipulation.  I find that R is far ahead of SAS in this respect
although most people are shocked to hear me say that.  We are doing all
our data manipulation (merging, recoding, etc.) in R for pharmaceutical
research.  The ability to deal with lists of data frames also helps us a
great deal when someone sends us a clinical trial database made of 50
SAS datasets.

Frank

>
> -----------------------------
>  Leif Kirschenbaum, Ph.D.
>  Senior Yield Engineer
>  Reflectivity
>  [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R:

JohnDee
In reply to this post by Achim Zeileis
On Thursday 05 January 2006 12:13, Achim Zeileis wrote:

> . . . snip
> Whether you find this simple or not depends on what you might want to
> have. Personally, I always find it very limiting if I've only got a switch
> to choose one or another vcov matrix when there is a multitude of vcov
> matrices in use in the literature. What if you would want to do HC3
> instead of the HC(0) that is offered by Eviews...or HC4...or HAC...or
> something bootstrapped...or...
> In my view, this is the stengths of many implementation in R: you can make
> programs very modular so that the user can easily extend the software or
> re-use it for other purposes. The price you pay for that is that it is not
> as easy to as a point-and-click software that offers some standard tools.
> Of course, both sides have advantages or disadvantages.
> . . .snip

Stata's ADO scripting language has the ability to access intermediate steps
and local variables used by various commands.  These are typically held in
memory until they are purged.  The difference between Stata and R is more
that Stata has been streamlined into an application, the nuts and bolts
hidden away, the rivet heads counter sunk and polished, so that unless you
really need to use them, they aren't visible.  It only LOOKS like you are
constrained to the readily available results of specific commands.  Stata
output will tend to look very much like the standard output one becomes
accustomed to in undergraduate stat courses.  

R assumes you _will_ want access to the nuts and bolts, and don't much care
about visible rivets if the system is both accurate and functional.  R is
much more a programming environment in that sense.  It is an important
difference.  There is going to be a continuing growth in users of R as
companies see cost savings in OS.  They will often be people who happily
dragged .xls files into SPSS or SPSS for analysis and then printed the
resulting reports.  (Personally, I became a strong believer in statistical
analysis packages after receiving a _negative_ variance in Excel once upon a
time.  I don't see how that could even be possible, but apparently it was a
known issue.  Some ad hoc experimentation then demonstrated that no
spreadsheet was all that precise).

One place where R and Stata have a great deal in common is in the manner in
which graphs and charts are formatted.  Stata is perhaps slightly less
bizantine, but only slightly.  Both systems emphasize flexibility and quality
graphics at the price of learning to know what you are doing.  That said, you
can still do a lot more with R in some areas than Stata, especially in
spatial graphics and analysis.

JD

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R -> Link to a technical report from ATS, UCLA

Naji Nassar
In reply to this post by Frank Harrell
Hi all,

UCLA ATS Statistical Consulting Group has just launched a very interesting
paper comparing SPSS, SAS & Stata as Statistical Packages.. "Perhaps the
most notable exception to this discussion is R"
http://www.ats.ucla.edu/stat/technicalreports/
It's an interesting reading for this thread.

Best regards
Naji

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R -> Link to a technical report from ATS, UCLA

Peter Dalgaard
Naji <[hidden email]> writes:

> Hi all,
>
> UCLA ATS Statistical Consulting Group has just launched a very interesting
> paper comparing SPSS, SAS & Stata as Statistical Packages.. "Perhaps the
> most notable exception to this discussion is R"
> http://www.ats.ucla.edu/stat/technicalreports/
> It's an interesting reading for this thread.

In fact, if you trace the thread back to its root, this is what
started it...

--
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html