Re: A comment about R

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R

John Maindonald
Quoting from Thomas's message -

>  "On the question of which system really is easier to learn I can  
> only comment that this isn't the only question   where education,  
> as a field, would benefit from some good randomized controlled  
> trials."

A Randomized Controlled Trial?:
Doing such trials would be a 30-year project.   The entry criterion  
might be at least a pass score on a test that was designed to  
identify students with the potential to be reasonable statistical  
practitioners.  (To make this work, coaching at a summer camp might  
be a necessary preliminary.)  Students would be introduced to  
whatever system at various times in their educational development --  
ages 11, 14, 18 or 24.  For each age/system combination, there'd be a  
variety of dose levels(!).  Half would be introduced via the GUI and  
half via the command line.  Outcome measures would be (1) liking for  
the system; (2) quality of analysis, on several analysis tasks of a  
type that are likely to arise in several different analysis areas.  
Assessments would be made in early career and in mid-career.  
Analyses would of course be done using both SAS Proc Mixed and lmer(0  
in lme4.  There'd be bound to be enough missing data to make the  
design unbalanced, hence allowing plenty of room for argument about  
the informativeness of the missingness, and about the adequacy of the  
degrees of freedom approximation, or whether an approach that uses a  
df approximation was even worth considering.

What happens with those who decide, of their own accord or from  
necessity, to learn a system additional to the one to which they were  
assigned? (This may itself be an outcome.)  Should there be control  
for exposure to another language?

The more one thinks about it, the worse the design problem gets.  The  
situation is a bit different from the teaching of reading, where high  
quality randomized trials can and should be done, notwithstanding the  
complications of controlling for teacher effects.  As always, it is  
however insightful to think about the randomized trial that would be  
required.

I can envisage a simple randomized trial, still extending over some  
years, where the outcome measure is the quality of statistical  
analysis, on problems that meet the criteria given above.


The height of the bar:
For proper comparison of ease of doing analyses, a staged set of  
analysis problems is required, from cases where most would agree that  
a t-test or chi-square test ot CI or ... "answers" the question of  
interest, through to a variety of realistic regression problems.  
Agreement on some minimal set of steps needed to do an adequate  
analysis would be a necessary part of the process.  This insists that  
the goalposts are always at the same height. Such an exercise could  
be highly insightful, and a useful contribution to the public  
scientific good.


Research questions:
To a smaller or larger extent, R is a component of a research  
exercise in the development of statistical computational abilities.  
Perhaps to the majority of users on this list, it is primarily an  
effective tool for the handling of statistical and other scientific  
computing tasks.  Some see these two goals as somewhat distinct (at  
the boundaries, they obviously are); others see a large  overlap.

In any case, this latter role has enormous importance, actual and  
potential, for the scientific community, and indeed for any area  
(especially business) where there is a continual and insistent demand  
to make sense of data.  A variety of research questions that warrant  
attention:

(1) Who should learn R?

[In my view R is such a versatile tool for scientific computing that  
anyone contemplating a career in science, and who expects to to their  
own computations that have a substantial data analysis component,  
should learn R.  The only serious competitors, in my view and  
depending on the area of application, are Genstat, Stata, and Matlab  
-- Genstat for the analysis of designed experiments and for the  
quality of its GUI, Stata for the reasons given by others, and Matlab  
for signal processsing.  SAS may be important for its efficiency in  
certain types of batch processing with large data sets, and because  
of the extent of existing large SAS repositories,  SPSS may be  
important because of the extent of existing large SPSS data  
repositories. Some comment is also needed on S-PLUS?  I am of course  
ignoring the skill investment that many researchers have made in  
these other packages.  While this has somehow to be factored in, it  
surely has limited relevance to assessing priorities for those who  
are currently starting out.]

(2) R has clearly reduced the time lag between the development of new  
theory, and availability of the associated methodology to statistical  
practitioners.  It has also, incidentally, raised the bar for  
commercial statistical software systems.  What are the implications  
for statistical research, and for professional practice and training?

(3) Should learners use a GUI, or the command line, in getting started?

[A major issue for GUIs is documentation of steps in an analysis.  
This will become increasingly important as more journals demand, as I  
hope will happen, publication of Sweave or other reproducible  
versions of analyses.  Some ultimate familiarity with the command  
line may in the medium term be essential.]

(4) When should students start learning R?

[Students should get their first exposure to a high-level programming  
language, in the style of R then Python or Octave, at age 11-14.  
There are now good alternatives to the former use of Fortran or  
Pascal, languages which have for good reason dropped out of favour  
for learning experience. They should start on R while their minds are  
still malleable, and long before they need it for serious research use.]

(5) What are the traps, in using R, for relative novices?

[Mechanisms are needed for identifying traps that routinely catch  
novices (even novices who may be quite sophisticated statistically),  
with a program to tackle these, in the medium to long term.]

(6) Default output requires (continuing) careful scrutiny from a  
"what will encourage good statistical practice" perspective.

(7) What, more widely, should go on the wish list?

John Maindonald             email: [hidden email]
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.



On 4 Jan 2006, at 10:00 PM, [hidden email] wrote:

> From: Thomas Lumley <[hidden email]>
> Date: 4 January 2006 6:23:18 AM
> To: Peter Dalgaard <[hidden email]>
> Cc: [hidden email], Patrick Burns <[hidden email]>
> Subject: Re: [R] A comment about R:
>
>
> On Tue, 3 Jan 2006, Peter Dalgaard wrote:
>> One thing that is often overlooked, and hasn't yet been mentioned in
>> the thread, is how much *simpler* R can be for certain completely
>> basic tasks of practical or pedagogical relevance: Calculate a simple
>> derived statistic, confidence intervals from estimate and SE,
>> percentage points of the binomial distribution - using dbinom or from
>> the formula, take the sum of each of 10 random samples from a set of
>> numbers, etc. This is where other packages get stuck in the
>> procedure+dataset mindset.
>
> Some of these things are actually fairly straightforward in Stata.  
> For example, Stata will give confidence intervals and tests for  
> linear combinations of coefficients and even (using symbolic  
> differentiation and the delta method) for nonlinear combinations.  
> The first is available in packages in R, the second is in "S  
> Programming" but doesn't seem to be packaged.
>
> <snip>
>
> Now, I still prefer R both for data analysis and (even more so) for  
> programming. There are some things that it is genuinely difficult  
> to program in Stata -- and as evidence that this isn't just my  
> ignorance of the best approaches, the language was substantially  
> reworked in both versions 8 and 9 to allow the vendor to implement  
> better graphics and
> linear mixed models.
>
> On the question of which system really is easier to learn I can  
> only comment that this isn't the only question where education, as  
> a field, would benefit from some good randomized controlled trials.
>
> -thomas
>
> Thomas Lumley Assoc. Professor, Biostatistics
> [hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R

Patrick Burns
John Maindonald wrote:

> ...
>
>(4) When should students start learning R?
>
>[Students should get their first exposure to a high-level programming  
>language, in the style of R then Python or Octave, at age 11-14.  
>There are now good alternatives to the former use of Fortran or  
>Pascal, languages which have for good reason dropped out of favour  
>for learning experience. They should start on R while their minds are  
>still malleable, and long before they need it for serious research use.]
>  
>

I think 11-14 years old might better be halved.  Kids are
playing very complicated video games barely after they
learn to walk.

R is a quite reasonable programming language for children.
You don't need to worry about low-level issues, and it is
easy to produce graphics with it.

Patrick Burns
[hidden email]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R

Liaw, Andy
In reply to this post by John Maindonald
From: Patrick Burns

>
> John Maindonald wrote:
>
> > ...
> >
> >(4) When should students start learning R?
> >
> >[Students should get their first exposure to a high-level
> programming  
> >language, in the style of R then Python or Octave, at age 11-14.  
> >There are now good alternatives to the former use of Fortran or  
> >Pascal, languages which have for good reason dropped out of favour  
> >for learning experience. They should start on R while their
> minds are  
> >still malleable, and long before they need it for serious
> research use.]
> >  
> >
>
> I think 11-14 years old might better be halved.  Kids are
> playing very complicated video games barely after they
> learn to walk.

My kids (7- and 5-year old) barely get an hour on video games a week, and I
can see that they lag behind their peers at the games (though I don't feel
sorry for that).  I hope I won't be acused of `endangering welfare of
children'...
 
> R is a quite reasonable programming language for children.
> You don't need to worry about low-level issues, and it is
> easy to produce graphics with it.

Any suggestion on how to go about getting kids that young on (R)
programming?

Cheers,
Andy

 

> Patrick Burns
> [hidden email]
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R

Gabor Grothendieck
On 1/5/06, Liaw, Andy <[hidden email]> wrote:

> From: Patrick Burns
> >
> > John Maindonald wrote:
> >
> > > ...
> > >
> > >(4) When should students start learning R?
> > >
> > >[Students should get their first exposure to a high-level
> > programming
> > >language, in the style of R then Python or Octave, at age 11-14.
> > >There are now good alternatives to the former use of Fortran or
> > >Pascal, languages which have for good reason dropped out of favour
> > >for learning experience. They should start on R while their
> > minds are
> > >still malleable, and long before they need it for serious
> > research use.]
> > >
> > >
> >
> > I think 11-14 years old might better be halved.  Kids are
> > playing very complicated video games barely after they
> > learn to walk.
>
> My kids (7- and 5-year old) barely get an hour on video games a week, and I
> can see that they lag behind their peers at the games (though I don't feel
> sorry for that).  I hope I won't be acused of `endangering welfare of
> children'...
>
> > R is a quite reasonable programming language for children.
> > You don't need to worry about low-level issues, and it is
> > easy to produce graphics with it.
>
> Any suggestion on how to go about getting kids that young on (R)
> programming?

I have introduced a number of computer software tools to my nephew
who is a teenager.  I think the key item is motivation and attention
span -- which is short.  They will want to get results fast and want
results to be of interest to them.

I have taught him elements of HTML, javascript and R.  In retrospect,
the most successful was HTML and to a lesser extent javascript.
When I asked him which of the three he wanted to learn more of
after not having done it for a while it was javascript.

The advantage of starting with HTML is that its relatively simple and within
one or two sessions he/she will be able to be putting together
web pages for themelves so its obviously useful and they can
be creative almost immediately. Also that leads naturally to javascript
and one can download lots of fancy mouse tails and other
motivating javascript snippets.

Previously we did it in person but now we are in different cities and
do it via instant messaging.  We started with javascript (which of the
three was the one he favored to get back into) again but
found that it was difficult to communicate javascript over instant
messaging so we tried R instead.

Because R is interactive one can easily discuss a line at a time and
include it right in the instant messaging dialogue so in that mode
I found R was possible to communicate whereas javascript difficult.  There
are some nice graphics demos in R which are motivating although
I think the mouse javascript tails are still more appealing to someone
that age.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A comment about R

Gregory Snow
In reply to this post by John Maindonald

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Liaw, Andy
> Sent: Thursday, January 05, 2006 6:26 AM
> To: 'Patrick Burns'; John Maindonald
> Cc: [hidden email]
> Subject: Re: [R] A comment about R

[snip]

> Any suggestion on how to go about getting kids that young on
> (R) programming?

For those of us in the US look at:
http://www.amstat.org/education/index.cfm?fuseaction=adoptas

I expect that some of the other stats organizations have similar
Adopt-A-School programs.

Last year I was in my daughter's 3rd grade class helping with a party
when I noticed a large posterboard that had the heights in inches of all
the students, since I had run out of apple juice to pour and was getting
a little bored, a went over to the chalk board next to it and made a
quick stem-and-leaf plot of the data.  The teacher was interested in
what I had done and came over and had me explain the stem and leaf plot
to her (she had used the data to talk about averages (mean and median,
but not by that name) and spread (general concept, not computing
anything)).

My other daughter (6) also brought home a homework to show me where they
had been given candy hearts (it was in February) and they had colored in
boxes corresponding to the colors of their hearts to make a basic bar
graph.  I showed her how I could do the same thing on my laptop using R
(I even colored the bars to match her graph and used the symbol font
with text to put colored hearts under the bars like hers had), she was
impressed enough to make me print the graph so she could show her
teacher.

There are 2 opourtunities that I should have followed up on more, now I
just need to get things in gear and do a more formal adopting of their
school.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[hidden email]
(801) 408-8111

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html