R programming style

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

R programming style

David Scott-6

I am aware of one (unofficial) guide to style for R programming:
http://www1.maths.lth.se/help/R/RCC/
from Henrik Bengtsson.

Can anyone provide further pointers to good style?

Views on Bengtsson's ideas would interest me as well.

David Scott



_________________________________________________________________
David Scott Department of Statistics, Tamaki Campus
  The University of Auckland, PB 92019
  Auckland 1142,    NEW ZEALAND
Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
Email: [hidden email]

Graduate Officer, Department of Statistics
Director of Consulting, Department of Statistics

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R programming style

Bernard Leemon
I just got a copy of
A First Course in Statistical Programming with R by W. John Braun and Duncan
J. Murdoch.  Cambridge.  at amazon:
 http://www.amazon.com/First-Course-Statistical-Programming-R/dp/0521694248/

first couple of chapters are base R that most everyone would know before
wanting to program but then the other chapters on programming itself seem
pretty good so far.

gary mcclelland
colorado

On Mon, Feb 11, 2008 at 3:47 AM, David Scott <[hidden email]> wrote:

>
> I am aware of one (unofficial) guide to style for R programming:
> http://www1.maths.lth.se/help/R/RCC/
> from Henrik Bengtsson.
>
> Can anyone provide further pointers to good style?
>
> Views on Bengtsson's ideas would interest me as well.
>
> David Scott
>
>
>
> _________________________________________________________________
> David Scott     Department of Statistics, Tamaki Campus
>                The University of Auckland, PB 92019
>                Auckland 1142,    NEW ZEALAND
> Phone: +64 9 373 7599 ext 86830         Fax: +64 9 373 7000
> Email:  [hidden email]
>
> Graduate Officer, Department of Statistics
> Director of Consulting, Department of Statistics
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R programming style

Roland Rau-3
In reply to this post by David Scott-6
Hi,

I think using Emacs+ESS [1,2] is always a good starting point for a
clear layout with consistent and meaningful indentation.

I don't know how other people think about it, but in my opinion,
"Elements of Programming Style" by Kernighan and Plauger is still an
interesting read -- although their programs are either Fortran or PL/1
and the book itself is 30 years or old. Of course, I am not always
successful but at least I try to incorporate their 'mantras':
- write clearly, don't be too clever [3]
- say what you mean, simply and directly
- use library functions
- write clearly -- don't sacrifice clarity for "efficiency"
- let the machine do the dirty work
- parenthesize to avoid ambiguity
- 10.0 times 0.1 is hardly ever 1.0
- ...

I hope this helps?

Best,
Roland


[1] http://www.gnu.org/software/emacs/
[2] http://ess.r-project.org/
[3] I guess this is what Kernighan meant in his famous(?) quote:
"Everyone knows that debugging is twice as hard as writing a program in
the first place. So if you're as clever as you can be when you write it,
how will you ever debug it?"
(http://en.wikiquote.org/wiki/Brian_W._Kernighan )






David Scott wrote:

> I am aware of one (unofficial) guide to style for R programming:
> http://www1.maths.lth.se/help/R/RCC/
> from Henrik Bengtsson.
>
> Can anyone provide further pointers to good style?
>
> Views on Bengtsson's ideas would interest me as well.
>
> David Scott
>
>
>
> _________________________________________________________________
> David Scott Department of Statistics, Tamaki Campus
>   The University of Auckland, PB 92019
>   Auckland 1142,    NEW ZEALAND
> Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
> Email: [hidden email]
>
> Graduate Officer, Department of Statistics
> Director of Consulting, Department of Statistics
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R programming style

barry rowlingson
Roland Rau wrote:

> Hi,
>
> I think using Emacs+ESS [1,2] is always a good starting point for a
> clear layout with consistent and meaningful indentation.
>
> I don't know how other people think about it, but in my opinion,
> "Elements of Programming Style" by Kernighan and Plauger is still an
> interesting read -- although their programs are either Fortran or PL/1
> and the book itself is 30 years or old. Of course, I am not always
> successful but at least I try to incorporate their 'mantras':
> - write clearly, don't be too clever [3]
> - say what you mean, simply and directly
> - use library functions
> - write clearly -- don't sacrifice clarity for "efficiency"
> - let the machine do the dirty work
> - parenthesize to avoid ambiguity
> - 10.0 times 0.1 is hardly ever 1.0
> - ...

  Reminiscent of The Zen Of Python, which you get by typing 'import
this' at the python prompt:

 >>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

  [Note that Guido Van Rossum, inventor of Python, is Dutch]

Barry

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R programming style

Earl F. Glynn
In reply to this post by David Scott-6
"David Scott" <[hidden email]> wrote in message
news:[hidden email]...
>
> Can anyone provide further pointers to good style?

While not written for R specifically, the book "Code Complete:  A Practical
Handbook of Software Construction" (2nd Edition) discusses a number of good
concepts for writing good code in any language:
http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670

In particular, Part IV "Statements" gives a number of useful suggestions by
type of statement, e.g., straight-line code, conditionals, loops, ...

There are some practices used in R that I think should be improved.  For
example, many years ago I was taught in a software engineering class that
the use of  "magic numbers" was a bad practice, yet we find magic numbers
used in R in many places.

Instead of using "1" or "2" in an "apply", I'll write something like this
trying for some sort of mnemonic

apply(x, BY.ROW<-1, sum)
or
apply(z, BY.COL<-2, mean)


I find BY.ROW or BY.COL to be more mnemonic than the magic numbers 1 and 2.

The "sides" 1, 2, 3, and 4 in an axis statement should have some sort of
mnemonic definition, too, perhaps:

axis(BOTTOM<-1, ...)

But I believe I was ostracized in this E-mail list the last time I suggested
such mnemonics instead of magic numbers.

efg
Earl F. Glynn
Bioinformatics
Stowers Institute for Medical Research

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R programming style

Scillieri, John
I second that, Code Complete is a great book! For anyone interested in
improving their code no matter what language, (it has a C++/Java-type
focus but is definitely applicable to R), it would definitely be a good
place to start.

I've read some negative reviews claiming that everything he writes is
'obvious' (use good variable names, short concise functions, limit
nested conditionals, etc) but on more than one occasion I've gone back
over the book and thought of new places to improve my code.

HTH,
John

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
On Behalf Of Earl F. Glynn
Sent: Monday, February 11, 2008 2:30 PM
To: [hidden email]
Subject: Re: [R] R programming style

"David Scott" <[hidden email]> wrote in message
news:[hidden email]...
>
> Can anyone provide further pointers to good style?

While not written for R specifically, the book "Code Complete:  A
Practical Handbook of Software Construction" (2nd Edition) discusses a
number of good concepts for writing good code in any language:
http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0
735619670

In particular, Part IV "Statements" gives a number of useful suggestions
by type of statement, e.g., straight-line code, conditionals, loops, ...

There are some practices used in R that I think should be improved.  For
example, many years ago I was taught in a software engineering class
that the use of  "magic numbers" was a bad practice, yet we find magic
numbers used in R in many places.

Instead of using "1" or "2" in an "apply", I'll write something like
this trying for some sort of mnemonic

apply(x, BY.ROW<-1, sum)
or
apply(z, BY.COL<-2, mean)


I find BY.ROW or BY.COL to be more mnemonic than the magic numbers 1 and
2.

The "sides" 1, 2, 3, and 4 in an axis statement should have some sort of
mnemonic definition, too, perhaps:

axis(BOTTOM<-1, ...)

But I believe I was ostracized in this E-mail list the last time I
suggested such mnemonics instead of magic numbers.

efg
Earl F. Glynn
Bioinformatics
Stowers Institute for Medical Research

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
>>> This e-mail and any attachments are confidential, may contain legal, professional or other privileged information, and are intended solely for the addressee.  If you are not the intended recipient, do not use the information in this e-mail in any way, delete this e-mail and notify the sender. CEG-IP1

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R programming style

Roland Rau-3
In reply to this post by Earl F. Glynn
Hi,

Earl F. Glynn wrote:
> Instead of using "1" or "2" in an "apply", I'll write something like this
> trying for some sort of mnemonic
>
> apply(x, BY.ROW<-1, sum)
> or
> apply(z, BY.COL<-2, mean)
>
It think it makes sense to use those "magic numbers" in the given case.
Please let me give you several arguments:

- In such a setting, I'd probably also use more mnemonic functions:
rowMeans
rowSums
colMeans
colSums

- The numbering of the MARGINs (the name of the second argument) is what
I remember from maths: 1 is for rows, 2 index is for columns, ... So I
don't think the numbering is counter-intuitive. For sure, you have to
check the help page at least once. But this is also the case for using
mnemonic arguments.

- The first argument in apply() is an array which is not restricted to
two dimensions. For example, if you are working with three dimension,
how would you specify it? BY.LAYER? Maybe, but then four dimensions or
five dimensions?[1]

Please don't consider this as a personal criticism. I am sure that
users' criticism improves R. But using mnemonics instead of the margins
in the apply() case is not a convincing example, I think. Maybe you have
another example?

Best,
Roland

[1] If you are curious whether there practical applications of four- or
fivedimensional arrays, I can write to you off-list how useful they were
in real world projects.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R programming style

Therneau, Terry M., Ph.D.
In reply to this post by David Scott-6
David Scott asked
 "Views on Bengtsson's ideas would interest me as well."
 
 I have only one serious disagreement with their suggestions
 
   "6.3.2 In general, the use of comments should be minimized by making the code
self-documenting by appropriate name choices and an explicit logical structure".
   
   The phrase "self-documenting code" is the description of a popular illusion.  
Variable names that are obvious today will not be so when you look at the same
code 3 years from now, whether you make them long, short, or in between.  I find
that each time I fix a reported bug in the survival code, I end up adding both
the fix and 3-4 new blocks of comments. These mostly represent features that
were "obvious" when I wrote the code; but I have just spent 20-40 minutes
reconstructing my understanding of the feature.  ("I see what the code is doing,
but why on earth did I want to do that?")  
   Every comment, no matter how obvious, will be appreciated by future readers
of your code.  And that includes yourself.
   
 Minor disagreements:
   1. They recommend an indent if 2 spaces, I much prefer 4.  Perhaps its older
eyes, but I have trouble seeing the structure of larger blocks when the offset
is so small.

   2. I do not like mixed case for function names.  As a user, I now have to
remember not just the name of the function, but the pattern of capitalization
(often idiosyncratic) that the author chose to use.  Within a function I have no
complaint; anything that improves readability is a good thing.
   
   Agreements: 98% of what they say.
   
    Terry Therneau

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R programming style

hadley wickham
On Feb 12, 2008 9:07 AM, Terry Therneau <[hidden email]> wrote:

> David Scott asked
>  "Views on Bengtsson's ideas would interest me as well."
>
>  I have only one serious disagreement with their suggestions
>
>    "6.3.2 In general, the use of comments should be minimized by making the code
> self-documenting by appropriate name choices and an explicit logical structure".
>
>    The phrase "self-documenting code" is the description of a popular illusion.
> Variable names that are obvious today will not be so when you look at the same
> code 3 years from now, whether you make them long, short, or in between.  I find
> that each time I fix a reported bug in the survival code, I end up adding both
> the fix and 3-4 new blocks of comments. These mostly represent features that
> were "obvious" when I wrote the code; but I have just spent 20-40 minutes
> reconstructing my understanding of the feature.  ("I see what the code is doing,
> but why on earth did I want to do that?")
>    Every comment, no matter how obvious, will be appreciated by future readers
> of your code.  And that includes yourself.

See http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.html for
a fairly well reasoned discussion of some of these issues.

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.