Quantcast

On R performance

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

On R performance

Justin Talbot-2
I've been working on an R performance academic project for the last
couple years which has involved writing an interpreter for R from
scratch and a JIT for R vector operations.

With the recent comments on Julia, I thought I'd share some thoughts
from my experience since they differ substantially from the common
speculation on R performance.

I went into the project thinking that R would be slow for the commonly
cited reasons: NAs, call-by-value, immutable values, ability to
dynamically add/remove variables from environments, etc. But this is
largely *not* true. It does require being somewhat clever, but most of
the cost of these features can be either eliminated or moved to
uncommon cases that won't affect most code. And there's plenty of room
for innovation here. The history of Javascript runtimes over the last
decade has shown that dramatic performance improvements are possible
even for difficult languages.

This is good news. I think we can keep essentially everything that
people like about R and still achieve great performance.

So why is R performance poor now? I think the fundamental reason is
related to software engineering: R is nearly impossible to experiment
with, so no one tries out new performance techniques on it. There are
two main issues here:

1) The R Language Definition doesn't get enough love. I could point
out plenty of specific problems, omissions, etc., but I think the
high-level problem is that the Language Definition currently conflates
three things: 1) the actual language definition, 2) the definition of
what is more properly the standard library, and 3) the implementation.
This conflation hides how simple the R/S language actually is and, by
assuming that the current implementation is the only implementation,
obscures performance improvements that could be made by changing the
implementation.

2) The R core implementation (e.g. everything in src/main) is too big.
There are ~900 functions listed in names.c. This has got to be simply
unmanageable. If one were to change the SEXP representation, how many
internal functions would have to be checked and updated? This is a
severe hinderance on improving performance.

I see little value is debating changes to the language semantics until
we've addressed this low hanging fruit and at least tried to make the
current R/S semantics run fast.

Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On R performance

Dominick Samperi-3
On Thu, Mar 8, 2012 at 2:06 PM, Justin Talbot <[hidden email]> wrote:

> I've been working on an R performance academic project for the last
> couple years which has involved writing an interpreter for R from
> scratch and a JIT for R vector operations.
>
> With the recent comments on Julia, I thought I'd share some thoughts
> from my experience since they differ substantially from the common
> speculation on R performance.
>
> I went into the project thinking that R would be slow for the commonly
> cited reasons: NAs, call-by-value, immutable values, ability to
> dynamically add/remove variables from environments, etc. But this is
> largely *not* true. It does require being somewhat clever, but most of
> the cost of these features can be either eliminated or moved to
> uncommon cases that won't affect most code. And there's plenty of room
> for innovation here. The history of Javascript runtimes over the last
> decade has shown that dramatic performance improvements are possible
> even for difficult languages.
>
> This is good news. I think we can keep essentially everything that
> people like about R and still achieve great performance.
>
> So why is R performance poor now? I think the fundamental reason is
> related to software engineering: R is nearly impossible to experiment
> with, so no one tries out new performance techniques on it. There are
> two main issues here:
>
> 1) The R Language Definition doesn't get enough love. I could point
> out plenty of specific problems, omissions, etc., but I think the
> high-level problem is that the Language Definition currently conflates
> three things: 1) the actual language definition, 2) the definition of
> what is more properly the standard library, and 3) the implementation.
> This conflation hides how simple the R/S language actually is and, by
> assuming that the current implementation is the only implementation,
> obscures performance improvements that could be made by changing the
> implementation.
>
> 2) The R core implementation (e.g. everything in src/main) is too big.
> There are ~900 functions listed in names.c. This has got to be simply
> unmanageable. If one were to change the SEXP representation, how many
> internal functions would have to be checked and updated? This is a
> severe hinderance on improving performance.
>
> I see little value is debating changes to the language semantics until
> we've addressed this low hanging fruit and at least tried to make the
> current R/S semantics run fast.

Isn't R much like Lisp under the covers? Afterall, it evolved from Scheme.
Hasn't there been a great deal of work done on optimizing Lisp over the
last 30 years? This suggests that instead of dropping the R/S semantics
and moving to another language like Julia, the proposals of Ross Ihaka
and Duncan Temple Lang could be followed to provide the familiar
R/S syntax on top of an optimized Lisp engine.

One could view the R language as "syntactic sugar" for Lisp and focus
on optimizing the Lisp engine, in the same way that functional languages
are viewed as syntactic sugar for the lambda calculus.

Another possibility is to implement R/S on top of an optimized virtual
machine like the JVM, LLVM, etc.

Of course, no matter what strategy is followed a foreign function
interface will be very important to leverage the existing base of
C/C++/Fortran numerical and graphics libs.

Dominick

> Justin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On R performance

Dirk Eddelbuettel
In reply to this post by Justin Talbot-2

Justin,

On 8 March 2012 at 11:06, Justin Talbot wrote:
| I've been working on an R performance academic project for the last
| couple years which has involved writing an interpreter for R from
| scratch and a JIT for R vector operations.

Cool.  I think John mention that once or twice and I promptly forgot.

Can you share some numbers?
 
| So why is R performance poor now? I think the fundamental reason is
| related to software engineering: R is nearly impossible to experiment
| with, so no one tries out new performance techniques on it. There are

Did you compare notes with the CXXR project by Andrew Runnalls and his
student(s)?  See http://www.cs.kent.ac.uk/projects/cxxr/

| I see little value is debating changes to the language semantics until
| we've addressed this low hanging fruit and at least tried to make the
| current R/S semantics run fast.

Fully agree.  

I'd add that helping expand R via the FFI also works, though it is of course
not as easy on the end user as making the core faster.

Dirk

--
"Outside of a dog, a book is a man's best friend. Inside of a dog, it is too
dark to read." -- Groucho Marx

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On R performance

Justin Talbot-2
In reply to this post by Dominick Samperi-3
>
> Isn't R much like Lisp under the covers? Afterall, it evolved from Scheme.
> Hasn't there been a great deal of work done on optimizing Lisp over the
> last 30 years? This suggests that instead of dropping the R/S semantics
> and moving to another language like Julia, the proposals of Ross Ihaka
> and Duncan Temple Lang could be followed to provide the familiar
> R/S syntax on top of an optimized Lisp engine.
>

I think R started off as a Lisp-like language, but since adopting S
semantics, it has diverged quite a ways. I think it's better to think
of R as a combination of two languages: a dynamically-typed high-level
language, much like Javascript or Lua, and an array language, like
APL. I think those are the right places to be looking to see how to
make R fast. Fortunately, all three of those languages have had a lot
of performance work done already that R could just steal from
wholesale.

> Another possibility is to implement R/S on top of an optimized virtual
> machine like the JVM, LLVM, etc.
>

I like this in theory. But in practice, I'm not sure how well it would
work for R. JVM implementations of dynamic languages, like JRuby and
Jython run marginally faster (30-40%) than their C interpreters. You
do get the Java ecosystem, which is nice, but the performance
improvements probably aren't enough to make it worthwhile. And, of
course, R already has a pretty good Java connection story.

LLVM is a better option; I know there's another group out there
looking at R on LLVM. But I'll just note that the really high
performance dynamic languages (e.g. Google's V8 implementation of
Javascript and Mike Pall's LuaJIT) are hand-rolled JITs. LLVM-based
implementations of dynamic languages, like Unladen Swallow, have not
been particularly successful. It remains to be seen how well R would
map to LLVM.

Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On R performance

Justin Talbot-2
In reply to this post by Dirk Eddelbuettel
>
> On 8 March 2012 at 11:06, Justin Talbot wrote:
> | I've been working on an R performance academic project for the last
> | couple years which has involved writing an interpreter for R from
> | scratch and a JIT for R vector operations.
>
> Cool.  I think John mention that once or twice and I promptly forgot.
>
> Can you share some numbers?
>

Sure, I'll give a quick summary. We're writing a paper on it right now
which will have more details.

We currently execute scalar R code (non-vectorized) through an
interpreter we wrote from scratch. We haven't put a whole lot of time
into it; it supports most of the important R semantics, but does not
yet implement most of the functions in src/names.c, which limits the
scope of real world code we can run. On a set of microbenchmarks
(careful what you conclude from microbenchmarks!) it runs about 4-5x
faster than Luke's bytecode interpreter.

The interpreter is still about 3-4x slower than LuaJIT's interpreter,
probably the fastest dynamic language interpreter out there, so there
is room for further improvement, but not a lot. (Lua is a much cleaner
language from the performance standpoint and LuaJIT's interpreter is
written in assembly. We don't anticipate doing that anytime soon.)

We execute vectorized code through a JIT that generates SSE code and
can parallelize across multiple cores. Performance here depends
greatly on the vector size and number of vectors since our performance
gain primarily comes from eliminating memory accesses. For long
vectors (1M+ elements) we've seen gains from about 5x-50x on a single
core for plausible workloads. We don't have good numbers on the
parallelization yet, but we have seen linear scalability out to 32
cores for a couple of our workloads. Scalability is clearly very task
dependent and we don't expect to get large numbers across the board.

One implication of having a JIT is that we now implement a lot of
functionality at the R level rather than in C functions. For example,
we implement matrix-vector multiplication as:

r <- 0
for(i in 1L:ncol(m)) {
   r <- r + m[,i]*v[[i]]
}

This beats R's built-in matrix-vector multiplication by a factor of 2
for "large" matrices (at least one dimension larger than 1000 or so)
and will parallelize without any more work from the programmer. With
more work to squeeze out our JIT overheads this could be effective
even for much smaller matrices.


> | So why is R performance poor now? I think the fundamental reason is
> | related to software engineering: R is nearly impossible to experiment
> | with, so no one tries out new performance techniques on it. There are
>
> Did you compare notes with the CXXR project by Andrew Runnalls and his
> student(s)?  See http://www.cs.kent.ac.uk/projects/cxxr/
>

I haven't talked to them, but I should! Looking at their slides it
looks like their approach will be effective at making the R core more
extensible, but it's somewhat antagonistic to pure interpreter
performance. I didn't see any performance numbers, but I would guess
that CXXR runs somewhat slower than the current interpreter.

The key for being able to experiment with performance is for the core
code to be small and well defined, not necessarily extensible.

> | I see little value is debating changes to the language semantics until
> | we've addressed this low hanging fruit and at least tried to make the
> | current R/S semantics run fast.
>
> Fully agree.
>
> I'd add that helping expand R via the FFI also works, though it is of course
> not as easy on the end user as making the core faster.
>

FFI is extremely important and Rcpp is a great step forward. I'll just
note that FFI and performance interact. An FFI like .Call/.External
exposes too much of R's internal implementation details to users,
making it difficult to improve performance in the core while
maintaining backwards compatibility. It would be much better if R's
high-performance FFI were something like Rcpp itself, hiding almost
all implementation details from the user.

Just one example on FFIs. .Call/.External lets users get raw pointers
to vector data (e.g. NUMERIC_POINTER). This is fine and dandy as long
as all implementations store vectors contiguously in memory. But, some
implementations may not want this. For example, Clojure gets
high-performance updates to its immutable arrays by storing them in a
tree data structure instead of flat in memory. This would be a nice
technique to port to R, but it breaks .Call packages. A better FFI
choice would have used something like NUMERIC_ELEMENT(x,i) to hide the
details of how element i is looked up in vector x. This would have
been just as fast for current packages while leaving a forward path
for more performance improvements.

Justin

Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On R performance

Dominick Samperi-3
In reply to this post by Justin Talbot-2
On Fri, Mar 9, 2012 at 11:39 AM, Justin Talbot <[hidden email]> wrote:

>>
>> Isn't R much like Lisp under the covers? Afterall, it evolved from Scheme.
>> Hasn't there been a great deal of work done on optimizing Lisp over the
>> last 30 years? This suggests that instead of dropping the R/S semantics
>> and moving to another language like Julia, the proposals of Ross Ihaka
>> and Duncan Temple Lang could be followed to provide the familiar
>> R/S syntax on top of an optimized Lisp engine.
>>
>
> I think R started off as a Lisp-like language, but since adopting S
> semantics, it has diverged quite a ways. I think it's better to think
> of R as a combination of two languages: a dynamically-typed high-level
> language, much like Javascript or Lua, and an array language, like
> APL. I think those are the right places to be looking to see how to
> make R fast. Fortunately, all three of those languages have had a lot
> of performance work done already that R could just steal from
> wholesale.

Thanks for the clarification Justin. What about the S4 classes
and methods? The design resembles CLOS, and currently this
is interpreted R code. Have you addressed performance issues
associated with this? What relative impact does this have compared
with other optimizations like vectorization?

Thanks,
Dominick

>> Another possibility is to implement R/S on top of an optimized virtual
>> machine like the JVM, LLVM, etc.
>>
>
> I like this in theory. But in practice, I'm not sure how well it would
> work for R. JVM implementations of dynamic languages, like JRuby and
> Jython run marginally faster (30-40%) than their C interpreters. You
> do get the Java ecosystem, which is nice, but the performance
> improvements probably aren't enough to make it worthwhile. And, of
> course, R already has a pretty good Java connection story.
>
> LLVM is a better option; I know there's another group out there
> looking at R on LLVM. But I'll just note that the really high
> performance dynamic languages (e.g. Google's V8 implementation of
> Javascript and Mike Pall's LuaJIT) are hand-rolled JITs. LLVM-based
> implementations of dynamic languages, like Unladen Swallow, have not
> been particularly successful. It remains to be seen how well R would
> map to LLVM.
>
> Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On R performance

Antonio Piccolboni
In reply to this post by Justin Talbot-2
On Fri, Mar 9, 2012 at 8:39 AM, Justin Talbot <[hidden email]>wrote:

>
> > Another possibility is to implement R/S on top of an optimized virtual
> > machine like the JVM, LLVM, etc.
> >
>
>
>
Somebody is pursuing that already: https://code.google.com/p/renjin/


Antonio

The RHadoop project, https://github.com/RevolutionAnalytics/RHadoop

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: On R performance

Justin Talbot-2
In reply to this post by Dominick Samperi-3
>
> Thanks for the clarification Justin. What about the S4 classes
> and methods? The design resembles CLOS, and currently this
> is interpreted R code. Have you addressed performance issues
> associated with this? What relative impact does this have compared
> with other optimizations like vectorization?
>

Sorry for the delay in my response. My posts keep getting stuck in moderation.

I'll be honest that I haven't looked at S4 performance yet. That's the
big part of R's semantics that I haven't implemented yet. I chose to
delay this part largely because the language still feels very unstable
around the object systems.

I know R takes a lot of flak for having multiple incompatible object
systems. It's frustrating that that one object system hasn't come
dominate--they all have their pluses and minuses. However, one
response is to point out that Javascript is in the same situation, but
has still been very successful. There are lots of different OO
libraries for Javascript, each one taking a slightly different tack
(see this review from my office mate
https://github.com/njoubert/inheritance.js/blob/master/INHERITANCE.md
(caution some strong language)). I think part of the reason this has
worked out for Javascript is that none of the object systems are
considered part of the core language, leaving both the language
implementers and the OO library designers flexibility to experiment
without blocking each other.

R has taken the opposite approach...incorporating multiple object
systems into the core language, with the associated maintenance load
on R-Core...and I'm not sure that it's been as profitable. Perhaps
there are good reasons for this though. I'll admit that I haven't
thought through this area much.

Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...