Package compiler - efficiency problem

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Package compiler - efficiency problem

Karol Podemski
 Dear R team,

I am a co-author and maintainer of one of R packages distributed by R-forge
(gEcon). One of gEcon package users found a strange behaviour of package (R
froze for couple of minutes) and reported it to me. I traced the strange
behaviour to compiler package. I attach short demonstration of the problem
to this mail (demonstration makes use of compiler and tictoc packages only).

In short, the compiler package has problems in compiling large functions -
their compilation and execution may take much longer than direct execution
of an uncompiled function. Such functions are generated by gEcon package as
they describe steady state for economy.

I am curious if you are aware of such problems and plan to handle the
efficiency issues. On one of the boards I saw that there were efficiency
issues in rpart package but they have been resolved. Or would you advise to
turn off JIT on package load (package heavily uses such long functions
generated whenever a new model is created)?

Best regards,
Karol Podemski

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

possible_bug_report.tar.gz (118K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Package compiler - efficiency problem

Tomas Kalibera
Dear Karol,

thank you for the report. I can reproduce that the function from you
example takes very long to compile and I can see where most time is
spent. The compiler is itself written in R and requires a lot of
resources for large functions (foo() has over 16,000 lines of code,
nearly 1 million of instructions/operands, 45,000 constants). In
particular a lot of time is spent in garbage collection and in finding a
unique set of constants. Some optimizations of the compiler may be
possible, but it is unlikely that functions this large will compile fast
any soon. For non-generated code, we now have the byte-compilation on
installation by default which at least removes the compile overhead from
runtime. Even though the compiler is slow, it is important to keep in
mind that in principle, with any compiler there will be functions where
compilation would not be improve performance (when the compile time is
included or not).

I think it is not a good idea to generate code for functions like foo()
in R (or any interpreted language). You say that R's byte-code compiler
produces code that runs 5-10x faster than when the function is
interpreted by the AST interpreter (uncompiled), which sounds like a
good result, but I believe that avoiding code generation would be much
faster than that, apart from drastically reducing code size and
therefore compile time. The generator of these functions has much more
information than the compiler - it could be turned into an interpreter
of these functions and compute their values on the fly.

A significant source of inefficiency of the generated code are
element-wise operations, such as

r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
...

r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])

(these could be vectorized, which would reduce code size and improve
interpretation speed; and make it somewhat readable). Most of the code
lines in the generated functions seem to be easily vectorizable.

Compilers and interpreters necessarily use some heuristics or optimize
at some code patterns. Optimizing for generated code may be tricky as it
could even harm performance of usual code. And, I would much rather
optimize the compiler for the usual code.

Indeed, a pragmatic solution requiring the least amount of work would be
to disable compilation of these generated functions. There is not a
documented way to do that and maybe we could add it (and technically it
is trivial), but I have been reluctant so far - in some cases,
compilation even of these functions may be beneficial - if the speedup
is 5-10x and we run very many times. But once the generated code
included some pragma preventing compilation, it won't be ever compiled.
Also, the trade-offs may change as the compiler evolves, perhaps not in
this case, but in other where such pragma may be used.

Well so the short answer would be that these functions should not be
generated in the first place. If it were too much work rewriting,
perhaps the generator could just be improved to produce vectorized
operations.

Best
Tomas

On 12.8.2018 21:31, Karol Podemski wrote:

>   Dear R team,
>
> I am a co-author and maintainer of one of R packages distributed by R-forge
> (gEcon). One of gEcon package users found a strange behaviour of package (R
> froze for couple of minutes) and reported it to me. I traced the strange
> behaviour to compiler package. I attach short demonstration of the problem
> to this mail (demonstration makes use of compiler and tictoc packages only).
>
> In short, the compiler package has problems in compiling large functions -
> their compilation and execution may take much longer than direct execution
> of an uncompiled function. Such functions are generated by gEcon package as
> they describe steady state for economy.
>
> I am curious if you are aware of such problems and plan to handle the
> efficiency issues. On one of the boards I saw that there were efficiency
> issues in rpart package but they have been resolved. Or would you advise to
> turn off JIT on package load (package heavily uses such long functions
> generated whenever a new model is created)?
>
> Best regards,
> Karol Podemski
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Package compiler - efficiency problem

Karol Podemski
Dear Thomas,

thank you for prompt response and taking interest in this issue. I really
appreciate your compiler project and efficiency gains in usual case. I am
aware of limitations of interpreted languages too and because of that even
when writing my first mail I had a hunch that it is not that easy to
address this problem.  As you mentioned optimisation of compiler for
handling non-standard code may be tricky and harmful for usual code. The
question is if gEcon is the only package that may face the same issue
because of compilation.

The functions generated by gEcon are systems of non-linear equations
defining the equilibrium of an economy (see
http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf  if you want
to learn a bit how we obtain it). The rows, you suggested to vectorise, are
indeed vectorisable because they define equilibrium for similiar markets
(e.g. production and sale of beverages and food) but do not have to be
vectorisable in general case. So that not to delve into too much details I
will stop here in description of how the equations originate. However, I
would like to point that similiar large systems of linear equations may
arise in other fields ( https://en.wikipedia.org/wiki/Steady_state ) and
there may be other packages that generate similar large systems (e.g.
network problems like hydraulic networks). In that case, reports such as
mine may help you to assess the scale of the problems.

Thank you for suggestions for improvement in our approach, i am going to
discuss them with other package developers.

Regards,
Karol Podemski

pon., 13 sie 2018 o 18:02 Tomas Kalibera <[hidden email]>
napisał(a):

> Dear Karol,
>
> thank you for the report. I can reproduce that the function from you
> example takes very long to compile and I can see where most time is spent.
> The compiler is itself written in R and requires a lot of resources for
> large functions (foo() has over 16,000 lines of code, nearly 1 million of
> instructions/operands, 45,000 constants). In particular a lot of time is
> spent in garbage collection and in finding a unique set of constants. Some
> optimizations of the compiler may be possible, but it is unlikely that
> functions this large will compile fast any soon. For non-generated code, we
> now have the byte-compilation on installation by default which at least
> removes the compile overhead from runtime. Even though the compiler is
> slow, it is important to keep in mind that in principle, with any compiler
> there will be functions where compilation would not be improve performance
> (when the compile time is included or not).
>
> I think it is not a good idea to generate code for functions like foo() in
> R (or any interpreted language). You say that R's byte-code compiler
> produces code that runs 5-10x faster than when the function is interpreted
> by the AST interpreter (uncompiled), which sounds like a good result, but I
> believe that avoiding code generation would be much faster than that, apart
> from drastically reducing code size and therefore compile time. The
> generator of these functions has much more information than the compiler -
> it could be turned into an interpreter of these functions and compute their
> values on the fly.
>
> A significant source of inefficiency of the generated code are
> element-wise operations, such as
>
> r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
> ...
>
> r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
>
> (these could be vectorized, which would reduce code size and improve
> interpretation speed; and make it somewhat readable). Most of the code
> lines in the generated functions seem to be easily vectorizable.
>
> Compilers and interpreters necessarily use some heuristics or optimize at
> some code patterns. Optimizing for generated code may be tricky as it could
> even harm performance of usual code. And, I would much rather optimize the
> compiler for the usual code.
>
> Indeed, a pragmatic solution requiring the least amount of work would be
> to disable compilation of these generated functions. There is not a
> documented way to do that and maybe we could add it (and technically it is
> trivial), but I have been reluctant so far - in some cases, compilation
> even of these functions may be beneficial - if the speedup is 5-10x and we
> run very many times. But once the generated code included some pragma
> preventing compilation, it won't be ever compiled. Also, the trade-offs may
> change as the compiler evolves, perhaps not in this case, but in other
> where such pragma may be used.
>
> Well so the short answer would be that these functions should not be
> generated in the first place. If it were too much work rewriting, perhaps
> the generator could just be improved to produce vectorized operations.
>
> Best
> Tomas
> On 12.8.2018 21:31, Karol Podemski wrote:
>
>  Dear R team,
>
> I am a co-author and maintainer of one of R packages distributed by R-forge
> (gEcon). One of gEcon package users found a strange behaviour of package (R
> froze for couple of minutes) and reported it to me. I traced the strange
> behaviour to compiler package. I attach short demonstration of the problem
> to this mail (demonstration makes use of compiler and tictoc packages only).
>
> In short, the compiler package has problems in compiling large functions -
> their compilation and execution may take much longer than direct execution
> of an uncompiled function. Such functions are generated by gEcon package as
> they describe steady state for economy.
>
> I am curious if you are aware of such problems and plan to handle the
> efficiency issues. On one of the boards I saw that there were efficiency
> issues in rpart package but they have been resolved. Or would you advise to
> turn off JIT on package load (package heavily uses such long functions
> generated whenever a new model is created)?
>
> Best regards,
> Karol Podemski
>
>
>
> [hidden email] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Package compiler - efficiency problem

Iñaki Ucar
Karol,

If I understood correctly, functions like "foo" are automatically
generated by gEcon's model parser. For such a long function, and
depending on how many times you need to call it, it may make more
sense to generate C++ code instead (including the 'for' loop). Then
you can use Rcpp::sourceCpp, or Rcpp::cppFunction, to compile it and
run it from R.

Iñaki

El vie., 17 ago. 2018 a las 0:47, Karol Podemski
(<[hidden email]>) escribió:

>
> Dear Thomas,
>
> thank you for prompt response and taking interest in this issue. I really
> appreciate your compiler project and efficiency gains in usual case. I am
> aware of limitations of interpreted languages too and because of that even
> when writing my first mail I had a hunch that it is not that easy to
> address this problem.  As you mentioned optimisation of compiler for
> handling non-standard code may be tricky and harmful for usual code. The
> question is if gEcon is the only package that may face the same issue
> because of compilation.
>
> The functions generated by gEcon are systems of non-linear equations
> defining the equilibrium of an economy (see
> http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf  if you want
> to learn a bit how we obtain it). The rows, you suggested to vectorise, are
> indeed vectorisable because they define equilibrium for similiar markets
> (e.g. production and sale of beverages and food) but do not have to be
> vectorisable in general case. So that not to delve into too much details I
> will stop here in description of how the equations originate. However, I
> would like to point that similiar large systems of linear equations may
> arise in other fields ( https://en.wikipedia.org/wiki/Steady_state ) and
> there may be other packages that generate similar large systems (e.g.
> network problems like hydraulic networks). In that case, reports such as
> mine may help you to assess the scale of the problems.
>
> Thank you for suggestions for improvement in our approach, i am going to
> discuss them with other package developers.
>
> Regards,
> Karol Podemski
>
> pon., 13 sie 2018 o 18:02 Tomas Kalibera <[hidden email]>
> napisał(a):
>
> > Dear Karol,
> >
> > thank you for the report. I can reproduce that the function from you
> > example takes very long to compile and I can see where most time is spent.
> > The compiler is itself written in R and requires a lot of resources for
> > large functions (foo() has over 16,000 lines of code, nearly 1 million of
> > instructions/operands, 45,000 constants). In particular a lot of time is
> > spent in garbage collection and in finding a unique set of constants. Some
> > optimizations of the compiler may be possible, but it is unlikely that
> > functions this large will compile fast any soon. For non-generated code, we
> > now have the byte-compilation on installation by default which at least
> > removes the compile overhead from runtime. Even though the compiler is
> > slow, it is important to keep in mind that in principle, with any compiler
> > there will be functions where compilation would not be improve performance
> > (when the compile time is included or not).
> >
> > I think it is not a good idea to generate code for functions like foo() in
> > R (or any interpreted language). You say that R's byte-code compiler
> > produces code that runs 5-10x faster than when the function is interpreted
> > by the AST interpreter (uncompiled), which sounds like a good result, but I
> > believe that avoiding code generation would be much faster than that, apart
> > from drastically reducing code size and therefore compile time. The
> > generator of these functions has much more information than the compiler -
> > it could be turned into an interpreter of these functions and compute their
> > values on the fly.
> >
> > A significant source of inefficiency of the generated code are
> > element-wise operations, such as
> >
> > r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
> > ...
> >
> > r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
> >
> > (these could be vectorized, which would reduce code size and improve
> > interpretation speed; and make it somewhat readable). Most of the code
> > lines in the generated functions seem to be easily vectorizable.
> >
> > Compilers and interpreters necessarily use some heuristics or optimize at
> > some code patterns. Optimizing for generated code may be tricky as it could
> > even harm performance of usual code. And, I would much rather optimize the
> > compiler for the usual code.
> >
> > Indeed, a pragmatic solution requiring the least amount of work would be
> > to disable compilation of these generated functions. There is not a
> > documented way to do that and maybe we could add it (and technically it is
> > trivial), but I have been reluctant so far - in some cases, compilation
> > even of these functions may be beneficial - if the speedup is 5-10x and we
> > run very many times. But once the generated code included some pragma
> > preventing compilation, it won't be ever compiled. Also, the trade-offs may
> > change as the compiler evolves, perhaps not in this case, but in other
> > where such pragma may be used.
> >
> > Well so the short answer would be that these functions should not be
> > generated in the first place. If it were too much work rewriting, perhaps
> > the generator could just be improved to produce vectorized operations.
> >
> > Best
> > Tomas
> > On 12.8.2018 21:31, Karol Podemski wrote:
> >
> >  Dear R team,
> >
> > I am a co-author and maintainer of one of R packages distributed by R-forge
> > (gEcon). One of gEcon package users found a strange behaviour of package (R
> > froze for couple of minutes) and reported it to me. I traced the strange
> > behaviour to compiler package. I attach short demonstration of the problem
> > to this mail (demonstration makes use of compiler and tictoc packages only).
> >
> > In short, the compiler package has problems in compiling large functions -
> > their compilation and execution may take much longer than direct execution
> > of an uncompiled function. Such functions are generated by gEcon package as
> > they describe steady state for economy.
> >
> > I am curious if you are aware of such problems and plan to handle the
> > efficiency issues. On one of the boards I saw that there were efficiency
> > issues in rpart package but they have been resolved. Or would you advise to
> > turn off JIT on package load (package heavily uses such long functions
> > generated whenever a new model is created)?
> >
> > Best regards,
> > Karol Podemski
> >
> >
> >
> > [hidden email] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
Iñaki Ucar

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Package compiler - efficiency problem

Tomas Kalibera
In reply to this post by Karol Podemski
Dear Karol,

I don't understand the models behind these function, but I can tell that
the code generated is very inefficient. The AST interpreter will be very
inefficient performing each scalar computation with all boxing,
allocations, function calls. The byte-code compiler removes some of the
boxing and allocation. While it could certainly compile faster, it will
always be taking long compiling such functions with so many commands: so
many expressions to track, so many source references to map, for so
little computation. The same code could be much more efficient if it
used vectorized operations and loops. The compiler cannot infer the
loops and vector operations from the code - it is not that smart and I
doubt it could easily be for R, but such optimizations could certainly
be done easily at a higher level, when optimizing computation within the
model, not within R with all its complicated semantics. I think you
could hope for ~100x speedups compared to current generated code running
with R AST interpreter.

So I think it might be worth thinking about writing an interpreter for
the model (the generator would compute function values on the fly,
without actually generating code). If that was too slow, it might pay
off to generate some intermediate representation for the model that
would be faster to interpret. If that was too hard, then perhaps
generating the code from the model in a smarter way (use vector
operations, loops). It is ok to do that opportunistically - only when
possible. With compilers, this is normal, optimizations often take
advantage of certain patterns in the code if they are present. If you
had more specific questions how to optimize the code feel free to ask
(also offline).

Certainly I don't want the existence of the byte-code compiler to
require you to switch from R to C/C++, that would be exactly the
opposite of what the compiler is aiming for. If it turns out you really
need a way to disable compilation of these generated functions (so they
run much slower, but you don't have to wait for them to compile), we
will provide it and using a hack/workaround it is already possible in
existing versions of R, with all the drawbacks I mentioned previously.

Best
Tomas


On 08/17/2018 12:43 AM, Karol Podemski wrote:

> Dear Thomas,
>
> thank you for prompt response and taking interest in this issue. I
> really appreciate your compiler project and efficiency gains in usual
> case. I am aware of limitations of interpreted languages too and
> because of that even when writing my first mail I had a hunch that it
> is not that easy to address this problem.  As you mentioned
> optimisation of compiler for handling non-standard code may be tricky
> and harmful for usual code. The question is if gEcon is the only
> package that may face the same issue because of compilation.
>
> The functions generated by gEcon are systems of non-linear equations
> defining the equilibrium of an economy (see
> http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf if you
> want to learn a bit how we obtain it). The rows, you suggested to
> vectorise, are indeed vectorisable because they define equilibrium for
> similiar markets (e.g. production and sale of beverages and food) but
> do not have to be vectorisable in general case. So that not to delve
> into too much details I will stop here in description of how the
> equations originate. However, I would like to point that similiar
> large systems of linear equations may arise in other fields (
> https://en.wikipedia.org/wiki/Steady_state ) and there may be other
> packages that generate similar large systems (e.g. network problems
> like hydraulic networks). In that case, reports such as mine may help
> you to assess the scale of the problems.
>
> Thank you for suggestions for improvement in our approach, i am going
> to discuss them with other package developers.
>
> Regards,
> Karol Podemski
>
> pon., 13 sie 2018 o 18:02 Tomas Kalibera <[hidden email]
> <mailto:[hidden email]>> napisał(a):
>
>     Dear Karol,
>
>     thank you for the report. I can reproduce that the function from
>     you example takes very long to compile and I can see where most
>     time is spent. The compiler is itself written in R and requires a
>     lot of resources for large functions (foo() has over 16,000 lines
>     of code, nearly 1 million of instructions/operands, 45,000
>     constants). In particular a lot of time is spent in garbage
>     collection and in finding a unique set of constants. Some
>     optimizations of the compiler may be possible, but it is unlikely
>     that functions this large will compile fast any soon. For
>     non-generated code, we now have the byte-compilation on
>     installation by default which at least removes the compile
>     overhead from runtime. Even though the compiler is slow, it is
>     important to keep in mind that in principle, with any compiler
>     there will be functions where compilation would not be improve
>     performance (when the compile time is included or not).
>
>     I think it is not a good idea to generate code for functions like
>     foo() in R (or any interpreted language). You say that R's
>     byte-code compiler produces code that runs 5-10x faster than when
>     the function is interpreted by the AST interpreter (uncompiled),
>     which sounds like a good result, but I believe that avoiding code
>     generation would be much faster than that, apart from drastically
>     reducing code size and therefore compile time. The generator of
>     these functions has much more information than the compiler - it
>     could be turned into an interpreter of these functions and compute
>     their values on the fly.
>
>     A significant source of inefficiency of the generated code are
>     element-wise operations, such as
>
>     r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
>     ...
>
>     r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
>
>     (these could be vectorized, which would reduce code size and
>     improve interpretation speed; and make it somewhat readable). Most
>     of the code lines in the generated functions seem to be easily
>     vectorizable.
>
>     Compilers and interpreters necessarily use some heuristics or
>     optimize at some code patterns. Optimizing for generated code may
>     be tricky as it could even harm performance of usual code. And, I
>     would much rather optimize the compiler for the usual code.
>
>     Indeed, a pragmatic solution requiring the least amount of work
>     would be to disable compilation of these generated functions.
>     There is not a documented way to do that and maybe we could add it
>     (and technically it is trivial), but I have been reluctant so far
>     - in some cases, compilation even of these functions may be
>     beneficial - if the speedup is 5-10x and we run very many times.
>     But once the generated code included some pragma preventing
>     compilation, it won't be ever compiled. Also, the trade-offs may
>     change as the compiler evolves, perhaps not in this case, but in
>     other where such pragma may be used.
>
>     Well so the short answer would be that these functions should not
>     be generated in the first place. If it were too much work
>     rewriting, perhaps the generator could just be improved to produce
>     vectorized operations.
>
>     Best
>     Tomas
>
>     On 12.8.2018 21:31, Karol Podemski wrote:
>>       Dear R team,
>>
>>     I am a co-author and maintainer of one of R packages distributed by R-forge
>>     (gEcon). One of gEcon package users found a strange behaviour of package (R
>>     froze for couple of minutes) and reported it to me. I traced the strange
>>     behaviour to compiler package. I attach short demonstration of the problem
>>     to this mail (demonstration makes use of compiler and tictoc packages only).
>>
>>     In short, the compiler package has problems in compiling large functions -
>>     their compilation and execution may take much longer than direct execution
>>     of an uncompiled function. Such functions are generated by gEcon package as
>>     they describe steady state for economy.
>>
>>     I am curious if you are aware of such problems and plan to handle the
>>     efficiency issues. On one of the boards I saw that there were efficiency
>>     issues in rpart package but they have been resolved. Or would you advise to
>>     turn off JIT on package load (package heavily uses such long functions
>>     generated whenever a new model is created)?
>>
>>     Best regards,
>>     Karol Podemski
>>
>>
>>     ______________________________________________
>>     [hidden email] <mailto:[hidden email]>  mailing list
>>     https://stat.ethz.ch/mailman/listinfo/r-devel
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Package compiler - efficiency problem

Karol Podemski
Dear Tomas, Inaki and the rest of R-devel team,

thank you for your explainations and suggestions. I talked with gEcon
development team and we decided to change our implementation along the
lines you suggested.

Best regards,
Karol Podemski


pt., 17 sie 2018 o 13:38 Tomas Kalibera <[hidden email]>
napisał(a):

> Dear Karol,
>
> I don't understand the models behind these function, but I can tell that
> the code generated is very inefficient. The AST interpreter will be very
> inefficient performing each scalar computation with all boxing,
> allocations, function calls. The byte-code compiler removes some of the
> boxing and allocation. While it could certainly compile faster, it will
> always be taking long compiling such functions with so many commands: so
> many expressions to track, so many source references to map, for so little
> computation. The same code could be much more efficient if it used
> vectorized operations and loops. The compiler cannot infer the loops and
> vector operations from the code - it is not that smart and I doubt it could
> easily be for R, but such optimizations could certainly be done easily at a
> higher level, when optimizing computation within the model, not within R
> with all its complicated semantics. I think you could hope for ~100x
> speedups compared to current generated code running with R AST interpreter.
>
> So I think it might be worth thinking about writing an interpreter for the
> model (the generator would compute function values on the fly, without
> actually generating code). If that was too slow, it might pay off to
> generate some intermediate representation for the model that would be
> faster to interpret. If that was too hard, then perhaps generating the code
> from the model in a smarter way (use vector operations, loops). It is ok to
> do that opportunistically - only when possible. With compilers, this is
> normal, optimizations often take advantage of certain patterns in the code
> if they are present. If you had more specific questions how to optimize the
> code feel free to ask (also offline).
>
> Certainly I don't want the existence of the byte-code compiler to require
> you to switch from R to C/C++, that would be exactly the opposite of what
> the compiler is aiming for. If it turns out you really need a way to
> disable compilation of these generated functions (so they run much slower,
> but you don't have to wait for them to compile), we will provide it and
> using a hack/workaround it is already possible in existing versions of R,
> with all the drawbacks I mentioned previously.
>
> Best
> Tomas
>
>
> On 08/17/2018 12:43 AM, Karol Podemski wrote:
>
> Dear Thomas,
>
> thank you for prompt response and taking interest in this issue. I really
> appreciate your compiler project and efficiency gains in usual case. I am
> aware of limitations of interpreted languages too and because of that even
> when writing my first mail I had a hunch that it is not that easy to
> address this problem.  As you mentioned optimisation of compiler for
> handling non-standard code may be tricky and harmful for usual code. The
> question is if gEcon is the only package that may face the same issue
> because of compilation.
>
> The functions generated by gEcon are systems of non-linear equations
> defining the equilibrium of an economy (see
> http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf  if you
> want to learn a bit how we obtain it). The rows, you suggested to
> vectorise, are indeed vectorisable because they define equilibrium for
> similiar markets (e.g. production and sale of beverages and food) but do
> not have to be vectorisable in general case. So that not to delve into too
> much details I will stop here in description of how the equations
> originate. However, I would like to point that similiar large systems of
> linear equations may arise in other fields (
> https://en.wikipedia.org/wiki/Steady_state ) and there may be other
> packages that generate similar large systems (e.g. network problems like
> hydraulic networks). In that case, reports such as mine may help you to
> assess the scale of the problems.
>
> Thank you for suggestions for improvement in our approach, i am going to
> discuss them with other package developers.
>
> Regards,
> Karol Podemski
>
> pon., 13 sie 2018 o 18:02 Tomas Kalibera <[hidden email]>
> napisał(a):
>
>> Dear Karol,
>>
>> thank you for the report. I can reproduce that the function from you
>> example takes very long to compile and I can see where most time is spent.
>> The compiler is itself written in R and requires a lot of resources for
>> large functions (foo() has over 16,000 lines of code, nearly 1 million of
>> instructions/operands, 45,000 constants). In particular a lot of time is
>> spent in garbage collection and in finding a unique set of constants. Some
>> optimizations of the compiler may be possible, but it is unlikely that
>> functions this large will compile fast any soon. For non-generated code, we
>> now have the byte-compilation on installation by default which at least
>> removes the compile overhead from runtime. Even though the compiler is
>> slow, it is important to keep in mind that in principle, with any compiler
>> there will be functions where compilation would not be improve performance
>> (when the compile time is included or not).
>>
>> I think it is not a good idea to generate code for functions like foo()
>> in R (or any interpreted language). You say that R's byte-code compiler
>> produces code that runs 5-10x faster than when the function is interpreted
>> by the AST interpreter (uncompiled), which sounds like a good result, but I
>> believe that avoiding code generation would be much faster than that, apart
>> from drastically reducing code size and therefore compile time. The
>> generator of these functions has much more information than the compiler -
>> it could be turned into an interpreter of these functions and compute their
>> values on the fly.
>>
>> A significant source of inefficiency of the generated code are
>> element-wise operations, such as
>>
>> r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
>> ...
>>
>> r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
>>
>> (these could be vectorized, which would reduce code size and improve
>> interpretation speed; and make it somewhat readable). Most of the code
>> lines in the generated functions seem to be easily vectorizable.
>>
>> Compilers and interpreters necessarily use some heuristics or optimize at
>> some code patterns. Optimizing for generated code may be tricky as it could
>> even harm performance of usual code. And, I would much rather optimize the
>> compiler for the usual code.
>>
>> Indeed, a pragmatic solution requiring the least amount of work would be
>> to disable compilation of these generated functions. There is not a
>> documented way to do that and maybe we could add it (and technically it is
>> trivial), but I have been reluctant so far - in some cases, compilation
>> even of these functions may be beneficial - if the speedup is 5-10x and we
>> run very many times. But once the generated code included some pragma
>> preventing compilation, it won't be ever compiled. Also, the trade-offs may
>> change as the compiler evolves, perhaps not in this case, but in other
>> where such pragma may be used.
>>
>> Well so the short answer would be that these functions should not be
>> generated in the first place. If it were too much work rewriting, perhaps
>> the generator could just be improved to produce vectorized operations.
>>
>> Best
>> Tomas
>> On 12.8.2018 21:31, Karol Podemski wrote:
>>
>>  Dear R team,
>>
>> I am a co-author and maintainer of one of R packages distributed by R-forge
>> (gEcon). One of gEcon package users found a strange behaviour of package (R
>> froze for couple of minutes) and reported it to me. I traced the strange
>> behaviour to compiler package. I attach short demonstration of the problem
>> to this mail (demonstration makes use of compiler and tictoc packages only).
>>
>> In short, the compiler package has problems in compiling large functions -
>> their compilation and execution may take much longer than direct execution
>> of an uncompiled function. Such functions are generated by gEcon package as
>> they describe steady state for economy.
>>
>> I am curious if you are aware of such problems and plan to handle the
>> efficiency issues. On one of the boards I saw that there were efficiency
>> issues in rpart package but they have been resolved. Or would you advise to
>> turn off JIT on package load (package heavily uses such long functions
>> generated whenever a new model is created)?
>>
>> Best regards,
>> Karol Podemski
>>
>>
>>
>> [hidden email] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel