Quantstrat - running applyStrategy in a loop

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Quantstrat - running applyStrategy in a loop

James Hirschorn-3
I plan to try it out myself, but I wanted to check here if running
applyStrategy in a loop, while looping over different dates, will work? I
could not find any examples of this.

There are 2 reasons for wanting to do this: First of all, one could have a
couple of years of tick data, which is too big to fit in memory for each
symbol. Of course, I am assuming that the orders placed by the strategy are
sparse enough so that the order_book generated by applyStrategy can still
fit in memory.

The second reason is that if this loop could moreover be run in parallel,
then there could potentially be a 500x speed up for two years of data.

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Quantstrat - running applyStrategy in a loop

Ilya Kipnis
So I'll let others correct me if I'm wrong, but the way I see it is so long
as you can encapsulate your script in a function, you can wrap it inside a
foreach loop, and return the results of each iteration, or just output
various files as part of the instructions of said function.

On Sun, Aug 19, 2018 at 5:16 PM, James Hirschorn <
[hidden email]> wrote:

> I plan to try it out myself, but I wanted to check here if running
> applyStrategy in a loop, while looping over different dates, will work? I
> could not find any examples of this.
>
> There are 2 reasons for wanting to do this: First of all, one could have a
> couple of years of tick data, which is too big to fit in memory for each
> symbol. Of course, I am assuming that the orders placed by the strategy are
> sparse enough so that the order_book generated by applyStrategy can still
> fit in memory.
>
> The second reason is that if this loop could moreover be run in parallel,
> then there could potentially be a 500x speed up for two years of data.
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Quantstrat - running applyStrategy in a loop

braverock
In reply to this post by James Hirschorn-3
On Sun, 2018-08-19 at 17:16 -0400, James Hirschorn wrote:

> I plan to try it out myself, but I wanted to check here if running
> applyStrategy in a loop, while looping over different dates, will
> work? I could not find any examples of this.
>
> There are 2 reasons for wanting to do this: First of all, one could
> have a couple of years of tick data, which is too big to fit in
> memory for each symbol. Of course, I am assuming that the orders
> placed by the strategy are sparse enough so that the order_book
> generated by applyStrategy can still fit in memory.
>
> The second reason is that if this loop could moreover be run in
> parallel, then there could potentially be a 500x speed up for two
> years of data.

James,

The answer is 'it depends'.

There is a parallel version of applyStrategy in the sandbox on github.
I haven't touched it in several years, so I wouldn't trust that code. I
mention it as an example of what is theoretically possible.  A better
example, which is already parallelized and much more highly utilized,
is apply.paramset().

First, to expand on Ilya's answer, let's talk about what *is* possible.

It is possible to wrap a foreach loop over applyStrategy that would
separate symbols to different workers (though your hypothesized 500x
speedup would require *at least* 500 worker nodes, spread out over
several physical machines, using something like doRedis, which we have
tested up to around 200 workers).  This assumes that each symbol is
completely independent, and that there is no interaction on things like
trade sizing or capital or risk among the symbols.  The simplest way to
do this would be to create separate portfolios per symbol, so that each
worker is completely independent.  See examples of a different kind of
splitting and parallelization in appply.paramset() (which is also used
in walk forward testing).

It is also possible, and we commonly do this, to segment the dates that
you want to run applyStrategy over.  As you hypothesized, a simple loop
over date regions, loading different non-conflicting time series, may
be applied to successively run each date range.  This, as you noted,
works well when even 64, 128, or 512GB+ of RAM is not enough for all of
your data.  We've made a number of changes over the years to make
quantstrat more memory efficient, but copies are still made when
unavoidable, state is kept between the various nested apply* functions,
and RAM use basically grows throughout the run of a strategy
evaluation.  So segmenting the use of market data by Dates can help,
though you may need to discard some intermediary results (like portions
of the order book) to make everything fit.  

In the first example of parallelizing by symbol, RAM is your most
likely issue still, since even very large machines rarely have more
than about 16GB per core/thread.

You still have some wrinkles here.  Again, you need to assess whether
there is any interaction.  Transactions cannot be added to a portfolio
out of order, as the P&L is (potentially) dependent on prior
transactions.  So you may again need to create multiple portfolios and
stitch the different period P&L together yourself.

So, in the 'don't do that' camp, don't try to apply transactions out of
order, the trade blotter won't allow it.

In the 'should work' camp are several variations of splitting your
computational problem so that it is amendable to looping and/or
parallelization, described above.

Regards,

Brian

--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Quantstrat - running applyStrategy in a loop

James Hirschorn-3
Thanks for the very detailed reply!

I couldn't find a parallel applyStrategy in the sandbox. Is it still there,
and if so what is the filename?

In any case, if I understood correctly it should instead be modelled on
apply.paramset().

You mentioned that you commonly segment the dates that you run
applyStrategy over. If you have an example, could you please point it out?

I will report back once I attempt a parallelization.

Yes, I have run into problems with out of order transactions in two
different situations. Once of them was when using the delay argument of
ruleSignal, as you had suggested in an SO answer. But that is off topic for
this thread...

Regards,
James

On Mon, Aug 20, 2018 at 7:40 AM, Brian G. Peterson <[hidden email]>
wrote:

> On Sun, 2018-08-19 at 17:16 -0400, James Hirschorn wrote:
> > I plan to try it out myself, but I wanted to check here if running
> > applyStrategy in a loop, while looping over different dates, will
> > work? I could not find any examples of this.
> >
> > There are 2 reasons for wanting to do this: First of all, one could
> > have a couple of years of tick data, which is too big to fit in
> > memory for each symbol. Of course, I am assuming that the orders
> > placed by the strategy are sparse enough so that the order_book
> > generated by applyStrategy can still fit in memory.
> >
> > The second reason is that if this loop could moreover be run in
> > parallel, then there could potentially be a 500x speed up for two
> > years of data.
>
> James,
>
> The answer is 'it depends'.
>
> There is a parallel version of applyStrategy in the sandbox on github.
> I haven't touched it in several years, so I wouldn't trust that code. I
> mention it as an example of what is theoretically possible.  A better
> example, which is already parallelized and much more highly utilized,
> is apply.paramset().
>
> First, to expand on Ilya's answer, let's talk about what *is* possible.
>
> It is possible to wrap a foreach loop over applyStrategy that would
> separate symbols to different workers (though your hypothesized 500x
> speedup would require *at least* 500 worker nodes, spread out over
> several physical machines, using something like doRedis, which we have
> tested up to around 200 workers).  This assumes that each symbol is
> completely independent, and that there is no interaction on things like
> trade sizing or capital or risk among the symbols.  The simplest way to
> do this would be to create separate portfolios per symbol, so that each
> worker is completely independent.  See examples of a different kind of
> splitting and parallelization in appply.paramset() (which is also used
> in walk forward testing).
>
> It is also possible, and we commonly do this, to segment the dates that
> you want to run applyStrategy over.  As you hypothesized, a simple loop
> over date regions, loading different non-conflicting time series, may
> be applied to successively run each date range.  This, as you noted,
> works well when even 64, 128, or 512GB+ of RAM is not enough for all of
> your data.  We've made a number of changes over the years to make
> quantstrat more memory efficient, but copies are still made when
> unavoidable, state is kept between the various nested apply* functions,
> and RAM use basically grows throughout the run of a strategy
> evaluation.  So segmenting the use of market data by Dates can help,
> though you may need to discard some intermediary results (like portions
> of the order book) to make everything fit.
>
> In the first example of parallelizing by symbol, RAM is your most
> likely issue still, since even very large machines rarely have more
> than about 16GB per core/thread.
>
> You still have some wrinkles here.  Again, you need to assess whether
> there is any interaction.  Transactions cannot be added to a portfolio
> out of order, as the P&L is (potentially) dependent on prior
> transactions.  So you may again need to create multiple portfolios and
> stitch the different period P&L together yourself.
>
> So, in the 'don't do that' camp, don't try to apply transactions out of
> order, the trade blotter won't allow it.
>
> In the 'should work' camp are several variations of splitting your
> computational problem so that it is amendable to looping and/or
> parallelization, described above.
>
> Regards,
>
> Brian
>
> --
> Brian G. Peterson
> http://braverock.com/brian/
> Ph: 773-459-4973
> IM: bgpbraverock
>

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.