Dealing with large dataset in quantmod

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Dealing with large dataset in quantmod

Gabriele Vivinetto [Public address]
Hello to the mailing list.
I'm a newbie in R, and this is my first post.
I've evaluated quantmod using EOD data from yahoo, and everything went fine.
I have a mysql database containing tick by tick data (in a format
suitable for quantmod getSymbols.MySQL) and I have tried to use these data.
Using a table with a small subset of the data (1000 rows) there is no
problem.
But if I try to use a table with all the record (~6 millions rows), R is
very slow and memory hungry (simply speaking it crashes all the times
after loading the data...).
As a workaround I've modified the getSymbols.MySQL R function to accept
from= and to= parameters, so the sql SELECT gives to R a desired subset
of data, but using more than 100k records is a pain.
Someone has a workaround or suggestions for using large datasets with
quantmod ?

Thank you !
--
Gabriele Vivinetto
http://www.gabrielevivinetto.it

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Dealing with large dataset in quantmod

braverock
On Thu, 2012-01-12 at 19:46 +0100, Gabriele Vivinetto [Public address]
wrote:

> Hello to the mailing list.
> I'm a newbie in R, and this is my first post.
> I've evaluated quantmod using EOD data from yahoo, and everything went fine.
> I have a mysql database containing tick by tick data (in a format
> suitable for quantmod getSymbols.MySQL) and I have tried to use these data.
> Using a table with a small subset of the data (1000 rows) there is no
> problem.
> But if I try to use a table with all the record (~6 millions rows), R is
> very slow and memory hungry (simply speaking it crashes all the times
> after loading the data...).
> As a workaround I've modified the getSymbols.MySQL R function to accept
> from= and to= parameters, so the sql SELECT gives to R a desired subset
> of data, but using more than 100k records is a pain.
> Someone has a workaround or suggestions for using large datasets with
> quantmod ?
>
> Thank you !

I routinely use xts on tick data series with tens or hundreds of
millions of rows.  I also have a lot of RAM (16-48GB) per machine.

Some things that will affect how much ram the xts object uses are the
number of columns in your data, and whether you are using a numeric or
character xts object.  

We just ran a little test here and 17.5M rows on one column take about a
third of a GB of RAM.

In a 32 bit environment, R is limited to 3GB of RAM, so this may be part
of your problem.

Last, you don't say what functions you are calling which respond slowly.

Regards,

   - Brian

--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Dealing with large dataset in quantmod

Paul Gilbert-2
In reply to this post by Gabriele Vivinetto [Public address]


On 12-01-12 01:46 PM, Gabriele Vivinetto [Public address] wrote:

> Hello to the mailing list.
> I'm a newbie in R, and this is my first post.
> I've evaluated quantmod using EOD data from yahoo, and everything went fine.
> I have a mysql database containing tick by tick data (in a format
> suitable for quantmod getSymbols.MySQL) and I have tried to use these data.
> Using a table with a small subset of the data (1000 rows) there is no
> problem.
> But if I try to use a table with all the record (~6 millions rows), R is
> very slow and memory hungry (simply speaking it crashes all the times
> after loading the data...).

You need to first determine if this is the Mysql connection, or your
memory/OS/and R version. If you are not running a 64 bit version of R
you may be simply hitting the limit of 32 bit machines. If you are not
running on a machine with a lot of physical memory then you will be
using swap, which will be slow. (These limits might also bite on the
server side.) You should probably monitor with something like top to
have a better idea what is going wrong. If it really is the mysql
connection that is the problem then you may need to look at the size of
the chunks returned in the request.

Regards,
Paul

> As a workaround I've modified the getSymbols.MySQL R function to accept
> from= and to= parameters, so the sql SELECT gives to R a desired subset
> of data, but using more than 100k records is a pain.
> Someone has a workaround or suggestions for using large datasets with
> quantmod ?
>
> Thank you !

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.