|
Has anyone come across situations where the export of historical data from IB's API has missing data points that occur randomly?
I did the following to download data for the Select SPDR etfs: > tws <- twsConnect() > contract <- twsEquity('XHB','SMART','ISLAND') > reqHistory(tws, Contract=contract) -> XHB You can split the data by day doing the following: > split.xts(XHB, f="days") -> XHBsplit Then you can cycle through each list inside XHBsplit and determine the number of data points by doing a dim(). There should be 390 1 minute data points on full trading days (note: you can ignore the half day after Thanksgiving as I would delete this day from the record anyhow as I'm not interested in trading half days). By default the above retrieval is set to retrieve 1 years worth of minute data. I received all of the data for XLE but noticed that XHB had fewer data points. I pulled up the chart of XHB to examine whether those missing data points showed up on the chart but all was well on the TWS Chart. As per IB, the backfilling on the TWS Chart uses the same data export framework as that used by reqHistoricalData. So I decided to re-download XHB. This time the missing data points were different from the previous download. I used IB's TswDde Excel file to cross verify and I noticed that the data is present using the Excel API. It could be that the problem did not surface because TwsDde limits the export of 1 minute data to 2 days worth of data. I'm speculating here but I do know that downloading via reqHistory produces data with missing data points that appear to occur randomly. The other thing I noticed was that the data pulled by reqHistory begins at 9:30:00 and ends at 15:59:00 while the same using TswDde begins at 9:31:00 and ends at 16:00:00. 2010-06-18 15:58:00 15.85 15.85 15.84 15.84 1569 15.843 0 337 2010-06-18 15:59:00 15.84 15.85 15.81 15.81 3518 15.828 0 527 2010-06-21 09:30:00 16.04 16.10 16.03 16.09 240 16.047 0 47 2010-06-21 09:31:00 16.09 16.09 16.07 16.08 226 16.081 0 119 2010-05-17 09:31:00 18.00 18.03 18.00 18.02 115 18.020 0 39 |
|
On Wed, May 18, 2011 at 2:33 PM, algotr8der <[hidden email]> wrote:
> Has anyone come across situations where the export of historical data from > IB's API has missing data points that occur randomly? > > I'm not too sure about 'randomly' but I have seen something similar in terms of missing dates/times/contracts on occasion. > I did the following to download data for the Select SPDR etfs: > > > tws <- twsConnect() > > contract <- twsEquity('XHB','SMART','ISLAND') > > reqHistory(tws, Contract=contract) -> XHB > > By default this is set to retrieve 1 years worth of minute data. I received > all of the data for XLE but noticed that XHB has fewer data points. I > pulled > up the chart of XHB to examine whether those missing data points showed up > on the chart but all was well on the TWS Chart. As per IB, the backfilling > on the TWS Chart uses the same data export framework as that used by > reqHistoricalData. So I decided to re-download XHB. This time the missing > data points were different from the previous download. > reqHistory isn't much more than an lapply over/around the max download limit per call. Maybe you could send me off-list the output of your request to see if I get the same issue. Another thing to help debug is to run this on the IBGateway - and send me a copy of the log file. setServerLogLevel(tws, 5) might do the same as well. I also would argue with IB that they aren't using the same framework for the backfills - since you can do more in the TWS than the API allows - something *is* different even at the user level. > > I used IB's TswDde Excel file to cross verify and I noticed that the data > is > present using the Excel API. It could be that the problem did not surface > because TwsDde limits the export of 1 minute data to 2 days worth of data. > I'm speculating here but I do know that downloading via reqHistory produces > data with missing data points that occur randomly. > The excel variant uses ActiveX - and I suspect it isn't really the same as the socket version (Java, IBrokers, etc). Test using the distributed Java example program (or write one). That would be more apples to apples. > > The other thing I noticed was that the data pulled by reqHistory begins at > 9:30:00 and ends at 15:59:00 while the same using TswDde begins at 9:31:00 > and ends at 16:00:00. > > 2010-06-18 15:58:00 15.85 15.85 15.84 15.84 1569 15.843 > 0 337 > 2010-06-18 15:59:00 15.84 15.85 15.81 15.81 3518 15.828 > 0 527 > 2010-06-21 09:30:00 16.04 16.10 16.03 16.09 240 16.047 > 0 47 > 2010-06-21 09:31:00 16.09 16.09 16.07 16.08 226 16.081 > 0 119 > 2010-05-17 09:31:00 18.00 18.03 18.00 18.02 115 18.020 > 0 39 > This is a potential indication of the differences internal to the socket vs. activeX. From the log I get 20100611 14:59:00 as the last data stamp. That is how bars get printed by the API as well - they use the time from the start of the minute, not the following one. It is dumb - as this can then introduce a lookahead bias if you aren't aware/paying attention. Or if you are merging with other data sources it causes havoc as well. Point is, IBrokers isn't doing anything to the timestamp - it is coming from the TWS/IBG that way. You can set the output to be in POSIX seconds since the epoch, though I am not too sure what that would do in terms of stamps. I'll check ... Best, Jeff > -- > View this message in context: > http://r.789695.n4.nabble.com/IBrokers-reqHistory-results-in-missing-random-data-tp3533694p3533694.html > Sent from the Rmetrics mailing list archive at Nabble.com. > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > -- Jeffrey Ryan [hidden email] www.lemnica.com www.esotericR.com [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
|
On 5/18/11 4:25 PM, Jeffrey Ryan wrote:
> On Wed, May 18, 2011 at 2:33 PM, algotr8der <[hidden email]> wrote: > >> Has anyone come across situations where the export of historical data from >> IB's API has missing data points that occur randomly? >> >> I'm not too sure about 'randomly' but I have seen something similar in > terms of missing dates/times/contracts on occasion. > > >> I did the following to download data for the Select SPDR etfs: >> >>> tws <- twsConnect() >>> contract <- twsEquity('XHB','SMART','ISLAND') >>> reqHistory(tws, Contract=contract) -> XHB >> By default this is set to retrieve 1 years worth of minute data. I received >> all of the data for XLE but noticed that XHB has fewer data points. I >> pulled >> up the chart of XHB to examine whether those missing data points showed up >> on the chart but all was well on the TWS Chart. As per IB, the backfilling >> on the TWS Chart uses the same data export framework as that used by >> reqHistoricalData. So I decided to re-download XHB. This time the missing >> data points were different from the previous download. >> > reqHistory isn't much more than an lapply over/around the max download limit > per call. Maybe you could send me off-list the output of your request to > see if I get the same issue. Another thing to help debug is to run this on > the IBGateway - and send me a copy of the log file. setServerLogLevel(tws, > 5) might do the same as well. > > I also would argue with IB that they aren't using the same framework for the > backfills - since you can do more in the TWS than the API allows - something > *is* different even at the user level. I share your feelings in that something *is* different between the two frameworks. I had a long discussion with one of IB's API representatives today in regards to that but did not make much progress there. I have to write up a ticket but I thought I would do further investigation first. So I executed reHistory() using the IBGateway as you suggested. I will upload the logs in a follow-up post as the api log is rather large and I don't want to plug peoples inboxes. This time I downloaded XHB the following dates (see below) had incomplete data. Note the number below the date is a count of the number of individual data points present for that day. The day post Thanksgiving should be the only exception as it represents a half trading day. split.xts(XHB, f="days") -> testXHB N <- length(testXHB) for (i in 1:N) { print(index(testXHB[[i]])[1]) print(dim(testXHB[[i]])[1]) } [1] "2010-07-19 09:30:00 EDT" [1] 389 [1] "2010-09-08 09:30:00 EDT" [1] 389 [1] "2010-10-22 09:30:00 EDT" [1] 389 [1] "2010-10-26 09:30:00 EDT" [1] 389 [1] "2010-11-17 09:30:00 EST" [1] 389 [1] "2010-11-26 09:30:00 EST" [1] 210 [1] "2010-11-30 09:30:00 EST" [1] 389 [1] "2010-12-30 09:30:00 EST" [1] 389 [1] "2010-12-31 09:30:00 EST" [1] 389 [1] "2011-02-14 09:30:00 EST" [1] 389 [1] "2011-03-11 09:30:00 EST" [1] 386 [1] "2011-04-14 09:30:00 EDT" [1] 389 [1] "2011-04-25 09:30:00 EDT" [1] 387 >> I used IB's TswDde Excel file to cross verify and I noticed that the data >> is >> present using the Excel API. It could be that the problem did not surface >> because TwsDde limits the export of 1 minute data to 2 days worth of data. >> I'm speculating here but I do know that downloading via reqHistory produces >> data with missing data points that occur randomly. >> > The excel variant uses ActiveX - and I suspect it isn't really the same as > the socket version (Java, IBrokers, etc). Test using the distributed Java > example program (or write one). That would be more apples to apples. compare to the ActiveX. Will provide further feedback. >> The other thing I noticed was that the data pulled by reqHistory begins at >> 9:30:00 and ends at 15:59:00 while the same using TswDde begins at 9:31:00 >> and ends at 16:00:00. >> >> 2010-06-18 15:58:00 15.85 15.85 15.84 15.84 1569 15.843 >> 0 337 >> 2010-06-18 15:59:00 15.84 15.85 15.81 15.81 3518 15.828 >> 0 527 >> 2010-06-21 09:30:00 16.04 16.10 16.03 16.09 240 16.047 >> 0 47 >> 2010-06-21 09:31:00 16.09 16.09 16.07 16.08 226 16.081 >> 0 119 >> 2010-05-17 09:31:00 18.00 18.03 18.00 18.02 115 18.020 >> 0 39 >> > This is a potential indication of the differences internal to the socket vs. > activeX. From the log I get 20100611 14:59:00 as the last data stamp. That > is how bars get printed by the API as well - they use the time from the > start of the minute, not the following one. It is dumb - as this can then > introduce a lookahead bias if you aren't aware/paying attention. Or if you > are merging with other data sources it causes havoc as well. Point is, > IBrokers isn't doing anything to the timestamp - it is coming from the > TWS/IBG that way. You can set the output to be in POSIX seconds since the > epoch, though I am not too sure what that would do in terms of stamps. I'll > check ... > > Best, > Jeff follow-up with IB on this. >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/IBrokers-reqHistory-results-in-missing-random-data-tp3533694p3533694.html >> Sent from the Rmetrics mailing list archive at Nabble.com. >> >> _______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions >> should go. >> > > _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
|
In reply to this post by algotr8der
|
|
In reply to this post by algotr8der
I must have spoken too soon. Here is what I have discovered.
1) the time issue is not present when you use TswDde, however it is present when you use TswActiveX. The first 1 minute intraday bar occurs at 09:30:00 and the last bar at 15:59:00 when you export historical data using tswActiveX. The same does not occur when you use TswDde. 2) the missing data issue occurs with both TswDde and TswActiveX. I haven't been able to use the distributed Java API client yet because I've had technical issues which I am trying to sort out at the moment. |
|
I am told there is a bug on IB's end. I have asked for further detail. I will provide further information as I become aware.
|
|
On Tue, May 24, 2011 at 10:59:35AM -0700, algotr8der wrote:
| I am told there is a bug on IB's end. I have asked for further detail. I will | provide further information as I become aware. You might want to try ibfetch - I use this to download historical and realtime data from IB into csv files daily. I haven't seen issues like the ones you describe. http://www.gaffa.net/stuff/ibfetch-0.2.tar.gz Example: $ ibfetch -i 20 -l 2 -s 20110402 -S '1 min' AUD.JPY ESc1 Kostas _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
|
For one, if this uses the sockets there would be zero difference. IBrokers
is a raw translation (open to read!) of the socket protocol, and simply would fail if incorrect (it is not). Unless you are running on the same data, at the same time, with the same requests your conclusion has no basis in fact - as it is nothing more than conjecture. Second, linking to a binary(?!) without some context around it (google search and directories of .net domain provide nothing) is about as useless as saying nothing. Certainly nothing to do with R, or even a solution/insight - except for those naive enough to run it. Your email and name aren't anywhere in my records of contributors to R-sig-finance, or R. I'd suggest this has nothing to do with an R solution and nothing to do with R at all. There are myriad ways to accomplish requests - all of the others aren't suitable to the thread in question. Best, Jeff On Sat, May 28, 2011 at 10:18 AM, Kostas Evangelinos <[hidden email]>wrote: > On Tue, May 24, 2011 at 10:59:35AM -0700, algotr8der wrote: > | I am told there is a bug on IB's end. I have asked for further detail. I > will > | provide further information as I become aware. > > You might want to try ibfetch - I use this to download historical and > realtime > data from IB into csv files daily. I haven't seen issues like the ones you > describe. > > http://www.gaffa.net/stuff/ibfetch-0.2.tar.gz > > Example: > $ ibfetch -i 20 -l 2 -s 20110402 -S '1 min' AUD.JPY ESc1 > > Kostas > > _______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions > should go. > -- Jeffrey Ryan [hidden email] www.lemnica.com www.esotericR.com [[alternative HTML version deleted]] _______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. |
|
>>For one, if this uses the sockets there would be zero difference. IBrokers
>>is a raw translation (open to read!) of the socket protocol, and simply >>would fail if incorrect (it is not). Unless you are running on the same >>data, at the same time, with the same requests your conclusion has no basis >>in fact - as it is nothing more than conjecture. Hi Jeff - I must have not been clear in my latest post in this thread - sincere apologies for that. When I said 'I am told there is a bug on IB's end' I meant IB = Interactive Brokers and not IBrokers. Sorry for the confusion. This is something internal to Interactive Brokers so nothing to do with the IBrokers R package. >>Second, linking to a binary(?!) without some context around it (google >>search and directories of .net domain provide nothing) is about as useless >>as saying nothing. Certainly nothing to do with R, or even a >>solution/insight - except for those naive enough to run it. Your email and >>name aren't anywhere in my records of contributors to R-sig-finance, or R. I'm not sure what you are trying to say in the above paragraph. >>I'd suggest this has nothing to do with an R solution and nothing to do with >>R at all. There are myriad ways to accomplish requests - all of the others >>aren't suitable to the thread in question. I agree that this has nothing to do with an R solution hence why I posted an update indicating that the problem occurs with Interactive Broker's own tools: 1) the time issue is not present when you use TswDde, however it is present when you use TswActiveX. The first 1 minute intraday bar occurs at 09:30:00 and the last bar at 15:59:00 when you export historical data using tswActiveX. The same does not occur when you use TswDde. 2) the missing data issue occurs with both TswDde and TswActiveX. I thought that issue #2 may have had to do with 'poor quality' data rather than a problem with the export mechanism. But Interactive Brokers technical support rep I was working with indicated on several occasions that he did not have missing data during his testing using the same said tools for the same time periods and same symbol. Now this is his *claim* and not something that I can independently verify other than indicate that there are gaps in the data exported from Interactive Brokers using their tools in my testing. All the best. AT |
| Powered by Nabble | Edit this page |
