# For loop gets exponentially slower as dataset gets larger...

3 messages
Open this post in threaded view
|

## For loop gets exponentially slower as dataset gets larger...

 I am running R 2.1.1 in a Microsoft Windows XP environment.       I have a matrix with three vectors (columns) and ~2 million rows.  The three vectors are date_, id, and price.  The data is ordered (sorted) by code and date_.       (The matrix contains daily prices for several thousand stocks, and has ~2 million rows. If a stock did not trade on a particular date, its price is set to NA)       I wish to add a fourth vector that is next_price. (Next price is the current price as long as the current price is not NA.  If the current price is NA, the next_price is the next price that the security with this same ID trades.  If the stock does not trade again,  next_price is set to NA.)       I wrote the following loop to calculate next_price.  It works as intended, but I have one problem.  When I have only 10,000 rows of data, the calculations are very fast.  However, when I run the loop on the full 2 million rows, it seems to take ~ 1 second per row.       Why is this happening?  What can I do to speed the calculations when running the loop on the full 2 million rows?       (I am not running low on memory, but I am maxing out my CPU at 100%)       Here is my code and some sample data:       data<- data[order(data\$code,data\$date_),]   l<-dim(data)[1]   w<-3   data[l,w+1]<-NA       for (i in (l-1):(1)){   data[i,w+1]<-ifelse(is.na(data[i,w])==F,data[i,w],ifelse(data[i,2]==data[i+1,2],data[i+1,w+1],NA))   }           date      id         price     next_price   6/24/2005        1635    444.7838         444.7838   6/27/2005        1635    448.4756         448.4756   6/28/2005        1635    455.4161         455.4161   6/29/2005        1635    454.6658         454.6658   6/30/2005        1635    453.9155         453.9155   7/1/2005          1635    453.3153         453.3153   7/4/2005          1635    NA      453.9155   7/5/2005          1635    453.9155         453.9155   7/6/2005          1635    453.0152         453.0152   7/7/2005          1635    452.8651         452.8651   7/8/2005          1635    456.0163         456.0163   12/19/2005      1635    442.6982         442.6982   12/20/2005      1635    446.5159         446.5159   12/21/2005      1635    452.4714         452.4714   12/22/2005      1635    451.074           451.074   12/23/2005      1635    454.6453         454.6453   12/27/2005      1635    NA      NA   12/28/2005      1635    NA      NA   12/1/2003        1881    66.1562           66.1562   12/2/2003        1881    64.9192           64.9192   12/3/2003        1881    66.0078           66.0078   12/4/2003        1881    65.8098           65.8098   12/5/2003        1881    64.1275           64.1275   12/8/2003        1881    64.8697           64.8697   12/9/2003        1881    63.5337           63.5337   12/10/2003      1881    62.9399           62.9399                 ---------------------------------         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html