# Loop: noob question

 Classic List Threaded
11 messages
Reply | Threaded
Open this post in threaded view
|

## Loop: noob question

 Hi, Can someone help me out to create a (for?) loop for the following procedure: x=rnorm(250,0,0.02)                                 library(timeSeries)                                   x=timeSeries(x)                                       P=1000                                                 loss=-P*x                                               loss                                                       v95=quantile(loss,0.95)                             v95 I would like to generate 100 000 v95 values. Also, can anybody name a source where I could read up basic programming in R? Thank you.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 This is a textbook of when NOT to use a loop in R: rather make a function that does what you want and use the replicate function to do it repeatedly. f <- function(){ return(-1000*quantile(rnorm(250,0,0.2),0.95) } x = replicate(1e5,f()) There are your desired numbers. Some general coding principles: firstly, you don't need to convert to time series: quantile immediately undoes that so you've just wasted the time doing the coercions each direction. Secondly, the quantiles of c*X are exactly c times the quantiles of X so you can cut down on the multiplications needed by doing the -1000 multiplication after quantilization. Specific to R: don't use loops unless entirely necessary. (And it's hardly ever necessary) -- replicate is great for repeated operations, apply is great for looping "over" thins. More broadly, what do you intend to do with the v95 values? There are probably much more efficient ways to get the desired numbers or even closed form results. I believe this idea is widely studied as VaR in finance. Feel free to write back if we can be of more help. Hope this helps, Michael Weylandt On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote: > Hi, > Can someone help me out to create a (for?) loop for the following > procedure: > > x=rnorm(250,0,0.02) > library(timeSeries) > x=timeSeries(x) > P=1000 > loss=-P*x > loss > v95=quantile(loss,0.95) > v95 > > I would like to generate 100 000 v95 values. > Also, can anybody name a source where I could read up basic programming in > R? > Thank you. > > -- > View this message in context: > http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 In reply to this post by UnitRoot Hey, no problem! We all have to start somewhere, welcome to R! The structure of the for loop is as follows:  First lets define the number you want, say  >vector.size = 100000   First allocate an empty vector to store the results, you could do this like so > V95.Vector<-c()   But I recommend you do it like so > V95.Vector=rep(NA,vector.size) Basically what this does is make a vector of NA's (of length n) for you to store into, that way if something goes wrong in your loop it will be easier to catch. This is better to use when you know the size of your vector from the start( in my opinion), which you do. Now, lets allocate a variable for the counting, say 'loop.count' So we have: >for( loop.count in 1:vector.size) #Count each time from 1 to the number we defined, vector.size {   x=rnorm(250,0,0.02) #generate data   library(timeSeries) #Note that you can call this before the loop, because once you read it in, its there to stay, this will make your loop much slower.   x=timeSeries(x) #Apply sum function to data   P=1000  #You can and should also call this before the loop if it doesn't change.   loss=-P*x #calc   v95=quantile(loss,0.95) #Quantile   V95.Vector[loop.count]=v95 #Store into index of this loop. }    And that should be it, as far as relevant reading Peter Daalgard's Introductory Statistics with R is very good if you do not know other programming languages. Crowleys R Book is the Bible as it were, and is very very good. Electronic copies are available.               Good luck!                 Ken On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote: > Hi, > Can someone help me out to create a (for?) loop for the following > procedure: > > x=rnorm(250,0,0.02) > library(timeSeries) > x=timeSeries(x) > P=1000 > loss=-P*x > loss > v95=quantile(loss,0.95) > v95 > > I would like to generate 100 000 v95 values. > Also, can anybody name a source where I could read up basic programming in > R? > Thank you. > > -- > View this message in context: > http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 In reply to this post by Michael Weylandt Michael: I'm sorry, but this "advice" is wrong. replicate() **IS** essentially a loop: it uses sapply(), which is basically an interpreted loop (with suitable caveats that R experts can provide). The correct advice is: whenever possible, move the loops down to underlying C code by vectorizing. In the example below, one can partly do this by removing the random number generation from the loop and structuring the result as a matrix: rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250 Now one must use apply() -- which is a loop for -- quantile() out <- apply(rmat,2,quantile, probs = .95) Unfortunately, in this case, it won't make much difference, as random number generation is very fast anyway and the looping for quantile() is the same either way. In fact, both versions took almost the same time on my computer (replicate() was actually 3 seconds faster --- 48 vs 51 seconds -- perhaps because sapply() is slightly more efficient than apply() ). As here (I think), one often cannot vectorize and must loop in interpreted code, and for this replicate() is fine and yields nice clean code. But it's still a loop. Cheers, Bert On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt <[hidden email]> wrote: > This is a textbook of when NOT to use a loop in R: rather make a function > that does what you want and use the replicate function to do it repeatedly. > > f <- function(){ > return(-1000*quantile(rnorm(250,0,0.2),0.95) > } > > x = replicate(1e5,f()) > > There are your desired numbers. > > Some general coding principles: firstly, you don't need to convert to time > series: quantile immediately undoes that so you've just wasted the time > doing the coercions each direction. Secondly, the quantiles of c*X are > exactly c times the quantiles of X so you can cut down on the > multiplications needed by doing the -1000 multiplication after > quantilization. > > Specific to R: don't use loops unless entirely necessary. (And it's hardly > ever necessary) -- replicate is great for repeated operations, apply is > great for looping "over" thins. > > More broadly, what do you intend to do with the v95 values? There are > probably much more efficient ways to get the desired numbers or even closed > form results. I believe this idea is widely studied as VaR in finance. > > Feel free to write back if we can be of more help. > > Hope this helps, > > Michael Weylandt > > On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote: > >> Hi, >> Can someone help me out to create a (for?) loop for the following >> procedure: >> >> x=rnorm(250,0,0.02) >> library(timeSeries) >> x=timeSeries(x) >> P=1000 >> loss=-P*x >> loss >> v95=quantile(loss,0.95) >> v95 >> >> I would like to generate 100 000 v95 values. >> Also, can anybody name a source where I could read up basic programming in >> R? >> Thank you. >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html>> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. >> > >        [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 In reply to this post by Ken Hutchison On Fri, Aug 5, 2011 at 9:20 AM, Ken H <[hidden email]> wrote: [snip] >   And that should be it, as far as relevant reading > Peter Daalgard's Introductory Statistics with R is very good if you do not > know other programming languages. I would strongly second this.  It is a very nice book.  What book to read depends a bit exactly what your goals are---data manipulation? Statistics (and then what kind)?  Programming?  etc. For statistics beyond Peter Dalgaard's book, I like John Fox's Applied Regression with Companion to Applied Regression (which uses R and is also the 'car' package). I have been pretty happy with Phil Spector's book Data Manipulation with R. For graphics in R I would suggest ggplot2 by Hadley Wickham or lattice by Deepayan Sarkar (they are both books and packages). For programming I would look at S Programming by Venables & Ripley. > Crowleys R Book is the Bible as it were, and is very very good. Electronic > copies are available. The R Book is very large, but it has some problems in my opinion.  It uses some styles that are often okay, but can cause problems (e.g., using attach, using function names for data).  I would turn elsewhere first.  All of the other books I recommended (except Data Manipulation with R) are written by people who also develop and maintain R Core or substantial R packages (i.e., they are experts in what they are talking about). Cheers, Josh -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 In reply to this post by Bert Gunter Bert, You are absolutely correct: I was wrong not to vectorize in this case. I am surprised, however, by your remark that sapply() (or really lapply()) is faster than apply() -- is there a reason for this? I would have guessed that the major difference between the two would have been memory management since replicate only holds 250 numbers at a time, rather than 2.5e6. I don't know how memory management is implemented on the C level for R so I might be entirely wrong about that though. For UnitRoot: if you go with Bert's code, there's a little typo: rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250 move a parenthesis: rmat <- matrix(rnorm(2.5e7), nrow=250)  ## each column is a sample of 250 Michael Weylandt On Fri, Aug 5, 2011 at 12:51 PM, Bert Gunter <[hidden email]> wrote: > Michael: > > I'm sorry, but this "advice" is wrong. replicate() **IS** essentially > a loop: it uses sapply(), which is basically an interpreted loop (with > suitable caveats that R experts can provide). > > The correct advice is: whenever possible, move the loops down to > underlying C code by vectorizing. In the example below, one can partly > do this by removing the random number generation from the loop and > structuring the result as a matrix: > > rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250 > > Now one must use apply() -- which is a loop for -- quantile() > > out <- apply(rmat,2,quantile, probs = .95) > > Unfortunately, in this case, it won't make much difference, as random > number generation is very fast anyway and the looping for quantile() > is the same either way. In fact, both versions took almost the same > time on my computer (replicate() was actually 3 seconds faster --- 48 > vs 51 seconds -- perhaps because sapply() is slightly more efficient > than apply() ). > > As here (I think), one often cannot vectorize and must loop in > interpreted code, and for this replicate() is fine and yields nice > clean code. But it's still a loop. > > Cheers, > Bert > > On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt > <[hidden email]> wrote: > > This is a textbook of when NOT to use a loop in R: rather make a function > > that does what you want and use the replicate function to do it > repeatedly. > > > > f <- function(){ > > return(-1000*quantile(rnorm(250,0,0.2),0.95) > > } > > > > x = replicate(1e5,f()) > > > > There are your desired numbers. > > > > Some general coding principles: firstly, you don't need to convert to > time > > series: quantile immediately undoes that so you've just wasted the time > > doing the coercions each direction. Secondly, the quantiles of c*X are > > exactly c times the quantiles of X so you can cut down on the > > multiplications needed by doing the -1000 multiplication after > > quantilization. > > > > Specific to R: don't use loops unless entirely necessary. (And it's > hardly > > ever necessary) -- replicate is great for repeated operations, apply is > > great for looping "over" thins. > > > > More broadly, what do you intend to do with the v95 values? There are > > probably much more efficient ways to get the desired numbers or even > closed > > form results. I believe this idea is widely studied as VaR in finance. > > > > Feel free to write back if we can be of more help. > > > > Hope this helps, > > > > Michael Weylandt > > > > On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote: > > > >> Hi, > >> Can someone help me out to create a (for?) loop for the following > >> procedure: > >> > >> x=rnorm(250,0,0.02) > >> library(timeSeries) > >> x=timeSeries(x) > >> P=1000 > >> loss=-P*x > >> loss > >> v95=quantile(loss,0.95) > >> v95 > >> > >> I would like to generate 100 000 v95 values. > >> Also, can anybody name a source where I could read up basic programming > in > >> R? > >> Thank you. > >> > >> -- > >> View this message in context: > >> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html> >> Sent from the R help mailing list archive at Nabble.com. > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. > >> > > > >        [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > "Men by nature long to get on to the ultimate truths, and will often > be impatient with elementary studies or fight shy of them. If it were > possible to reach the ultimate truths without the elementary studies > usually prefixed to them, these would not be preparatory studies but > superfluous diversions." > > -- Maimonides (1135-1204) > > Bert Gunter > Genentech Nonclinical Biostatistics >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 Inline below. On Fri, Aug 5, 2011 at 10:23 AM, R. Michael Weylandt <[hidden email]> wrote: > Bert, > > You are absolutely correct: I was wrong not to vectorize in this case. > No. That wasn't my point at all. In this case, vectorizing doesn't seem to help because you still must do a loop (via *ply) in R. My point was only that replicate() is another form of a loop and that, when possible, vectorizing is preferable. Here I don't think it's possible. However, I interpreted your advice as a general prescription, and that was the problematic part. > I am surprised, however, by your remark that sapply() (or really lapply()) > is faster than apply() -- is there a reason for this? I apologize for being unclear. I **do not know** this to be true, although one might guess it to be so by looking at the apply() code, which has some overhead in it before calling vapply() to do the actual loop. However, I defer to experts for confirmation -- or not. Cheers, Bert I would have guessed > that the major difference between the two would have been memory management > since replicate only holds 250 numbers at a time, rather than 2.5e6. I don't > know how memory management is implemented on the C level for R so I might be > entirely wrong about that though. > > For UnitRoot: if you go with Bert's code, there's a little typo: > > rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250 > > move a parenthesis: > > rmat <- matrix(rnorm(2.5e7), nrow=250)  ## each column is a sample of 250 > > Michael Weylandt > > On Fri, Aug 5, 2011 at 12:51 PM, Bert Gunter <[hidden email]> wrote: >> >> Michael: >> >> I'm sorry, but this "advice" is wrong. replicate() **IS** essentially >> a loop: it uses sapply(), which is basically an interpreted loop (with >> suitable caveats that R experts can provide). >> >> The correct advice is: whenever possible, move the loops down to >> underlying C code by vectorizing. In the example below, one can partly >> do this by removing the random number generation from the loop and >> structuring the result as a matrix: >> >> rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250 >> >> Now one must use apply() -- which is a loop for -- quantile() >> >> out <- apply(rmat,2,quantile, probs = .95) >> >> Unfortunately, in this case, it won't make much difference, as random >> number generation is very fast anyway and the looping for quantile() >> is the same either way. In fact, both versions took almost the same >> time on my computer (replicate() was actually 3 seconds faster --- 48 >> vs 51 seconds -- perhaps because sapply() is slightly more efficient >> than apply() ). >> >> As here (I think), one often cannot vectorize and must loop in >> interpreted code, and for this replicate() is fine and yields nice >> clean code. But it's still a loop. >> >> Cheers, >> Bert >> >> On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt >> <[hidden email]> wrote: >> > This is a textbook of when NOT to use a loop in R: rather make a >> > function >> > that does what you want and use the replicate function to do it >> > repeatedly. >> > >> > f <- function(){ >> > return(-1000*quantile(rnorm(250,0,0.2),0.95) >> > } >> > >> > x = replicate(1e5,f()) >> > >> > There are your desired numbers. >> > >> > Some general coding principles: firstly, you don't need to convert to >> > time >> > series: quantile immediately undoes that so you've just wasted the time >> > doing the coercions each direction. Secondly, the quantiles of c*X are >> > exactly c times the quantiles of X so you can cut down on the >> > multiplications needed by doing the -1000 multiplication after >> > quantilization. >> > >> > Specific to R: don't use loops unless entirely necessary. (And it's >> > hardly >> > ever necessary) -- replicate is great for repeated operations, apply is >> > great for looping "over" thins. >> > >> > More broadly, what do you intend to do with the v95 values? There are >> > probably much more efficient ways to get the desired numbers or even >> > closed >> > form results. I believe this idea is widely studied as VaR in finance. >> > >> > Feel free to write back if we can be of more help. >> > >> > Hope this helps, >> > >> > Michael Weylandt >> > >> > On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote: >> > >> >> Hi, >> >> Can someone help me out to create a (for?) loop for the following >> >> procedure: >> >> >> >> x=rnorm(250,0,0.02) >> >> library(timeSeries) >> >> x=timeSeries(x) >> >> P=1000 >> >> loss=-P*x >> >> loss >> >> v95=quantile(loss,0.95) >> >> v95 >> >> >> >> I would like to generate 100 000 v95 values. >> >> Also, can anybody name a source where I could read up basic programming >> >> in >> >> R? >> >> Thank you. >> >> >> >> -- >> >> View this message in context: >> >> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html>> >> Sent from the R help mailing list archive at Nabble.com. >> >> >> >> ______________________________________________ >> >> [hidden email] mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help>> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html>> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> >        [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > [hidden email] mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help>> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html>> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> "Men by nature long to get on to the ultimate truths, and will often >> be impatient with elementary studies or fight shy of them. If it were >> possible to reach the ultimate truths without the elementary studies >> usually prefixed to them, these would not be preparatory studies but >> superfluous diversions." >> >> -- Maimonides (1135-1204) >> >> Bert Gunter >> Genentech Nonclinical Biostatistics > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 In reply to this post by Joshua Wiley-2 That's a good point Josh is correct, Its the R Bible because its the size of the Bible and serves as a very good reference. I agree that it is definitely not a first blush kind of book. I second the regression book, it is excellent. Cryer and Chan Time Series Analysis with Applications in R is pretty good if you're into that kind of thing... Did not know about the data manipulation or graphics books, I'll definitely be checking those out.   Thanks for the info,           Ken On Fri, Aug 5, 2011 at 1:16 PM, Joshua Wiley <[hidden email]> wrote: > On Fri, Aug 5, 2011 at 9:20 AM, Ken H <[hidden email]> wrote: > [snip] > >   And that should be it, as far as relevant reading > > Peter Daalgard's Introductory Statistics with R is very good if you do > not > > know other programming languages. > > I would strongly second this.  It is a very nice book.  What book to > read depends a bit exactly what your goals are---data manipulation? > Statistics (and then what kind)?  Programming?  etc. > > For statistics beyond Peter Dalgaard's book, I like John Fox's Applied > Regression with Companion to Applied Regression (which uses R and is > also the 'car' package). > > I have been pretty happy with Phil Spector's book Data Manipulation with R. > > For graphics in R I would suggest ggplot2 by Hadley Wickham or lattice > by Deepayan Sarkar (they are both books and packages). > > For programming I would look at S Programming by Venables & Ripley. > > > Crowleys R Book is the Bible as it were, and is very very good. > Electronic > > copies are available. > > The R Book is very large, but it has some problems in my opinion.  It > uses some styles that are often okay, but can cause problems (e.g., > using attach, using function names for data).  I would turn elsewhere > first.  All of the other books I recommended (except Data Manipulation > with R) are written by people who also develop and maintain R Core or > substantial R packages (i.e., they are experts in what they are > talking about). > > Cheers, > > Josh > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, ATS Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/>         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 I think that Josh may have inadvertently left out Venables's and Ripley"s MASS (the book), which is what I would choose if I were marooned on a desert island with my computer, a power supply, and unlimited mai tai's. Also Mozart's Piano Concerti if I had to limit my music to just one composer ( :-)  ) I think John Chambers's latest book is also worth a look. Hard to find a more authoritative author than he (or V & R, for that matter). -- Bert On Fri, Aug 5, 2011 at 10:50 AM, Ken H <[hidden email]> wrote: > That's a good point Josh is correct, > Its the R Bible because its the size of the Bible and serves as a very good > reference. I agree that it is definitely not a first blush kind of book. I > second the regression book, it is excellent. > Cryer and Chan Time Series Analysis with Applications in R is pretty good if > you're into that kind of thing... > Did not know about the data manipulation or graphics books, I'll definitely > be checking those out. > >  Thanks for the info, >          Ken > > > On Fri, Aug 5, 2011 at 1:16 PM, Joshua Wiley <[hidden email]> wrote: > >> On Fri, Aug 5, 2011 at 9:20 AM, Ken H <[hidden email]> wrote: >> [snip] >> >   And that should be it, as far as relevant reading >> > Peter Daalgard's Introductory Statistics with R is very good if you do >> not >> > know other programming languages. >> >> I would strongly second this.  It is a very nice book.  What book to >> read depends a bit exactly what your goals are---data manipulation? >> Statistics (and then what kind)?  Programming?  etc. >> >> For statistics beyond Peter Dalgaard's book, I like John Fox's Applied >> Regression with Companion to Applied Regression (which uses R and is >> also the 'car' package). >> >> I have been pretty happy with Phil Spector's book Data Manipulation with R. >> >> For graphics in R I would suggest ggplot2 by Hadley Wickham or lattice >> by Deepayan Sarkar (they are both books and packages). >> >> For programming I would look at S Programming by Venables & Ripley. >> >> > Crowleys R Book is the Bible as it were, and is very very good. >> Electronic >> > copies are available. >> >> The R Book is very large, but it has some problems in my opinion.  It >> uses some styles that are often okay, but can cause problems (e.g., >> using attach, using function names for data).  I would turn elsewhere >> first.  All of the other books I recommended (except Data Manipulation >> with R) are written by people who also develop and maintain R Core or >> substantial R packages (i.e., they are experts in what they are >> talking about). >> >> Cheers, >> >> Josh >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, ATS Statistical Consulting Group >> University of California, Los Angeles >> https://joshuawiley.com/>> > >        [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 In reply to this post by Bert Gunter I don't think the "use replicate" answer is necessarily "wrong". Making a matrix of all the random numbers you plan on using and then using apply on it requires that you have memory for all those random numbers.  Generating a batch of random numbers, running quantile on them, then discarding the random numbers (saving only the output of quantile) will save quite a bit of memory and lets you increase the number of repetitions more than the first method allows.  E.g., here are three ways of doing the same simulation with 1000 reps:  > set.seed(1) ; t0 <- system.time(z0 <- replicate(10^3, quantile(rnorm(10^4), c(0.25, 0.75))))  > set.seed(1) ; t1 <- system.time(z1 <- vapply(seq_len(10^3), function(i)quantile(rnorm(10^4), c(0.25, 0.75)), FUN.VALUE=numeric(2)))  > set.seed(1) ; t2 <- system.time(z2 <- apply(matrix(rnorm(10^4 * 10^3), ncol=10^3), 2, function(col)quantile(col, c(0.25,0.75))))  > identical(z0,z1)  [1] TRUE  > identical(z0,z2)  [1] TRUE  > rbind(t0, t1, t2)     user.self sys.self elapsed user.child sys.child  t0      2.53     0.01    3.92         NA        NA  t1      2.33     0.00    3.22         NA        NA  t2      2.94     0.10    3.85         NA        NA Their times are not much different.  However, the matrix approach fails on my 32-bit Windows machine when you ask for 10000 repetitions:  > set.seed(1) ; t0 <- system.time(z0 <- replicate(10^4, quantile(rnorm(10^4), c(0.25, 0.75))))  > set.seed(1) ; t1 <- system.time(z1 <- vapply(seq_len(10^4), function(i)quantile(rnorm(10^4), c(0.25, 0.75)), FUN.VALUE=numeric(2)))  > set.seed(1) ; t2 <- system.time(z2 <- apply(matrix(rnorm(10^4 * 10^4), ncol=10^4), 2, function(col)quantile(col, c(0.25,0.75))))  Error: cannot allocate vector of size 762.9 Mb  In addition: Warning messages:  1: In matrix(rnorm(10^4 * 10^4), ncol = 10^4) :    Reached total allocation of 1535Mb: see help(memory.size)  2: In matrix(rnorm(10^4 * 10^4), ncol = 10^4) :    Reached total allocation of 1535Mb: see help(memory.size)  3: In matrix(rnorm(10^4 * 10^4), ncol = 10^4) :    Reached total allocation of 1535Mb: see help(memory.size)  4: In matrix(rnorm(10^4 * 10^4), ncol = 10^4) :    Reached total allocation of 1535Mb: see help(memory.size)  Timing stopped at: 18.25 0.3 21.8  > identical(z0,z1)  [1] TRUE  > rbind(t0, t1)     user.self sys.self elapsed user.child sys.child  t0     23.95     0.02   30.52         NA        NA  t1     25.90     0.01   30.75         NA        NA (I like vapply() because it throws an error if your function returns something other than what you said it would.  It also gives a reasonable result when the input has length zero.) This list gets a lot of questions on the most efficient ways of doing things.  The answer often depends on the size, shape, or other nature of the data (or machine) you are working on. I see a lot of requests for loopless solutions.  Here is one for the above problem, but it takes about 5 times as long to run as the above three loopy solutions.   > set.seed(1) ; t3 <- system.time({ tmp <- matrix(rnorm(10^4 * 10^3), ncol=10^3);                                       tmp[] <- tmp[order(col(tmp), tmp)] ;                                       z3 <- tmp[round(c(1/4,3/4)*nrow(tmp)),]}) The bottom line is that if you are concerned with efficiency, you ought learn how to measure it and get used to running tests.  Seeing how the run time scales with various measures of your data is important. system.time() is a good tool in base R and the rbenchmark package makes it easy to compare various approaches. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf Of Bert Gunter > Sent: Friday, August 05, 2011 9:51 AM > To: R. Michael Weylandt > Cc: [hidden email]; UnitRoot > Subject: Re: [R] Loop: noob question > > Michael: > > I'm sorry, but this "advice" is wrong. replicate() **IS** essentially > a loop: it uses sapply(), which is basically an interpreted loop (with > suitable caveats that R experts can provide). > > The correct advice is: whenever possible, move the loops down to > underlying C code by vectorizing. In the example below, one can partly > do this by removing the random number generation from the loop and > structuring the result as a matrix: > > rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250 > > Now one must use apply() -- which is a loop for -- quantile() > > out <- apply(rmat,2,quantile, probs = .95) > > Unfortunately, in this case, it won't make much difference, as random > number generation is very fast anyway and the looping for quantile() > is the same either way. In fact, both versions took almost the same > time on my computer (replicate() was actually 3 seconds faster --- 48 > vs 51 seconds -- perhaps because sapply() is slightly more efficient > than apply() ). > > As here (I think), one often cannot vectorize and must loop in > interpreted code, and for this replicate() is fine and yields nice > clean code. But it's still a loop. > > Cheers, > Bert > > On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt > <[hidden email]> wrote: > > This is a textbook of when NOT to use a loop in R: rather make a function > > that does what you want and use the replicate function to do it repeatedly. > > > > f <- function(){ > > return(-1000*quantile(rnorm(250,0,0.2),0.95) > > } > > > > x = replicate(1e5,f()) > > > > There are your desired numbers. > > > > Some general coding principles: firstly, you don't need to convert to time > > series: quantile immediately undoes that so you've just wasted the time > > doing the coercions each direction. Secondly, the quantiles of c*X are > > exactly c times the quantiles of X so you can cut down on the > > multiplications needed by doing the -1000 multiplication after > > quantilization. > > > > Specific to R: don't use loops unless entirely necessary. (And it's hardly > > ever necessary) -- replicate is great for repeated operations, apply is > > great for looping "over" thins. > > > > More broadly, what do you intend to do with the v95 values? There are > > probably much more efficient ways to get the desired numbers or even closed > > form results. I believe this idea is widely studied as VaR in finance. > > > > Feel free to write back if we can be of more help. > > > > Hope this helps, > > > > Michael Weylandt > > > > On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote: > > > >> Hi, > >> Can someone help me out to create a (for?) loop for the following > >> procedure: > >> > >> x=rnorm(250,0,0.02) > >> library(timeSeries) > >> x=timeSeries(x) > >> P=1000 > >> loss=-P*x > >> loss > >> v95=quantile(loss,0.95) > >> v95 > >> > >> I would like to generate 100 000 v95 values. > >> Also, can anybody name a source where I could read up basic programming in > >> R? > >> Thank you. > >> > >> -- > >> View this message in context: > >> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html> >> Sent from the R help mailing list archive at Nabble.com. > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. > >> > > > >        [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > "Men by nature long to get on to the ultimate truths, and will often > be impatient with elementary studies or fight shy of them. If it were > possible to reach the ultimate truths without the elementary studies > usually prefixed to them, these would not be preparatory studies but > superfluous diversions." > > -- Maimonides (1135-1204) > > Bert Gunter > Genentech Nonclinical Biostatistics > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: Loop: noob question

 In reply to this post by Ken Hutchison On Aug 5, 2011, at 1:50 PM, Ken H wrote: > That's a good point Josh is correct, > Its the R Bible because its the size of the Bible and serves as a   > very good > reference. Some people apparently think so [assuming here that you  are referring   to Crawley.] My experience is less favorable. When I tried to use it,   it often leaves me without a workable answer or had misleading advice   because of its loose  use of terminology. It frequently generates   confusing  questions to this list because of its reliance on attach(). I thought MASS was a better book to learn R from.  Yes, I know it's   not supposed to be an introductory book, but it has enough to use it   as such when used in combination with the help pages. (Every(noob)ody   does use the help pages, right?) > I agree that it is definitely not a first blush kind of book. I > second the regression book, it is excellent. > Cryer and Chan Time Series Analysis with Applications in R is pretty   > good if > you're into that kind of thing... > Did not know about the data manipulation or graphics books, I'll   > definitely > be checking those out. > >  Thanks for the info, >          Ken > > > On Fri, Aug 5, 2011 at 1:16 PM, Joshua Wiley   > <[hidden email]> wrote: > >> On Fri, Aug 5, 2011 at 9:20 AM, Ken H <[hidden email]> wrote: >> [snip] >>>  And that should be it, as far as relevant reading >>> Peter Daalgard's Introductory Statistics with R is very good if   >>> you do >> not >>> know other programming languages. >> >> I would strongly second this.  It is a very nice book.  What book to >> read depends a bit exactly what your goals are---data manipulation? >> Statistics (and then what kind)?  Programming?  etc. >> >> For statistics beyond Peter Dalgaard's book, I like John Fox's   >> Applied >> Regression with Companion to Applied Regression (which uses R and is >> also the 'car' package). >> >> I have been pretty happy with Phil Spector's book Data Manipulation   >> with R. >> >> For graphics in R I would suggest ggplot2 by Hadley Wickham or   >> lattice >> by Deepayan Sarkar (they are both books and packages). >> >> For programming I would look at S Programming by Venables & Ripley. >> >>> Crowleys R Book is the Bible as it were, and is very very good. >> Electronic >>> copies are available. >> >> The R Book is very large, but it has some problems in my opinion.  It >> uses some styles that are often okay, but can cause problems (e.g., >> using attach, using function names for data).  I would turn elsewhere >> first.  All of the other books I recommended (except Data   >> Manipulation >> with R) are written by people who also develop and maintain R Core or >> substantial R packages (i.e., they are experts in what they are >> talking about). >> >> Cheers, >> >> Josh >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, ATS Statistical Consulting Group >> University of California, Los Angeles >> https://joshuawiley.com/>> > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.