Loop: noob question

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Loop: noob question

UnitRoot
Hi,
Can someone help me out to create a (for?) loop for the following procedure:

x=rnorm(250,0,0.02)                                
library(timeSeries)                                  
x=timeSeries(x)                                      
P=1000                                                
loss=-P*x                                              
loss                                                      
v95=quantile(loss,0.95)                            
v95

I would like to generate 100 000 v95 values.
Also, can anybody name a source where I could read up basic programming in R?
Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

Michael Weylandt
This is a textbook of when NOT to use a loop in R: rather make a function
that does what you want and use the replicate function to do it repeatedly.

f <- function(){
return(-1000*quantile(rnorm(250,0,0.2),0.95)
}

x = replicate(1e5,f())

There are your desired numbers.

Some general coding principles: firstly, you don't need to convert to time
series: quantile immediately undoes that so you've just wasted the time
doing the coercions each direction. Secondly, the quantiles of c*X are
exactly c times the quantiles of X so you can cut down on the
multiplications needed by doing the -1000 multiplication after
quantilization.

Specific to R: don't use loops unless entirely necessary. (And it's hardly
ever necessary) -- replicate is great for repeated operations, apply is
great for looping "over" thins.

More broadly, what do you intend to do with the v95 values? There are
probably much more efficient ways to get the desired numbers or even closed
form results. I believe this idea is widely studied as VaR in finance.

Feel free to write back if we can be of more help.

Hope this helps,

Michael Weylandt

On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote:

> Hi,
> Can someone help me out to create a (for?) loop for the following
> procedure:
>
> x=rnorm(250,0,0.02)
> library(timeSeries)
> x=timeSeries(x)
> P=1000
> loss=-P*x
> loss
> v95=quantile(loss,0.95)
> v95
>
> I would like to generate 100 000 v95 values.
> Also, can anybody name a source where I could read up basic programming in
> R?
> Thank you.
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

Ken Hutchison
In reply to this post by UnitRoot
Hey, no problem! We all have to start somewhere, welcome to R!
The structure of the for loop is as follows:
 First lets define the number you want, say
 >vector.size = 100000
  First allocate an empty vector to store the results, you could do this
like so
> V95.Vector<-c()
  But I recommend you do it like so
> V95.Vector=rep(NA,vector.size)
Basically what this does is make a vector of NA's (of length n) for you to
store into, that way if something goes wrong in your loop it will be easier
to catch. This is better to use when you know the size of your vector from
the start( in my opinion), which you do.
Now, lets allocate a variable for the counting, say 'loop.count'
So we have:
>for( loop.count in 1:vector.size) #Count each time from 1 to the number we
defined, vector.size
{
  x=rnorm(250,0,0.02) #generate data
  library(timeSeries) #Note that you can call this before the loop, because
once you read it in, its there to stay, this will make your loop much
slower.
  x=timeSeries(x) #Apply sum function to data
  P=1000  #You can and should also call this before the loop if it doesn't
change.
  loss=-P*x #calc
  v95=quantile(loss,0.95) #Quantile
  V95.Vector[loop.count]=v95 #Store into index of this loop.
}
   And that should be it, as far as relevant reading
Peter Daalgard's Introductory Statistics with R is very good if you do not
know other programming languages.
Crowleys R Book is the Bible as it were, and is very very good. Electronic
copies are available.
              Good luck!
                Ken
On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote:

> Hi,
> Can someone help me out to create a (for?) loop for the following
> procedure:
>
> x=rnorm(250,0,0.02)
> library(timeSeries)
> x=timeSeries(x)
> P=1000
> loss=-P*x
> loss
> v95=quantile(loss,0.95)
> v95
>
> I would like to generate 100 000 v95 values.
> Also, can anybody name a source where I could read up basic programming in
> R?
> Thank you.
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

Bert Gunter
In reply to this post by Michael Weylandt
Michael:

I'm sorry, but this "advice" is wrong. replicate() **IS** essentially
a loop: it uses sapply(), which is basically an interpreted loop (with
suitable caveats that R experts can provide).

The correct advice is: whenever possible, move the loops down to
underlying C code by vectorizing. In the example below, one can partly
do this by removing the random number generation from the loop and
structuring the result as a matrix:

rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250

Now one must use apply() -- which is a loop for -- quantile()

out <- apply(rmat,2,quantile, probs = .95)

Unfortunately, in this case, it won't make much difference, as random
number generation is very fast anyway and the looping for quantile()
is the same either way. In fact, both versions took almost the same
time on my computer (replicate() was actually 3 seconds faster --- 48
vs 51 seconds -- perhaps because sapply() is slightly more efficient
than apply() ).

As here (I think), one often cannot vectorize and must loop in
interpreted code, and for this replicate() is fine and yields nice
clean code. But it's still a loop.

Cheers,
Bert

On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt
<[hidden email]> wrote:

> This is a textbook of when NOT to use a loop in R: rather make a function
> that does what you want and use the replicate function to do it repeatedly.
>
> f <- function(){
> return(-1000*quantile(rnorm(250,0,0.2),0.95)
> }
>
> x = replicate(1e5,f())
>
> There are your desired numbers.
>
> Some general coding principles: firstly, you don't need to convert to time
> series: quantile immediately undoes that so you've just wasted the time
> doing the coercions each direction. Secondly, the quantiles of c*X are
> exactly c times the quantiles of X so you can cut down on the
> multiplications needed by doing the -1000 multiplication after
> quantilization.
>
> Specific to R: don't use loops unless entirely necessary. (And it's hardly
> ever necessary) -- replicate is great for repeated operations, apply is
> great for looping "over" thins.
>
> More broadly, what do you intend to do with the v95 values? There are
> probably much more efficient ways to get the desired numbers or even closed
> form results. I believe this idea is widely studied as VaR in finance.
>
> Feel free to write back if we can be of more help.
>
> Hope this helps,
>
> Michael Weylandt
>
> On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote:
>
>> Hi,
>> Can someone help me out to create a (for?) loop for the following
>> procedure:
>>
>> x=rnorm(250,0,0.02)
>> library(timeSeries)
>> x=timeSeries(x)
>> P=1000
>> loss=-P*x
>> loss
>> v95=quantile(loss,0.95)
>> v95
>>
>> I would like to generate 100 000 v95 values.
>> Also, can anybody name a source where I could read up basic programming in
>> R?
>> Thank you.
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

Joshua Wiley-2
In reply to this post by Ken Hutchison
On Fri, Aug 5, 2011 at 9:20 AM, Ken H <[hidden email]> wrote:
[snip]
>   And that should be it, as far as relevant reading
> Peter Daalgard's Introductory Statistics with R is very good if you do not
> know other programming languages.

I would strongly second this.  It is a very nice book.  What book to
read depends a bit exactly what your goals are---data manipulation?
Statistics (and then what kind)?  Programming?  etc.

For statistics beyond Peter Dalgaard's book, I like John Fox's Applied
Regression with Companion to Applied Regression (which uses R and is
also the 'car' package).

I have been pretty happy with Phil Spector's book Data Manipulation with R.

For graphics in R I would suggest ggplot2 by Hadley Wickham or lattice
by Deepayan Sarkar (they are both books and packages).

For programming I would look at S Programming by Venables & Ripley.

> Crowleys R Book is the Bible as it were, and is very very good. Electronic
> copies are available.

The R Book is very large, but it has some problems in my opinion.  It
uses some styles that are often okay, but can cause problems (e.g.,
using attach, using function names for data).  I would turn elsewhere
first.  All of the other books I recommended (except Data Manipulation
with R) are written by people who also develop and maintain R Core or
substantial R packages (i.e., they are experts in what they are
talking about).

Cheers,

Josh


--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

Michael Weylandt
In reply to this post by Bert Gunter
Bert,

You are absolutely correct: I was wrong not to vectorize in this case.

I am surprised, however, by your remark that sapply() (or really lapply())
is faster than apply() -- is there a reason for this? I would have guessed
that the major difference between the two would have been memory management
since replicate only holds 250 numbers at a time, rather than 2.5e6. I don't
know how memory management is implemented on the C level for R so I might be
entirely wrong about that though.

For UnitRoot: if you go with Bert's code, there's a little typo:

rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250

move a parenthesis:

rmat <- matrix(rnorm(2.5e7), nrow=250)  ## each column is a sample of 250

Michael Weylandt

On Fri, Aug 5, 2011 at 12:51 PM, Bert Gunter <[hidden email]> wrote:

> Michael:
>
> I'm sorry, but this "advice" is wrong. replicate() **IS** essentially
> a loop: it uses sapply(), which is basically an interpreted loop (with
> suitable caveats that R experts can provide).
>
> The correct advice is: whenever possible, move the loops down to
> underlying C code by vectorizing. In the example below, one can partly
> do this by removing the random number generation from the loop and
> structuring the result as a matrix:
>
> rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250
>
> Now one must use apply() -- which is a loop for -- quantile()
>
> out <- apply(rmat,2,quantile, probs = .95)
>
> Unfortunately, in this case, it won't make much difference, as random
> number generation is very fast anyway and the looping for quantile()
> is the same either way. In fact, both versions took almost the same
> time on my computer (replicate() was actually 3 seconds faster --- 48
> vs 51 seconds -- perhaps because sapply() is slightly more efficient
> than apply() ).
>
> As here (I think), one often cannot vectorize and must loop in
> interpreted code, and for this replicate() is fine and yields nice
> clean code. But it's still a loop.
>
> Cheers,
> Bert
>
> On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt
> <[hidden email]> wrote:
> > This is a textbook of when NOT to use a loop in R: rather make a function
> > that does what you want and use the replicate function to do it
> repeatedly.
> >
> > f <- function(){
> > return(-1000*quantile(rnorm(250,0,0.2),0.95)
> > }
> >
> > x = replicate(1e5,f())
> >
> > There are your desired numbers.
> >
> > Some general coding principles: firstly, you don't need to convert to
> time
> > series: quantile immediately undoes that so you've just wasted the time
> > doing the coercions each direction. Secondly, the quantiles of c*X are
> > exactly c times the quantiles of X so you can cut down on the
> > multiplications needed by doing the -1000 multiplication after
> > quantilization.
> >
> > Specific to R: don't use loops unless entirely necessary. (And it's
> hardly
> > ever necessary) -- replicate is great for repeated operations, apply is
> > great for looping "over" thins.
> >
> > More broadly, what do you intend to do with the v95 values? There are
> > probably much more efficient ways to get the desired numbers or even
> closed
> > form results. I believe this idea is widely studied as VaR in finance.
> >
> > Feel free to write back if we can be of more help.
> >
> > Hope this helps,
> >
> > Michael Weylandt
> >
> > On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote:
> >
> >> Hi,
> >> Can someone help me out to create a (for?) loop for the following
> >> procedure:
> >>
> >> x=rnorm(250,0,0.02)
> >> library(timeSeries)
> >> x=timeSeries(x)
> >> P=1000
> >> loss=-P*x
> >> loss
> >> v95=quantile(loss,0.95)
> >> v95
> >>
> >> I would like to generate 100 000 v95 values.
> >> Also, can anybody name a source where I could read up basic programming
> in
> >> R?
> >> Thank you.
> >>
> >> --
> >> View this message in context:
> >> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> "Men by nature long to get on to the ultimate truths, and will often
> be impatient with elementary studies or fight shy of them. If it were
> possible to reach the ultimate truths without the elementary studies
> usually prefixed to them, these would not be preparatory studies but
> superfluous diversions."
>
> -- Maimonides (1135-1204)
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

Bert Gunter
Inline below.

On Fri, Aug 5, 2011 at 10:23 AM, R. Michael Weylandt
<[hidden email]> wrote:
> Bert,
>
> You are absolutely correct: I was wrong not to vectorize in this case.
>

No. That wasn't my point at all. In this case, vectorizing doesn't
seem to help because you still must do a loop (via *ply) in R. My
point was only that replicate() is another form of a loop and that,
when possible, vectorizing is preferable. Here I don't think it's
possible. However, I interpreted your advice as a general
prescription, and that was the problematic part.

> I am surprised, however, by your remark that sapply() (or really lapply())
> is faster than apply() -- is there a reason for this?

I apologize for being unclear. I **do not know** this to be true,
although one might guess it to be so by looking at the apply() code,
which has some overhead in it before calling vapply() to do the actual
loop. However, I defer to experts for confirmation -- or not.

Cheers,
Bert

I would have guessed

> that the major difference between the two would have been memory management
> since replicate only holds 250 numbers at a time, rather than 2.5e6. I don't
> know how memory management is implemented on the C level for R so I might be
> entirely wrong about that though.
>
> For UnitRoot: if you go with Bert's code, there's a little typo:
>
> rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250
>
> move a parenthesis:
>
> rmat <- matrix(rnorm(2.5e7), nrow=250)  ## each column is a sample of 250
>
> Michael Weylandt
>
> On Fri, Aug 5, 2011 at 12:51 PM, Bert Gunter <[hidden email]> wrote:
>>
>> Michael:
>>
>> I'm sorry, but this "advice" is wrong. replicate() **IS** essentially
>> a loop: it uses sapply(), which is basically an interpreted loop (with
>> suitable caveats that R experts can provide).
>>
>> The correct advice is: whenever possible, move the loops down to
>> underlying C code by vectorizing. In the example below, one can partly
>> do this by removing the random number generation from the loop and
>> structuring the result as a matrix:
>>
>> rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250
>>
>> Now one must use apply() -- which is a loop for -- quantile()
>>
>> out <- apply(rmat,2,quantile, probs = .95)
>>
>> Unfortunately, in this case, it won't make much difference, as random
>> number generation is very fast anyway and the looping for quantile()
>> is the same either way. In fact, both versions took almost the same
>> time on my computer (replicate() was actually 3 seconds faster --- 48
>> vs 51 seconds -- perhaps because sapply() is slightly more efficient
>> than apply() ).
>>
>> As here (I think), one often cannot vectorize and must loop in
>> interpreted code, and for this replicate() is fine and yields nice
>> clean code. But it's still a loop.
>>
>> Cheers,
>> Bert
>>
>> On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt
>> <[hidden email]> wrote:
>> > This is a textbook of when NOT to use a loop in R: rather make a
>> > function
>> > that does what you want and use the replicate function to do it
>> > repeatedly.
>> >
>> > f <- function(){
>> > return(-1000*quantile(rnorm(250,0,0.2),0.95)
>> > }
>> >
>> > x = replicate(1e5,f())
>> >
>> > There are your desired numbers.
>> >
>> > Some general coding principles: firstly, you don't need to convert to
>> > time
>> > series: quantile immediately undoes that so you've just wasted the time
>> > doing the coercions each direction. Secondly, the quantiles of c*X are
>> > exactly c times the quantiles of X so you can cut down on the
>> > multiplications needed by doing the -1000 multiplication after
>> > quantilization.
>> >
>> > Specific to R: don't use loops unless entirely necessary. (And it's
>> > hardly
>> > ever necessary) -- replicate is great for repeated operations, apply is
>> > great for looping "over" thins.
>> >
>> > More broadly, what do you intend to do with the v95 values? There are
>> > probably much more efficient ways to get the desired numbers or even
>> > closed
>> > form results. I believe this idea is widely studied as VaR in finance.
>> >
>> > Feel free to write back if we can be of more help.
>> >
>> > Hope this helps,
>> >
>> > Michael Weylandt
>> >
>> > On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote:
>> >
>> >> Hi,
>> >> Can someone help me out to create a (for?) loop for the following
>> >> procedure:
>> >>
>> >> x=rnorm(250,0,0.02)
>> >> library(timeSeries)
>> >> x=timeSeries(x)
>> >> P=1000
>> >> loss=-P*x
>> >> loss
>> >> v95=quantile(loss,0.95)
>> >> v95
>> >>
>> >> I would like to generate 100 000 v95 values.
>> >> Also, can anybody name a source where I could read up basic programming
>> >> in
>> >> R?
>> >> Thank you.
>> >>
>> >> --
>> >> View this message in context:
>> >> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html
>> >> Sent from the R help mailing list archive at Nabble.com.
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> "Men by nature long to get on to the ultimate truths, and will often
>> be impatient with elementary studies or fight shy of them. If it were
>> possible to reach the ultimate truths without the elementary studies
>> usually prefixed to them, these would not be preparatory studies but
>> superfluous diversions."
>>
>> -- Maimonides (1135-1204)
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

Ken Hutchison
In reply to this post by Joshua Wiley-2
That's a good point Josh is correct,
Its the R Bible because its the size of the Bible and serves as a very good
reference. I agree that it is definitely not a first blush kind of book. I
second the regression book, it is excellent.
Cryer and Chan Time Series Analysis with Applications in R is pretty good if
you're into that kind of thing...
Did not know about the data manipulation or graphics books, I'll definitely
be checking those out.

  Thanks for the info,
          Ken


On Fri, Aug 5, 2011 at 1:16 PM, Joshua Wiley <[hidden email]> wrote:

> On Fri, Aug 5, 2011 at 9:20 AM, Ken H <[hidden email]> wrote:
> [snip]
> >   And that should be it, as far as relevant reading
> > Peter Daalgard's Introductory Statistics with R is very good if you do
> not
> > know other programming languages.
>
> I would strongly second this.  It is a very nice book.  What book to
> read depends a bit exactly what your goals are---data manipulation?
> Statistics (and then what kind)?  Programming?  etc.
>
> For statistics beyond Peter Dalgaard's book, I like John Fox's Applied
> Regression with Companion to Applied Regression (which uses R and is
> also the 'car' package).
>
> I have been pretty happy with Phil Spector's book Data Manipulation with R.
>
> For graphics in R I would suggest ggplot2 by Hadley Wickham or lattice
> by Deepayan Sarkar (they are both books and packages).
>
> For programming I would look at S Programming by Venables & Ripley.
>
> > Crowleys R Book is the Bible as it were, and is very very good.
> Electronic
> > copies are available.
>
> The R Book is very large, but it has some problems in my opinion.  It
> uses some styles that are often okay, but can cause problems (e.g.,
> using attach, using function names for data).  I would turn elsewhere
> first.  All of the other books I recommended (except Data Manipulation
> with R) are written by people who also develop and maintain R Core or
> substantial R packages (i.e., they are experts in what they are
> talking about).
>
> Cheers,
>
> Josh
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, ATS Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

Bert Gunter
I think that Josh may have inadvertently left out Venables's and
Ripley"s MASS (the book), which is what I would choose if I were
marooned on a desert island with my computer, a power supply, and
unlimited mai tai's. Also Mozart's Piano Concerti if I had to limit my
music to just one composer ( :-)  )

I think John Chambers's latest book is also worth a look. Hard to find
a more authoritative author than he (or V & R, for that matter).

-- Bert

On Fri, Aug 5, 2011 at 10:50 AM, Ken H <[hidden email]> wrote:

> That's a good point Josh is correct,
> Its the R Bible because its the size of the Bible and serves as a very good
> reference. I agree that it is definitely not a first blush kind of book. I
> second the regression book, it is excellent.
> Cryer and Chan Time Series Analysis with Applications in R is pretty good if
> you're into that kind of thing...
> Did not know about the data manipulation or graphics books, I'll definitely
> be checking those out.
>
>  Thanks for the info,
>          Ken
>
>
> On Fri, Aug 5, 2011 at 1:16 PM, Joshua Wiley <[hidden email]> wrote:
>
>> On Fri, Aug 5, 2011 at 9:20 AM, Ken H <[hidden email]> wrote:
>> [snip]
>> >   And that should be it, as far as relevant reading
>> > Peter Daalgard's Introductory Statistics with R is very good if you do
>> not
>> > know other programming languages.
>>
>> I would strongly second this.  It is a very nice book.  What book to
>> read depends a bit exactly what your goals are---data manipulation?
>> Statistics (and then what kind)?  Programming?  etc.
>>
>> For statistics beyond Peter Dalgaard's book, I like John Fox's Applied
>> Regression with Companion to Applied Regression (which uses R and is
>> also the 'car' package).
>>
>> I have been pretty happy with Phil Spector's book Data Manipulation with R.
>>
>> For graphics in R I would suggest ggplot2 by Hadley Wickham or lattice
>> by Deepayan Sarkar (they are both books and packages).
>>
>> For programming I would look at S Programming by Venables & Ripley.
>>
>> > Crowleys R Book is the Bible as it were, and is very very good.
>> Electronic
>> > copies are available.
>>
>> The R Book is very large, but it has some problems in my opinion.  It
>> uses some styles that are often okay, but can cause problems (e.g.,
>> using attach, using function names for data).  I would turn elsewhere
>> first.  All of the other books I recommended (except Data Manipulation
>> with R) are written by people who also develop and maintain R Core or
>> substantial R packages (i.e., they are experts in what they are
>> talking about).
>>
>> Cheers,
>>
>> Josh
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, ATS Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

William Dunlap
In reply to this post by Bert Gunter
I don't think the "use replicate" answer is necessarily "wrong".
Making a matrix of all the random numbers you plan on using
and then using apply on it requires that you have
memory for all those random numbers.  Generating
a batch of random numbers, running quantile on them,
then discarding the random numbers (saving only
the output of quantile) will save quite a bit of memory
and lets you increase the number of repetitions more
than the first method allows.  E.g., here are three ways
of doing the same simulation with 1000 reps:

 > set.seed(1) ; t0 <- system.time(z0 <- replicate(10^3, quantile(rnorm(10^4), c(0.25, 0.75))))
 > set.seed(1) ; t1 <- system.time(z1 <- vapply(seq_len(10^3), function(i)quantile(rnorm(10^4), c(0.25, 0.75)), FUN.VALUE=numeric(2)))
 > set.seed(1) ; t2 <- system.time(z2 <- apply(matrix(rnorm(10^4 * 10^3), ncol=10^3), 2, function(col)quantile(col, c(0.25,0.75))))
 > identical(z0,z1)
 [1] TRUE
 > identical(z0,z2)
 [1] TRUE
 > rbind(t0, t1, t2)
    user.self sys.self elapsed user.child sys.child
 t0      2.53     0.01    3.92         NA        NA
 t1      2.33     0.00    3.22         NA        NA
 t2      2.94     0.10    3.85         NA        NA

Their times are not much different.  However, the matrix
approach fails on my 32-bit Windows machine when you ask
for 10000 repetitions:

 > set.seed(1) ; t0 <- system.time(z0 <- replicate(10^4, quantile(rnorm(10^4), c(0.25, 0.75))))
 > set.seed(1) ; t1 <- system.time(z1 <- vapply(seq_len(10^4), function(i)quantile(rnorm(10^4), c(0.25, 0.75)), FUN.VALUE=numeric(2)))
 > set.seed(1) ; t2 <- system.time(z2 <- apply(matrix(rnorm(10^4 * 10^4), ncol=10^4), 2, function(col)quantile(col, c(0.25,0.75))))
 Error: cannot allocate vector of size 762.9 Mb
 In addition: Warning messages:
 1: In matrix(rnorm(10^4 * 10^4), ncol = 10^4) :
   Reached total allocation of 1535Mb: see help(memory.size)
 2: In matrix(rnorm(10^4 * 10^4), ncol = 10^4) :
   Reached total allocation of 1535Mb: see help(memory.size)
 3: In matrix(rnorm(10^4 * 10^4), ncol = 10^4) :
   Reached total allocation of 1535Mb: see help(memory.size)
 4: In matrix(rnorm(10^4 * 10^4), ncol = 10^4) :
   Reached total allocation of 1535Mb: see help(memory.size)
 Timing stopped at: 18.25 0.3 21.8
 > identical(z0,z1)
 [1] TRUE
 > rbind(t0, t1)
    user.self sys.self elapsed user.child sys.child
 t0     23.95     0.02   30.52         NA        NA
 t1     25.90     0.01   30.75         NA        NA

(I like vapply() because it throws an error if your function
returns something other than what you said it would.  It also
gives a reasonable result when the input has length zero.)

This list gets a lot of questions on the most efficient ways
of doing things.  The answer often depends on the size, shape, or
other nature of the data (or machine) you are working on.

I see a lot of requests for loopless solutions.  Here is one
for the above problem, but it takes about 5 times as long to run
as the above three loopy solutions.
  > set.seed(1) ; t3 <- system.time({ tmp <- matrix(rnorm(10^4 * 10^3), ncol=10^3);
                                      tmp[] <- tmp[order(col(tmp), tmp)] ;
                                      z3 <- tmp[round(c(1/4,3/4)*nrow(tmp)),]})

The bottom line is that if you are concerned with efficiency, you
ought learn how to measure it and get used to running tests.  Seeing
how the run time scales with various measures of your data is important.
system.time() is a good tool in base R and the rbenchmark package
makes it easy to compare various approaches.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Bert Gunter
> Sent: Friday, August 05, 2011 9:51 AM
> To: R. Michael Weylandt
> Cc: [hidden email]; UnitRoot
> Subject: Re: [R] Loop: noob question
>
> Michael:
>
> I'm sorry, but this "advice" is wrong. replicate() **IS** essentially
> a loop: it uses sapply(), which is basically an interpreted loop (with
> suitable caveats that R experts can provide).
>
> The correct advice is: whenever possible, move the loops down to
> underlying C code by vectorizing. In the example below, one can partly
> do this by removing the random number generation from the loop and
> structuring the result as a matrix:
>
> rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250
>
> Now one must use apply() -- which is a loop for -- quantile()
>
> out <- apply(rmat,2,quantile, probs = .95)
>
> Unfortunately, in this case, it won't make much difference, as random
> number generation is very fast anyway and the looping for quantile()
> is the same either way. In fact, both versions took almost the same
> time on my computer (replicate() was actually 3 seconds faster --- 48
> vs 51 seconds -- perhaps because sapply() is slightly more efficient
> than apply() ).
>
> As here (I think), one often cannot vectorize and must loop in
> interpreted code, and for this replicate() is fine and yields nice
> clean code. But it's still a loop.
>
> Cheers,
> Bert
>
> On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt
> <[hidden email]> wrote:
> > This is a textbook of when NOT to use a loop in R: rather make a function
> > that does what you want and use the replicate function to do it repeatedly.
> >
> > f <- function(){
> > return(-1000*quantile(rnorm(250,0,0.2),0.95)
> > }
> >
> > x = replicate(1e5,f())
> >
> > There are your desired numbers.
> >
> > Some general coding principles: firstly, you don't need to convert to time
> > series: quantile immediately undoes that so you've just wasted the time
> > doing the coercions each direction. Secondly, the quantiles of c*X are
> > exactly c times the quantiles of X so you can cut down on the
> > multiplications needed by doing the -1000 multiplication after
> > quantilization.
> >
> > Specific to R: don't use loops unless entirely necessary. (And it's hardly
> > ever necessary) -- replicate is great for repeated operations, apply is
> > great for looping "over" thins.
> >
> > More broadly, what do you intend to do with the v95 values? There are
> > probably much more efficient ways to get the desired numbers or even closed
> > form results. I believe this idea is widely studied as VaR in finance.
> >
> > Feel free to write back if we can be of more help.
> >
> > Hope this helps,
> >
> > Michael Weylandt
> >
> > On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <[hidden email]> wrote:
> >
> >> Hi,
> >> Can someone help me out to create a (for?) loop for the following
> >> procedure:
> >>
> >> x=rnorm(250,0,0.02)
> >> library(timeSeries)
> >> x=timeSeries(x)
> >> P=1000
> >> loss=-P*x
> >> loss
> >> v95=quantile(loss,0.95)
> >> v95
> >>
> >> I would like to generate 100 000 v95 values.
> >> Also, can anybody name a source where I could read up basic programming in
> >> R?
> >> Thank you.
> >>
> >> --
> >> View this message in context:
> >> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> "Men by nature long to get on to the ultimate truths, and will often
> be impatient with elementary studies or fight shy of them. If it were
> possible to reach the ultimate truths without the elementary studies
> usually prefixed to them, these would not be preparatory studies but
> superfluous diversions."
>
> -- Maimonides (1135-1204)
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loop: noob question

David Winsemius
In reply to this post by Ken Hutchison

On Aug 5, 2011, at 1:50 PM, Ken H wrote:

> That's a good point Josh is correct,
> Its the R Bible because its the size of the Bible and serves as a  
> very good
> reference.

Some people apparently think so [assuming here that you  are referring  
to Crawley.] My experience is less favorable. When I tried to use it,  
it often leaves me without a workable answer or had misleading advice  
because of its loose  use of terminology. It frequently generates  
confusing  questions to this list because of its reliance on attach().

I thought MASS was a better book to learn R from.  Yes, I know it's  
not supposed to be an introductory book, but it has enough to use it  
as such when used in combination with the help pages. (Every(noob)ody  
does use the help pages, right?)


> I agree that it is definitely not a first blush kind of book. I
> second the regression book, it is excellent.
> Cryer and Chan Time Series Analysis with Applications in R is pretty  
> good if
> you're into that kind of thing...
> Did not know about the data manipulation or graphics books, I'll  
> definitely
> be checking those out.
>
>  Thanks for the info,
>          Ken
>
>
> On Fri, Aug 5, 2011 at 1:16 PM, Joshua Wiley  
> <[hidden email]> wrote:
>
>> On Fri, Aug 5, 2011 at 9:20 AM, Ken H <[hidden email]> wrote:
>> [snip]
>>>  And that should be it, as far as relevant reading
>>> Peter Daalgard's Introductory Statistics with R is very good if  
>>> you do
>> not
>>> know other programming languages.
>>
>> I would strongly second this.  It is a very nice book.  What book to
>> read depends a bit exactly what your goals are---data manipulation?
>> Statistics (and then what kind)?  Programming?  etc.
>>
>> For statistics beyond Peter Dalgaard's book, I like John Fox's  
>> Applied
>> Regression with Companion to Applied Regression (which uses R and is
>> also the 'car' package).
>>
>> I have been pretty happy with Phil Spector's book Data Manipulation  
>> with R.
>>
>> For graphics in R I would suggest ggplot2 by Hadley Wickham or  
>> lattice
>> by Deepayan Sarkar (they are both books and packages).
>>
>> For programming I would look at S Programming by Venables & Ripley.
>>
>>> Crowleys R Book is the Bible as it were, and is very very good.
>> Electronic
>>> copies are available.
>>
>> The R Book is very large, but it has some problems in my opinion.  It
>> uses some styles that are often okay, but can cause problems (e.g.,
>> using attach, using function names for data).  I would turn elsewhere
>> first.  All of the other books I recommended (except Data  
>> Manipulation
>> with R) are written by people who also develop and maintain R Core or
>> substantial R packages (i.e., they are experts in what they are
>> talking about).
>>
>> Cheers,
>>
>> Josh
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, ATS Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.