|
This post was updated on .
Hi,
Reading about a "Heads and Tails" game in Introduction to Probability (Example 1.4, pp. 5-8). You toss a coin 40 times. If heads, Peter wins $1, tails, he loses $1. I think I can do that ok with: winnings <- sum(sample(c(-1,1), 40, replace=TRUE)) But I have to do it 10,000 times and I have to record and collect the winnings. In other languages, I would probably use a for loop and "push" each winnings into some sort of collective array or vector. However, for loops seem to be discouraged in R and it seems the same can be said for "pushing" a calculation onto a vector. So, can someone give me some guidance on how to collect 10,000 winnings? The second part of the game asks us to keep track of how often Peter is in the lead during each game. So, I'll probably use: cum_winnings <- cumsum(sample(c(-1,1), 40, replace=TRUE)) Obviously, he is in the lead at any moment his cumulative winnings are positive. But the game requires that we also do something at the moment the cumulative winnings are zero. (1) if the previous cumulative sum was nonnegative, then the zero counts a "staying in the lead." So, for example, during a single game, Peter might be in the lead for say 34 out of the 40 tosses. I must record the 34 and perform the game 9,999 more times, each time recording the number of times that Peter is in the lead. So again, any thoughts on how to do this without for loops and "pushing?" Thanks for the help. Great list. David Arnold College of the Redwoods Eureka, CA http://msemac.redwoods.edu/~darnold/index.php |
|
On Aug 3, 2012, at 5:55 PM, darnold wrote: > Hi, > > Reading about a "Heads and Tails" game in > http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf > Introduction to Probability (Example 1.4, pp. 5-8). > > You toss a coin 40 times. If heads, Peter wins $1, tails, he loses > $1. I > think I can do that ok with: > > winnings <- sum(sample(c(-1,1), 40, replace=TRUE)) > > But I have to do it 10,000 times and I have to record and collect the > winnings. In other languages, I would probably use a for loop and > "push" > each winnings into some sort of collective array or vector. However, > for > loops seem to be discouraged in R and it seems the same can be said > for > "pushing" a calculation onto a vector. So, can someone give me some > guidance > on how to collect 10,000 winnings? What's wrong with: set.seed(123) # always good to make reproducible winnings <- sum(sample(c(-1,1), 10000, replace=TRUE)) > > The second part of the game asks us to keep track of how often Peter > is in > the lead during each game. Obviously, he is in the lead at any > moment his > cumulative winnings are positive. But the game requires that we also > do > something at the moment the cumulative winnings are zero. ?cumsum ?which > (1) if the > previous cumulative sum was nonnegative, then the zero counts a > "staying in > the lead." So, for example, during a single game, Peter might be in > the lead > for say 34 out of the 40 tosses. I must record the 34 and perform > the game > 9,999 more times, each time recording the number of times that Peter > is in > the lead. ? replicate > So again, any thoughts on how to do this without for loops and > "pushing?" > > Thanks for the help. Great list. > > David Arnold > -- David Winsemius, MD Alameda, CA, USA ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
David,
set.seed(123) # always good to make reproducible winnings <- sum(sample(c(-1,1), 10000, replace=TRUE)) Unfortunately, that's not the game. The game requires 40 flips of a coin. Then you have to play the game 10,000 times. D. |
|
On Aug 3, 2012, at 9:14 PM, darnold <[hidden email]> wrote: > David, > > set.seed(123) # always good to make reproducible > winnings <- sum(sample(c(-1,1), 10000, replace=TRUE)) > > Unfortunately, that's not the game. The game requires 40 flips of a coin. > > Then you have to play the game 10,000 times. > colSums(matrix(sample(c(-1, 1), 40*10000, TRUE), ncol = 10000)) or some such Michael > D. > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Head-or-Tails-game-tp4639142p4639145.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Arnold
HI,
You could try this: set.seed(112) list1<-vector("list",1000) for(i in 1:1000){ list1[[i]]<-sample(c(-1,1),40,replace=TRUE)} dat1<-do.call(rbind,lapply(list1,function(x) sum(x))) dat2<-matrix(dat1,ncol=20,byrow=TRUE) head(dat2) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] #[1,] -2 12 0 8 0 4 -8 2 -8 -4 -14 8 -4 -12 #[2,] 8 4 -4 4 12 -2 -6 0 -8 8 -8 12 2 2 #[3,] -2 8 2 -6 -12 -6 8 6 -4 4 4 8 6 10 #[4,] 6 6 0 4 10 -8 -4 -2 4 -10 -6 -2 4 2 #[5,] -6 -4 -6 -6 4 0 -14 -2 0 -8 -6 2 4 -8 #[6,] 0 10 0 0 -14 -2 2 0 6 -10 4 0 -4 4 # [,15] [,16] [,17] [,18] [,19] [,20] #[1,] 4 12 -6 4 10 8 #[2,] -6 0 4 8 0 12 #[3,] -2 -4 10 -10 6 -10 #[4,] 2 6 4 6 8 6 #[5,] 10 0 -6 -2 10 -8 #[6,] 0 -12 16 -2 2 4 #system time for 10,000 times system.time({ set.seed(112) list1<-vector("list",10000) for(i in 1:10000){ list1[[i]]<-sample(c(-1,1),40,replace=TRUE)} dat1<-do.call(rbind,lapply(list1,function(x) sum(x))) dat2<-matrix(dat1,ncol=200,byrow=TRUE) }) user system elapsed 0.112 0.000 0.111 A.K. ----- Original Message ----- From: darnold <[hidden email]> To: [hidden email] Cc: Sent: Friday, August 3, 2012 10:14 PM Subject: Re: [R] Head or Tails game David, set.seed(123) # always good to make reproducible winnings <- sum(sample(c(-1,1), 10000, replace=TRUE)) Unfortunately, that's not the game. The game requires 40 flips of a coin. Then you have to play the game 10,000 times. D. -- View this message in context: http://r.789695.n4.nabble.com/Head-or-Tails-game-tp4639142p4639145.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Arnold
Use a loop to run over the 10,000 reps and in each rep use
rle(x>=0) to look at the runs of nonnegative numbers, postprocessing its output a bit to count them. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf > Of darnold > Sent: Friday, August 03, 2012 7:14 PM > To: [hidden email] > Subject: Re: [R] Head or Tails game > > David, > > set.seed(123) # always good to make reproducible > winnings <- sum(sample(c(-1,1), 10000, replace=TRUE)) > > Unfortunately, that's not the game. The game requires 40 flips of a coin. > > Then you have to play the game 10,000 times. > > D. > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Head-or-Tails-game- > tp4639142p4639145.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Arnold
Two possible solutions below.
On Fri, 3 Aug 2012, darnold wrote: > Hi, > > Reading about a "Heads and Tails" game in > http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf > Introduction to Probability (Example 1.4, pp. 5-8). > > You toss a coin 40 times. If heads, Peter wins $1, tails, he loses $1. I > think I can do that ok with: > > winnings <- sum(sample(c(-1,1), 40, replace=TRUE)) > > But I have to do it 10,000 times and I have to record and collect the > winnings. In other languages, I would probably use a for loop and "push" > each winnings into some sort of collective array or vector. However, for > loops seem to be discouraged in R and it seems the same can be said for > "pushing" a calculation onto a vector. So, can someone give me some guidance > on how to collect 10,000 winnings? > > The second part of the game asks us to keep track of how often Peter is in > the lead during each game. Obviously, he is in the lead at any moment his > cumulative winnings are positive. But the game requires that we also do > something at the moment the cumulative winnings are zero. (1) if the > previous cumulative sum was nonnegative, then the zero counts a "staying in > the lead." So, for example, during a single game, Peter might be in the lead > for say 34 out of the 40 tosses. I must record the 34 and perform the game > 9,999 more times, each time recording the number of times that Peter is in > the lead. So again, any thoughts on how to do this without for loops and > "pushing?" In general, vectorizing means doing a bunch of calculations in each step. Keep in mind that apply and friends are basically convenience notation for loops (they are convenient, but are best applied after a bunch of vectorization has already been applied). In some cases multiple levels of loops can be vectorized together. I was only able to manage one level, but choosing the RIGHT level to vectorize is important... get as many calculations done at each step of code as possible to get the fastest speedup. The ideas: 1) Have "sample" do as many coin flips at once as possible. Fold the results into a matrix to make it look like multiple sets of results. 2) Use vector logical tests to perform the game logic. Use vector indexing if appropriate to look at "previous" results. 3) Use automatic conversion of logical results to integer for accumulating counts of successes. 4) Consider using cumsum or similar vectorized operations. 5) Since there are more games than moves in each game, consider simulating each step of all games vectorially to minimize the number of looping steps. The code: #------------------------------------ # Head or Tails game # Approach by vectorizing each game evaluation and looping over games loopbygame <- function( results ) { gameleadcount <- function(v) { pot <- cumsum( v ) sum( pot > 0 | c( FALSE, 0==pot[ -1 ] & 0 != pot[ -nrow( results ) ] ) ) } apply(results,2,gameleadcount) } # Approach by vectorizing one step from all games and looping over steps loopbymove <- function( results ) { pot <- rep( 0, ncol( results ) ) leadcount <- rep( 0, ncol( results ) ) for ( moveno in 1:nrow( results ) ) { oldpot <- pot pot <- pot + results[ moveno, ] leadcount <- leadcount + ( pot > 0 | 0 == pot & 0 != oldpot ) } leadcount } gamelen <- 40 gamecount <- 10000 set.seed( 123 ) results <- matrix( sample( c( -1, 1 ), gamelen*gamecount, replace=TRUE ) , ncol=gamecount ) system.time( ans1 <- loopbymove( results ) ) system.time( ans2 <-loopbygame( results ) ) all.equal( ans1, ans2 ) My results: > system.time( ans1 <- loopbymove( results ) ) user system elapsed 0.040 0.000 0.037 > system.time( ans2 <- loopbygame( results ) ) user system elapsed 0.224 0.000 0.219 > all.equal( ans1, ans2 ) [1] TRUE > Thanks for the help. Great list. You are welcome. Note that on this "great list" normally requests for "ideas" on optimization without a concrete, reproducible code example of some type are groused at for not showing sufficient research on your part and for essentially asking us to do your work for you. In this case I found the problem interesting enough to respond in spite of your lack of reproducible (even if inefficient) code. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Arnold
Wow! Some great responses!
I am getting some great responses. I've only read David, Michael, and Dennis thus far, leading me to develop this result before reading further. lead <- function(x) { n <- length(x) count <- 0 if (x[1] >= 0) count <- count + 1 for (i in 2:n) { if (x[i] > 0 || (x[i] == 0 && x[i-1] >= 0 )) { count <- count + 1 } } count } games <- replicate(10000,sample(c(-1,1),40,replace=TRUE)) games_sum <- apply(games,2,sum) plot(table(games_sum)) games_lead <- apply(games,2,cumsum) games_lead <- apply(games_lead,2,lead) plot(table(games_lead)) Now I am going to read Arun, William, and Jeff's responses and see what other ideas are being proposed. Thanks everyone. D. |
|
In reply to this post by Michael Weylandt
Hi, system.time({ set.seed(111) colSums(matrix(sample(c(-1, 1), 40*10000, TRUE), ncol = 10000)) }) user system elapsed 0.032 0.012 0.041 system.time({ set.seed(112) list1<-vector("list",10000) for(i in 1:10000){ list1[[i]]<-sample(c(-1,1),40,replace=TRUE)} dat1<-do.call(rbind,lapply(list1,function(x) sum(x))) dat2<-matrix(dat1,ncol=200,byrow=TRUE) }) user system elapsed 0.112 0.000 0.111 #modified version system.time({ set.seed(112) list1<-vector("list",10000) for(i in 1:10000){ list1[[i]]<-sample(c(-1,1),40,replace=TRUE)} dat1<-unlist(lapply(list1,function(x) sum(x))) }) user system elapsed 0.092 0.000 0.092 It seems like Michael's solution is better in terms of the CPU utilization. I guess, the loop created the difference. A.K. ----- Original Message ----- From: Michael Weylandt <[hidden email]> To: darnold <[hidden email]> Cc: "[hidden email]" <[hidden email]> Sent: Friday, August 3, 2012 10:20 PM Subject: Re: [R] Head or Tails game On Aug 3, 2012, at 9:14 PM, darnold <[hidden email]> wrote: > David, > > set.seed(123) # always good to make reproducible > winnings <- sum(sample(c(-1,1), 10000, replace=TRUE)) > > Unfortunately, that's not the game. The game requires 40 flips of a coin. > > Then you have to play the game 10,000 times. > colSums(matrix(sample(c(-1, 1), 40*10000, TRUE), ncol = 10000)) or some such Michael > D. > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Head-or-Tails-game-tp4639142p4639145.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Arnold
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r- > project.org] On Behalf Of darnold > Sent: Friday, August 03, 2012 9:18 PM > To: [hidden email] > Subject: Re: [R] Head or Tails game > > Wow! Some great responses! > > I am getting some great responses. I've only read David, Michael, and > Dennis > thus far, leading me to develop this result before reading further. > > lead <- function(x) { > n <- length(x) > count <- 0 > if (x[1] >= 0) count <- count + 1 > for (i in 2:n) { > if (x[i] > 0 || (x[i] == 0 && x[i-1] >= 0 )) { > count <- count + 1 > } > } > count > } > > games <- replicate(10000,sample(c(-1,1),40,replace=TRUE)) > > games_sum <- apply(games,2,sum) > plot(table(games_sum)) > > games_lead <- apply(games,2,cumsum) > games_lead <- apply(games_lead,2,lead) > plot(table(games_lead)) > > Now I am going to read Arun, William, and Jeff's responses and see what > other ideas are being proposed. > > Thanks everyone. > > D. > Here is another solution that doesn't need to define an additional function with an explicit loop. It seems to be considerably faster than the approach presented above. system.time({ set.seed(123) games <- matrix(sample(c(-1, 1), 40*10000, TRUE), ncol = 10000) games_sum <- apply(games,2,cumsum) games_lead <- colSums((games_sum > 0) | (games_sum==0 & games==-1)) }) user system elapsed 0.08 0.00 0.08 plot(table(games_sum[40,])) plot(table(games_lead)) Compare this with your solution system.time({ set.seed(123) games <- replicate(10000,sample(c(-1,1),40,replace=TRUE)) games_sum <- apply(games,2,sum) games_lead <- apply(games,2,cumsum) games_lead <- apply(games_lead,2,lead) }) user system elapsed 0.95 0.02 0.98 Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
