

This post was updated on .
Dear R users,
I need to run 1000 simulations to find maximum likelihood estimates. I print my output as a vector. However, it is taking too long. I am running 50 simulations at a time and it is taking me 30 minutes. Once I tried to run 200 simulations at once, after 2 hours I stopped it and saw that only about 40 of them are simulated in those 2 hours. Is there any way to make my simulations faster?. Thank you in advance.
Here is a part of my code. My data set is a clustered data set with 2 subjects in each group (100 groups). I generated 1000 of these and put them in a matrix. So, each column represents one data set.
output1<vector("numeric",length(1:r))
output2<vector("numeric",length(1:r))
output3<vector("numeric",length(1:r))
output4<vector("numeric",length(1:r))
for (m in 1:r){
ll<function(p){
cumhaz<(time*exp(Zi*p[3]))*p[1]
cumhaz<aggregate(cumhaz,by=list(i),FUN=sum)
lnhaz<delta*log(exp(Zi*p[3])*p[1])
lnhaz<aggregate(lnhaz,by=list(i),FUN=sum)
lnhaz<lnhaz[2:(r+1)]
cumhaz<cumhaz[2:(r+1)]
lik<r[m]*log(p[2])
sum((di[,m]+1/p[2])*log(1+cumhaz[,m]*p[2]))+sum(lnhaz[,m])
n*log(gamma(1/p[2]))+sum(log(gamma(di[,m]+1/p[2])))
lik
}
initial<c(1,0.5,0.5)
estt<nlm(ll,initial,hessian=T)
lambda<estt$estimate[1]
theta<estt$estimate[2]
beta<estt$estimate[3]
hessian<t$hessian
cov<solve(hessian)
vtheta<cov[2,2]
vbeta<cov[3,3]
output1[m]<theta
output2[m]<beta
output3[m]<vtheta
output4[m]<vbeta
}
theta<as.vector(output1)
beta<as.vector(output2)
vtheta<as.vector(output3)
vbeta<as.vector(output4)


On Fri, Oct 26, 2012 at 4:23 AM, stats12 < [hidden email]> wrote:
> Dear R users,
>
> I need to run 1000 simulations to find maximum likelihood estimates. I
> print my output as a vector. However, it is taking too long. I am running 50
> simulations at a time and it is taking me 30 minutes. Once I tried to run
> 200 simulations at once, after 2 hours I stopped it and saw that only about
> 40 of them are simulated in those 2 hours. Is there any way to make my
> simulations faster? (I can post my code if needed, I'm just looking for
> general ideas here). Thank you in advance.
>
Code would be nice: I struggle to think of an basic MLE fitting that
would take ~36s per iteration and scale so badly if you are writing
idiomatic R:
http://stackoverflow.com/questions/5963269/howtomakeagreatrreproducibleexampleFinally, I note you're posting from Nabble. Please include context in
your reply  I don't believe Nabble does this automatically, so
you'll need to manually include it. Most of the regular respondents on
this list don't use Nabble  it is a _mailing list_ after all  so
we don't get the forum view you do, only emails of the individual
posts. Combine that with the high volume of posts, and it's quite
difficult to trace a discussion if we all don't make sure to include
context.
RMW
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


use Rprof to profile your code to determine where time is being spent. This might tell you where to concentrate your effort.
Sent from my iPad
On Oct 25, 2012, at 23:23, stats12 < [hidden email]> wrote:
> Dear R users,
>
> I need to run 1000 simulations to find maximum likelihood estimates. I
> print my output as a vector. However, it is taking too long. I am running 50
> simulations at a time and it is taking me 30 minutes. Once I tried to run
> 200 simulations at once, after 2 hours I stopped it and saw that only about
> 40 of them are simulated in those 2 hours. Is there any way to make my
> simulations faster? (I can post my code if needed, I'm just looking for
> general ideas here). Thank you in advance.
>
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/howtomakesimulationfastertp4647492.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi,
Thank you for your reply. I updated my post with the code. Also, about posting from Nabble, since I am a new user I didn't know about that problem. If I post to the mailing list ( rhelp@rproject.org), would it get rid of that problem?
output1<vector("numeric",length(1:r))
output2<vector("numeric",length(1:r))
output3<vector("numeric",length(1:r))
output4<vector("numeric",length(1:r))
for (m in 1:r){
ll<function(p){
cumhaz<(time*exp(Zi*p[3]))*p[1]
cumhaz<aggregate(cumhaz,by=list(i),FUN=sum)
lnhaz<delta*log(exp(Zi*p[3])*p[1])
lnhaz<aggregate(lnhaz,by=list(i),FUN=sum)
lnhaz<lnhaz[2:(r+1)]
cumhaz<cumhaz[2:(r+1)]
lik<r[m]*log(p[2])
sum((di[,m]+1/p[2])*log(1+cumhaz[,m]*p[2]))+sum(lnhaz[,m])
n*log(gamma(1/p[2]))+sum(log(gamma(di[,m]+1/p[2])))
lik
}
initial<c(1,0.5,0.5)
estt<nlm(ll,initial,hessian=T)
lambda<estt$estimate[1]
theta<estt$estimate[2]
beta<estt$estimate[3]
hessian<t$hessian
cov<solve(hessian)
vtheta<cov[2,2]
vbeta<cov[3,3]
output1[m]<theta
output2[m]<beta
output3[m]<vtheta
output4[m]<vbeta
}
theta<as.vector(output1)
beta<as.vector(output2)
vtheta<as.vector(output3)
vbeta<as.vector(output4)
Code would be nice: I struggle to think of an basic MLE fitting that
would take ~36s per iteration and scale so badly if you are writing
idiomatic R:
http://stackoverflow.com/questions/5963269/howtomakeagreatrreproducibleexampleFinally, I note you're posting from Nabble. Please include context in
your reply  I don't believe Nabble does this automatically, so
you'll need to manually include it. Most of the regular respondents on
this list don't use Nabble  it is a _mailing list_ after all  so
we don't get the forum view you do, only emails of the individual
posts. Combine that with the high volume of posts, and it's quite
difficult to trace a discussion if we all don't make sure to include
context.


Thank you. I tried Rprof and looks like aggregate function I am using is one of the functions that takes most of the time. What is the difference between self time and total time?
$by.total
total.time total.pct self.time self.pct
f 925.92 99.98 5.16 0.56
<Anonymous> 925.92 99.98 0.06 0.01
nlm 925.92 99.98 0.00 0.00
aggregate 885.66 95.64 0.28 0.03
aggregate.default 885.38 95.61 0.02 0.00
aggregate.data.frame 885.34 95.60 12.54 1.35
lapply 817.06 88.23 183.76 19.84
FUN 814.48 87.95 53.50 5.78
split 425.06 45.90 4.50 0.49
split.default 420.56 45.41 14.16 1.53
factor 408.84 44.15 347.12 37.48
unique 235.32 25.41 19.38 2.09
sapply 202.60 21.88 1.70 0.18
unlist 181.40 19.59 132.28 14.28
simplify2array 148.70 16.06 1.10 0.12
[[< 41.24 4.45 1.62 0.17
[[<.data.frame 39.62 4.28 31.42 3.39
^ 26.18 2.83 26.18 2.83
jholtman wrote
use Rprof to profile your code to determine where time is being spent. This might tell you where to concentrate your effort.
Sent from my iPad
On Oct 25, 2012, at 23:23, stats12 < [hidden email]> wrote:
> Dear R users,
>
> I need to run 1000 simulations to find maximum likelihood estimates. I
> print my output as a vector. However, it is taking too long. I am running 50
> simulations at a time and it is taking me 30 minutes. Once I tried to run
> 200 simulations at once, after 2 hours I stopped it and saw that only about
> 40 of them are simulated in those 2 hours. Is there any way to make my
> simulations faster? (I can post my code if needed, I'm just looking for
> general ideas here). Thank you in advance.
>
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/howtomakesimulationfastertp4647492.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The code you posted was not runnable. 'r' and at least 'Zi' were missing.
The 'total time' is the amount of the elapsed time that it was
sampling with the given function. The "self time" is how much time
was actually spent in that function.
>From your data, one of the big hitter is the "factor" function. This
is probably within the "aggregate" function since this is where most
of the "total time" is being spent. You may want to look at what you
are trying to do with the "aggregate" and see if there is another way
of doing it.
You seem to have an object "i" being used in the aggregate that does
not seem to be defined.
There are probably other ways of speeding it up. Here is one that
compares 'aggregate' to 'sapply' to do the same thing:
> x < as.numeric(1:1e6)
> z < sample(100, 1e6, TRUE)
> system.time(r1 < aggregate(x, list(z), sum))
user system elapsed
3.62 0.05 3.67
> system.time(r2 < sapply(split(x, z), sum))
user system elapsed
0.56 0.00 0.56
So this is about a 6X increase in performance.
On Fri, Oct 26, 2012 at 9:08 AM, stats12 < [hidden email]> wrote:
> Hi,
>
> Thank you for your reply. I updated my post with the code. Also, about
> posting from Nabble, since I am a new user I didn't know about that problem.
> If I post to the mailing list ( [hidden email]), would it get rid of
> that problem?
>
>
> output1<vector("numeric",length(1:r))
> output2<vector("numeric",length(1:r))
> output3<vector("numeric",length(1:r))
> output4<vector("numeric",length(1:r))
>
> for (m in 1:r){
>
> ll<function(p){
> cumhaz<(time*exp(Zi*p[3]))*p[1]
> cumhaz<aggregate(cumhaz,by=list(i),FUN=sum)
> lnhaz<delta*log(exp(Zi*p[3])*p[1])
> lnhaz<aggregate(lnhaz,by=list(i),FUN=sum)
> lnhaz<lnhaz[2:(r+1)]
> cumhaz<cumhaz[2:(r+1)]
> lik<r[m]*log(p[2])
> sum((di[,m]+1/p[2])*log(1+cumhaz[,m]*p[2]))+sum(lnhaz[,m])
> n*log(gamma(1/p[2]))+sum(log(gamma(di[,m]+1/p[2])))
> lik
> }
>
> initial<c(1,0.5,0.5)
> estt<nlm(ll,initial,hessian=T)
> lambda<estt$estimate[1]
> theta<estt$estimate[2]
> beta<estt$estimate[3]
>
> hessian<t$hessian
> cov<solve(hessian)
> vtheta<cov[2,2]
> vbeta<cov[3,3]
>
> output1[m]<theta
> output2[m]<beta
> output3[m]<vtheta
> output4[m]<vbeta
> }
>
> theta<as.vector(output1)
> beta<as.vector(output2)
> vtheta<as.vector(output3)
> vbeta<as.vector(output4)
>
>
>
>
> Code would be nice: I struggle to think of an basic MLE fitting that
> would take ~36s per iteration and scale so badly if you are writing
> idiomatic R:
>
> http://stackoverflow.com/questions/5963269/howtomakeagreatrreproducibleexample>
> Finally, I note you're posting from Nabble. Please include context in
> your reply  I don't believe Nabble does this automatically, so
> you'll need to manually include it. Most of the regular respondents on
> this list don't use Nabble  it is a _mailing list_ after all  so
> we don't get the forum view you do, only emails of the individual
> posts. Combine that with the high volume of posts, and it's quite
> difficult to trace a discussion if we all don't make sure to include
> context.
>
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/howtomakesimulationfastertp4647492p4647541.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


You can get even better improvement using the 'data.table' package:
> require(data.table)
> system.time({
+ dt < data.table(value = x, z = z)
+ r3 < dt[
+ , list(sum = sum(value))
+ , keyby = z
+ ]
+ })
user system elapsed
0.14 0.00 0.14
On Thu, Oct 25, 2012 at 11:23 PM, stats12 < [hidden email]> wrote:
> Dear R users,
>
> I need to run 1000 simulations to find maximum likelihood estimates. I
> print my output as a vector. However, it is taking too long. I am running 50
> simulations at a time and it is taking me 30 minutes. Once I tried to run
> 200 simulations at once, after 2 hours I stopped it and saw that only about
> 40 of them are simulated in those 2 hours. Is there any way to make my
> simulations faster? (I can post my code if needed, I'm just looking for
> general ideas here). Thank you in advance.
>
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/howtomakesimulationfastertp4647492.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thank you very much for saving my time. I ran 500 simulations in 20 min using "sapply" function. I'll try "data.table" method for the rest of my simulations to get the results even faster. Thanks a lot again!
jholtman wrote
You can get even better improvement using the 'data.table' package:
> require(data.table)
> system.time({
+ dt < data.table(value = x, z = z)
+ r3 < dt[
+ , list(sum = sum(value))
+ , keyby = z
+ ]
+ })
user system elapsed
0.14 0.00 0.14
On Thu, Oct 25, 2012 at 11:23 PM, stats12 < [hidden email]> wrote:
> Dear R users,
>
> I need to run 1000 simulations to find maximum likelihood estimates. I
> print my output as a vector. However, it is taking too long. I am running 50
> simulations at a time and it is taking me 30 minutes. Once I tried to run
> 200 simulations at once, after 2 hours I stopped it and saw that only about
> 40 of them are simulated in those 2 hours. Is there any way to make my
> simulations faster? (I can post my code if needed, I'm just looking for
> general ideas here). Thank you in advance.
>
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/howtomakesimulationfastertp4647492.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

