Has For bucle be impooved in R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Has For bucle be impooved in R

jpara3
Hi!

I am doing a lapply and for comparaison and I get that for is faster than lapply.


What I have done:



n<-100000
set.seed(123)
x<-rnorm(n)
y<-x+rnorm(n)
rand.data<-data.frame(x,y)
k<-100
samples<-split(sample(1:n),rep(1:k,length=n))

res<-list()
t<-Sys.time()
for(i in 1:100){
  modelo<-lm(y~x,rand.data[-samples[[i]]])
  prediccion<-predict(modelo,rand.data[samples[[i]],])
  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])

}
print(Sys.time()-t)

Which takes 8.042 seconds

and using Lapply

cv.fold.fun <- function(index){
   fit <- lm(y~x, data = rand.data[-samples[[index]],])
   pred <- predict(fit, newdata = rand.data[samples[[index]],])
   return((pred - rand.data$y[samples[[index]]])^2)
  }


t<-Sys.time()

nuevo<-lapply(seq(along = samples),cv.fold.fun)
print(Sys.time()-t)


Which takes 9.56 seconds.

So... has been improved the FOR loop on R???

Thanks!





        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Guided Tours Basque Country Guided tours in the three capitals of the Basque Country: Bilbao, Vitoria-Gasteiz and San Sebastian, as well as in their provinces. Available languages. Travel planners for groups and design of tourist routes across the Basque Country.
Reply | Threaded
Open this post in threaded view
|

Re: Has For bucle be impooved in R

Jeff Newmiller
The lapply loop and the for loop have very similar speed characteristics. Differences seen are almost always due to how you use memory in the body of the loop. This fact is not new. You may be under the incorrect assumption that using lapply is somehow equivalent to "vectorization", which it is not.
--
Sent from my phone. Please excuse my brevity.

On August 7, 2017 7:29:58 AM PDT, "Jesús Para Fernández" <[hidden email]> wrote:

>Hi!
>
>I am doing a lapply and for comparaison and I get that for is faster
>than lapply.
>
>
>What I have done:
>
>
>
>n<-100000
>set.seed(123)
>x<-rnorm(n)
>y<-x+rnorm(n)
>rand.data<-data.frame(x,y)
>k<-100
>samples<-split(sample(1:n),rep(1:k,length=n))
>
>res<-list()
>t<-Sys.time()
>for(i in 1:100){
>  modelo<-lm(y~x,rand.data[-samples[[i]]])
>  prediccion<-predict(modelo,rand.data[samples[[i]],])
>  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])
>
>}
>print(Sys.time()-t)
>
>Which takes 8.042 seconds
>
>and using Lapply
>
>cv.fold.fun <- function(index){
>   fit <- lm(y~x, data = rand.data[-samples[[index]],])
>   pred <- predict(fit, newdata = rand.data[samples[[index]],])
>   return((pred - rand.data$y[samples[[index]]])^2)
>  }
>
>
>t<-Sys.time()
>
>nuevo<-lapply(seq(along = samples),cv.fold.fun)
>print(Sys.time()-t)
>
>
>Which takes 9.56 seconds.
>
>So... has been improved the FOR loop on R???
>
>Thanks!
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Has For bucle be impooved in R

David Carlson
In reply to this post by jpara3
A Google search on "lapply vs for r" or "lapply vs loop r" might have saved you some trouble. Many people have debunked this myth. Strangely they all start out with "everyone knows" or "it is commonly said that." I'm sure someone must have said it, but no one seems to be able to provide an authoritative citation before proceeding to demonstrate that it is false.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Jesús Para Fernández
Sent: Monday, August 7, 2017 9:30 AM
To: [hidden email]
Subject: [R] Has For bucle be impooved in R

Hi!

I am doing a lapply and for comparaison and I get that for is faster than lapply.


What I have done:



n<-100000
set.seed(123)
x<-rnorm(n)
y<-x+rnorm(n)
rand.data<-data.frame(x,y)
k<-100
samples<-split(sample(1:n),rep(1:k,length=n))

res<-list()
t<-Sys.time()
for(i in 1:100){
  modelo<-lm(y~x,rand.data[-samples[[i]]])
  prediccion<-predict(modelo,rand.data[samples[[i]],])
  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])

}
print(Sys.time()-t)

Which takes 8.042 seconds

and using Lapply

cv.fold.fun <- function(index){
   fit <- lm(y~x, data = rand.data[-samples[[index]],])
   pred <- predict(fit, newdata = rand.data[samples[[index]],])
   return((pred - rand.data$y[samples[[index]]])^2)
  }


t<-Sys.time()

nuevo<-lapply(seq(along = samples),cv.fold.fun)
print(Sys.time()-t)


Which takes 9.56 seconds.

So... has been improved the FOR loop on R???

Thanks!





        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Has For bucle be impooved in R

Thierry Onkelinx
In reply to this post by Jeff Newmiller
Dear Jesus,

The difference is marginal when each code chunk does the same things. Your
for loop does not yields the same output as the lapply. Here is the cleaned
version of your code.

n<-10000
set.seed(123)
x<-rnorm(n)
y<-x+rnorm(n)
rand.data<-data.frame(x,y)
k<-100
samples <- split(sample(n), rep(seq_len(k),length=n))

library(microbenchmark)
microbenchmark(
  "for" = {
    res <- vector("list", length(samples))
    for(index in seq_along(samples)) {
      fit <- lm(y~x, data = rand.data[-samples[[index]],])
      pred <- predict(fit, newdata = rand.data[samples[[index]],])
      res[[i]] <- ((pred - rand.data$y[samples[[index]]])^2)
    }
  },
  lapply = {
    cv.fold.fun <- function(index){
      fit <- lm(y~x, data = rand.data[-samples[[index]],])
      pred <- predict(fit, newdata = rand.data[samples[[index]],])
      return((pred - rand.data$y[samples[[index]]])^2)
    }
    lapply(seq_along(samples), cv.fold.fun)
  }
)

Unit: milliseconds
   expr      min       lq     mean   median       uq      max neval cld
    for 866.4196 897.3137 949.8155 926.1918 946.8390 1767.463   100   a
 lapply 837.7804 889.6620 947.2401 909.9946 939.6379 2476.415   100   a

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2017-08-07 16:48 GMT+02:00 Jeff Newmiller <[hidden email]>:

> The lapply loop and the for loop have very similar speed characteristics.
> Differences seen are almost always due to how you use memory in the body of
> the loop. This fact is not new. You may be under the incorrect assumption
> that using lapply is somehow equivalent to "vectorization", which it is not.
> --
> Sent from my phone. Please excuse my brevity.
>
> On August 7, 2017 7:29:58 AM PDT, "Jesús Para Fernández" <
> [hidden email]> wrote:
> >Hi!
> >
> >I am doing a lapply and for comparaison and I get that for is faster
> >than lapply.
> >
> >
> >What I have done:
> >
> >
> >
> >n<-100000
> >set.seed(123)
> >x<-rnorm(n)
> >y<-x+rnorm(n)
> >rand.data<-data.frame(x,y)
> >k<-100
> >samples<-split(sample(1:n),rep(1:k,length=n))
> >
> >res<-list()
> >t<-Sys.time()
> >for(i in 1:100){
> >  modelo<-lm(y~x,rand.data[-samples[[i]]])
> >  prediccion<-predict(modelo,rand.data[samples[[i]],])
> >  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])
> >
> >}
> >print(Sys.time()-t)
> >
> >Which takes 8.042 seconds
> >
> >and using Lapply
> >
> >cv.fold.fun <- function(index){
> >   fit <- lm(y~x, data = rand.data[-samples[[index]],])
> >   pred <- predict(fit, newdata = rand.data[samples[[index]],])
> >   return((pred - rand.data$y[samples[[index]]])^2)
> >  }
> >
> >
> >t<-Sys.time()
> >
> >nuevo<-lapply(seq(along = samples),cv.fold.fun)
> >print(Sys.time()-t)
> >
> >
> >Which takes 9.56 seconds.
> >
> >So... has been improved the FOR loop on R???
> >
> >Thanks!
> >
> >
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Has For bucle be impooved in R

jholtman
If you run it under the profiler in RStudio, you will see that the 'lm'
call is taking about 2 seconds longer in the function which might have to
do with resolving the reference.  So it is probably the function call in
'lapply' vs. the in-line statement in the 'for' loop that account for the
differences.  I have attached the output of the profiler.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Aug 7, 2017 at 10:57 AM, Thierry Onkelinx <[hidden email]>
wrote:

> Dear Jesus,
>
> The difference is marginal when each code chunk does the same things. Your
> for loop does not yields the same output as the lapply. Here is the cleaned
> version of your code.
>
> n<-10000
> set.seed(123)
> x<-rnorm(n)
> y<-x+rnorm(n)
> rand.data<-data.frame(x,y)
> k<-100
> samples <- split(sample(n), rep(seq_len(k),length=n))
>
> library(microbenchmark)
> microbenchmark(
>   "for" = {
>     res <- vector("list", length(samples))
>     for(index in seq_along(samples)) {
>       fit <- lm(y~x, data = rand.data[-samples[[index]],])
>       pred <- predict(fit, newdata = rand.data[samples[[index]],])
>       res[[i]] <- ((pred - rand.data$y[samples[[index]]])^2)
>     }
>   },
>   lapply = {
>     cv.fold.fun <- function(index){
>       fit <- lm(y~x, data = rand.data[-samples[[index]],])
>       pred <- predict(fit, newdata = rand.data[samples[[index]],])
>       return((pred - rand.data$y[samples[[index]]])^2)
>     }
>     lapply(seq_along(samples), cv.fold.fun)
>   }
> )
>
> Unit: milliseconds
>    expr      min       lq     mean   median       uq      max neval cld
>     for 866.4196 897.3137 949.8155 926.1918 946.8390 1767.463   100   a
>  lapply 837.7804 889.6620 947.2401 909.9946 939.6379 2476.415   100   a
>
> Best regards,
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2017-08-07 16:48 GMT+02:00 Jeff Newmiller <[hidden email]>:
>
> > The lapply loop and the for loop have very similar speed characteristics.
> > Differences seen are almost always due to how you use memory in the body
> of
> > the loop. This fact is not new. You may be under the incorrect assumption
> > that using lapply is somehow equivalent to "vectorization", which it is
> not.
> > --
> > Sent from my phone. Please excuse my brevity.
> >
> > On August 7, 2017 7:29:58 AM PDT, "Jesús Para Fernández" <
> > [hidden email]> wrote:
> > >Hi!
> > >
> > >I am doing a lapply and for comparaison and I get that for is faster
> > >than lapply.
> > >
> > >
> > >What I have done:
> > >
> > >
> > >
> > >n<-100000
> > >set.seed(123)
> > >x<-rnorm(n)
> > >y<-x+rnorm(n)
> > >rand.data<-data.frame(x,y)
> > >k<-100
> > >samples<-split(sample(1:n),rep(1:k,length=n))
> > >
> > >res<-list()
> > >t<-Sys.time()
> > >for(i in 1:100){
> > >  modelo<-lm(y~x,rand.data[-samples[[i]]])
> > >  prediccion<-predict(modelo,rand.data[samples[[i]],])
> > >  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])
> > >
> > >}
> > >print(Sys.time()-t)
> > >
> > >Which takes 8.042 seconds
> > >
> > >and using Lapply
> > >
> > >cv.fold.fun <- function(index){
> > >   fit <- lm(y~x, data = rand.data[-samples[[index]],])
> > >   pred <- predict(fit, newdata = rand.data[samples[[index]],])
> > >   return((pred - rand.data$y[samples[[index]]])^2)
> > >  }
> > >
> > >
> > >t<-Sys.time()
> > >
> > >nuevo<-lapply(seq(along = samples),cv.fold.fun)
> > >print(Sys.time()-t)
> > >
> > >
> > >Which takes 9.56 seconds.
> > >
> > >So... has been improved the FOR loop on R???
> > >
> > >Thanks!
> > >
> > >
> > >
> > >
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > >______________________________________________
> > >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

profile.png (28K) Download Attachment