R code for if-then-do code blocks

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

R code for if-then-do code blocks

R help mailing list-2
Hello All,

Season's greetings!

 Am trying to replicate some SAS code in R. The SAS code uses if-then-do code blocks. I've been trying to do likewise in R as that seems to be the most reliable way to get the same result.

Below is some toy data and some code that does work. There are some things I don't necessarily like about the code though. So I was hoping some people could help make it better. One thing I don't like is that the within function reverses the order of the computed columns such that test1:test5 becomes test5:test1. I've used a mutate to overcome that but would prefer not to have to do so.

 Another, perhaps very small thing, is the need to calculate an ID variable that becomes the basis for a grouping. 

I did considerable Internet searching for R code that conditionally computes blocks of code. I didn't find much though and so am wondering if my search terms were not sufficient or if there is some other reason. It occurred to me that maybe if-then-do code blocks like we often see in SAS as are frowned upon and therefore not much implemented.

I'd be interested in seeing more R-compatible approaches if this is the case. I've learned that it's a mistake to try and make R be like SAS. It's better to let R be R. Trouble is I'm not always sure how to do that.

Thanks,

Paul


d1 <- data.frame(workshop=rep(1:2,4),
                gender=rep(c("f","m"),each=4))

library(tibble)
library(plyr)

d2 <- d1 %>%
  rownames_to_column("ID") %>%
  mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
  ddply("ID",
        within,
        if (gender == "f" & workshop == 1) {
          test1 <- 1
          test1 <- 6 + test1
          test2 <- 2 + test1
          test4 <- 1
          test5 <- 1
        } else {
          test1 <- test2 <- test4 <- test5 <- 0
        })

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R code for if-then-do code blocks

R help mailing list-2
Dear Paul,

R's power is that is works vectorised. Unlike SAS which is rowbased. Using
R in a SAS way will lead to very slow code.

Your examples can be written vectorised

d1 %>%
  rownames_to_column("ID") %>%
  mutate(
    test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
    test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
    test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
    test5 = test4
  )

Here is a speed comparison.

library(microbenchmark)
microbenchmark(
  vector = {d1 %>%
    rownames_to_column("ID") %>%
    mutate(
      test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
      test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
      test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
      test5 = test4
    ) },
  rowbased = {d1 %>%
  rownames_to_column("ID") %>%
  mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
  ddply("ID",
        within,
        if (gender == "f" & workshop == 1) {
          test1 <- 1
          test1 <- 6 + test1
          test2 <- 2 + test1
          test4 <- 1
          test5 <- 1
        } else {
          test1 <- test2 <- test4 <- test5 <- 0
        })}
)


Best regards,

Thierry

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
[hidden email]
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
[hidden email]>:

> Hello All,
>
> Season's greetings!
>
>  Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> code blocks. I've been trying to do likewise in R as that seems to be the
> most reliable way to get the same result.
>
> Below is some toy data and some code that does work. There are some things
> I don't necessarily like about the code though. So I was hoping some people
> could help make it better. One thing I don't like is that the within
> function reverses the order of the computed columns such that test1:test5
> becomes test5:test1. I've used a mutate to overcome that but would prefer
> not to have to do so.
>
>  Another, perhaps very small thing, is the need to calculate an ID
> variable that becomes the basis for a grouping.
>
> I did considerable Internet searching for R code that conditionally
> computes blocks of code. I didn't find much though and so am wondering if
> my search terms were not sufficient or if there is some other reason. It
> occurred to me that maybe if-then-do code blocks like we often see in SAS
> as are frowned upon and therefore not much implemented.
>
> I'd be interested in seeing more R-compatible approaches if this is the
> case. I've learned that it's a mistake to try and make R be like SAS. It's
> better to let R be R. Trouble is I'm not always sure how to do that.
>
> Thanks,
>
> Paul
>
>
> d1 <- data.frame(workshop=rep(1:2,4),
>                 gender=rep(c("f","m"),each=4))
>
> library(tibble)
> library(plyr)
>
> d2 <- d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
>         within,
>         if (gender == "f" & workshop == 1) {
>           test1 <- 1
>           test1 <- 6 + test1
>           test2 <- 2 + test1
>           test4 <- 1
>           test5 <- 1
>         } else {
>           test1 <- test2 <- test4 <- test5 <- 0
>         })
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R code for if-then-do code blocks

Richard M. Heiberger
this can be dome even faster, and I think more easily read, using only base R

d1 <- data.frame(workshop=rep(1:2,4),
                gender=rep(c("f","m"),each=4))

## needed by vector and rowbased, not needed by rmh
library(tibble)
library(plyr)
library(magrittr)

microbenchmark(
  vector = {d1 %>%
    rownames_to_column("ID") %>%
    mutate(
      test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
      test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
      test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
      test5 = test4
    ) },
  rowbased = {d1 %>%
  rownames_to_column("ID") %>%
  mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
  ddply("ID",
        within,
        if (gender == "f" & workshop == 1) {
          test1 <- 1
          test1 <- 6 + test1
          test2 <- 2 + test1
          test4 <- 1
          test5 <- 1
        } else {
          test1 <- test2 <- test4 <- test5 <- 0
        })},
  rmh={
    data.frame(ID=rownames(d1),
               d1,
               test1=0,
               test2=0,
               test4=0,
               test5=0)
    myRowSubset <- d3$gender=="f" & d3$workshop==1
    test1 <- 1
    d3[myRowSubset, "test1"] <- test1 + 6
    d3[myRowSubset, "test2"] <- test1 + 6 + 2
    d3[myRowSubset, c("test4", "test5")] <- test1
  }
)

Unit: microseconds
     expr      min       lq      mean   median        uq        max neval cld
   vector 1281.994 1468.102  1669.266 1573.043  1750.354   3171.777   100  a
 rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034   100   b
      rmh  925.571 1056.530  1167.568 1116.425  1221.457   1968.199   100  a
On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
<[hidden email]> wrote:

>
> Dear Paul,
>
> R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> R in a SAS way will lead to very slow code.
>
> Your examples can be written vectorised
>
> d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(
>     test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
>     test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
>     test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
>     test5 = test4
>   )
>
> Here is a speed comparison.
>
> library(microbenchmark)
> microbenchmark(
>   vector = {d1 %>%
>     rownames_to_column("ID") %>%
>     mutate(
>       test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
>       test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
>       test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
>       test5 = test4
>     ) },
>   rowbased = {d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
>         within,
>         if (gender == "f" & workshop == 1) {
>           test1 <- 1
>           test1 <- 6 + test1
>           test2 <- 2 + test1
>           test4 <- 1
>           test5 <- 1
>         } else {
>           test1 <- test2 <- test4 <- test5 <- 0
>         })}
> )
>
>
> Best regards,
>
> Thierry
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> [hidden email]
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
> [hidden email]>:
>
> > Hello All,
> >
> > Season's greetings!
> >
> >  Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> > code blocks. I've been trying to do likewise in R as that seems to be the
> > most reliable way to get the same result.
> >
> > Below is some toy data and some code that does work. There are some things
> > I don't necessarily like about the code though. So I was hoping some people
> > could help make it better. One thing I don't like is that the within
> > function reverses the order of the computed columns such that test1:test5
> > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > not to have to do so.
> >
> >  Another, perhaps very small thing, is the need to calculate an ID
> > variable that becomes the basis for a grouping.
> >
> > I did considerable Internet searching for R code that conditionally
> > computes blocks of code. I didn't find much though and so am wondering if
> > my search terms were not sufficient or if there is some other reason. It
> > occurred to me that maybe if-then-do code blocks like we often see in SAS
> > as are frowned upon and therefore not much implemented.
> >
> > I'd be interested in seeing more R-compatible approaches if this is the
> > case. I've learned that it's a mistake to try and make R be like SAS. It's
> > better to let R be R. Trouble is I'm not always sure how to do that.
> >
> > Thanks,
> >
> > Paul
> >
> >
> > d1 <- data.frame(workshop=rep(1:2,4),
> >                 gender=rep(c("f","m"),each=4))
> >
> > library(tibble)
> > library(plyr)
> >
> > d2 <- d1 %>%
> >   rownames_to_column("ID") %>%
> >   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> >   ddply("ID",
> >         within,
> >         if (gender == "f" & workshop == 1) {
> >           test1 <- 1
> >           test1 <- 6 + test1
> >           test2 <- 2 + test1
> >           test4 <- 1
> >           test5 <- 1
> >         } else {
> >           test1 <- test2 <- test4 <- test5 <- 0
> >         })
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R code for if-then-do code blocks

Richard M. Heiberger
I got another 10% savings with this example by using only one
subscripting adjustment.
I also fixed a typo in my previous posting (which didn't affect the timing).



microbenchmark(
 rmh={
    d3 <-data.frame(ID=rownames(d1),
               d1,
               test1=0,
               test2=0,
               test4=0,
               test5=0)
    myRowSubset <- d3$gender=="f" & d3$workshop==1
    test1 <- 1
    d3[myRowSubset, "test1"] <- test1 + 6
    d3[myRowSubset, "test2"] <- test1 + 6 + 2
    d3[myRowSubset, c("test4", "test5")] <- test1
  },
 rmh4={
   d4 <- data.frame(ID=rownames(d1),
                    d1,
                    test1=0,
                    test2=0,
                    test4=0,
                    test5=0)
   myRowSubset <- d4$gender=="f" & d4$workshop==1
   test1 <- 1
   d4[myRowSubset, c("test1", "test2", "test4", "test5")] <-
     matrix(test1 + c(6, 6+2, 0, 0), nrow=sum(myRowSubset), ncol=4, byrow=TRUE)
 }
)

Unit: microseconds
 expr     min       lq     mean   median       uq      max neval cld
  rmh 956.187 1183.304 1538.012 1617.985 1865.149 2177.071   100   b
 rmh4 850.729 1042.997 1380.842 1416.476 1700.307 2448.545   100  a


On Mon, Dec 17, 2018 at 12:49 PM Richard M. Heiberger <[hidden email]> wrote:

>
> this can be dome even faster, and I think more easily read, using only base R
>
> d1 <- data.frame(workshop=rep(1:2,4),
>                 gender=rep(c("f","m"),each=4))
>
> ## needed by vector and rowbased, not needed by rmh
> library(tibble)
> library(plyr)
> library(magrittr)
>
> microbenchmark(
>   vector = {d1 %>%
>     rownames_to_column("ID") %>%
>     mutate(
>       test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
>       test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
>       test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
>       test5 = test4
>     ) },
>   rowbased = {d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
>         within,
>         if (gender == "f" & workshop == 1) {
>           test1 <- 1
>           test1 <- 6 + test1
>           test2 <- 2 + test1
>           test4 <- 1
>           test5 <- 1
>         } else {
>           test1 <- test2 <- test4 <- test5 <- 0
>         })},
>   rmh={
>     data.frame(ID=rownames(d1),
>                d1,
>                test1=0,
>                test2=0,
>                test4=0,
>                test5=0)
>     myRowSubset <- d3$gender=="f" & d3$workshop==1
>     test1 <- 1
>     d3[myRowSubset, "test1"] <- test1 + 6
>     d3[myRowSubset, "test2"] <- test1 + 6 + 2
>     d3[myRowSubset, c("test4", "test5")] <- test1
>   }
> )
>
> Unit: microseconds
>      expr      min       lq      mean   median        uq        max neval cld
>    vector 1281.994 1468.102  1669.266 1573.043  1750.354   3171.777   100  a
>  rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034   100   b
>       rmh  925.571 1056.530  1167.568 1116.425  1221.457   1968.199   100  a
> On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
> <[hidden email]> wrote:
> >
> > Dear Paul,
> >
> > R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> > R in a SAS way will lead to very slow code.
> >
> > Your examples can be written vectorised
> >
> > d1 %>%
> >   rownames_to_column("ID") %>%
> >   mutate(
> >     test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> >     test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> >     test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> >     test5 = test4
> >   )
> >
> > Here is a speed comparison.
> >
> > library(microbenchmark)
> > microbenchmark(
> >   vector = {d1 %>%
> >     rownames_to_column("ID") %>%
> >     mutate(
> >       test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> >       test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> >       test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> >       test5 = test4
> >     ) },
> >   rowbased = {d1 %>%
> >   rownames_to_column("ID") %>%
> >   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> >   ddply("ID",
> >         within,
> >         if (gender == "f" & workshop == 1) {
> >           test1 <- 1
> >           test1 <- 6 + test1
> >           test2 <- 2 + test1
> >           test4 <- 1
> >           test5 <- 1
> >         } else {
> >           test1 <- test2 <- test4 <- test5 <- 0
> >         })}
> > )
> >
> >
> > Best regards,
> >
> > Thierry
> >
> > ir. Thierry Onkelinx
> > Statisticus / Statistician
> >
> > Vlaamse Overheid / Government of Flanders
> > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> > FOREST
> > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> > [hidden email]
> > Havenlaan 88 bus 73, 1000 Brussel
> > www.inbo.be
> >
> > ///////////////////////////////////////////////////////////////////////////////////////////
> > To call in the statistician after the experiment is done may be no more
> > than asking him to perform a post-mortem examination: he may be able to say
> > what the experiment died of. ~ Sir Ronald Aylmer Fisher
> > The plural of anecdote is not data. ~ Roger Brinner
> > The combination of some data and an aching desire for an answer does not
> > ensure that a reasonable answer can be extracted from a given body of data.
> > ~ John Tukey
> > ///////////////////////////////////////////////////////////////////////////////////////////
> >
> > <https://www.inbo.be>
> >
> >
> > Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
> > [hidden email]>:
> >
> > > Hello All,
> > >
> > > Season's greetings!
> > >
> > >  Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> > > code blocks. I've been trying to do likewise in R as that seems to be the
> > > most reliable way to get the same result.
> > >
> > > Below is some toy data and some code that does work. There are some things
> > > I don't necessarily like about the code though. So I was hoping some people
> > > could help make it better. One thing I don't like is that the within
> > > function reverses the order of the computed columns such that test1:test5
> > > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > > not to have to do so.
> > >
> > >  Another, perhaps very small thing, is the need to calculate an ID
> > > variable that becomes the basis for a grouping.
> > >
> > > I did considerable Internet searching for R code that conditionally
> > > computes blocks of code. I didn't find much though and so am wondering if
> > > my search terms were not sufficient or if there is some other reason. It
> > > occurred to me that maybe if-then-do code blocks like we often see in SAS
> > > as are frowned upon and therefore not much implemented.
> > >
> > > I'd be interested in seeing more R-compatible approaches if this is the
> > > case. I've learned that it's a mistake to try and make R be like SAS. It's
> > > better to let R be R. Trouble is I'm not always sure how to do that.
> > >
> > > Thanks,
> > >
> > > Paul
> > >
> > >
> > > d1 <- data.frame(workshop=rep(1:2,4),
> > >                 gender=rep(c("f","m"),each=4))
> > >
> > > library(tibble)
> > > library(plyr)
> > >
> > > d2 <- d1 %>%
> > >   rownames_to_column("ID") %>%
> > >   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> > >   ddply("ID",
> > >         within,
> > >         if (gender == "f" & workshop == 1) {
> > >           test1 <- 1
> > >           test1 <- 6 + test1
> > >           test2 <- 2 + test1
> > >           test4 <- 1
> > >           test5 <- 1
> > >         } else {
> > >           test1 <- test2 <- test4 <- test5 <- 0
> > >         })
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R code for if-then-do code blocks

Gabor Grothendieck
In reply to this post by R help mailing list-2
There is some discussion of approaches to this here:

https://stackoverflow.com/questions/34096162/dplyr-mutate-replace-on-a-subset-of-rows/34096575#34096575


On Mon, Dec 17, 2018 at 10:30 AM Paul Miller via R-help
<[hidden email]> wrote:

>
> Hello All,
>
> Season's greetings!
>
>  Am trying to replicate some SAS code in R. The SAS code uses if-then-do code blocks. I've been trying to do likewise in R as that seems to be the most reliable way to get the same result.
>
> Below is some toy data and some code that does work. There are some things I don't necessarily like about the code though. So I was hoping some people could help make it better. One thing I don't like is that the within function reverses the order of the computed columns such that test1:test5 becomes test5:test1. I've used a mutate to overcome that but would prefer not to have to do so.
>
>  Another, perhaps very small thing, is the need to calculate an ID variable that becomes the basis for a grouping.
>
> I did considerable Internet searching for R code that conditionally computes blocks of code. I didn't find much though and so am wondering if my search terms were not sufficient or if there is some other reason. It occurred to me that maybe if-then-do code blocks like we often see in SAS as are frowned upon and therefore not much implemented.
>
> I'd be interested in seeing more R-compatible approaches if this is the case. I've learned that it's a mistake to try and make R be like SAS. It's better to let R be R. Trouble is I'm not always sure how to do that.
>
> Thanks,
>
> Paul
>
>
> d1 <- data.frame(workshop=rep(1:2,4),
>                 gender=rep(c("f","m"),each=4))
>
> library(tibble)
> library(plyr)
>
> d2 <- d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
>         within,
>         if (gender == "f" & workshop == 1) {
>           test1 <- 1
>           test1 <- 6 + test1
>           test2 <- 2 + test1
>           test4 <- 1
>           test5 <- 1
>         } else {
>           test1 <- test2 <- test4 <- test5 <- 0
>         })
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R code for if-then-do code blocks

R help mailing list-2
Hi Gabor, Richard, and Thierry,

Thanks very much for your replies. Turns out I had already hit on Gabor's idea of "factor out" in writing an initial draft of the code converting from SAS to R. Below is the link Gabor sent describing this and other approaches.

https://stackoverflow.com/questions/34096162/dplyr-mutate-replace-on-a-subset-of-rows/34096575#34096575

At the end of this email are some new test data plus a snippet of my initial R code. The R code I have replicates the result from SAS but is quite verbose. That should be obvious from the snippet. I know I can make the code less verbose with a subsequent draft but wonder if I can simplify to the point where the factor out approach gets a fair test. I'd appreciate it if people could share some ways to make the factor out approach less verbose. I'd also like to see how well some of the other approaches might work with these data. I spent considerable time looking at the link Gabor sent as well as the other responses I received. The mutate_cond function in the link seems promising but it wasn't clear to me how I could avoid having to repeat the various conditions using that approach.

Thanks again.

Paul

library(magrittr)
library(dplyr)
 
test_data <-
  structure(
    list(
      intPatientId = c("3", "37", "48", "6", "6", "5"),
      intSurveySessionId = c(1L, 10996L, 19264L, 2841L, 28L, 34897L),
      a_CCMA02 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_),
      a_CCMA69 = c(7, NA, 0, 2, NA, 0),
      a_CCMA70 = c(7, 0, NA, 10, NA, NA),
      a_CCMA72 = c(7, 2, 3, NA, NA, NA),
      CCMA2 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,NA_integer_, NA_integer_),
      a_CCMA05 = c(NA, NA, NA, NA, NA, 0),
      a_CCMA43 = c(5, 0, 6, 5, NA, NA),
      a_CCMA44 = c(5, 0, 0, 5, 0, NA),
      CCMA5 = c(NA, NA, NA, NA, NA, 0)
    ),
    class = "data.frame",
    row.names = c(NA,-6L)
  )

factor_out <- test_data %>%
  mutate(
    CCMA2_cond = case_when(
      (is.na(a_CCMA02) | a_CCMA02 < 0 | a_CCMA02 > 10) &
        (!is.na(a_CCMA69) & between(a_CCMA69, 0, 10) &
           !is.na(a_CCMA70) & between(a_CCMA70, 0, 10) &
           !is.na(a_CCMA72) & between(a_CCMA72, 0, 10)) ~ "A",
      (is.na(a_CCMA02) | a_CCMA02 < 0 | a_CCMA02 > 10) &
        (is.na(a_CCMA69) | a_CCMA69 < 0 | a_CCMA69 >= 10) &
        !is.na(a_CCMA70) & between(a_CCMA70, 0, 10) &
        !is.na(a_CCMA72) & between(a_CCMA72, 0, 10) ~ "B",
      (is.na(a_CCMA02) | a_CCMA02 < 0 | a_CCMA02 > 10) &
        (is.na(a_CCMA70) | a_CCMA70 < 0 | a_CCMA70 >= 10) &
        between(a_CCMA69, 0, 10) & between(a_CCMA72, 0, 10) ~ "C",
      (is.na(a_CCMA02) | a_CCMA02 < 0 | a_CCMA02 > 10) &
        (is.na(a_CCMA72) | a_CCMA72 < 0 | a_CCMA72 >= 10) &
        between(a_CCMA69, 0, 10) & between(a_CCMA70, 0, 10) ~ "D")
  ) %>%
  mutate(
    CCMA2 = case_when(
      CCMA2_cond == "A" & 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72) < 0  ~ 0,
      CCMA2_cond == "A" & 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72) > 10 ~ 10,
      CCMA2_cond == "A" ~ 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72),
      TRUE ~ as.double(CCMA2)
    ),
    CCMA2 = case_when(
      CCMA2_cond == "B" & 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72) < 0  ~ 0,
      CCMA2_cond == "B" & 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72) > 10 ~ 10,
      CCMA2_cond == "B" ~ 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72),
      TRUE ~ as.double(CCMA2)
    ),
    CCMA2 = case_when(
      CCMA2_cond == "C" & 0.614 + (0.065 * a_CCMA69) + (-0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72) < 0  ~ 0,
      CCMA2_cond == "C" & 0.614 + (0.065 * a_CCMA69) + (-0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72) > 10 ~ 10,
      CCMA2_cond == "C" ~ 0.614 + (0.065 * a_CCMA69) + (-0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72),
      TRUE ~ as.double(CCMA2)
    ),
    CCMA2 = case_when(
      CCMA2_cond == "D" & 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2) < 0  ~ 0,
      CCMA2_cond == "D" & 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2) > 10 ~ 10,
      CCMA2_cond == "D" ~ 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2),
      TRUE ~ as.double(CCMA2)
    )
  ) %>%
  select(-CCMA2_cond) %>%
  mutate(
    CCMA5_condA = if_else(
      (is.na(a_CCMA05) | a_CCMA05 < 0 | a_CCMA05 > 10),
      1, 0
    ),
    CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA43, 0, 10) & between(a_CCMA44, 0, 10),
                   0.216 + (0.257 * a_CCMA43) + (0.828 * a_CCMA44),
                   CCMA5),
    CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA43, 0, 10) & (is.na(a_CCMA44) | a_CCMA44 < 0 | a_CCMA44 > 10),
                   0.216 + (0.257 * a_CCMA43) + (0.828 * a_CCMA43),
                   CCMA5),
    CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA44, 0, 10) & (is.na(a_CCMA43) | a_CCMA43 < 0 | a_CCMA43 > 10),
                   0.216 + (0.257 * a_CCMA44) + (0.828 * a_CCMA44),
                   CCMA5),
    CCMA5 = ifelse(CCMA5_condA == 1 & !is.na(CCMA5) & CCMA5 < 0,
                   0,
                   CCMA5),
    CCMA5 = ifelse(CCMA5_condA == 1 & CCMA5 > 10,
                   10,
                   CCMA5)
  ) %>%
  select(-CCMA5_condA)

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.