

Hello All,
Season's greetings!
Am trying to replicate some SAS code in R. The SAS code uses ifthendo code blocks. I've been trying to do likewise in R as that seems to be the most reliable way to get the same result.
Below is some toy data and some code that does work. There are some things I don't necessarily like about the code though. So I was hoping some people could help make it better. One thing I don't like is that the within function reverses the order of the computed columns such that test1:test5 becomes test5:test1. I've used a mutate to overcome that but would prefer not to have to do so.
Another, perhaps very small thing, is the need to calculate an ID variable that becomes the basis for a grouping.
I did considerable Internet searching for R code that conditionally computes blocks of code. I didn't find much though and so am wondering if my search terms were not sufficient or if there is some other reason. It occurred to me that maybe ifthendo code blocks like we often see in SAS as are frowned upon and therefore not much implemented.
I'd be interested in seeing more Rcompatible approaches if this is the case. I've learned that it's a mistake to try and make R be like SAS. It's better to let R be R. Trouble is I'm not always sure how to do that.
Thanks,
Paul
d1 < data.frame(workshop=rep(1:2,4),
gender=rep(c("f","m"),each=4))
library(tibble)
library(plyr)
d2 < d1 %>%
rownames_to_column("ID") %>%
mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
ddply("ID",
within,
if (gender == "f" & workshop == 1) {
test1 < 1
test1 < 6 + test1
test2 < 2 + test1
test4 < 1
test5 < 1
} else {
test1 < test2 < test4 < test5 < 0
})
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear Paul,
R's power is that is works vectorised. Unlike SAS which is rowbased. Using
R in a SAS way will lead to very slow code.
Your examples can be written vectorised
d1 %>%
rownames_to_column("ID") %>%
mutate(
test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
test5 = test4
)
Here is a speed comparison.
library(microbenchmark)
microbenchmark(
vector = {d1 %>%
rownames_to_column("ID") %>%
mutate(
test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
test5 = test4
) },
rowbased = {d1 %>%
rownames_to_column("ID") %>%
mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
ddply("ID",
within,
if (gender == "f" & workshop == 1) {
test1 < 1
test1 < 6 + test1
test2 < 2 + test1
test4 < 1
test5 < 1
} else {
test1 < test2 < test4 < test5 < 0
})}
)
Best regards,
Thierry
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
[hidden email]
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a postmortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
< https://www.inbo.be>
Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via Rhelp <
[hidden email]>:
> Hello All,
>
> Season's greetings!
>
> Am trying to replicate some SAS code in R. The SAS code uses ifthendo
> code blocks. I've been trying to do likewise in R as that seems to be the
> most reliable way to get the same result.
>
> Below is some toy data and some code that does work. There are some things
> I don't necessarily like about the code though. So I was hoping some people
> could help make it better. One thing I don't like is that the within
> function reverses the order of the computed columns such that test1:test5
> becomes test5:test1. I've used a mutate to overcome that but would prefer
> not to have to do so.
>
> Another, perhaps very small thing, is the need to calculate an ID
> variable that becomes the basis for a grouping.
>
> I did considerable Internet searching for R code that conditionally
> computes blocks of code. I didn't find much though and so am wondering if
> my search terms were not sufficient or if there is some other reason. It
> occurred to me that maybe ifthendo code blocks like we often see in SAS
> as are frowned upon and therefore not much implemented.
>
> I'd be interested in seeing more Rcompatible approaches if this is the
> case. I've learned that it's a mistake to try and make R be like SAS. It's
> better to let R be R. Trouble is I'm not always sure how to do that.
>
> Thanks,
>
> Paul
>
>
> d1 < data.frame(workshop=rep(1:2,4),
> gender=rep(c("f","m"),each=4))
>
> library(tibble)
> library(plyr)
>
> d2 < d1 %>%
> rownames_to_column("ID") %>%
> mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
> test1 < 1
> test1 < 6 + test1
> test2 < 2 + test1
> test4 < 1
> test5 < 1
> } else {
> test1 < test2 < test4 < test5 < 0
> })
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


this can be dome even faster, and I think more easily read, using only base R
d1 < data.frame(workshop=rep(1:2,4),
gender=rep(c("f","m"),each=4))
## needed by vector and rowbased, not needed by rmh
library(tibble)
library(plyr)
library(magrittr)
microbenchmark(
vector = {d1 %>%
rownames_to_column("ID") %>%
mutate(
test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
test5 = test4
) },
rowbased = {d1 %>%
rownames_to_column("ID") %>%
mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
ddply("ID",
within,
if (gender == "f" & workshop == 1) {
test1 < 1
test1 < 6 + test1
test2 < 2 + test1
test4 < 1
test5 < 1
} else {
test1 < test2 < test4 < test5 < 0
})},
rmh={
data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset < d3$gender=="f" & d3$workshop==1
test1 < 1
d3[myRowSubset, "test1"] < test1 + 6
d3[myRowSubset, "test2"] < test1 + 6 + 2
d3[myRowSubset, c("test4", "test5")] < test1
}
)
Unit: microseconds
expr min lq mean median uq max neval cld
vector 1281.994 1468.102 1669.266 1573.043 1750.354 3171.777 100 a
rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034 100 b
rmh 925.571 1056.530 1167.568 1116.425 1221.457 1968.199 100 a
On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via Rhelp
< [hidden email]> wrote:
>
> Dear Paul,
>
> R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> R in a SAS way will lead to very slow code.
>
> Your examples can be written vectorised
>
> d1 %>%
> rownames_to_column("ID") %>%
> mutate(
> test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> test5 = test4
> )
>
> Here is a speed comparison.
>
> library(microbenchmark)
> microbenchmark(
> vector = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(
> test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> test5 = test4
> ) },
> rowbased = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
> test1 < 1
> test1 < 6 + test1
> test2 < 2 + test1
> test4 < 1
> test5 < 1
> } else {
> test1 < test2 < test4 < test5 < 0
> })}
> )
>
>
> Best regards,
>
> Thierry
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> [hidden email]
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a postmortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> < https://www.inbo.be>
>
>
> Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via Rhelp <
> [hidden email]>:
>
> > Hello All,
> >
> > Season's greetings!
> >
> > Am trying to replicate some SAS code in R. The SAS code uses ifthendo
> > code blocks. I've been trying to do likewise in R as that seems to be the
> > most reliable way to get the same result.
> >
> > Below is some toy data and some code that does work. There are some things
> > I don't necessarily like about the code though. So I was hoping some people
> > could help make it better. One thing I don't like is that the within
> > function reverses the order of the computed columns such that test1:test5
> > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > not to have to do so.
> >
> > Another, perhaps very small thing, is the need to calculate an ID
> > variable that becomes the basis for a grouping.
> >
> > I did considerable Internet searching for R code that conditionally
> > computes blocks of code. I didn't find much though and so am wondering if
> > my search terms were not sufficient or if there is some other reason. It
> > occurred to me that maybe ifthendo code blocks like we often see in SAS
> > as are frowned upon and therefore not much implemented.
> >
> > I'd be interested in seeing more Rcompatible approaches if this is the
> > case. I've learned that it's a mistake to try and make R be like SAS. It's
> > better to let R be R. Trouble is I'm not always sure how to do that.
> >
> > Thanks,
> >
> > Paul
> >
> >
> > d1 < data.frame(workshop=rep(1:2,4),
> > gender=rep(c("f","m"),each=4))
> >
> > library(tibble)
> > library(plyr)
> >
> > d2 < d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> > ddply("ID",
> > within,
> > if (gender == "f" & workshop == 1) {
> > test1 < 1
> > test1 < 6 + test1
> > test2 < 2 + test1
> > test4 < 1
> > test5 < 1
> > } else {
> > test1 < test2 < test4 < test5 < 0
> > })
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


I got another 10% savings with this example by using only one
subscripting adjustment.
I also fixed a typo in my previous posting (which didn't affect the timing).
microbenchmark(
rmh={
d3 <data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset < d3$gender=="f" & d3$workshop==1
test1 < 1
d3[myRowSubset, "test1"] < test1 + 6
d3[myRowSubset, "test2"] < test1 + 6 + 2
d3[myRowSubset, c("test4", "test5")] < test1
},
rmh4={
d4 < data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset < d4$gender=="f" & d4$workshop==1
test1 < 1
d4[myRowSubset, c("test1", "test2", "test4", "test5")] <
matrix(test1 + c(6, 6+2, 0, 0), nrow=sum(myRowSubset), ncol=4, byrow=TRUE)
}
)
Unit: microseconds
expr min lq mean median uq max neval cld
rmh 956.187 1183.304 1538.012 1617.985 1865.149 2177.071 100 b
rmh4 850.729 1042.997 1380.842 1416.476 1700.307 2448.545 100 a
On Mon, Dec 17, 2018 at 12:49 PM Richard M. Heiberger < [hidden email]> wrote:
>
> this can be dome even faster, and I think more easily read, using only base R
>
> d1 < data.frame(workshop=rep(1:2,4),
> gender=rep(c("f","m"),each=4))
>
> ## needed by vector and rowbased, not needed by rmh
> library(tibble)
> library(plyr)
> library(magrittr)
>
> microbenchmark(
> vector = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(
> test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> test5 = test4
> ) },
> rowbased = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
> test1 < 1
> test1 < 6 + test1
> test2 < 2 + test1
> test4 < 1
> test5 < 1
> } else {
> test1 < test2 < test4 < test5 < 0
> })},
> rmh={
> data.frame(ID=rownames(d1),
> d1,
> test1=0,
> test2=0,
> test4=0,
> test5=0)
> myRowSubset < d3$gender=="f" & d3$workshop==1
> test1 < 1
> d3[myRowSubset, "test1"] < test1 + 6
> d3[myRowSubset, "test2"] < test1 + 6 + 2
> d3[myRowSubset, c("test4", "test5")] < test1
> }
> )
>
> Unit: microseconds
> expr min lq mean median uq max neval cld
> vector 1281.994 1468.102 1669.266 1573.043 1750.354 3171.777 100 a
> rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034 100 b
> rmh 925.571 1056.530 1167.568 1116.425 1221.457 1968.199 100 a
> On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via Rhelp
> < [hidden email]> wrote:
> >
> > Dear Paul,
> >
> > R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> > R in a SAS way will lead to very slow code.
> >
> > Your examples can be written vectorised
> >
> > d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(
> > test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> > test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> > test5 = test4
> > )
> >
> > Here is a speed comparison.
> >
> > library(microbenchmark)
> > microbenchmark(
> > vector = {d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(
> > test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> > test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> > test5 = test4
> > ) },
> > rowbased = {d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> > ddply("ID",
> > within,
> > if (gender == "f" & workshop == 1) {
> > test1 < 1
> > test1 < 6 + test1
> > test2 < 2 + test1
> > test4 < 1
> > test5 < 1
> > } else {
> > test1 < test2 < test4 < test5 < 0
> > })}
> > )
> >
> >
> > Best regards,
> >
> > Thierry
> >
> > ir. Thierry Onkelinx
> > Statisticus / Statistician
> >
> > Vlaamse Overheid / Government of Flanders
> > INSTITUUT VOOR NATUUR EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> > FOREST
> > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> > [hidden email]
> > Havenlaan 88 bus 73, 1000 Brussel
> > www.inbo.be
> >
> > ///////////////////////////////////////////////////////////////////////////////////////////
> > To call in the statistician after the experiment is done may be no more
> > than asking him to perform a postmortem examination: he may be able to say
> > what the experiment died of. ~ Sir Ronald Aylmer Fisher
> > The plural of anecdote is not data. ~ Roger Brinner
> > The combination of some data and an aching desire for an answer does not
> > ensure that a reasonable answer can be extracted from a given body of data.
> > ~ John Tukey
> > ///////////////////////////////////////////////////////////////////////////////////////////
> >
> > < https://www.inbo.be>
> >
> >
> > Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via Rhelp <
> > [hidden email]>:
> >
> > > Hello All,
> > >
> > > Season's greetings!
> > >
> > > Am trying to replicate some SAS code in R. The SAS code uses ifthendo
> > > code blocks. I've been trying to do likewise in R as that seems to be the
> > > most reliable way to get the same result.
> > >
> > > Below is some toy data and some code that does work. There are some things
> > > I don't necessarily like about the code though. So I was hoping some people
> > > could help make it better. One thing I don't like is that the within
> > > function reverses the order of the computed columns such that test1:test5
> > > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > > not to have to do so.
> > >
> > > Another, perhaps very small thing, is the need to calculate an ID
> > > variable that becomes the basis for a grouping.
> > >
> > > I did considerable Internet searching for R code that conditionally
> > > computes blocks of code. I didn't find much though and so am wondering if
> > > my search terms were not sufficient or if there is some other reason. It
> > > occurred to me that maybe ifthendo code blocks like we often see in SAS
> > > as are frowned upon and therefore not much implemented.
> > >
> > > I'd be interested in seeing more Rcompatible approaches if this is the
> > > case. I've learned that it's a mistake to try and make R be like SAS. It's
> > > better to let R be R. Trouble is I'm not always sure how to do that.
> > >
> > > Thanks,
> > >
> > > Paul
> > >
> > >
> > > d1 < data.frame(workshop=rep(1:2,4),
> > > gender=rep(c("f","m"),each=4))
> > >
> > > library(tibble)
> > > library(plyr)
> > >
> > > d2 < d1 %>%
> > > rownames_to_column("ID") %>%
> > > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> > > ddply("ID",
> > > within,
> > > if (gender == "f" & workshop == 1) {
> > > test1 < 1
> > > test1 < 6 + test1
> > > test2 < 2 + test1
> > > test4 < 1
> > > test5 < 1
> > > } else {
> > > test1 < test2 < test4 < test5 < 0
> > > })
> > >
> > > ______________________________________________
> > > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/rhelp> > > PLEASE do read the posting guide
> > > http://www.Rproject.org/postingguide.html> > > and provide commented, minimal, selfcontained, reproducible code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


In reply to this post by R help mailing list2
There is some discussion of approaches to this here:
https://stackoverflow.com/questions/34096162/dplyrmutatereplaceonasubsetofrows/34096575#34096575On Mon, Dec 17, 2018 at 10:30 AM Paul Miller via Rhelp
< [hidden email]> wrote:
>
> Hello All,
>
> Season's greetings!
>
> Am trying to replicate some SAS code in R. The SAS code uses ifthendo code blocks. I've been trying to do likewise in R as that seems to be the most reliable way to get the same result.
>
> Below is some toy data and some code that does work. There are some things I don't necessarily like about the code though. So I was hoping some people could help make it better. One thing I don't like is that the within function reverses the order of the computed columns such that test1:test5 becomes test5:test1. I've used a mutate to overcome that but would prefer not to have to do so.
>
> Another, perhaps very small thing, is the need to calculate an ID variable that becomes the basis for a grouping.
>
> I did considerable Internet searching for R code that conditionally computes blocks of code. I didn't find much though and so am wondering if my search terms were not sufficient or if there is some other reason. It occurred to me that maybe ifthendo code blocks like we often see in SAS as are frowned upon and therefore not much implemented.
>
> I'd be interested in seeing more Rcompatible approaches if this is the case. I've learned that it's a mistake to try and make R be like SAS. It's better to let R be R. Trouble is I'm not always sure how to do that.
>
> Thanks,
>
> Paul
>
>
> d1 < data.frame(workshop=rep(1:2,4),
> gender=rep(c("f","m"),each=4))
>
> library(tibble)
> library(plyr)
>
> d2 < d1 %>%
> rownames_to_column("ID") %>%
> mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
> test1 < 1
> test1 < 6 + test1
> test2 < 2 + test1
> test4 < 1
> test5 < 1
> } else {
> test1 < test2 < test4 < test5 < 0
> })
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1877GKXGROUP
email: ggrothendieck at gmail.com
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Gabor, Richard, and Thierry,
Thanks very much for your replies. Turns out I had already hit on Gabor's idea of "factor out" in writing an initial draft of the code converting from SAS to R. Below is the link Gabor sent describing this and other approaches.
https://stackoverflow.com/questions/34096162/dplyrmutatereplaceonasubsetofrows/34096575#34096575At the end of this email are some new test data plus a snippet of my initial R code. The R code I have replicates the result from SAS but is quite verbose. That should be obvious from the snippet. I know I can make the code less verbose with a subsequent draft but wonder if I can simplify to the point where the factor out approach gets a fair test. I'd appreciate it if people could share some ways to make the factor out approach less verbose. I'd also like to see how well some of the other approaches might work with these data. I spent considerable time looking at the link Gabor sent as well as the other responses I received. The mutate_cond function in the link seems promising but it wasn't clear to me how I could avoid having to repeat the various conditions using that approach.
Thanks again.
Paul
library(magrittr)
library(dplyr)
test_data <
structure(
list(
intPatientId = c("3", "37", "48", "6", "6", "5"),
intSurveySessionId = c(1L, 10996L, 19264L, 2841L, 28L, 34897L),
a_CCMA02 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_),
a_CCMA69 = c(7, NA, 0, 2, NA, 0),
a_CCMA70 = c(7, 0, NA, 10, NA, NA),
a_CCMA72 = c(7, 2, 3, NA, NA, NA),
CCMA2 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,NA_integer_, NA_integer_),
a_CCMA05 = c(NA, NA, NA, NA, NA, 0),
a_CCMA43 = c(5, 0, 6, 5, NA, NA),
a_CCMA44 = c(5, 0, 0, 5, 0, NA),
CCMA5 = c(NA, NA, NA, NA, NA, 0)
),
class = "data.frame",
row.names = c(NA,6L)
)
factor_out < test_data %>%
mutate(
CCMA2_cond = case_when(
(is.na(a_CCMA02)  a_CCMA02 < 0  a_CCMA02 > 10) &
(!is.na(a_CCMA69) & between(a_CCMA69, 0, 10) &
!is.na(a_CCMA70) & between(a_CCMA70, 0, 10) &
!is.na(a_CCMA72) & between(a_CCMA72, 0, 10)) ~ "A",
(is.na(a_CCMA02)  a_CCMA02 < 0  a_CCMA02 > 10) &
(is.na(a_CCMA69)  a_CCMA69 < 0  a_CCMA69 >= 10) &
!is.na(a_CCMA70) & between(a_CCMA70, 0, 10) &
!is.na(a_CCMA72) & between(a_CCMA72, 0, 10) ~ "B",
(is.na(a_CCMA02)  a_CCMA02 < 0  a_CCMA02 > 10) &
(is.na(a_CCMA70)  a_CCMA70 < 0  a_CCMA70 >= 10) &
between(a_CCMA69, 0, 10) & between(a_CCMA72, 0, 10) ~ "C",
(is.na(a_CCMA02)  a_CCMA02 < 0  a_CCMA02 > 10) &
(is.na(a_CCMA72)  a_CCMA72 < 0  a_CCMA72 >= 10) &
between(a_CCMA69, 0, 10) & between(a_CCMA70, 0, 10) ~ "D")
) %>%
mutate(
CCMA2 = case_when(
CCMA2_cond == "A" & 0.614 + (0.065 * a_CCMA69) + (0.012 * a_CCMA70) + (0.504 * a_CCMA72) < 0 ~ 0,
CCMA2_cond == "A" & 0.614 + (0.065 * a_CCMA69) + (0.012 * a_CCMA70) + (0.504 * a_CCMA72) > 10 ~ 10,
CCMA2_cond == "A" ~ 0.614 + (0.065 * a_CCMA69) + (0.012 * a_CCMA70) + (0.504 * a_CCMA72),
TRUE ~ as.double(CCMA2)
),
CCMA2 = case_when(
CCMA2_cond == "B" & 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (0.012 * a_CCMA70) + (0.504 * a_CCMA72) < 0 ~ 0,
CCMA2_cond == "B" & 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (0.012 * a_CCMA70) + (0.504 * a_CCMA72) > 10 ~ 10,
CCMA2_cond == "B" ~ 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (0.012 * a_CCMA70) + (0.504 * a_CCMA72),
TRUE ~ as.double(CCMA2)
),
CCMA2 = case_when(
CCMA2_cond == "C" & 0.614 + (0.065 * a_CCMA69) + (0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72) < 0 ~ 0,
CCMA2_cond == "C" & 0.614 + (0.065 * a_CCMA69) + (0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72) > 10 ~ 10,
CCMA2_cond == "C" ~ 0.614 + (0.065 * a_CCMA69) + (0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72),
TRUE ~ as.double(CCMA2)
),
CCMA2 = case_when(
CCMA2_cond == "D" & 0.614 + (0.065 * a_CCMA69) + (0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2) < 0 ~ 0,
CCMA2_cond == "D" & 0.614 + (0.065 * a_CCMA69) + (0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2) > 10 ~ 10,
CCMA2_cond == "D" ~ 0.614 + (0.065 * a_CCMA69) + (0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2),
TRUE ~ as.double(CCMA2)
)
) %>%
select(CCMA2_cond) %>%
mutate(
CCMA5_condA = if_else(
(is.na(a_CCMA05)  a_CCMA05 < 0  a_CCMA05 > 10),
1, 0
),
CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA43, 0, 10) & between(a_CCMA44, 0, 10),
0.216 + (0.257 * a_CCMA43) + (0.828 * a_CCMA44),
CCMA5),
CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA43, 0, 10) & (is.na(a_CCMA44)  a_CCMA44 < 0  a_CCMA44 > 10),
0.216 + (0.257 * a_CCMA43) + (0.828 * a_CCMA43),
CCMA5),
CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA44, 0, 10) & (is.na(a_CCMA43)  a_CCMA43 < 0  a_CCMA43 > 10),
0.216 + (0.257 * a_CCMA44) + (0.828 * a_CCMA44),
CCMA5),
CCMA5 = ifelse(CCMA5_condA == 1 & !is.na(CCMA5) & CCMA5 < 0,
0,
CCMA5),
CCMA5 = ifelse(CCMA5_condA == 1 & CCMA5 > 10,
10,
CCMA5)
) %>%
select(CCMA5_condA)
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

