# Character (1a, 1b) to numeric

22 messages
12
Open this post in threaded view
|

## Character (1a, 1b) to numeric

 Dear All I have a character vector,  representing histology stages, such as for example: xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment. I would want to convert xc, for plotting reasons, to a numeric vector such as xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) Unfortunately I have no clue on how to do that. Thanks for any help and apologies if I am missing the obvious way to do it. JL -- Verif30042020 ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 Dear Jean-Louis, There must be many ways to do this. Here's one simple way (with no claim of optimality!): > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > set.seed(123) # for reproducibility > x <- sample(xc, 20, replace=TRUE) # "data" > > names(xn) <- xc > z <- xn[x] > > data.frame(z, x)      z  x 1  2.5 2b 2  2.5 2b 3  1.5 1b 4  2.3 2a 5  1.5 1b 6  1.3 1a 7  1.3 1a 8  2.3 2a 9  1.5 1b 10 2.0  2 11 1.7 1c 12 2.3 2a 13 2.3 2a 14 1.0  1 15 1.3 1a 16 1.5 1b 17 2.7 2c 18 2.0  2 19 1.5 1b 20 1.5 1b I hope this helps,  John   -----------------------------   John Fox, Professor Emeritus   McMaster University   Hamilton, Ontario, Canada   Web: http::/socserv.mcmaster.ca/jfox > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote: > > Dear All > > I have a character vector,  representing histology stages, such as for example: > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment. > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > Unfortunately I have no clue on how to do that. > > Thanks for any help and apologies if I am missing the obvious way to do it. > > JL > -- > Verif30042020 > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: [External] Character (1a, 1b) to numeric

 In reply to this post by Jean-Louis Abitbol-2 > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > testdata <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8) > testdata  [1] "1"  "1a" "1a" "1b" "1b" "1b" "1c" "1c" "1c" "1c" "2"  "2"  "2"  "2"  "2" [16] "2a" "2a" "2a" "2a" "2a" "2a" "2b" "2b" "2b" "2b" "2b" "2b" "2b" "2c" "2c" [31] "2c" "2c" "2c" "2c" "2c" "2c" > ?match > xn[match(testdata, xc)]  [1] 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 2.3 [20] 2.3 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 2.7 2.7 2.7 > On Fri, Jul 10, 2020 at 1:51 PM Jean-Louis Abitbol <[hidden email]> wrote: > > Dear All > > I have a character vector,  representing histology stages, such as for example: > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment. > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > Unfortunately I have no clue on how to do that. > > Thanks for any help and apologies if I am missing the obvious way to do it. > > JL > -- > Verif30042020 > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Jean-Louis Abitbol-2 Obvious is in the eye of the beholder. Presuming your letters don't go beyond "i": a) Lookup table: tbl <- read.table( text= "OldCode  NewCode 1         1 1a        1.1 1b        1.2 1c        1.3 2         2 2a        2.1 2b        2.2 ", as.is=TRUE, header=TRUE ) tblv <- setNames( tbl\$NewCode, tbl\$OldCode ) test <- c( "2", "1c", "2b" ) as.vector( tblv[ test ] ) b) String manipulation: n <- as.integer( sub( "[a-i]\$", "", test ) ) d <- match( sub( "^\\d+", "", test ), letters[1:9] ) d[ is.na( d ) ] <- 0 n + d / 10 On July 10, 2020 10:50:18 AM PDT, Jean-Louis Abitbol <[hidden email]> wrote: >Dear All > >I have a character vector,  representing histology stages, such as for >example: >xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > >and this goes on to 3, 3a etc in various order for each patient. I do >have of course a pre-established  classification available which does >change according to the histology criteria under assessment. > >I would want to convert xc, for plotting reasons, to a numeric vector >such as > >xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > >Unfortunately I have no clue on how to do that. > >Thanks for any help and apologies if I am missing the obvious way to do >it. > >JL -- Sent from my phone. Please excuse my brevity. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Fox, John Here is a different approach: xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) xn # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 David L Carlson Professor Emeritus of Anthropology Texas A&M University On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > Dear Jean-Louis, > > There must be many ways to do this. Here's one simple way (with no claim > of optimality!): > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > set.seed(123) # for reproducibility > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > names(xn) <- xc > > z <- xn[x] > > > > data.frame(z, x) >      z  x > 1  2.5 2b > 2  2.5 2b > 3  1.5 1b > 4  2.3 2a > 5  1.5 1b > 6  1.3 1a > 7  1.3 1a > 8  2.3 2a > 9  1.5 1b > 10 2.0  2 > 11 1.7 1c > 12 2.3 2a > 13 2.3 2a > 14 1.0  1 > 15 1.3 1a > 16 1.5 1b > 17 2.7 2c > 18 2.0  2 > 19 1.5 1b > 20 1.5 1b > > I hope this helps, >  John > >   ----------------------------- >   John Fox, Professor Emeritus >   McMaster University >   Hamilton, Ontario, Canada >   Web: http::/socserv.mcmaster.ca/jfox > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> > wrote: > > > > Dear All > > > > I have a character vector,  representing histology stages, such as for > example: > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > and this goes on to 3, 3a etc in various order for each patient. I do > have of course a pre-established  classification available which does > change according to the histology criteria under assessment. > > > > I would want to convert xc, for plotting reasons, to a numeric vector > such as > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > Unfortunately I have no clue on how to do that. > > > > Thanks for any help and apologies if I am missing the obvious way to do > it. > > > > JL > > -- > > Verif30042020 > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > PLEASE do read the posting guide > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> PLEASE do read the posting guide > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 ... and continuing with this cute little thread... I found the OP's specification a little imprecise -- are your values always a string that begins with *some sort" of numeric value followed by "some sort" of alpha code? That is, could the numeric value be several digits and the alpha code several letters? Probably not, and the existing solutions you have been provided are almost certainly all you need. But for fun, assuming this more general specification, here is a general way to split your alphanumeric codes up into numeric and alpha parts and then convert by using a couple of sub() 's. > set.seed(131) > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, replace = TRUE) > nums <- sub("[[:alpha:]]+","",xc)  ## extract numeric part > alph <- sub("\\d+","",xc)   ## extract alpha part > codes <- letters[1:3] ## whatever alpha codes are used > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to convert codes to > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > data.frame (xc = xc, xnew = xnew)    xc xnew 1  1a  1.3 2   2  2.0 3  1c  1.7 4  1c  1.7 5  1b  1.5 6  1a  1.3 7   2  2.0 8   2  2.0 9  1a  1.3 10 1a  1.3 11 2c  2.7 12 1b  1.5 13 1b  1.5 14  1  1.0 15 1c  1.7 Echoing others, no claim for optimality in any sense. Cheers, Bert On Fri, Jul 10, 2020 at 12:28 PM David Carlson <[hidden email]> wrote: > Here is a different approach: > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > xn > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > David L Carlson > Professor Emeritus of Anthropology > Texas A&M University > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > > > Dear Jean-Louis, > > > > There must be many ways to do this. Here's one simple way (with no claim > > of optimality!): > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > set.seed(123) # for reproducibility > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > names(xn) <- xc > > > z <- xn[x] > > > > > > data.frame(z, x) > >      z  x > > 1  2.5 2b > > 2  2.5 2b > > 3  1.5 1b > > 4  2.3 2a > > 5  1.5 1b > > 6  1.3 1a > > 7  1.3 1a > > 8  2.3 2a > > 9  1.5 1b > > 10 2.0  2 > > 11 1.7 1c > > 12 2.3 2a > > 13 2.3 2a > > 14 1.0  1 > > 15 1.3 1a > > 16 1.5 1b > > 17 2.7 2c > > 18 2.0  2 > > 19 1.5 1b > > 20 1.5 1b > > > > I hope this helps, > >  John > > > >   ----------------------------- > >   John Fox, Professor Emeritus > >   McMaster University > >   Hamilton, Ontario, Canada > >   Web: http::/socserv.mcmaster.ca/jfox > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> > > wrote: > > > > > > Dear All > > > > > > I have a character vector,  representing histology stages, such as for > > example: > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do > > have of course a pre-established  classification available which does > > change according to the histology criteria under assessment. > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector > > such as > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > Unfortunately I have no clue on how to do that. > > > > > > Thanks for any help and apologies if I am missing the obvious way to do > > it. > > > > > > JL > > > -- > > > Verif30042020 > > > > > > ______________________________________________ > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > PLEASE do read the posting guide > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > PLEASE do read the posting guide > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > and provide commented, minimal, self-contained, reproducible code. > > > >         [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by David Carlson Hi, We've had several solutions, and I was curious about their relative efficiency. Here's a test with a moderately large data vector: > library("microbenchmark") > set.seed(123) # for reproducibility > x <- sample(xc, 1e4, replace=TRUE) # "data" > microbenchmark(John = John <- xn[x], +                Rich = Rich <- xn[match(x, xc)], +                Jeff = Jeff <- { +                 n <- as.integer( sub( "[a-i]\$", "", x ) ) +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] ) +                 d[ is.na( d ) ] <- 0 +                 n + d / 10 +                 }, +                David = David <- as.numeric(gsub("a", ".3", +                                      gsub("b", ".5", +                                           gsub("c", ".7", x)))), +                times=1000L +                ) Unit: microseconds   expr       min        lq       mean     median         uq       max neval cld   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a     Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a     Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c > all.equal(John, Rich) [1] TRUE > all.equal(John, David) [1] "names for target but not for current" > all.equal(John, Jeff) [1] "names for target but not for current" "Mean relative difference: 0.1498243" Of course, efficiency isn't the only consideration, and aesthetically (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH, Jeff's solution is more general in that it generates the correspondence between letters and numbers. The argument for Jeff's solution would, however, be stronger if it gave the desired answer. Best,  John > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote: > > Here is a different approach: > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > xn > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > David L Carlson > Professor Emeritus of Anthropology > Texas A&M University > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > Dear Jean-Louis, > > There must be many ways to do this. Here's one simple way (with no claim of optimality!): > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > set.seed(123) # for reproducibility > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > names(xn) <- xc > > z <- xn[x] > > > > data.frame(z, x) >      z  x > 1  2.5 2b > 2  2.5 2b > 3  1.5 1b > 4  2.3 2a > 5  1.5 1b > 6  1.3 1a > 7  1.3 1a > 8  2.3 2a > 9  1.5 1b > 10 2.0  2 > 11 1.7 1c > 12 2.3 2a > 13 2.3 2a > 14 1.0  1 > 15 1.3 1a > 16 1.5 1b > 17 2.7 2c > 18 2.0  2 > 19 1.5 1b > 20 1.5 1b > > I hope this helps, >  John > >   ----------------------------- >   John Fox, Professor Emeritus >   McMaster University >   Hamilton, Ontario, Canada >   Web: http::/socserv.mcmaster.ca/jfox > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote: > > > > Dear All > > > > I have a character vector,  representing histology stages, such as for example: > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment. > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > Unfortunately I have no clue on how to do that. > > > > Thanks for any help and apologies if I am missing the obvious way to do it. > > > > JL > > -- > > Verif30042020 > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$  > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$  > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$  > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$  > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 Many thanks to all. This help-list is wonderful. I have used Rich Heiberger solution using match and found something to learn in each answer. off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data.... Best wishes. On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote: > Hi, > > We've had several solutions, and I was curious about their relative > efficiency. Here's a test with a moderately large data vector: > > > library("microbenchmark") > > set.seed(123) # for reproducibility > > x <- sample(xc, 1e4, replace=TRUE) # "data" > > microbenchmark(John = John <- xn[x], > +                Rich = Rich <- xn[match(x, xc)], > +                Jeff = Jeff <- { > +                 n <- as.integer( sub( "[a-i]\$", "", x ) ) > +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] ) > +                 d[ is.na( d ) ] <- 0 > +                 n + d / 10 > +                 }, > +                David = David <- as.numeric(gsub("a", ".3", > +                                      gsub("b", ".5", > +                                           gsub("c", ".7", x)))), > +                times=1000L > +                ) > Unit: microseconds >   expr       min        lq       mean     median         uq       max neval cld >   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a   >   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a   >   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b >  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c > > all.equal(John, Rich) > [1] TRUE > > all.equal(John, David) > [1] "names for target but not for current" > > all.equal(John, Jeff) > [1] "names for target but not for current" "Mean relative difference: > 0.1498243" > > Of course, efficiency isn't the only consideration, and aesthetically > (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH, > Jeff's solution is more general in that it generates the correspondence > between letters and numbers. The argument for Jeff's solution would, > however, be stronger if it gave the desired answer. > > Best, >  John > > > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote: > > > > Here is a different approach: > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > > xn > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > > > David L Carlson > > Professor Emeritus of Anthropology > > Texas A&M University > > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > > Dear Jean-Louis, > > > > There must be many ways to do this. Here's one simple way (with no claim of optimality!): > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > set.seed(123) # for reproducibility > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > names(xn) <- xc > > > z <- xn[x] > > > > > > data.frame(z, x) > >      z  x > > 1  2.5 2b > > 2  2.5 2b > > 3  1.5 1b > > 4  2.3 2a > > 5  1.5 1b > > 6  1.3 1a > > 7  1.3 1a > > 8  2.3 2a > > 9  1.5 1b > > 10 2.0  2 > > 11 1.7 1c > > 12 2.3 2a > > 13 2.3 2a > > 14 1.0  1 > > 15 1.3 1a > > 16 1.5 1b > > 17 2.7 2c > > 18 2.0  2 > > 19 1.5 1b > > 20 1.5 1b > > > > I hope this helps, > >  John > > > >   ----------------------------- > >   John Fox, Professor Emeritus > >   McMaster University > >   Hamilton, Ontario, Canada > >   Web: http::/socserv.mcmaster.ca/jfox > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote: > > > > > > Dear All > > > > > > I have a character vector,  representing histology stages, such as for example: > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment. > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > Unfortunately I have no clue on how to do that. > > > > > > Thanks for any help and apologies if I am missing the obvious way to do it. > > > > > > JL > > > -- > > > Verif30042020 > > > > > > ______________________________________________ > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$  > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$  > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$  > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$  > > and provide commented, minimal, self-contained, reproducible code. > > -- Verif30042020 ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Bert Gunter-2 Dear Bert, Wouldn't you know it, but your contribution arrived just after I pressed "send" on my last message? So here's how your solution compares: > microbenchmark(John = John <- xn[x], +                Rich = Rich <- xn[match(x, xc)], +                Jeff = Jeff <- { +                   n <- as.integer( sub( "[a-i]\$", "", x ) ) +                   d <- match( sub( "^\\d+", "", x ), letters[1:9] ) +                   d[ is.na( d ) ] <- 0 +                   n + d / 10 +                }, +                David = David <- as.numeric(gsub("a", ".3", +                                      gsub("b", ".5", +                                           gsub("c", ".7", x)))), +                Bert = Bert <- { +                   nums <- sub("[[:alpha:]]+","",x)   +                   alph <- sub("\\d+","",x)   +                   as.numeric(nums) + ifelse(alph == "",0, vals[alph]) +                }, +                times=1000L +                ) Unit: microseconds   expr       min         lq       mean    median         uq       max neval  cld   John   261.739   373.9765   599.9411   536.571   569.3750  14489.48  1000 a     Rich   250.697   372.4450   542.3208   520.383   554.7215  10682.73  1000 a     Jeff 10879.223 13477.7665 15647.7856 15549.255 17516.7420 146155.28  1000  b    David 14337.510 18375.0100 20325.8796 20187.174 22161.0195  32575.31  1000    d   Bert 12344.506 15753.2510 18024.2757 17702.838 19973.0465  32043.80  1000   c > all.equal(John, Rich) [1] TRUE > all.equal(John, David) [1] "names for target but not for current" > all.equal(John, Jeff) [1] "names for target but not for current" "Mean relative difference: 0.1498243" > all.equal(John, Bert) [1] "names for target but not for current" To make the comparison fair, I moved the parts of the solutions that don't depend on the length of the data outside the benchmark. Your solution does have the virtue of providing the right answer. Best,  John > On Jul 10, 2020, at 3:54 PM, Bert Gunter <[hidden email]> wrote: > > ... and continuing with this cute little thread... > > I found the OP's specification a little imprecise -- are your values always a string that begins with *some sort" of numeric value followed by "some sort" of alpha code? That is, could the numeric value be several digits and the alpha code several letters? Probably not, and the existing solutions you have been provided are almost certainly all you need. But for fun, assuming this more general specification, here is a general way to split your alphanumeric codes up into numeric and alpha parts and then convert by using a couple of sub() 's. > > > set.seed(131) > > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, replace = TRUE) > > nums <- sub("[[:alpha:]]+","",xc)  ## extract numeric part > > alph <- sub("\\d+","",xc)   ## extract alpha part > > codes <- letters[1:3] ## whatever alpha codes are used > > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to convert codes to > > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > > data.frame (xc = xc, xnew = xnew) >    xc xnew > 1  1a  1.3 > 2   2  2.0 > 3  1c  1.7 > 4  1c  1.7 > 5  1b  1.5 > 6  1a  1.3 > 7   2  2.0 > 8   2  2.0 > 9  1a  1.3 > 10 1a  1.3 > 11 2c  2.7 > 12 1b  1.5 > 13 1b  1.5 > 14  1  1.0 > 15 1c  1.7 > > Echoing others, no claim for optimality in any sense. > > Cheers, > Bert > > > On Fri, Jul 10, 2020 at 12:28 PM David Carlson <[hidden email]> wrote: > Here is a different approach: > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > xn > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > David L Carlson > Professor Emeritus of Anthropology > Texas A&M University > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > > > Dear Jean-Louis, > > > > There must be many ways to do this. Here's one simple way (with no claim > > of optimality!): > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > set.seed(123) # for reproducibility > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > names(xn) <- xc > > > z <- xn[x] > > > > > > data.frame(z, x) > >      z  x > > 1  2.5 2b > > 2  2.5 2b > > 3  1.5 1b > > 4  2.3 2a > > 5  1.5 1b > > 6  1.3 1a > > 7  1.3 1a > > 8  2.3 2a > > 9  1.5 1b > > 10 2.0  2 > > 11 1.7 1c > > 12 2.3 2a > > 13 2.3 2a > > 14 1.0  1 > > 15 1.3 1a > > 16 1.5 1b > > 17 2.7 2c > > 18 2.0  2 > > 19 1.5 1b > > 20 1.5 1b > > > > I hope this helps, > >  John > > > >   ----------------------------- > >   John Fox, Professor Emeritus > >   McMaster University > >   Hamilton, Ontario, Canada > >   Web: http::/socserv.mcmaster.ca/jfox > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> > > wrote: > > > > > > Dear All > > > > > > I have a character vector,  representing histology stages, such as for > > example: > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do > > have of course a pre-established  classification available which does > > change according to the histology criteria under assessment. > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector > > such as > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > Unfortunately I have no clue on how to do that. > > > > > > Thanks for any help and apologies if I am missing the obvious way to do > > it. > > > > > > JL > > > -- > > > Verif30042020 > > > > > > ______________________________________________ > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > PLEASE do read the posting guide > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > PLEASE do read the posting guide > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > and provide commented, minimal, self-contained, reproducible code. > > > >         [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 Thanks! As I said, cute exercise. Best, Bert On Fri, Jul 10, 2020 at 1:21 PM Fox, John <[hidden email]> wrote: > Dear Bert, > > Wouldn't you know it, but your contribution arrived just after I pressed > "send" on my last message? So here's how your solution compares: > > > microbenchmark(John = John <- xn[x], > +                Rich = Rich <- xn[match(x, xc)], > +                Jeff = Jeff <- { > +                   n <- as.integer( sub( "[a-i]\$", "", x ) ) > +                   d <- match( sub( "^\\d+", "", x ), letters[1:9] ) > +                   d[ is.na( d ) ] <- 0 > +                   n + d / 10 > +                }, > +                David = David <- as.numeric(gsub("a", ".3", > +                                      gsub("b", ".5", > +                                           gsub("c", ".7", x)))), > +                Bert = Bert <- { > +                   nums <- sub("[[:alpha:]]+","",x) > +                   alph <- sub("\\d+","",x) > +                   as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > +                }, > +                times=1000L > +                ) > Unit: microseconds >   expr       min         lq       mean    median         uq       max > neval  cld >   John   261.739   373.9765   599.9411   536.571   569.3750  14489.48 > 1000 a >   Rich   250.697   372.4450   542.3208   520.383   554.7215  10682.73 > 1000 a >   Jeff 10879.223 13477.7665 15647.7856 15549.255 17516.7420 146155.28 > 1000  b >  David 14337.510 18375.0100 20325.8796 20187.174 22161.0195  32575.31 > 1000    d >   Bert 12344.506 15753.2510 18024.2757 17702.838 19973.0465  32043.80 > 1000   c > > all.equal(John, Rich) > [1] TRUE > > all.equal(John, David) > [1] "names for target but not for current" > > all.equal(John, Jeff) > [1] "names for target but not for current" "Mean relative difference: > 0.1498243" > > all.equal(John, Bert) > [1] "names for target but not for current" > > To make the comparison fair, I moved the parts of the solutions that don't > depend on the length of the data outside the benchmark. Your solution does > have the virtue of providing the right answer. > > Best, >  John > > > On Jul 10, 2020, at 3:54 PM, Bert Gunter <[hidden email]> wrote: > > > > ... and continuing with this cute little thread... > > > > I found the OP's specification a little imprecise -- are your values > always a string that begins with *some sort" of numeric value followed by > "some sort" of alpha code? That is, could the numeric value be several > digits and the alpha code several letters? Probably not, and the existing > solutions you have been provided are almost certainly all you need. But for > fun, assuming this more general specification, here is a general way to > split your alphanumeric codes up into numeric and alpha parts and then > convert by using a couple of sub() 's. > > > > > set.seed(131) > > > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, > replace = TRUE) > > > nums <- sub("[[:alpha:]]+","",xc)  ## extract numeric part > > > alph <- sub("\\d+","",xc)   ## extract alpha part > > > codes <- letters[1:3] ## whatever alpha codes are used > > > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to > convert codes to > > > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > > > data.frame (xc = xc, xnew = xnew) > >    xc xnew > > 1  1a  1.3 > > 2   2  2.0 > > 3  1c  1.7 > > 4  1c  1.7 > > 5  1b  1.5 > > 6  1a  1.3 > > 7   2  2.0 > > 8   2  2.0 > > 9  1a  1.3 > > 10 1a  1.3 > > 11 2c  2.7 > > 12 1b  1.5 > > 13 1b  1.5 > > 14  1  1.0 > > 15 1c  1.7 > > > > Echoing others, no claim for optimality in any sense. > > > > Cheers, > > Bert > > > > > > On Fri, Jul 10, 2020 at 12:28 PM David Carlson <[hidden email]> > wrote: > > Here is a different approach: > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > > xn > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > > > David L Carlson > > Professor Emeritus of Anthropology > > Texas A&M University > > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > > > > > Dear Jean-Louis, > > > > > > There must be many ways to do this. Here's one simple way (with no > claim > > > of optimality!): > > > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > set.seed(123) # for reproducibility > > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > > > names(xn) <- xc > > > > z <- xn[x] > > > > > > > > data.frame(z, x) > > >      z  x > > > 1  2.5 2b > > > 2  2.5 2b > > > 3  1.5 1b > > > 4  2.3 2a > > > 5  1.5 1b > > > 6  1.3 1a > > > 7  1.3 1a > > > 8  2.3 2a > > > 9  1.5 1b > > > 10 2.0  2 > > > 11 1.7 1c > > > 12 2.3 2a > > > 13 2.3 2a > > > 14 1.0  1 > > > 15 1.3 1a > > > 16 1.5 1b > > > 17 2.7 2c > > > 18 2.0  2 > > > 19 1.5 1b > > > 20 1.5 1b > > > > > > I hope this helps, > > >  John > > > > > >   ----------------------------- > > >   John Fox, Professor Emeritus > > >   McMaster University > > >   Hamilton, Ontario, Canada > > >   Web: http::/socserv.mcmaster.ca/jfox > > > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> > > > wrote: > > > > > > > > Dear All > > > > > > > > I have a character vector,  representing histology stages, such as > for > > > example: > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do > > > have of course a pre-established  classification available which does > > > change according to the histology criteria under assessment. > > > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector > > > such as > > > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > Unfortunately I have no clue on how to do that. > > > > > > > > Thanks for any help and apologies if I am missing the obvious way to > do > > > it. > > > > > > > > JL > > > > -- > > > > Verif30042020 > > > > > > > > ______________________________________________ > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > > PLEASE do read the posting guide > > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > PLEASE do read the posting guide > > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > and provide commented, minimal, self-contained, reproducible code. > > > > > > >         [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Jean-Louis Abitbol-2 This can be done very simply because vectors in R can have named elements, and can be indexed by strings. > stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7, +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7, +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7) > testdata <- rep(c("1", "1a", "1b", "1c", +                   "2", "2a", "2b", "2c", +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1)) > stage[testdata]   1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a  2a  2a 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 2.3 2.3  2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a  3a  3b 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 3.3 3.3 3.5  3b  3c 3.5 3.7 On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]> wrote: > Dear All > > I have a character vector,  representing histology stages, such as for > example: > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > and this goes on to 3, 3a etc in various order for each patient. I do have > of course a pre-established  classification available which does change > according to the histology criteria under assessment. > > I would want to convert xc, for plotting reasons, to a numeric vector such > as > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > Unfortunately I have no clue on how to do that. > > Thanks for any help and apologies if I am missing the obvious way to do it. > > JL > -- > Verif30042020 > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 xn <- as.numeric(sub("c",".7",sub("b",".5",sub("a",".3",xc)))) On Sat, Jul 11, 2020 at 5:09 AM Richard O'Keefe <[hidden email]> wrote: > This can be done very simply because vectors in R can have > named elements, and can be indexed by strings. > > > stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7, > +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7, > +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7) > > > testdata <- rep(c("1", "1a", "1b", "1c", > +                   "2", "2a", "2b", "2c", > +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1)) > > > stage[testdata] >   1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a  2a >  2a > 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 2.3 > 2.3 >  2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a  3a >  3b > 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 3.3 3.3 > 3.5 >  3b  3c > 3.5 3.7 > > On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]> wrote: > > > Dear All > > > > I have a character vector,  representing histology stages, such as for > > example: > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > and this goes on to 3, 3a etc in various order for each patient. I do > have > > of course a pre-established  classification available which does change > > according to the histology criteria under assessment. > > > > I would want to convert xc, for plotting reasons, to a numeric vector > such > > as > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > Unfortunately I have no clue on how to do that. > > > > Thanks for any help and apologies if I am missing the obvious way to do > it. > > > > JL > > -- > > Verif30042020 > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > > >         [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Jean-Louis Abitbol-2 Hello Jean-Louis, Noting the subject line of your post I thought the first answer would have been encoding histology stages as factors, and "unclass-ing" them to obtain integers that then can be mathematically manipulated. You can get a lot of work done with all the commands listed on the "factor" help page: ?factor samples <- 1:36 values <- runif(length(samples), min=1, max=length(samples)) hist <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8) data1 <- data.frame("samples" = samples, "values" = values, "hist" = hist ) (data1\$hist <- factor(data1\$hist, levels=c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")) ) unclass(data1\$hist) library(RColorBrewer); pal_1 <- brewer.pal(8, "Pastel2") barplot(data1\$value, beside=T, col=pal_1[data1\$hist]) plot(data1\$hist, data1\$value, col=pal_1) pal_2 <- brewer.pal(8, "Dark2") plot(unclass(data1\$hist)/4, data1\$value, pch=19, col=pal_2[data1\$hist] ) group <- c(rep(0,10),rep(1,26)); data1\$group <- group library(lattice); dotplot(hist ~ values | group, data=data1, xlim=c(0,36) ) HTH, Bill. W. Michels, Ph.D. On Fri, Jul 10, 2020 at 1:41 PM Jean-Louis Abitbol <[hidden email]> wrote: > > Many thanks to all. This help-list is wonderful. > > I have used Rich Heiberger solution using match and found something to learn in each answer. > > off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data.... > > Best wishes. > > > On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote: > > Hi, > > > > We've had several solutions, and I was curious about their relative > > efficiency. Here's a test with a moderately large data vector: > > > > > library("microbenchmark") > > > set.seed(123) # for reproducibility > > > x <- sample(xc, 1e4, replace=TRUE) # "data" > > > microbenchmark(John = John <- xn[x], > > +                Rich = Rich <- xn[match(x, xc)], > > +                Jeff = Jeff <- { > > +                 n <- as.integer( sub( "[a-i]\$", "", x ) ) > > +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] ) > > +                 d[ is.na( d ) ] <- 0 > > +                 n + d / 10 > > +                 }, > > +                David = David <- as.numeric(gsub("a", ".3", > > +                                      gsub("b", ".5", > > +                                           gsub("c", ".7", x)))), > > +                times=1000L > > +                ) > > Unit: microseconds > >   expr       min        lq       mean     median         uq       max neval cld > >   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a > >   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a > >   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b > >  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c > > > all.equal(John, Rich) > > [1] TRUE > > > all.equal(John, David) > > [1] "names for target but not for current" > > > all.equal(John, Jeff) > > [1] "names for target but not for current" "Mean relative difference: > > 0.1498243" > > > > Of course, efficiency isn't the only consideration, and aesthetically > > (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH, > > Jeff's solution is more general in that it generates the correspondence > > between letters and numbers. The argument for Jeff's solution would, > > however, be stronger if it gave the desired answer. > > > > Best, > >  John > > > > > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote: > > > > > > Here is a different approach: > > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > > > xn > > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > > > > > David L Carlson > > > Professor Emeritus of Anthropology > > > Texas A&M University > > > > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > > > Dear Jean-Louis, > > > > > > There must be many ways to do this. Here's one simple way (with no claim of optimality!): > > > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > set.seed(123) # for reproducibility > > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > > > names(xn) <- xc > > > > z <- xn[x] > > > > > > > > data.frame(z, x) > > >      z  x > > > 1  2.5 2b > > > 2  2.5 2b > > > 3  1.5 1b > > > 4  2.3 2a > > > 5  1.5 1b > > > 6  1.3 1a > > > 7  1.3 1a > > > 8  2.3 2a > > > 9  1.5 1b > > > 10 2.0  2 > > > 11 1.7 1c > > > 12 2.3 2a > > > 13 2.3 2a > > > 14 1.0  1 > > > 15 1.3 1a > > > 16 1.5 1b > > > 17 2.7 2c > > > 18 2.0  2 > > > 19 1.5 1b > > > 20 1.5 1b > > > > > > I hope this helps, > > >  John > > > > > >   ----------------------------- > > >   John Fox, Professor Emeritus > > >   McMaster University > > >   Hamilton, Ontario, Canada > > >   Web: http::/socserv.mcmaster.ca/jfox > > > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote: > > > > > > > > Dear All > > > > > > > > I have a character vector,  representing histology stages, such as for example: > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment. > > > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > Unfortunately I have no clue on how to do that. > > > > > > > > Thanks for any help and apologies if I am missing the obvious way to do it. > > > > > > > > JL > > > > -- > > > > Verif30042020 > > > > > > > > ______________________________________________ > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Verif30042020 > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Eric Berger The string index approach works with any mapping from stage names to stage numbers, not just regular ones.  For example, if we had "1" -> 1, "1a" -> 1.4, "1b" -> 1.6 "2" -> 2, "2a" -> 2.3, "2b" -> 2.7 the 'sub' version would fail miserably while the string index version would just work.  The 'sub' version would also not work terribly well if the mapping were "1" -> 1, "a1" -> 1.3, "b1" -> 1.5, "c1" -> 1.7 and so on. The thing I like about the indexing approach is that it uses a fundamental operation of the language very directly. Anyone using R would do well to *master* what indexing can do for you. On Sat, 11 Jul 2020 at 17:16, Eric Berger <[hidden email]> wrote: > xn <- as.numeric(sub("c",".7",sub("b",".5",sub("a",".3",xc)))) > > > On Sat, Jul 11, 2020 at 5:09 AM Richard O'Keefe <[hidden email]> wrote: > >> This can be done very simply because vectors in R can have >> named elements, and can be indexed by strings. >> >> > stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7, >> +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7, >> +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7) >> >> > testdata <- rep(c("1", "1a", "1b", "1c", >> +                   "2", "2a", "2b", "2c", >> +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1)) >> >> > stage[testdata] >>   1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a >> 2a >>  2a >> 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 >> 2.3 >> 2.3 >>  2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a >> 3a >>  3b >> 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 3.3 >> 3.3 >> 3.5 >>  3b  3c >> 3.5 3.7 >> >> On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]> >> wrote: >> >> > Dear All >> > >> > I have a character vector,  representing histology stages, such as for >> > example: >> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") >> > >> > and this goes on to 3, 3a etc in various order for each patient. I do >> have >> > of course a pre-established  classification available which does change >> > according to the histology criteria under assessment. >> > >> > I would want to convert xc, for plotting reasons, to a numeric vector >> such >> > as >> > >> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) >> > >> > Unfortunately I have no clue on how to do that. >> > >> > Thanks for any help and apologies if I am missing the obvious way to do >> it. >> > >> > JL >> > -- >> > Verif30042020 >> > >> > ______________________________________________ >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help>> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html>> > and provide commented, minimal, self-contained, reproducible code. >> > >> >>         [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. >> >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by R help mailing list-2 Hello Bill, Thanks. That has indeed the advantage of keeping the histology classification on the  plot instead of some arbitrary numeric scale. Best wishes, JL On Sat, Jul 11, 2020, at 8:25 AM, William Michels wrote: > Hello Jean-Louis, > > Noting the subject line of your post I thought the first answer would > have been encoding histology stages as factors, and "unclass-ing" them > to obtain integers that then can be mathematically manipulated. You > can get a lot of work done with all the commands listed on the > "factor" help page: > > ?factor > samples <- 1:36 > values <- runif(length(samples), min=1, max=length(samples)) > hist <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8) > data1 <- data.frame("samples" = samples, "values" = values, "hist" = hist ) > (data1\$hist <- factor(data1\$hist, levels=c("1", "1a", "1b", "1c", "2", > "2a", "2b", "2c")) ) > unclass(data1\$hist) > > library(RColorBrewer); pal_1 <- brewer.pal(8, "Pastel2") > barplot(data1\$value, beside=T, col=pal_1[data1\$hist]) > plot(data1\$hist, data1\$value, col=pal_1) > pal_2 <- brewer.pal(8, "Dark2") > plot(unclass(data1\$hist)/4, data1\$value, pch=19, col=pal_2[data1\$hist] ) > group <- c(rep(0,10),rep(1,26)); data1\$group <- group > library(lattice); dotplot(hist ~ values | group, data=data1, xlim=c(0,36) ) > > HTH, Bill. > > W. Michels, Ph.D. > > > > > On Fri, Jul 10, 2020 at 1:41 PM Jean-Louis Abitbol <[hidden email]> wrote: > > > > Many thanks to all. This help-list is wonderful. > > > > I have used Rich Heiberger solution using match and found something to learn in each answer. > > > > off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data.... > > > > Best wishes. > > > > > > On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote: > > > Hi, > > > > > > We've had several solutions, and I was curious about their relative > > > efficiency. Here's a test with a moderately large data vector: > > > > > > > library("microbenchmark") > > > > set.seed(123) # for reproducibility > > > > x <- sample(xc, 1e4, replace=TRUE) # "data" > > > > microbenchmark(John = John <- xn[x], > > > +                Rich = Rich <- xn[match(x, xc)], > > > +                Jeff = Jeff <- { > > > +                 n <- as.integer( sub( "[a-i]\$", "", x ) ) > > > +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] ) > > > +                 d[ is.na( d ) ] <- 0 > > > +                 n + d / 10 > > > +                 }, > > > +                David = David <- as.numeric(gsub("a", ".3", > > > +                                      gsub("b", ".5", > > > +                                           gsub("c", ".7", x)))), > > > +                times=1000L > > > +                ) > > > Unit: microseconds > > >   expr       min        lq       mean     median         uq       max neval cld > > >   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a > > >   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a > > >   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b > > >  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c > > > > all.equal(John, Rich) > > > [1] TRUE > > > > all.equal(John, David) > > > [1] "names for target but not for current" > > > > all.equal(John, Jeff) > > > [1] "names for target but not for current" "Mean relative difference: > > > 0.1498243" > > > > > > Of course, efficiency isn't the only consideration, and aesthetically > > > (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH, > > > Jeff's solution is more general in that it generates the correspondence > > > between letters and numbers. The argument for Jeff's solution would, > > > however, be stronger if it gave the desired answer. > > > > > > Best, > > >  John > > > > > > > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote: > > > > > > > > Here is a different approach: > > > > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > > > > xn > > > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > > > > > > > David L Carlson > > > > Professor Emeritus of Anthropology > > > > Texas A&M University > > > > > > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > > > > Dear Jean-Louis, > > > > > > > > There must be many ways to do this. Here's one simple way (with no claim of optimality!): > > > > > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > > > set.seed(123) # for reproducibility > > > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > > > > > names(xn) <- xc > > > > > z <- xn[x] > > > > > > > > > > data.frame(z, x) > > > >      z  x > > > > 1  2.5 2b > > > > 2  2.5 2b > > > > 3  1.5 1b > > > > 4  2.3 2a > > > > 5  1.5 1b > > > > 6  1.3 1a > > > > 7  1.3 1a > > > > 8  2.3 2a > > > > 9  1.5 1b > > > > 10 2.0  2 > > > > 11 1.7 1c > > > > 12 2.3 2a > > > > 13 2.3 2a > > > > 14 1.0  1 > > > > 15 1.3 1a > > > > 16 1.5 1b > > > > 17 2.7 2c > > > > 18 2.0  2 > > > > 19 1.5 1b > > > > 20 1.5 1b > > > > > > > > I hope this helps, > > > >  John > > > > > > > >   ----------------------------- > > > >   John Fox, Professor Emeritus > > > >   McMaster University > > > >   Hamilton, Ontario, Canada > > > >   Web: http::/socserv.mcmaster.ca/jfox > > > > > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote: > > > > > > > > > > Dear All > > > > > > > > > > I have a character vector,  representing histology stages, such as for example: > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment. > > > > > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > > > > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > > > Unfortunately I have no clue on how to do that. > > > > > > > > > > Thanks for any help and apologies if I am missing the obvious way to do it. > > > > > > > > > > JL > > > > > -- > > > > > Verif30042020 > > > > > > > > > > ______________________________________________ > > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > ______________________________________________ > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > -- > > Verif30042020 > > > > ______________________________________________ > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Verif30042020 ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 Agreed, I meant to add this line (for unclassed factor levels 1-through-8): > ((1:8 - 1)*(0.25))+1 [1] 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 Depending on the circumstance, you can also consider using dummy factors or even "NA" as a level; see the "factor" help page for details. Best, Bill. W. Michels, Ph.D. On Sat, Jul 11, 2020 at 12:16 AM Jean-Louis Abitbol <[hidden email]> wrote: > > Hello Bill, > > Thanks. > > That has indeed the advantage of keeping the histology classification on the  plot instead of some arbitrary numeric scale. > > Best wishes, JL > > On Sat, Jul 11, 2020, at 8:25 AM, William Michels wrote: > > Hello Jean-Louis, > > > > Noting the subject line of your post I thought the first answer would > > have been encoding histology stages as factors, and "unclass-ing" them > > to obtain integers that then can be mathematically manipulated. You > > can get a lot of work done with all the commands listed on the > > "factor" help page: > > > > ?factor > > samples <- 1:36 > > values <- runif(length(samples), min=1, max=length(samples)) > > hist <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8) > > data1 <- data.frame("samples" = samples, "values" = values, "hist" = hist ) > > (data1\$hist <- factor(data1\$hist, levels=c("1", "1a", "1b", "1c", "2", > > "2a", "2b", "2c")) ) > > unclass(data1\$hist) > > > > library(RColorBrewer); pal_1 <- brewer.pal(8, "Pastel2") > > barplot(data1\$value, beside=T, col=pal_1[data1\$hist]) > > plot(data1\$hist, data1\$value, col=pal_1) > > pal_2 <- brewer.pal(8, "Dark2") > > plot(unclass(data1\$hist)/4, data1\$value, pch=19, col=pal_2[data1\$hist] ) > > group <- c(rep(0,10),rep(1,26)); data1\$group <- group > > library(lattice); dotplot(hist ~ values | group, data=data1, xlim=c(0,36) ) > > > > HTH, Bill. > > > > W. Michels, Ph.D. > > > > > > > > > > On Fri, Jul 10, 2020 at 1:41 PM Jean-Louis Abitbol <[hidden email]> wrote: > > > > > > Many thanks to all. This help-list is wonderful. > > > > > > I have used Rich Heiberger solution using match and found something to learn in each answer. > > > > > > off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data.... > > > > > > Best wishes. > > > > > > > > > On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote: > > > > Hi, > > > > > > > > We've had several solutions, and I was curious about their relative > > > > efficiency. Here's a test with a moderately large data vector: > > > > > > > > > library("microbenchmark") > > > > > set.seed(123) # for reproducibility > > > > > x <- sample(xc, 1e4, replace=TRUE) # "data" > > > > > microbenchmark(John = John <- xn[x], > > > > +                Rich = Rich <- xn[match(x, xc)], > > > > +                Jeff = Jeff <- { > > > > +                 n <- as.integer( sub( "[a-i]\$", "", x ) ) > > > > +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] ) > > > > +                 d[ is.na( d ) ] <- 0 > > > > +                 n + d / 10 > > > > +                 }, > > > > +                David = David <- as.numeric(gsub("a", ".3", > > > > +                                      gsub("b", ".5", > > > > +                                           gsub("c", ".7", x)))), > > > > +                times=1000L > > > > +                ) > > > > Unit: microseconds > > > >   expr       min        lq       mean     median         uq       max neval cld > > > >   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a > > > >   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a > > > >   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b > > > >  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c > > > > > all.equal(John, Rich) > > > > [1] TRUE > > > > > all.equal(John, David) > > > > [1] "names for target but not for current" > > > > > all.equal(John, Jeff) > > > > [1] "names for target but not for current" "Mean relative difference: > > > > 0.1498243" > > > > > > > > Of course, efficiency isn't the only consideration, and aesthetically > > > > (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH, > > > > Jeff's solution is more general in that it generates the correspondence > > > > between letters and numbers. The argument for Jeff's solution would, > > > > however, be stronger if it gave the desired answer. > > > > > > > > Best, > > > >  John > > > > > > > > > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote: > > > > > > > > > > Here is a different approach: > > > > > > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > > > > > xn > > > > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > > > > > > > > > David L Carlson > > > > > Professor Emeritus of Anthropology > > > > > Texas A&M University > > > > > > > > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote: > > > > > Dear Jean-Louis, > > > > > > > > > > There must be many ways to do this. Here's one simple way (with no claim of optimality!): > > > > > > > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > > > > > set.seed(123) # for reproducibility > > > > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > > > > > > > names(xn) <- xc > > > > > > z <- xn[x] > > > > > > > > > > > > data.frame(z, x) > > > > >      z  x > > > > > 1  2.5 2b > > > > > 2  2.5 2b > > > > > 3  1.5 1b > > > > > 4  2.3 2a > > > > > 5  1.5 1b > > > > > 6  1.3 1a > > > > > 7  1.3 1a > > > > > 8  2.3 2a > > > > > 9  1.5 1b > > > > > 10 2.0  2 > > > > > 11 1.7 1c > > > > > 12 2.3 2a > > > > > 13 2.3 2a > > > > > 14 1.0  1 > > > > > 15 1.3 1a > > > > > 16 1.5 1b > > > > > 17 2.7 2c > > > > > 18 2.0  2 > > > > > 19 1.5 1b > > > > > 20 1.5 1b > > > > > > > > > > I hope this helps, > > > > >  John > > > > > > > > > >   ----------------------------- > > > > >   John Fox, Professor Emeritus > > > > >   McMaster University > > > > >   Hamilton, Ontario, Canada > > > > >   Web: http::/socserv.mcmaster.ca/jfox > > > > > > > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote: > > > > > > > > > > > > Dear All > > > > > > > > > > > > I have a character vector,  representing histology stages, such as for example: > > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment. > > > > > > > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > > > > > > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > > > > > Unfortunately I have no clue on how to do that. > > > > > > > > > > > > Thanks for any help and apologies if I am missing the obvious way to do it. > > > > > > > > > > > > JL > > > > > > -- > > > > > > Verif30042020 > > > > > > > > > > > > ______________________________________________ > > > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > ______________________________________________ > > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I\$> > > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk\$> > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > -- > > > Verif30042020 > > > > > > ______________________________________________ > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Verif30042020 ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Richard O'Keefe-2 There are many ways to do what is requested and some are fairly simple and robust. A simple switch statement will do if you write some code but consider using a function from some package for simple vectors or factors. You could use the recode() or recode_factor() functions in package dplyr or other similar functions elsewhere and type in the conversions like so: library("dplyr") xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) sample <- rep(xc, each=3) recode(sample,        "1" = 1,        "1a" = 1.3,        "1b" = 1.5,        "1c" = 1.7,        "2" = 2,        "2a" =2.3,        "2b" = 2.5,        "2c" = 2.7) That returns: [1] 1.0 1.0 1.0 1.3 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 2.0 2.0 2.0 2.3 2.3 2.3 2.5 2.5 2.5 2.7 2.7 2.7 To use the original vectors would be a tad harder but doable perhaps using some indirection. As has been noted, you need to be careful in matching things to use the entire item from beginning to end as matching  a substring can produce odd results. If you add this code to the above, in a silly way, it works for a more general case: library(glue) converted <- sample for (i in 1:length(xc)) {   converted <- sub(glue("^{xc[i]}\$"), xn[i], converted) } result <- as.numeric(converted) Returns: > result  [1] 1.0 1.0 1.0 1.3 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 2.0 2.0 2.0 2.3 2.3 2.3 2.5 2.5 2.5 2.7 2.7 2.7 Not necessarily efficient but it works. You could use something like glue::glue() to create the arguments you want to use for something like recode() in more complex cases and so on. I think we have had enough solutions and methods posted but there are likely many more as there is rarely only one way to do things in R. -----Original Message----- From: R-help <[hidden email]> On Behalf Of Richard O'Keefe Sent: Saturday, July 11, 2020 3:02 AM To: Eric Berger <[hidden email]> Cc: Jean-Louis Abitbol <[hidden email]>; R Project Help <[hidden email]> Subject: Re: [R] Character (1a, 1b) to numeric The string index approach works with any mapping from stage names to stage numbers, not just regular ones.  For example, if we had "1" -> 1, "1a" -> 1.4, "1b" -> 1.6 "2" -> 2, "2a" -> 2.3, "2b" -> 2.7 the 'sub' version would fail miserably while the string index version would just work.  The 'sub' version would also not work terribly well if the mapping were "1" -> 1, "a1" -> 1.3, "b1" -> 1.5, "c1" -> 1.7 and so on. The thing I like about the indexing approach is that it uses a fundamental operation of the language very directly. Anyone using R would do well to *master* what indexing can do for you. On Sat, 11 Jul 2020 at 17:16, Eric Berger <[hidden email]> wrote: > xn <- as.numeric(sub("c",".7",sub("b",".5",sub("a",".3",xc)))) > > > On Sat, Jul 11, 2020 at 5:09 AM Richard O'Keefe <[hidden email]> wrote: > >> This can be done very simply because vectors in R can have named >> elements, and can be indexed by strings. >> >> > stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7, >> +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7, >> +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7) >> >> > testdata <- rep(c("1", "1a", "1b", "1c", >> +                   "2", "2a", "2b", "2c", >> +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1)) >> >> > stage[testdata] >>   1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a >> 2a >>  2a >> 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 >> 2.3 >> 2.3 >> 2.3 >>  2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a >> 3a >>  3b >> 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 >> 3.3 >> 3.3 >> 3.5 >>  3b  3c >> 3.5 3.7 >> >> On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]> >> wrote: >> >> > Dear All >> > >> > I have a character vector,  representing histology stages, such as >> > for >> > example: >> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") >> > >> > and this goes on to 3, 3a etc in various order for each patient. I >> > do >> have >> > of course a pre-established  classification available which does >> > change according to the histology criteria under assessment. >> > >> > I would want to convert xc, for plotting reasons, to a numeric >> > vector >> such >> > as >> > >> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) >> > >> > Unfortunately I have no clue on how to do that. >> > >> > Thanks for any help and apologies if I am missing the obvious way >> > to do >> it. >> > >> > JL >> > -- >> > Verif30042020 >> > >> > ______________________________________________ >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help>> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html>> > and provide commented, minimal, self-contained, reproducible code. >> > >> >>         [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. >> >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Richard O'Keefe-2 It might be easier to simply assign names to the numeric vector if you already have numeric and character vectors of the right lengths. Using Heibergers's vectors: xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) names(xn) <- xc testdata <- rep(c("1", "1a", "1b", "1c",                      "2", "2a", "2b", "2c",                     "3", "3a", "3b", "3c"), times=c(1:6,6:1)) xn[ testdata ]  #  NA's when there's no match is a feature. -- David. On 7/10/20 7:08 PM, Richard O'Keefe wrote: > This can be done very simply because vectors in R can have > named elements, and can be indexed by strings. > >> stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7, > +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7, > +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7) > >> testdata <- rep(c("1", "1a", "1b", "1c", > +                   "2", "2a", "2b", "2c", > +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1)) > >> stage[testdata] >    1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a  2a >   2a > 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 2.3 > 2.3 >   2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a  3a >   3b > 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 3.3 3.3 > 3.5 >   3b  3c > 3.5 3.7 > > On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]> wrote: > >> Dear All >> >> I have a character vector,  representing histology stages, such as for >> example: >> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") >> >> and this goes on to 3, 3a etc in various order for each patient. I do have >> of course a pre-established  classification available which does change >> according to the histology criteria under assessment. >> >> I would want to convert xc, for plotting reasons, to a numeric vector such >> as >> >> xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) >> >> Unfortunately I have no clue on how to do that. >> >> Thanks for any help and apologies if I am missing the obvious way to do it. >> >> JL >> -- >> Verif30042020 >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Character (1a, 1b) to numeric

 In reply to this post by Fox, John On Sat, Jul 11, 2020 at 8:04 AM Fox, John <[hidden email]> wrote: > We've had several solutions, and I was curious about their relative efficiency. Here's a test Am I the only person on this mailing list who learnt to program with ASCII...? In theory, the most ***efficient*** solution, is to get the ASCII/UTF8/etc values. Then use a simple (math) formula. No matching, no searching, required ... Here's one possibility:     xc <-  c ("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")     I <- (nchar (xc) == 2)     xn <- as.integer (substring (xc, 1, 1) )     xn [I] <- xn [I] + (utf8ToInt (paste (substring (xc [I], 2, 2), collapse="") ) - 96) / 4     xn Unfortunately, this makes R look bad. The corresponding C implementation is simpler and presumably the performance winner. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.