Character (1a, 1b) to numeric

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Character (1a, 1b) to numeric

Jean-Louis Abitbol-2
Dear All

I have a character vector,  representing histology stages, such as for example:
xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")

and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.

I would want to convert xc, for plotting reasons, to a numeric vector such as

xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)

Unfortunately I have no clue on how to do that.

Thanks for any help and apologies if I am missing the obvious way to do it.

JL
--
Verif30042020

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Fox, John
Dear Jean-Louis,

There must be many ways to do this. Here's one simple way (with no claim of optimality!):

> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
>
> set.seed(123) # for reproducibility
> x <- sample(xc, 20, replace=TRUE) # "data"
>
> names(xn) <- xc
> z <- xn[x]
>
> data.frame(z, x)
     z  x
1  2.5 2b
2  2.5 2b
3  1.5 1b
4  2.3 2a
5  1.5 1b
6  1.3 1a
7  1.3 1a
8  2.3 2a
9  1.5 1b
10 2.0  2
11 1.7 1c
12 2.3 2a
13 2.3 2a
14 1.0  1
15 1.3 1a
16 1.5 1b
17 2.7 2c
18 2.0  2
19 1.5 1b
20 1.5 1b

I hope this helps,
 John

  -----------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote:
>
> Dear All
>
> I have a character vector,  representing histology stages, such as for example:
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
>
> and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
>
> I would want to convert xc, for plotting reasons, to a numeric vector such as
>
> xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
>
> Unfortunately I have no clue on how to do that.
>
> Thanks for any help and apologies if I am missing the obvious way to do it.
>
> JL
> --
> Verif30042020
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] Character (1a, 1b) to numeric

Richard M. Heiberger
In reply to this post by Jean-Louis Abitbol-2
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> testdata <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8)
> testdata
 [1] "1"  "1a" "1a" "1b" "1b" "1b" "1c" "1c" "1c" "1c" "2"  "2"  "2"  "2"  "2"
[16] "2a" "2a" "2a" "2a" "2a" "2a" "2b" "2b" "2b" "2b" "2b" "2b" "2b" "2c" "2c"
[31] "2c" "2c" "2c" "2c" "2c" "2c"
> ?match
> xn[match(testdata, xc)]
 [1] 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 2.3
[20] 2.3 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 2.7 2.7 2.7
>

On Fri, Jul 10, 2020 at 1:51 PM Jean-Louis Abitbol <[hidden email]> wrote:

>
> Dear All
>
> I have a character vector,  representing histology stages, such as for example:
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
>
> and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
>
> I would want to convert xc, for plotting reasons, to a numeric vector such as
>
> xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
>
> Unfortunately I have no clue on how to do that.
>
> Thanks for any help and apologies if I am missing the obvious way to do it.
>
> JL
> --
> Verif30042020
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Jeff Newmiller
In reply to this post by Jean-Louis Abitbol-2
Obvious is in the eye of the beholder. Presuming your letters don't go beyond "i":

a) Lookup table:

tbl <- read.table( text=
"OldCode  NewCode
1         1
1a        1.1
1b        1.2
1c        1.3
2         2
2a        2.1
2b        2.2
", as.is=TRUE, header=TRUE )

tblv <- setNames( tbl$NewCode, tbl$OldCode )
test <- c( "2", "1c", "2b" )
as.vector( tblv[ test ] )

b) String manipulation:

n <- as.integer( sub( "[a-i]$", "", test ) )
d <- match( sub( "^\\d+", "", test ), letters[1:9] )
d[ is.na( d ) ] <- 0
n + d / 10

On July 10, 2020 10:50:18 AM PDT, Jean-Louis Abitbol <[hidden email]> wrote:

>Dear All
>
>I have a character vector,  representing histology stages, such as for
>example:
>xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
>
>and this goes on to 3, 3a etc in various order for each patient. I do
>have of course a pre-established  classification available which does
>change according to the histology criteria under assessment.
>
>I would want to convert xc, for plotting reasons, to a numeric vector
>such as
>
>xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
>
>Unfortunately I have no clue on how to do that.
>
>Thanks for any help and apologies if I am missing the obvious way to do
>it.
>
>JL

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

David Carlson
In reply to this post by Fox, John
Here is a different approach:

xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
xn
# [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7

David L Carlson
Professor Emeritus of Anthropology
Texas A&M University

On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:

> Dear Jean-Louis,
>
> There must be many ways to do this. Here's one simple way (with no claim
> of optimality!):
>
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> >
> > set.seed(123) # for reproducibility
> > x <- sample(xc, 20, replace=TRUE) # "data"
> >
> > names(xn) <- xc
> > z <- xn[x]
> >
> > data.frame(z, x)
>      z  x
> 1  2.5 2b
> 2  2.5 2b
> 3  1.5 1b
> 4  2.3 2a
> 5  1.5 1b
> 6  1.3 1a
> 7  1.3 1a
> 8  2.3 2a
> 9  1.5 1b
> 10 2.0  2
> 11 1.7 1c
> 12 2.3 2a
> 13 2.3 2a
> 14 1.0  1
> 15 1.3 1a
> 16 1.5 1b
> 17 2.7 2c
> 18 2.0  2
> 19 1.5 1b
> 20 1.5 1b
>
> I hope this helps,
>  John
>
>   -----------------------------
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
>
> > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]>
> wrote:
> >
> > Dear All
> >
> > I have a character vector,  representing histology stages, such as for
> example:
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> >
> > and this goes on to 3, 3a etc in various order for each patient. I do
> have of course a pre-established  classification available which does
> change according to the histology criteria under assessment.
> >
> > I would want to convert xc, for plotting reasons, to a numeric vector
> such as
> >
> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> >
> > Unfortunately I have no clue on how to do that.
> >
> > Thanks for any help and apologies if I am missing the obvious way to do
> it.
> >
> > JL
> > --
> > Verif30042020
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > PLEASE do read the posting guide
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> PLEASE do read the posting guide
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Bert Gunter-2
... and continuing with this cute little thread...

I found the OP's specification a little imprecise -- are your values always
a string that begins with *some sort" of numeric value followed by "some
sort" of alpha code? That is, could the numeric value be several digits and
the alpha code several letters? Probably not, and the existing solutions
you have been provided are almost certainly all you need. But for fun,
assuming this more general specification, here is a general way to split
your alphanumeric codes up into numeric and alpha parts and then convert by
using a couple of sub() 's.

> set.seed(131)
> xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, replace
= TRUE)
> nums <- sub("[[:alpha:]]+","",xc)  ## extract numeric part
> alph <- sub("\\d+","",xc)   ## extract alpha part
> codes <- letters[1:3] ## whatever alpha codes are used
> vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to
convert codes to
> xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph])
> data.frame (xc = xc, xnew = xnew)
   xc xnew
1  1a  1.3
2   2  2.0
3  1c  1.7
4  1c  1.7
5  1b  1.5
6  1a  1.3
7   2  2.0
8   2  2.0
9  1a  1.3
10 1a  1.3
11 2c  2.7
12 1b  1.5
13 1b  1.5
14  1  1.0
15 1c  1.7

Echoing others, no claim for optimality in any sense.

Cheers,
Bert


On Fri, Jul 10, 2020 at 12:28 PM David Carlson <[hidden email]> wrote:

> Here is a different approach:
>
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> xn
> # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
>
> David L Carlson
> Professor Emeritus of Anthropology
> Texas A&M University
>
> On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:
>
> > Dear Jean-Louis,
> >
> > There must be many ways to do this. Here's one simple way (with no claim
> > of optimality!):
> >
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > >
> > > set.seed(123) # for reproducibility
> > > x <- sample(xc, 20, replace=TRUE) # "data"
> > >
> > > names(xn) <- xc
> > > z <- xn[x]
> > >
> > > data.frame(z, x)
> >      z  x
> > 1  2.5 2b
> > 2  2.5 2b
> > 3  1.5 1b
> > 4  2.3 2a
> > 5  1.5 1b
> > 6  1.3 1a
> > 7  1.3 1a
> > 8  2.3 2a
> > 9  1.5 1b
> > 10 2.0  2
> > 11 1.7 1c
> > 12 2.3 2a
> > 13 2.3 2a
> > 14 1.0  1
> > 15 1.3 1a
> > 16 1.5 1b
> > 17 2.7 2c
> > 18 2.0  2
> > 19 1.5 1b
> > 20 1.5 1b
> >
> > I hope this helps,
> >  John
> >
> >   -----------------------------
> >   John Fox, Professor Emeritus
> >   McMaster University
> >   Hamilton, Ontario, Canada
> >   Web: http::/socserv.mcmaster.ca/jfox
> >
> > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]>
> > wrote:
> > >
> > > Dear All
> > >
> > > I have a character vector,  representing histology stages, such as for
> > example:
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > >
> > > and this goes on to 3, 3a etc in various order for each patient. I do
> > have of course a pre-established  classification available which does
> > change according to the histology criteria under assessment.
> > >
> > > I would want to convert xc, for plotting reasons, to a numeric vector
> > such as
> > >
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > >
> > > Unfortunately I have no clue on how to do that.
> > >
> > > Thanks for any help and apologies if I am missing the obvious way to do
> > it.
> > >
> > > JL
> > > --
> > > Verif30042020
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > PLEASE do read the posting guide
> >
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > PLEASE do read the posting guide
> >
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Fox, John
In reply to this post by David Carlson
Hi,

We've had several solutions, and I was curious about their relative efficiency. Here's a test with a moderately large data vector:

> library("microbenchmark")
> set.seed(123) # for reproducibility
> x <- sample(xc, 1e4, replace=TRUE) # "data"
> microbenchmark(John = John <- xn[x],
+                Rich = Rich <- xn[match(x, xc)],
+                Jeff = Jeff <- {
+                 n <- as.integer( sub( "[a-i]$", "", x ) )
+                 d <- match( sub( "^\\d+", "", x ), letters[1:9] )
+                 d[ is.na( d ) ] <- 0
+                 n + d / 10
+                 },
+                David = David <- as.numeric(gsub("a", ".3",
+                                      gsub("b", ".5",
+                                           gsub("c", ".7", x)))),
+                times=1000L
+                )
Unit: microseconds
  expr       min        lq       mean     median         uq       max neval cld
  John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a  
  Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a  
  Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b
 David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c
> all.equal(John, Rich)
[1] TRUE
> all.equal(John, David)
[1] "names for target but not for current"
> all.equal(John, Jeff)
[1] "names for target but not for current" "Mean relative difference: 0.1498243"

Of course, efficiency isn't the only consideration, and aesthetically (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH, Jeff's solution is more general in that it generates the correspondence between letters and numbers. The argument for Jeff's solution would, however, be stronger if it gave the desired answer.

Best,
 John

> On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote:
>
> Here is a different approach:
>
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> xn
> # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
>
> David L Carlson
> Professor Emeritus of Anthropology
> Texas A&M University
>
> On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:
> Dear Jean-Louis,
>
> There must be many ways to do this. Here's one simple way (with no claim of optimality!):
>
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> >
> > set.seed(123) # for reproducibility
> > x <- sample(xc, 20, replace=TRUE) # "data"
> >
> > names(xn) <- xc
> > z <- xn[x]
> >
> > data.frame(z, x)
>      z  x
> 1  2.5 2b
> 2  2.5 2b
> 3  1.5 1b
> 4  2.3 2a
> 5  1.5 1b
> 6  1.3 1a
> 7  1.3 1a
> 8  2.3 2a
> 9  1.5 1b
> 10 2.0  2
> 11 1.7 1c
> 12 2.3 2a
> 13 2.3 2a
> 14 1.0  1
> 15 1.3 1a
> 16 1.5 1b
> 17 2.7 2c
> 18 2.0  2
> 19 1.5 1b
> 20 1.5 1b
>
> I hope this helps,
>  John
>
>   -----------------------------
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
>
> > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote:
> >
> > Dear All
> >
> > I have a character vector,  representing histology stages, such as for example:
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> >
> > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
> >
> > I would want to convert xc, for plotting reasons, to a numeric vector such as
> >
> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> >
> > Unfortunately I have no clue on how to do that.
> >
> > Thanks for any help and apologies if I am missing the obvious way to do it.
> >
> > JL
> > --
> > Verif30042020
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ 
> > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ 
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ 
> PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ 
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Jean-Louis Abitbol-2
Many thanks to all. This help-list is wonderful.

I have used Rich Heiberger solution using match and found something to learn in each answer.

off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data....

Best wishes.


On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote:

> Hi,
>
> We've had several solutions, and I was curious about their relative
> efficiency. Here's a test with a moderately large data vector:
>
> > library("microbenchmark")
> > set.seed(123) # for reproducibility
> > x <- sample(xc, 1e4, replace=TRUE) # "data"
> > microbenchmark(John = John <- xn[x],
> +                Rich = Rich <- xn[match(x, xc)],
> +                Jeff = Jeff <- {
> +                 n <- as.integer( sub( "[a-i]$", "", x ) )
> +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] )
> +                 d[ is.na( d ) ] <- 0
> +                 n + d / 10
> +                 },
> +                David = David <- as.numeric(gsub("a", ".3",
> +                                      gsub("b", ".5",
> +                                           gsub("c", ".7", x)))),
> +                times=1000L
> +                )
> Unit: microseconds
>   expr       min        lq       mean     median         uq       max neval cld
>   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a  
>   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a  
>   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b
>  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c
> > all.equal(John, Rich)
> [1] TRUE
> > all.equal(John, David)
> [1] "names for target but not for current"
> > all.equal(John, Jeff)
> [1] "names for target but not for current" "Mean relative difference:
> 0.1498243"
>
> Of course, efficiency isn't the only consideration, and aesthetically
> (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH,
> Jeff's solution is more general in that it generates the correspondence
> between letters and numbers. The argument for Jeff's solution would,
> however, be stronger if it gave the desired answer.
>
> Best,
>  John
>
> > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote:
> >
> > Here is a different approach:
> >
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> > xn
> > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> >
> > David L Carlson
> > Professor Emeritus of Anthropology
> > Texas A&M University
> >
> > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:
> > Dear Jean-Louis,
> >
> > There must be many ways to do this. Here's one simple way (with no claim of optimality!):
> >
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > >
> > > set.seed(123) # for reproducibility
> > > x <- sample(xc, 20, replace=TRUE) # "data"
> > >
> > > names(xn) <- xc
> > > z <- xn[x]
> > >
> > > data.frame(z, x)
> >      z  x
> > 1  2.5 2b
> > 2  2.5 2b
> > 3  1.5 1b
> > 4  2.3 2a
> > 5  1.5 1b
> > 6  1.3 1a
> > 7  1.3 1a
> > 8  2.3 2a
> > 9  1.5 1b
> > 10 2.0  2
> > 11 1.7 1c
> > 12 2.3 2a
> > 13 2.3 2a
> > 14 1.0  1
> > 15 1.3 1a
> > 16 1.5 1b
> > 17 2.7 2c
> > 18 2.0  2
> > 19 1.5 1b
> > 20 1.5 1b
> >
> > I hope this helps,
> >  John
> >
> >   -----------------------------
> >   John Fox, Professor Emeritus
> >   McMaster University
> >   Hamilton, Ontario, Canada
> >   Web: http::/socserv.mcmaster.ca/jfox
> >
> > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote:
> > >
> > > Dear All
> > >
> > > I have a character vector,  representing histology stages, such as for example:
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > >
> > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
> > >
> > > I would want to convert xc, for plotting reasons, to a numeric vector such as
> > >
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > >
> > > Unfortunately I have no clue on how to do that.
> > >
> > > Thanks for any help and apologies if I am missing the obvious way to do it.
> > >
> > > JL
> > > --
> > > Verif30042020
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ 
> > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ 
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ 
> > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ 
> > and provide commented, minimal, self-contained, reproducible code.
>
>

--
Verif30042020

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Fox, John
In reply to this post by Bert Gunter-2
Dear Bert,

Wouldn't you know it, but your contribution arrived just after I pressed "send" on my last message? So here's how your solution compares:

> microbenchmark(John = John <- xn[x],
+                Rich = Rich <- xn[match(x, xc)],
+                Jeff = Jeff <- {
+                   n <- as.integer( sub( "[a-i]$", "", x ) )
+                   d <- match( sub( "^\\d+", "", x ), letters[1:9] )
+                   d[ is.na( d ) ] <- 0
+                   n + d / 10
+                },
+                David = David <- as.numeric(gsub("a", ".3",
+                                      gsub("b", ".5",
+                                           gsub("c", ".7", x)))),
+                Bert = Bert <- {
+                   nums <- sub("[[:alpha:]]+","",x)  
+                   alph <- sub("\\d+","",x)  
+                   as.numeric(nums) + ifelse(alph == "",0, vals[alph])
+                },
+                times=1000L
+                )
Unit: microseconds
  expr       min         lq       mean    median         uq       max neval  cld
  John   261.739   373.9765   599.9411   536.571   569.3750  14489.48  1000 a  
  Rich   250.697   372.4450   542.3208   520.383   554.7215  10682.73  1000 a  
  Jeff 10879.223 13477.7665 15647.7856 15549.255 17516.7420 146155.28  1000  b  
 David 14337.510 18375.0100 20325.8796 20187.174 22161.0195  32575.31  1000    d
  Bert 12344.506 15753.2510 18024.2757 17702.838 19973.0465  32043.80  1000   c
> all.equal(John, Rich)
[1] TRUE
> all.equal(John, David)
[1] "names for target but not for current"
> all.equal(John, Jeff)
[1] "names for target but not for current" "Mean relative difference: 0.1498243"
> all.equal(John, Bert)
[1] "names for target but not for current"

To make the comparison fair, I moved the parts of the solutions that don't depend on the length of the data outside the benchmark. Your solution does have the virtue of providing the right answer.

Best,
 John

> On Jul 10, 2020, at 3:54 PM, Bert Gunter <[hidden email]> wrote:
>
> ... and continuing with this cute little thread...
>
> I found the OP's specification a little imprecise -- are your values always a string that begins with *some sort" of numeric value followed by "some sort" of alpha code? That is, could the numeric value be several digits and the alpha code several letters? Probably not, and the existing solutions you have been provided are almost certainly all you need. But for fun, assuming this more general specification, here is a general way to split your alphanumeric codes up into numeric and alpha parts and then convert by using a couple of sub() 's.
>
> > set.seed(131)
> > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, replace = TRUE)
> > nums <- sub("[[:alpha:]]+","",xc)  ## extract numeric part
> > alph <- sub("\\d+","",xc)   ## extract alpha part
> > codes <- letters[1:3] ## whatever alpha codes are used
> > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to convert codes to
> > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph])
> > data.frame (xc = xc, xnew = xnew)
>    xc xnew
> 1  1a  1.3
> 2   2  2.0
> 3  1c  1.7
> 4  1c  1.7
> 5  1b  1.5
> 6  1a  1.3
> 7   2  2.0
> 8   2  2.0
> 9  1a  1.3
> 10 1a  1.3
> 11 2c  2.7
> 12 1b  1.5
> 13 1b  1.5
> 14  1  1.0
> 15 1c  1.7
>
> Echoing others, no claim for optimality in any sense.
>
> Cheers,
> Bert
>
>
> On Fri, Jul 10, 2020 at 12:28 PM David Carlson <[hidden email]> wrote:
> Here is a different approach:
>
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> xn
> # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
>
> David L Carlson
> Professor Emeritus of Anthropology
> Texas A&M University
>
> On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:
>
> > Dear Jean-Louis,
> >
> > There must be many ways to do this. Here's one simple way (with no claim
> > of optimality!):
> >
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > >
> > > set.seed(123) # for reproducibility
> > > x <- sample(xc, 20, replace=TRUE) # "data"
> > >
> > > names(xn) <- xc
> > > z <- xn[x]
> > >
> > > data.frame(z, x)
> >      z  x
> > 1  2.5 2b
> > 2  2.5 2b
> > 3  1.5 1b
> > 4  2.3 2a
> > 5  1.5 1b
> > 6  1.3 1a
> > 7  1.3 1a
> > 8  2.3 2a
> > 9  1.5 1b
> > 10 2.0  2
> > 11 1.7 1c
> > 12 2.3 2a
> > 13 2.3 2a
> > 14 1.0  1
> > 15 1.3 1a
> > 16 1.5 1b
> > 17 2.7 2c
> > 18 2.0  2
> > 19 1.5 1b
> > 20 1.5 1b
> >
> > I hope this helps,
> >  John
> >
> >   -----------------------------
> >   John Fox, Professor Emeritus
> >   McMaster University
> >   Hamilton, Ontario, Canada
> >   Web: http::/socserv.mcmaster.ca/jfox
> >
> > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]>
> > wrote:
> > >
> > > Dear All
> > >
> > > I have a character vector,  representing histology stages, such as for
> > example:
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > >
> > > and this goes on to 3, 3a etc in various order for each patient. I do
> > have of course a pre-established  classification available which does
> > change according to the histology criteria under assessment.
> > >
> > > I would want to convert xc, for plotting reasons, to a numeric vector
> > such as
> > >
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > >
> > > Unfortunately I have no clue on how to do that.
> > >
> > > Thanks for any help and apologies if I am missing the obvious way to do
> > it.
> > >
> > > JL
> > > --
> > > Verif30042020
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > PLEASE do read the posting guide
> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > PLEASE do read the posting guide
> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Bert Gunter-2
Thanks! As I said, cute exercise.

Best,
Bert




On Fri, Jul 10, 2020 at 1:21 PM Fox, John <[hidden email]> wrote:

> Dear Bert,
>
> Wouldn't you know it, but your contribution arrived just after I pressed
> "send" on my last message? So here's how your solution compares:
>
> > microbenchmark(John = John <- xn[x],
> +                Rich = Rich <- xn[match(x, xc)],
> +                Jeff = Jeff <- {
> +                   n <- as.integer( sub( "[a-i]$", "", x ) )
> +                   d <- match( sub( "^\\d+", "", x ), letters[1:9] )
> +                   d[ is.na( d ) ] <- 0
> +                   n + d / 10
> +                },
> +                David = David <- as.numeric(gsub("a", ".3",
> +                                      gsub("b", ".5",
> +                                           gsub("c", ".7", x)))),
> +                Bert = Bert <- {
> +                   nums <- sub("[[:alpha:]]+","",x)
> +                   alph <- sub("\\d+","",x)
> +                   as.numeric(nums) + ifelse(alph == "",0, vals[alph])
> +                },
> +                times=1000L
> +                )
> Unit: microseconds
>   expr       min         lq       mean    median         uq       max
> neval  cld
>   John   261.739   373.9765   599.9411   536.571   569.3750  14489.48
> 1000 a
>   Rich   250.697   372.4450   542.3208   520.383   554.7215  10682.73
> 1000 a
>   Jeff 10879.223 13477.7665 15647.7856 15549.255 17516.7420 146155.28
> 1000  b
>  David 14337.510 18375.0100 20325.8796 20187.174 22161.0195  32575.31
> 1000    d
>   Bert 12344.506 15753.2510 18024.2757 17702.838 19973.0465  32043.80
> 1000   c
> > all.equal(John, Rich)
> [1] TRUE
> > all.equal(John, David)
> [1] "names for target but not for current"
> > all.equal(John, Jeff)
> [1] "names for target but not for current" "Mean relative difference:
> 0.1498243"
> > all.equal(John, Bert)
> [1] "names for target but not for current"
>
> To make the comparison fair, I moved the parts of the solutions that don't
> depend on the length of the data outside the benchmark. Your solution does
> have the virtue of providing the right answer.
>
> Best,
>  John
>
> > On Jul 10, 2020, at 3:54 PM, Bert Gunter <[hidden email]> wrote:
> >
> > ... and continuing with this cute little thread...
> >
> > I found the OP's specification a little imprecise -- are your values
> always a string that begins with *some sort" of numeric value followed by
> "some sort" of alpha code? That is, could the numeric value be several
> digits and the alpha code several letters? Probably not, and the existing
> solutions you have been provided are almost certainly all you need. But for
> fun, assuming this more general specification, here is a general way to
> split your alphanumeric codes up into numeric and alpha parts and then
> convert by using a couple of sub() 's.
> >
> > > set.seed(131)
> > > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15,
> replace = TRUE)
> > > nums <- sub("[[:alpha:]]+","",xc)  ## extract numeric part
> > > alph <- sub("\\d+","",xc)   ## extract alpha part
> > > codes <- letters[1:3] ## whatever alpha codes are used
> > > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to
> convert codes to
> > > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph])
> > > data.frame (xc = xc, xnew = xnew)
> >    xc xnew
> > 1  1a  1.3
> > 2   2  2.0
> > 3  1c  1.7
> > 4  1c  1.7
> > 5  1b  1.5
> > 6  1a  1.3
> > 7   2  2.0
> > 8   2  2.0
> > 9  1a  1.3
> > 10 1a  1.3
> > 11 2c  2.7
> > 12 1b  1.5
> > 13 1b  1.5
> > 14  1  1.0
> > 15 1c  1.7
> >
> > Echoing others, no claim for optimality in any sense.
> >
> > Cheers,
> > Bert
> >
> >
> > On Fri, Jul 10, 2020 at 12:28 PM David Carlson <[hidden email]>
> wrote:
> > Here is a different approach:
> >
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> > xn
> > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> >
> > David L Carlson
> > Professor Emeritus of Anthropology
> > Texas A&M University
> >
> > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:
> >
> > > Dear Jean-Louis,
> > >
> > > There must be many ways to do this. Here's one simple way (with no
> claim
> > > of optimality!):
> > >
> > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > >
> > > > set.seed(123) # for reproducibility
> > > > x <- sample(xc, 20, replace=TRUE) # "data"
> > > >
> > > > names(xn) <- xc
> > > > z <- xn[x]
> > > >
> > > > data.frame(z, x)
> > >      z  x
> > > 1  2.5 2b
> > > 2  2.5 2b
> > > 3  1.5 1b
> > > 4  2.3 2a
> > > 5  1.5 1b
> > > 6  1.3 1a
> > > 7  1.3 1a
> > > 8  2.3 2a
> > > 9  1.5 1b
> > > 10 2.0  2
> > > 11 1.7 1c
> > > 12 2.3 2a
> > > 13 2.3 2a
> > > 14 1.0  1
> > > 15 1.3 1a
> > > 16 1.5 1b
> > > 17 2.7 2c
> > > 18 2.0  2
> > > 19 1.5 1b
> > > 20 1.5 1b
> > >
> > > I hope this helps,
> > >  John
> > >
> > >   -----------------------------
> > >   John Fox, Professor Emeritus
> > >   McMaster University
> > >   Hamilton, Ontario, Canada
> > >   Web: http::/socserv.mcmaster.ca/jfox
> > >
> > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]>
> > > wrote:
> > > >
> > > > Dear All
> > > >
> > > > I have a character vector,  representing histology stages, such as
> for
> > > example:
> > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > >
> > > > and this goes on to 3, 3a etc in various order for each patient. I do
> > > have of course a pre-established  classification available which does
> > > change according to the histology criteria under assessment.
> > > >
> > > > I would want to convert xc, for plotting reasons, to a numeric vector
> > > such as
> > > >
> > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > >
> > > > Unfortunately I have no clue on how to do that.
> > > >
> > > > Thanks for any help and apologies if I am missing the obvious way to
> do
> > > it.
> > > >
> > > > JL
> > > > --
> > > > Verif30042020
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > >
> > >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > PLEASE do read the posting guide
> > >
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >
> > >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > PLEASE do read the posting guide
> > >
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Richard O'Keefe-2
In reply to this post by Jean-Louis Abitbol-2
This can be done very simply because vectors in R can have
named elements, and can be indexed by strings.

> stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7,
+            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7,
+            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7)

> testdata <- rep(c("1", "1a", "1b", "1c",
+                   "2", "2a", "2b", "2c",
+                   "3", "3a", "3b", "3c"), times=c(1:6,6:1))

> stage[testdata]
  1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a  2a
 2a
1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 2.3
2.3
 2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a  3a
 3b
2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 3.3 3.3
3.5
 3b  3c
3.5 3.7

On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]> wrote:

> Dear All
>
> I have a character vector,  representing histology stages, such as for
> example:
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
>
> and this goes on to 3, 3a etc in various order for each patient. I do have
> of course a pre-established  classification available which does change
> according to the histology criteria under assessment.
>
> I would want to convert xc, for plotting reasons, to a numeric vector such
> as
>
> xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
>
> Unfortunately I have no clue on how to do that.
>
> Thanks for any help and apologies if I am missing the obvious way to do it.
>
> JL
> --
> Verif30042020
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Eric Berger
xn <- as.numeric(sub("c",".7",sub("b",".5",sub("a",".3",xc))))


On Sat, Jul 11, 2020 at 5:09 AM Richard O'Keefe <[hidden email]> wrote:

> This can be done very simply because vectors in R can have
> named elements, and can be indexed by strings.
>
> > stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7,
> +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7,
> +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7)
>
> > testdata <- rep(c("1", "1a", "1b", "1c",
> +                   "2", "2a", "2b", "2c",
> +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1))
>
> > stage[testdata]
>   1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a  2a
>  2a
> 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 2.3
> 2.3
>  2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a  3a
>  3b
> 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 3.3 3.3
> 3.5
>  3b  3c
> 3.5 3.7
>
> On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]> wrote:
>
> > Dear All
> >
> > I have a character vector,  representing histology stages, such as for
> > example:
> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> >
> > and this goes on to 3, 3a etc in various order for each patient. I do
> have
> > of course a pre-established  classification available which does change
> > according to the histology criteria under assessment.
> >
> > I would want to convert xc, for plotting reasons, to a numeric vector
> such
> > as
> >
> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> >
> > Unfortunately I have no clue on how to do that.
> >
> > Thanks for any help and apologies if I am missing the obvious way to do
> it.
> >
> > JL
> > --
> > Verif30042020
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

R help mailing list-2
In reply to this post by Jean-Louis Abitbol-2
Hello Jean-Louis,

Noting the subject line of your post I thought the first answer would
have been encoding histology stages as factors, and "unclass-ing" them
to obtain integers that then can be mathematically manipulated. You
can get a lot of work done with all the commands listed on the
"factor" help page:

?factor
samples <- 1:36
values <- runif(length(samples), min=1, max=length(samples))
hist <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8)
data1 <- data.frame("samples" = samples, "values" = values, "hist" = hist )
(data1$hist <- factor(data1$hist, levels=c("1", "1a", "1b", "1c", "2",
"2a", "2b", "2c")) )
unclass(data1$hist)

library(RColorBrewer); pal_1 <- brewer.pal(8, "Pastel2")
barplot(data1$value, beside=T, col=pal_1[data1$hist])
plot(data1$hist, data1$value, col=pal_1)
pal_2 <- brewer.pal(8, "Dark2")
plot(unclass(data1$hist)/4, data1$value, pch=19, col=pal_2[data1$hist] )
group <- c(rep(0,10),rep(1,26)); data1$group <- group
library(lattice); dotplot(hist ~ values | group, data=data1, xlim=c(0,36) )

HTH, Bill.

W. Michels, Ph.D.




On Fri, Jul 10, 2020 at 1:41 PM Jean-Louis Abitbol <[hidden email]> wrote:

>
> Many thanks to all. This help-list is wonderful.
>
> I have used Rich Heiberger solution using match and found something to learn in each answer.
>
> off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data....
>
> Best wishes.
>
>
> On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote:
> > Hi,
> >
> > We've had several solutions, and I was curious about their relative
> > efficiency. Here's a test with a moderately large data vector:
> >
> > > library("microbenchmark")
> > > set.seed(123) # for reproducibility
> > > x <- sample(xc, 1e4, replace=TRUE) # "data"
> > > microbenchmark(John = John <- xn[x],
> > +                Rich = Rich <- xn[match(x, xc)],
> > +                Jeff = Jeff <- {
> > +                 n <- as.integer( sub( "[a-i]$", "", x ) )
> > +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] )
> > +                 d[ is.na( d ) ] <- 0
> > +                 n + d / 10
> > +                 },
> > +                David = David <- as.numeric(gsub("a", ".3",
> > +                                      gsub("b", ".5",
> > +                                           gsub("c", ".7", x)))),
> > +                times=1000L
> > +                )
> > Unit: microseconds
> >   expr       min        lq       mean     median         uq       max neval cld
> >   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a
> >   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a
> >   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b
> >  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c
> > > all.equal(John, Rich)
> > [1] TRUE
> > > all.equal(John, David)
> > [1] "names for target but not for current"
> > > all.equal(John, Jeff)
> > [1] "names for target but not for current" "Mean relative difference:
> > 0.1498243"
> >
> > Of course, efficiency isn't the only consideration, and aesthetically
> > (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH,
> > Jeff's solution is more general in that it generates the correspondence
> > between letters and numbers. The argument for Jeff's solution would,
> > however, be stronger if it gave the desired answer.
> >
> > Best,
> >  John
> >
> > > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote:
> > >
> > > Here is a different approach:
> > >
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> > > xn
> > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> > >
> > > David L Carlson
> > > Professor Emeritus of Anthropology
> > > Texas A&M University
> > >
> > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:
> > > Dear Jean-Louis,
> > >
> > > There must be many ways to do this. Here's one simple way (with no claim of optimality!):
> > >
> > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > >
> > > > set.seed(123) # for reproducibility
> > > > x <- sample(xc, 20, replace=TRUE) # "data"
> > > >
> > > > names(xn) <- xc
> > > > z <- xn[x]
> > > >
> > > > data.frame(z, x)
> > >      z  x
> > > 1  2.5 2b
> > > 2  2.5 2b
> > > 3  1.5 1b
> > > 4  2.3 2a
> > > 5  1.5 1b
> > > 6  1.3 1a
> > > 7  1.3 1a
> > > 8  2.3 2a
> > > 9  1.5 1b
> > > 10 2.0  2
> > > 11 1.7 1c
> > > 12 2.3 2a
> > > 13 2.3 2a
> > > 14 1.0  1
> > > 15 1.3 1a
> > > 16 1.5 1b
> > > 17 2.7 2c
> > > 18 2.0  2
> > > 19 1.5 1b
> > > 20 1.5 1b
> > >
> > > I hope this helps,
> > >  John
> > >
> > >   -----------------------------
> > >   John Fox, Professor Emeritus
> > >   McMaster University
> > >   Hamilton, Ontario, Canada
> > >   Web: http::/socserv.mcmaster.ca/jfox
> > >
> > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote:
> > > >
> > > > Dear All
> > > >
> > > > I have a character vector,  representing histology stages, such as for example:
> > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > >
> > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
> > > >
> > > > I would want to convert xc, for plotting reasons, to a numeric vector such as
> > > >
> > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > >
> > > > Unfortunately I have no clue on how to do that.
> > > >
> > > > Thanks for any help and apologies if I am missing the obvious way to do it.
> > > >
> > > > JL
> > > > --
> > > > Verif30042020
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> --
> Verif30042020
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Richard O'Keefe-2
In reply to this post by Eric Berger
The string index approach works with any mapping from stage names
to stage numbers, not just regular ones.  For example, if we had
"1" -> 1, "1a" -> 1.4, "1b" -> 1.6
"2" -> 2, "2a" -> 2.3, "2b" -> 2.7
the 'sub' version would fail miserably while the string index
version would just work.  The 'sub' version would also not work
terribly well if the mapping were
"1" -> 1, "a1" -> 1.3, "b1" -> 1.5, "c1" -> 1.7
and so on. The thing I like about the indexing approach is that
it uses a fundamental operation of the language very directly.

Anyone using R would do well to *master* what indexing can do
for you.


On Sat, 11 Jul 2020 at 17:16, Eric Berger <[hidden email]> wrote:

> xn <- as.numeric(sub("c",".7",sub("b",".5",sub("a",".3",xc))))
>
>
> On Sat, Jul 11, 2020 at 5:09 AM Richard O'Keefe <[hidden email]> wrote:
>
>> This can be done very simply because vectors in R can have
>> named elements, and can be indexed by strings.
>>
>> > stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7,
>> +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7,
>> +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7)
>>
>> > testdata <- rep(c("1", "1a", "1b", "1c",
>> +                   "2", "2a", "2b", "2c",
>> +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1))
>>
>> > stage[testdata]
>>   1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a
>> 2a
>>  2a
>> 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3
>> 2.3
>> 2.3
>>  2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a
>> 3a
>>  3b
>> 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 3.3
>> 3.3
>> 3.5
>>  3b  3c
>> 3.5 3.7
>>
>> On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]>
>> wrote:
>>
>> > Dear All
>> >
>> > I have a character vector,  representing histology stages, such as for
>> > example:
>> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
>> >
>> > and this goes on to 3, 3a etc in various order for each patient. I do
>> have
>> > of course a pre-established  classification available which does change
>> > according to the histology criteria under assessment.
>> >
>> > I would want to convert xc, for plotting reasons, to a numeric vector
>> such
>> > as
>> >
>> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
>> >
>> > Unfortunately I have no clue on how to do that.
>> >
>> > Thanks for any help and apologies if I am missing the obvious way to do
>> it.
>> >
>> > JL
>> > --
>> > Verif30042020
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Jean-Louis Abitbol-2
In reply to this post by R help mailing list-2
Hello Bill,

Thanks.

That has indeed the advantage of keeping the histology classification on the  plot instead of some arbitrary numeric scale.

Best wishes, JL

On Sat, Jul 11, 2020, at 8:25 AM, William Michels wrote:

> Hello Jean-Louis,
>
> Noting the subject line of your post I thought the first answer would
> have been encoding histology stages as factors, and "unclass-ing" them
> to obtain integers that then can be mathematically manipulated. You
> can get a lot of work done with all the commands listed on the
> "factor" help page:
>
> ?factor
> samples <- 1:36
> values <- runif(length(samples), min=1, max=length(samples))
> hist <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8)
> data1 <- data.frame("samples" = samples, "values" = values, "hist" = hist )
> (data1$hist <- factor(data1$hist, levels=c("1", "1a", "1b", "1c", "2",
> "2a", "2b", "2c")) )
> unclass(data1$hist)
>
> library(RColorBrewer); pal_1 <- brewer.pal(8, "Pastel2")
> barplot(data1$value, beside=T, col=pal_1[data1$hist])
> plot(data1$hist, data1$value, col=pal_1)
> pal_2 <- brewer.pal(8, "Dark2")
> plot(unclass(data1$hist)/4, data1$value, pch=19, col=pal_2[data1$hist] )
> group <- c(rep(0,10),rep(1,26)); data1$group <- group
> library(lattice); dotplot(hist ~ values | group, data=data1, xlim=c(0,36) )
>
> HTH, Bill.
>
> W. Michels, Ph.D.
>
>
>
>
> On Fri, Jul 10, 2020 at 1:41 PM Jean-Louis Abitbol <[hidden email]> wrote:
> >
> > Many thanks to all. This help-list is wonderful.
> >
> > I have used Rich Heiberger solution using match and found something to learn in each answer.
> >
> > off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data....
> >
> > Best wishes.
> >
> >
> > On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote:
> > > Hi,
> > >
> > > We've had several solutions, and I was curious about their relative
> > > efficiency. Here's a test with a moderately large data vector:
> > >
> > > > library("microbenchmark")
> > > > set.seed(123) # for reproducibility
> > > > x <- sample(xc, 1e4, replace=TRUE) # "data"
> > > > microbenchmark(John = John <- xn[x],
> > > +                Rich = Rich <- xn[match(x, xc)],
> > > +                Jeff = Jeff <- {
> > > +                 n <- as.integer( sub( "[a-i]$", "", x ) )
> > > +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] )
> > > +                 d[ is.na( d ) ] <- 0
> > > +                 n + d / 10
> > > +                 },
> > > +                David = David <- as.numeric(gsub("a", ".3",
> > > +                                      gsub("b", ".5",
> > > +                                           gsub("c", ".7", x)))),
> > > +                times=1000L
> > > +                )
> > > Unit: microseconds
> > >   expr       min        lq       mean     median         uq       max neval cld
> > >   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a
> > >   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a
> > >   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b
> > >  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c
> > > > all.equal(John, Rich)
> > > [1] TRUE
> > > > all.equal(John, David)
> > > [1] "names for target but not for current"
> > > > all.equal(John, Jeff)
> > > [1] "names for target but not for current" "Mean relative difference:
> > > 0.1498243"
> > >
> > > Of course, efficiency isn't the only consideration, and aesthetically
> > > (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH,
> > > Jeff's solution is more general in that it generates the correspondence
> > > between letters and numbers. The argument for Jeff's solution would,
> > > however, be stronger if it gave the desired answer.
> > >
> > > Best,
> > >  John
> > >
> > > > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote:
> > > >
> > > > Here is a different approach:
> > > >
> > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> > > > xn
> > > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> > > >
> > > > David L Carlson
> > > > Professor Emeritus of Anthropology
> > > > Texas A&M University
> > > >
> > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:
> > > > Dear Jean-Louis,
> > > >
> > > > There must be many ways to do this. Here's one simple way (with no claim of optimality!):
> > > >
> > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > > >
> > > > > set.seed(123) # for reproducibility
> > > > > x <- sample(xc, 20, replace=TRUE) # "data"
> > > > >
> > > > > names(xn) <- xc
> > > > > z <- xn[x]
> > > > >
> > > > > data.frame(z, x)
> > > >      z  x
> > > > 1  2.5 2b
> > > > 2  2.5 2b
> > > > 3  1.5 1b
> > > > 4  2.3 2a
> > > > 5  1.5 1b
> > > > 6  1.3 1a
> > > > 7  1.3 1a
> > > > 8  2.3 2a
> > > > 9  1.5 1b
> > > > 10 2.0  2
> > > > 11 1.7 1c
> > > > 12 2.3 2a
> > > > 13 2.3 2a
> > > > 14 1.0  1
> > > > 15 1.3 1a
> > > > 16 1.5 1b
> > > > 17 2.7 2c
> > > > 18 2.0  2
> > > > 19 1.5 1b
> > > > 20 1.5 1b
> > > >
> > > > I hope this helps,
> > > >  John
> > > >
> > > >   -----------------------------
> > > >   John Fox, Professor Emeritus
> > > >   McMaster University
> > > >   Hamilton, Ontario, Canada
> > > >   Web: http::/socserv.mcmaster.ca/jfox
> > > >
> > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote:
> > > > >
> > > > > Dear All
> > > > >
> > > > > I have a character vector,  representing histology stages, such as for example:
> > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > >
> > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
> > > > >
> > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as
> > > > >
> > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > > >
> > > > > Unfortunately I have no clue on how to do that.
> > > > >
> > > > > Thanks for any help and apologies if I am missing the obvious way to do it.
> > > > >
> > > > > JL
> > > > > --
> > > > > Verif30042020
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> > --
> > Verif30042020
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

--
Verif30042020

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

R help mailing list-2
Agreed, I meant to add this line (for unclassed factor levels 1-through-8):

> ((1:8 - 1)*(0.25))+1
[1] 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75

Depending on the circumstance, you can also consider using dummy
factors or even "NA" as a level; see the "factor" help page for
details.

Best, Bill.

W. Michels, Ph.D.



On Sat, Jul 11, 2020 at 12:16 AM Jean-Louis Abitbol <[hidden email]> wrote:

>
> Hello Bill,
>
> Thanks.
>
> That has indeed the advantage of keeping the histology classification on the  plot instead of some arbitrary numeric scale.
>
> Best wishes, JL
>
> On Sat, Jul 11, 2020, at 8:25 AM, William Michels wrote:
> > Hello Jean-Louis,
> >
> > Noting the subject line of your post I thought the first answer would
> > have been encoding histology stages as factors, and "unclass-ing" them
> > to obtain integers that then can be mathematically manipulated. You
> > can get a lot of work done with all the commands listed on the
> > "factor" help page:
> >
> > ?factor
> > samples <- 1:36
> > values <- runif(length(samples), min=1, max=length(samples))
> > hist <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8)
> > data1 <- data.frame("samples" = samples, "values" = values, "hist" = hist )
> > (data1$hist <- factor(data1$hist, levels=c("1", "1a", "1b", "1c", "2",
> > "2a", "2b", "2c")) )
> > unclass(data1$hist)
> >
> > library(RColorBrewer); pal_1 <- brewer.pal(8, "Pastel2")
> > barplot(data1$value, beside=T, col=pal_1[data1$hist])
> > plot(data1$hist, data1$value, col=pal_1)
> > pal_2 <- brewer.pal(8, "Dark2")
> > plot(unclass(data1$hist)/4, data1$value, pch=19, col=pal_2[data1$hist] )
> > group <- c(rep(0,10),rep(1,26)); data1$group <- group
> > library(lattice); dotplot(hist ~ values | group, data=data1, xlim=c(0,36) )
> >
> > HTH, Bill.
> >
> > W. Michels, Ph.D.
> >
> >
> >
> >
> > On Fri, Jul 10, 2020 at 1:41 PM Jean-Louis Abitbol <[hidden email]> wrote:
> > >
> > > Many thanks to all. This help-list is wonderful.
> > >
> > > I have used Rich Heiberger solution using match and found something to learn in each answer.
> > >
> > > off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data....
> > >
> > > Best wishes.
> > >
> > >
> > > On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote:
> > > > Hi,
> > > >
> > > > We've had several solutions, and I was curious about their relative
> > > > efficiency. Here's a test with a moderately large data vector:
> > > >
> > > > > library("microbenchmark")
> > > > > set.seed(123) # for reproducibility
> > > > > x <- sample(xc, 1e4, replace=TRUE) # "data"
> > > > > microbenchmark(John = John <- xn[x],
> > > > +                Rich = Rich <- xn[match(x, xc)],
> > > > +                Jeff = Jeff <- {
> > > > +                 n <- as.integer( sub( "[a-i]$", "", x ) )
> > > > +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] )
> > > > +                 d[ is.na( d ) ] <- 0
> > > > +                 n + d / 10
> > > > +                 },
> > > > +                David = David <- as.numeric(gsub("a", ".3",
> > > > +                                      gsub("b", ".5",
> > > > +                                           gsub("c", ".7", x)))),
> > > > +                times=1000L
> > > > +                )
> > > > Unit: microseconds
> > > >   expr       min        lq       mean     median         uq       max neval cld
> > > >   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a
> > > >   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a
> > > >   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b
> > > >  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c
> > > > > all.equal(John, Rich)
> > > > [1] TRUE
> > > > > all.equal(John, David)
> > > > [1] "names for target but not for current"
> > > > > all.equal(John, Jeff)
> > > > [1] "names for target but not for current" "Mean relative difference:
> > > > 0.1498243"
> > > >
> > > > Of course, efficiency isn't the only consideration, and aesthetically
> > > > (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH,
> > > > Jeff's solution is more general in that it generates the correspondence
> > > > between letters and numbers. The argument for Jeff's solution would,
> > > > however, be stronger if it gave the desired answer.
> > > >
> > > > Best,
> > > >  John
> > > >
> > > > > On Jul 10, 2020, at 3:28 PM, David Carlson <[hidden email]> wrote:
> > > > >
> > > > > Here is a different approach:
> > > > >
> > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> > > > > xn
> > > > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> > > > >
> > > > > David L Carlson
> > > > > Professor Emeritus of Anthropology
> > > > > Texas A&M University
> > > > >
> > > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <[hidden email]> wrote:
> > > > > Dear Jean-Louis,
> > > > >
> > > > > There must be many ways to do this. Here's one simple way (with no claim of optimality!):
> > > > >
> > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > > > >
> > > > > > set.seed(123) # for reproducibility
> > > > > > x <- sample(xc, 20, replace=TRUE) # "data"
> > > > > >
> > > > > > names(xn) <- xc
> > > > > > z <- xn[x]
> > > > > >
> > > > > > data.frame(z, x)
> > > > >      z  x
> > > > > 1  2.5 2b
> > > > > 2  2.5 2b
> > > > > 3  1.5 1b
> > > > > 4  2.3 2a
> > > > > 5  1.5 1b
> > > > > 6  1.3 1a
> > > > > 7  1.3 1a
> > > > > 8  2.3 2a
> > > > > 9  1.5 1b
> > > > > 10 2.0  2
> > > > > 11 1.7 1c
> > > > > 12 2.3 2a
> > > > > 13 2.3 2a
> > > > > 14 1.0  1
> > > > > 15 1.3 1a
> > > > > 16 1.5 1b
> > > > > 17 2.7 2c
> > > > > 18 2.0  2
> > > > > 19 1.5 1b
> > > > > 20 1.5 1b
> > > > >
> > > > > I hope this helps,
> > > > >  John
> > > > >
> > > > >   -----------------------------
> > > > >   John Fox, Professor Emeritus
> > > > >   McMaster University
> > > > >   Hamilton, Ontario, Canada
> > > > >   Web: http::/socserv.mcmaster.ca/jfox
> > > > >
> > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <[hidden email]> wrote:
> > > > > >
> > > > > > Dear All
> > > > > >
> > > > > > I have a character vector,  representing histology stages, such as for example:
> > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > > >
> > > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
> > > > > >
> > > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as
> > > > > >
> > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > > > >
> > > > > > Unfortunately I have no clue on how to do that.
> > > > > >
> > > > > > Thanks for any help and apologies if I am missing the obvious way to do it.
> > > > > >
> > > > > > JL
> > > > > > --
> > > > > > Verif30042020
> > > > > >
> > > > > > ______________________________________________
> > > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > > > and provide commented, minimal, self-contained, reproducible code.
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > > >
> > >
> > > --
> > > Verif30042020
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Verif30042020

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

R help mailing list-2
In reply to this post by Richard O'Keefe-2
There are many ways to do what is requested and some are fairly simple and
robust. A simple switch statement will do if you write some code but
consider using a function from some package for simple vectors or factors.

You could use the recode() or recode_factor() functions in package dplyr or
other similar functions elsewhere and type in the conversions like so:

library("dplyr")

xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")

xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)

sample <- rep(xc, each=3)

recode(sample,
       "1" = 1,
       "1a" = 1.3,
       "1b" = 1.5,
       "1c" = 1.7,
       "2" = 2,
       "2a" =2.3,
       "2b" = 2.5,
       "2c" = 2.7)

That returns:

[1] 1.0 1.0 1.0 1.3 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 2.0 2.0 2.0 2.3 2.3 2.3
2.5 2.5 2.5 2.7 2.7 2.7

To use the original vectors would be a tad harder but doable perhaps using
some indirection.

As has been noted, you need to be careful in matching things to use the
entire item from beginning to end as matching  a substring can produce odd
results. If you add this code to the above, in a silly way, it works for a
more general case:

library(glue)

converted <- sample
for (i in 1:length(xc)) {
  converted <- sub(glue("^{xc[i]}$"), xn[i], converted)
}

result <- as.numeric(converted)

Returns:

> result
 [1] 1.0 1.0 1.0 1.3 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 2.0 2.0 2.0 2.3 2.3 2.3
2.5 2.5 2.5 2.7 2.7 2.7

Not necessarily efficient but it works. You could use something like
glue::glue() to create the arguments you want to use for something like
recode() in more complex cases and so on.

I think we have had enough solutions and methods posted but there are likely
many more as there is rarely only one way to do things in R.

-----Original Message-----
From: R-help <[hidden email]> On Behalf Of Richard O'Keefe
Sent: Saturday, July 11, 2020 3:02 AM
To: Eric Berger <[hidden email]>
Cc: Jean-Louis Abitbol <[hidden email]>; R Project Help
<[hidden email]>
Subject: Re: [R] Character (1a, 1b) to numeric

The string index approach works with any mapping from stage names to stage
numbers, not just regular ones.  For example, if we had "1" -> 1, "1a" ->
1.4, "1b" -> 1.6 "2" -> 2, "2a" -> 2.3, "2b" -> 2.7 the 'sub' version would
fail miserably while the string index version would just work.  The 'sub'
version would also not work terribly well if the mapping were "1" -> 1, "a1"
-> 1.3, "b1" -> 1.5, "c1" -> 1.7 and so on. The thing I like about the
indexing approach is that it uses a fundamental operation of the language
very directly.

Anyone using R would do well to *master* what indexing can do for you.


On Sat, 11 Jul 2020 at 17:16, Eric Berger <[hidden email]> wrote:

> xn <- as.numeric(sub("c",".7",sub("b",".5",sub("a",".3",xc))))
>
>
> On Sat, Jul 11, 2020 at 5:09 AM Richard O'Keefe <[hidden email]> wrote:
>
>> This can be done very simply because vectors in R can have named
>> elements, and can be indexed by strings.
>>
>> > stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7,
>> +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7,
>> +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7)
>>
>> > testdata <- rep(c("1", "1a", "1b", "1c",
>> +                   "2", "2a", "2b", "2c",
>> +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1))
>>
>> > stage[testdata]
>>   1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a
>> 2a
>>  2a
>> 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3
>> 2.3
>> 2.3
>> 2.3
>>  2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a
>> 3a
>>  3b
>> 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3
>> 3.3
>> 3.3
>> 3.5
>>  3b  3c
>> 3.5 3.7
>>
>> On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]>
>> wrote:
>>
>> > Dear All
>> >
>> > I have a character vector,  representing histology stages, such as
>> > for
>> > example:
>> > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
>> >
>> > and this goes on to 3, 3a etc in various order for each patient. I
>> > do
>> have
>> > of course a pre-established  classification available which does
>> > change according to the histology criteria under assessment.
>> >
>> > I would want to convert xc, for plotting reasons, to a numeric
>> > vector
>> such
>> > as
>> >
>> > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
>> >
>> > Unfortunately I have no clue on how to do that.
>> >
>> > Thanks for any help and apologies if I am missing the obvious way
>> > to do
>> it.
>> >
>> > JL
>> > --
>> > Verif30042020
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

David Winsemius
In reply to this post by Richard O'Keefe-2
It might be easier to simply assign names to the numeric vector if you
already have numeric and character vectors of the right lengths. Using
Heibergers's vectors:


xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)

names(xn) <- xc

testdata <- rep(c("1", "1a", "1b", "1c",
                     "2", "2a", "2b", "2c",
                    "3", "3a", "3b", "3c"), times=c(1:6,6:1))

xn[ testdata ]  #  NA's when there's no match is a feature.

--
David.

On 7/10/20 7:08 PM, Richard O'Keefe wrote:

> This can be done very simply because vectors in R can have
> named elements, and can be indexed by strings.
>
>> stage <- c("1" = 1, "1a" = 1.3, "1b" = 1.5, "1c" = 1.7,
> +            "2" = 2, "2a" = 2.3, "2b" = 2.5, "2c" = 2.7,
> +            "3" = 3, "3a" = 3.3, "3b" = 3.5, "3c" = 3.7)
>
>> testdata <- rep(c("1", "1a", "1b", "1c",
> +                   "2", "2a", "2b", "2c",
> +                   "3", "3a", "3b", "3c"), times=c(1:6,6:1))
>
>> stage[testdata]
>    1  1a  1a  1b  1b  1b  1c  1c  1c  1c   2   2   2   2   2  2a  2a  2a  2a
>   2a
> 1.0 1.3 1.3 1.5 1.5 1.5 1.7 1.7 1.7 1.7 2.0 2.0 2.0 2.0 2.0 2.3 2.3 2.3 2.3
> 2.3
>   2a  2b  2b  2b  2b  2b  2b  2c  2c  2c  2c  2c   3   3   3   3  3a  3a  3a
>   3b
> 2.3 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 2.7 2.7 2.7 3.0 3.0 3.0 3.0 3.3 3.3 3.3
> 3.5
>   3b  3c
> 3.5 3.7
>
> On Sat, 11 Jul 2020 at 05:51, Jean-Louis Abitbol <[hidden email]> wrote:
>
>> Dear All
>>
>> I have a character vector,  representing histology stages, such as for
>> example:
>> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
>>
>> and this goes on to 3, 3a etc in various order for each patient. I do have
>> of course a pre-established  classification available which does change
>> according to the histology criteria under assessment.
>>
>> I would want to convert xc, for plotting reasons, to a numeric vector such
>> as
>>
>> xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
>>
>> Unfortunately I have no clue on how to do that.
>>
>> Thanks for any help and apologies if I am missing the obvious way to do it.
>>
>> JL
>> --
>> Verif30042020
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Abby Spurdle
In reply to this post by Fox, John
On Sat, Jul 11, 2020 at 8:04 AM Fox, John <[hidden email]> wrote:
> We've had several solutions, and I was curious about their relative efficiency. Here's a test

Am I the only person on this mailing list who learnt to program with ASCII...?

In theory, the most ***efficient*** solution, is to get the
ASCII/UTF8/etc values.
Then use a simple (math) formula.
No matching, no searching, required ...

Here's one possibility:

    xc <-  c ("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")

    I <- (nchar (xc) == 2)
    xn <- as.integer (substring (xc, 1, 1) )
    xn [I] <- xn [I] + (utf8ToInt (paste (substring (xc [I], 2, 2),
collapse="") ) - 96) / 4
    xn

Unfortunately, this makes R look bad.
The corresponding C implementation is simpler and presumably the
performance winner.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Character (1a, 1b) to numeric

Jim Lemon-4
I'll admit that I cut my teeth on ASCII, but I worried about your
reliance on that ancient typographic ordering. I wrote a little
function:

al2num_sub<-function(x) {
 xspl<-unlist(strsplit(x,""))
 if(length(xspl) > 1)
  xspl<-paste(xspl[1],which(letters==xspl[2]),sep=".")
 return(xspl)
}
unlist(sapply(xc,al2num_sub(xc)))

that does the trick with ASCII, but there was a nagging worry that it
wouldn't work for any ordering apart from the Roman alphabet.
Unfortunately I couldn't find any way to substitute something for
"letters" that would allow me to plug in a more general solution like:

alpha.set<-c("letters","greek",...)

Maybe someone else can crack that one.

Jim

On Sun, Jul 12, 2020 at 9:07 AM Abby Spurdle <[hidden email]> wrote:

>
> On Sat, Jul 11, 2020 at 8:04 AM Fox, John <[hidden email]> wrote:
> > We've had several solutions, and I was curious about their relative efficiency. Here's a test
>
> Am I the only person on this mailing list who learnt to program with ASCII...?
>
> In theory, the most ***efficient*** solution, is to get the
> ASCII/UTF8/etc values.
> Then use a simple (math) formula.
> No matching, no searching, required ...
>
> Here's one possibility:
>
>     xc <-  c ("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
>
>     I <- (nchar (xc) == 2)
>     xn <- as.integer (substring (xc, 1, 1) )
>     xn [I] <- xn [I] + (utf8ToInt (paste (substring (xc [I], 2, 2),
> collapse="") ) - 96) / 4
>     xn
>
> Unfortunately, this makes R look bad.
> The corresponding C implementation is simpler and presumably the
> performance winner.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
12