Hello,
I have a file which contains a column with age, which is represented in the two following patterns 1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means year and I would like to extract only the numeric values eg 7 in the above case if this pattern exits in a line of file. 2. "004/M" or "004/m" where M or m means month ...... for these lines I would like to first extract the numeric value of Month eg. 4 and then convert it into a value of years, which would be 0.33 eg 4 divided by 12. Can anyone help? Thank you |
On Mar 18, 2012, at 10:44 AM, irene wrote: > Hello, > > I have a file which contains a column with age, which is represented > in the > two following patterns > > 1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means > year and I > would like to extract only the numeric values eg 7 in the above case > if this > pattern exits in a line of file. > > 2. "004/M" or "004/m" where M or m means month ...... for these > lines I > would like to first extract the numeric value of Month eg. 4 and then > convert it into a value of years, which would be 0.33 eg 4 divided > by 12. I thought it easier to get to months as an initial step: > dfrm <- read.table(text="'007/A'\n'007/a' \n '7 /a '\n '004/ M'\n'004/m'") > dfrm$agenew <- sub("(^\\d+\\s*)(/)([aA])","\\1 * 12", dfrm$V1) > dfrm$agenew2 <- sub("(^\\d+\\s*)(/)([mM])","\\1", dfrm$agenew) > dfrm$agenew2 [1] "007 * 12" "007 * 12" "7 * 12 " "004" "004" > eval(parse(text=dfrm$agenew2)) [1] 4 > sapply(dfrm$agenew2, function(x) eval(parse(text=x)) ) 007 * 12 007 * 12 7 * 12 004 004 84 84 84 4 4 > > Can anyone help? > > Thank you > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4482248.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Mar 18, 2012, at 11:37 AM, David Winsemius wrote: > > On Mar 18, 2012, at 10:44 AM, irene wrote: > >> Hello, >> >> I have a file which contains a column with age, which is >> represented in the >> two following patterns >> >> 1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means >> year and I >> would like to extract only the numeric values eg 7 in the above >> case if this >> pattern exits in a line of file. >> >> 2. "004/M" or "004/m" where M or m means month ...... for these >> lines I >> would like to first extract the numeric value of Month eg. 4 and >> then >> convert it into a value of years, which would be 0.33 eg 4 divided >> by 12. > > I thought it easier to get to months as an initial step: > > > dfrm <- read.table(text="'007/A'\n'007/a' \n '7 /a '\n '004/ > M'\n'004/m'") As I was thinking further it's easier (and clearer) to do it as years: > dfrm$agenew3 <- sub("[mM]", "12", dfrm$V1) > dfrm$agenew3 <- sub("[aA]", "1", dfrm$agenew3) > sapply(dfrm$agenew3, function(x) eval(parse(text=x)) ) 007/1 007/1 7 /1 004/12 004/12 7.0000000 7.0000000 7.0000000 0.3333333 0.3333333 > > > dfrm$agenew <- sub("(^\\d+\\s*)(/)([aA])","\\1 * 12", dfrm$V1) > > dfrm$agenew2 <- sub("(^\\d+\\s*)(/)([mM])","\\1", dfrm$agenew) > > dfrm$agenew2 > [1] "007 * 12" "007 * 12" "7 * 12 " "004" "004" > > eval(parse(text=dfrm$agenew2)) > [1] 4 > > sapply(dfrm$agenew2, function(x) eval(parse(text=x)) ) > 007 * 12 007 * 12 7 * 12 004 004 > 84 84 84 4 4 David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by irene
Assume your year value is
x<-007/A You want to replace all non-numeric characters (i.e. letters and punctuation) and all zeros with nothing. gsub('[[:alpha:]]|[[:punct:]]|0','',x) Let's say you have a vector with both month and year values (you can separate them). Now we need to identify the cells that have a month or year indicator x<-c("007/A","007/a","003/M","003/m") grep("/A|/a",x) #cells in x with year information grep("/M|/m",x) #cells in x with month information To remove all characters, punctuation, and 0s from x, do: gsub('[[:alpha:]]|[[:punct:]]|0','',x) which you can also do specifically for the cells that identify months and years, respectively: years<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/A|/a",x)]) #years years months<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/M|/m",x)]) #months months Convert the resulting character vectors into numeric vectors by as.numeric(as.character(years)) , for example. HTH, Daniel |
On Mar 18, 2012, at 3:17 PM, Daniel Malter <[hidden email]> wrote: > Assume your year value is > > x<-007/A > > You want to replace all non-numeric characters (i.e. letters and > punctuation) and all zeros with nothing. > > gsub('[[:alpha:]]|[[:punct:]]|0','',x) > > Let's say you have a vector with both month and year values (you can > separate them). Now we need to identify the cells that have a month or year > indicator > > x<-c("007/A","007/a","003/M","003/m") > > grep("/A|/a",x) #cells in x with year information > grep("/M|/m",x) #cells in x with month information > > To remove all characters, punctuation, and 0s from x, do: > > gsub('[[:alpha:]]|[[:punct:]]|0','',x) > > which you can also do specifically for the cells that identify months and > years, respectively: > > years<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/A|/a",x)]) The problem with this approach is that the years vector becomes disjoint from the months vector. It doesn't lend itself well to data.frame operations. -- David Sent from my iPhone > #years > years > months<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/M|/m",x)]) #months > months > > Convert the resulting character vectors into numeric vectors by > as.numeric(as.character(years)) , for example. > > HTH, > Daniel > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4482732.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |