Extracting numbers from a character variable of different types

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Extracting numbers from a character variable of different types

irene
Hello,

I have a file which contains a column with age, which is represented in the two following patterns

1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means year and I would like to extract only the numeric values eg 7 in the above case if this pattern exits in a line of file.

2. "004/M" or "004/m" where M or m means month ...... for these lines I would like to first extract the numeric value of Month eg. 4  and then convert it into a value of years, which would be 0.33 eg 4 divided by 12.

Can anyone help?

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: Extracting numbers from a character variable of different types

David Winsemius

On Mar 18, 2012, at 10:44 AM, irene wrote:

> Hello,
>
> I have a file which contains a column with age, which is represented  
> in the
> two following patterns
>
> 1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means  
> year and I
> would like to extract only the numeric values eg 7 in the above case  
> if this
> pattern exits in a line of file.
>
> 2. "004/M" or "004/m" where M or m means month ...... for these  
> lines I
> would like to first extract the numeric value of Month eg. 4  and then
> convert it into a value of years, which would be 0.33 eg 4 divided  
> by 12.

I thought it easier to get to months as an initial step:

 > dfrm <- read.table(text="'007/A'\n'007/a' \n '7 /a '\n '004/
M'\n'004/m'")

 > dfrm$agenew <- sub("(^\\d+\\s*)(/)([aA])","\\1 * 12", dfrm$V1)
 > dfrm$agenew2 <- sub("(^\\d+\\s*)(/)([mM])","\\1", dfrm$agenew)
 > dfrm$agenew2
[1] "007 * 12" "007 * 12" "7  * 12 " "004"      "004"
 > eval(parse(text=dfrm$agenew2))
[1] 4
 > sapply(dfrm$agenew2, function(x) eval(parse(text=x)) )
007 * 12 007 * 12 7  * 12       004      004
       84       84       84        4        4


>
> Can anyone help?
>
> Thank you
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4482248.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extracting numbers from a character variable of different types

David Winsemius

On Mar 18, 2012, at 11:37 AM, David Winsemius wrote:

>
> On Mar 18, 2012, at 10:44 AM, irene wrote:
>
>> Hello,
>>
>> I have a file which contains a column with age, which is  
>> represented in the
>> two following patterns
>>
>> 1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means  
>> year and I
>> would like to extract only the numeric values eg 7 in the above  
>> case if this
>> pattern exits in a line of file.
>>
>> 2. "004/M" or "004/m" where M or m means month ...... for these  
>> lines I
>> would like to first extract the numeric value of Month eg. 4  and  
>> then
>> convert it into a value of years, which would be 0.33 eg 4 divided  
>> by 12.
>
> I thought it easier to get to months as an initial step:
>
> > dfrm <- read.table(text="'007/A'\n'007/a' \n '7 /a '\n '004/
> M'\n'004/m'")

As I was thinking further it's easier (and clearer) to do it as years:

 > dfrm$agenew3 <- sub("[mM]", "12", dfrm$V1)
 > dfrm$agenew3 <- sub("[aA]", "1", dfrm$agenew3)
 > sapply(dfrm$agenew3, function(x) eval(parse(text=x)) )
     007/1     007/1     7 /1     004/12    004/12
7.0000000 7.0000000 7.0000000 0.3333333 0.3333333

>
> > dfrm$agenew <- sub("(^\\d+\\s*)(/)([aA])","\\1 * 12", dfrm$V1)
> > dfrm$agenew2 <- sub("(^\\d+\\s*)(/)([mM])","\\1", dfrm$agenew)
> > dfrm$agenew2
> [1] "007 * 12" "007 * 12" "7  * 12 " "004"      "004"
> > eval(parse(text=dfrm$agenew2))
> [1] 4
> > sapply(dfrm$agenew2, function(x) eval(parse(text=x)) )
> 007 * 12 007 * 12 7  * 12       004      004
>      84       84       84        4        4
--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extracting numbers from a character variable of different types

Daniel Malter
In reply to this post by irene
Assume your year value is

x<-007/A

You want to replace all non-numeric characters (i.e. letters and punctuation) and all zeros with nothing.

gsub('[[:alpha:]]|[[:punct:]]|0','',x)

Let's say you have a vector with both month and year values (you can separate them). Now we need to identify the cells that have a month or year indicator

x<-c("007/A","007/a","003/M","003/m")

grep("/A|/a",x) #cells in x with year information
grep("/M|/m",x) #cells in x with month information

To remove all characters, punctuation, and 0s from x, do:

gsub('[[:alpha:]]|[[:punct:]]|0','',x)

which you can also do specifically for the cells that identify months and years, respectively:

years<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/A|/a",x)]) #years
years
months<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/M|/m",x)]) #months
months

Convert the resulting character vectors into numeric vectors by as.numeric(as.character(years)) , for example.

HTH,
Daniel



Reply | Threaded
Open this post in threaded view
|

Re: Extracting numbers from a character variable of different types

David Winsemius


On Mar 18, 2012, at 3:17 PM, Daniel Malter <[hidden email]> wrote:

> Assume your year value is
>
> x<-007/A
>
> You want to replace all non-numeric characters (i.e. letters and
> punctuation) and all zeros with nothing.
>
> gsub('[[:alpha:]]|[[:punct:]]|0','',x)
>
> Let's say you have a vector with both month and year values (you can
> separate them). Now we need to identify the cells that have a month or year
> indicator
>
> x<-c("007/A","007/a","003/M","003/m")
>
> grep("/A|/a",x) #cells in x with year information
> grep("/M|/m",x) #cells in x with month information
>
> To remove all characters, punctuation, and 0s from x, do:
>
> gsub('[[:alpha:]]|[[:punct:]]|0','',x)
>
> which you can also do specifically for the cells that identify months and
> years, respectively:
>
> years<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/A|/a",x)])

The problem with this approach is that the years vector becomes disjoint from the months vector. It doesn't lend itself well to data.frame operations.

--
David
Sent from my iPhone


> #years
> years
> months<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/M|/m",x)]) #months
> months
>
> Convert the resulting character vectors into numeric vectors by
> as.numeric(as.character(years)) , for example.
>
> HTH,
> Daniel
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4482732.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extracting numbers from a character variable of different types

irene
In reply to this post by David Winsemius
It worked perfectly!

Thank you