

Hi,
If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?
grep doesn't seem to be helpful
grep("[az]","absdfds0213451ab", ignore.case=T)
[1] 1
grep("[09]","absdfds0213451ab", ignore.case=T)
[1] 1
Thanks
Carol
Hello,
Maybe ?gregexpr and ?regmatches. SOmething like the following.
m1 < gregexpr("[az]+","absdfds0213451ab")
regmatches("absdfds0213451ab", m1)
m2 < gregexpr("[09]+","absdfds0213451ab")
regmatches("absdfds0213451ab", m2)
Hope this helps,
Rui Barradas
grep() will only tell you that a pattern is present. You want to use gsub() or similar with back references to return parts of the vector.
Will they ALWAYS appear in that pattern (letters, numbers, letters) or is there some level of variation?
If they will always appear as in your example, then one approach is:
> strsplit(gsub("([az]+)([09]+)([az]+)", "\\1 \\2 \\3", "absdfds0213451ab"), " ")
[[1]]
[1] "absdfds" "0213451" "ab"
The initial gsub() returns the 3 parts separated by a space, which is then used as the split argument to strsplit().
If there will be some variation, you can use multiple calls to gsub() or similar, each getting either the letters or the numbers.
Regards,
Marc Schwartz
I'd propose something along:
result < gsub("^([[:alpha:]]+)([[:digit:]]+)([[:alpha:]]+)$",
"\\1\\2\\3", "absdfds0213451ab")
If you have lots of these strings, you can convert all of them and then
do.call("rbind", strsplit(result, ""))
or some such.
Best,
Uwe Ligges
There are some level of variation either chars followed by numbers or chars, numbers, chars
Perhaps, I should use gsub as you suggested all and if the string is composed of chars followed by numbers, it will return the 3rd part empty?
Regards,
Carol
If you have some variations of the order of numbers followed by chars,
library(stringr)
v1 < c("absdfds0213451ab", "123abcs4145")
pattern=c("[AZaz]+", "\\d+")
do.call(`Map`,c(c,lapply(pattern, function(.pat) str_extract_all(v1, .pat))))
#[[1]]
#[1] "absdfds" "ab" "0213451"
#[[2]]
#[1] "abcs" "123" "4145"
A.K.
