separate numbers from chars in a string

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

separate numbers from chars in a string

carol white
Hi,
If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?

grep doesn't seem to be helpful

grep("[a-z]","absdfds0213451ab", ignore.case=T)
[1] 1


 grep("[0-9]","absdfds0213451ab", ignore.case=T)
[1] 1

Thanks

Carol
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: separate numbers from chars in a string

Rui Barradas
Hello,

Maybe ?gregexpr and ?regmatches. SOmething like the following.


m1 <- gregexpr("[a-z]+","absdfds0213451ab")
regmatches("absdfds0213451ab", m1)

m2 <- gregexpr("[0-9]+","absdfds0213451ab")
regmatches("absdfds0213451ab", m2)


Hope this helps,

Rui Barradas

Em 30-07-2014 21:13, carol white escreveu:

> Hi,
> If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?
>
> grep doesn't seem to be helpful
>
> grep("[a-z]","absdfds0213451ab", ignore.case=T)
> [1] 1
>
>
>   grep("[0-9]","absdfds0213451ab", ignore.case=T)
> [1] 1
>
> Thanks
>
> Carol
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: separate numbers from chars in a string

Marc Schwartz-3
In reply to this post by carol white
On Jul 30, 2014, at 3:13 PM, carol white <[hidden email]> wrote:

> Hi,
> If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?
>
> grep doesn't seem to be helpful
>
> grep("[a-z]","absdfds0213451ab", ignore.case=T)
> [1] 1
>
>
>  grep("[0-9]","absdfds0213451ab", ignore.case=T)
> [1] 1
>
> Thanks
>
> Carol


grep() will only tell you that a pattern is present. You want to use gsub() or similar with back references to return parts of the vector.

Will they ALWAYS appear in that pattern (letters, numbers, letters) or is there some level of variation?

If they will always appear as in your example, then one approach is:

> strsplit(gsub("([a-z]+)([0-9]+)([a-z]+)", "\\1 \\2 \\3", "absdfds0213451ab"), " ")
[[1]]
[1] "absdfds" "0213451" "ab"    


The initial gsub() returns the 3 parts separated by a space, which is then used as the split argument to strsplit().

If there will be some variation, you can use multiple calls to gsub() or similar, each getting either the letters or the numbers.

Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: separate numbers from chars in a string

Uwe Ligges-3
In reply to this post by carol white


On 30.07.2014 22:13, carol white wrote:

> Hi,
> If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?
>
> grep doesn't seem to be helpful
>
> grep("[a-z]","absdfds0213451ab", ignore.case=T)
> [1] 1
>
>
>   grep("[0-9]","absdfds0213451ab", ignore.case=T)
> [1] 1


I'd propose something along:

result <- gsub("^([[:alpha:]]+)([[:digit:]]+)([[:alpha:]]+)$",
"\\1-\\2-\\3", "absdfds0213451ab")

If you have lots of these strings, you can convert all of them and then

do.call("rbind", strsplit(result, "-"))

or some such.

Best,
Uwe Ligges



> Thanks
>
> Carol
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: separate numbers from chars in a string

carol white
In reply to this post by Marc Schwartz-3
There are some level of variation either chars followed by numbers or chars, numbers, chars


Perhaps, I should use gsub as you suggested all and if the string is composed of chars followed by numbers, it will return the 3rd part empty?

Regards,

Carol


On Wednesday, July 30, 2014 10:52 PM, Marc Schwartz <[hidden email]> wrote:
 


On Jul 30, 2014, at 3:13 PM, carol white <[hidden email]> wrote:

> Hi,
> If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?
>
> grep doesn't seem to be helpful
>
> grep("[a-z]","absdfds0213451ab", ignore.case=T)
> [1] 1
>
>
>  grep("[0-9]","absdfds0213451ab", ignore.case=T)
> [1] 1
>
> Thanks
>
> Carol

grep() will only tell you that a pattern is present. You want to use gsub() or similar with back references to return parts of the vector.

Will they ALWAYS appear in that pattern (letters, numbers, letters) or is there some level of variation?

If they will always appear as in your example, then one approach is:

> strsplit(gsub("([a-z]+)([0-9]+)([a-z]+)", "\\1 \\2 \\3", "absdfds0213451ab"), " ")

[[1]]
[1] "absdfds" "0213451" "ab"   


The initial gsub() returns the 3 parts separated by a space, which is then used as the split argument to strsplit().

If there will be some variation, you can use multiple calls to gsub() or similar, each getting either the letters or the numbers.

Regards,

Marc Schwartz
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: separate numbers from chars in a string

arun kirshna
In reply to this post by carol white
If you have some variations of the order of numbers followed by chars,

library(stringr)

v1 <- c("absdfds0213451ab", "123abcs4145")
pattern=c("[A-Za-z]+", "\\d+")

do.call(`Map`,c(c,lapply(pattern, function(.pat) str_extract_all(v1, .pat))))
#[[1]]
#[1] "absdfds" "ab"      "0213451"

#[[2]]
#[1] "abcs" "123"  "4145"
A.K.



Hi,
If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?

grep doesn't seem to be helpful

grep("[a-z]","absdfds0213451ab", ignore.case=T)
[1] 1


 grep("[0-9]","absdfds0213451ab", ignore.case=T)
[1] 1

Thanks

Carol


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: separate numbers from chars in a string

Uwe Ligges-3
In reply to this post by carol white


On 31.07.2014 04:46, carol white wrote:
> There are some level of variation either chars followed by numbers or chars, numbers, chars
>
>
> Perhaps, I should use gsub as you suggested all and if the string is composed of chars followed by numbers, it will return the 3rd part empty?


Please read about regularvexpressions and describe your problem
accurately. If the last strings are onot always present, use * rather
than + at the very end of the regular expression.

Best,
Uwe Ligges


> Regards,
>
> Carol
>
>
> On Wednesday, July 30, 2014 10:52 PM, Marc Schwartz <[hidden email]> wrote:
>
>
>
> On Jul 30, 2014, at 3:13 PM, carol white <[hidden email]> wrote:
>
>> Hi,
>> If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?
>>
>> grep doesn't seem to be helpful
>>
>> grep("[a-z]","absdfds0213451ab", ignore.case=T)
>> [1] 1
>>
>>
>>    grep("[0-9]","absdfds0213451ab", ignore.case=T)
>> [1] 1
>>
>> Thanks
>>
>> Carol
>
>
> grep() will only tell you that a pattern is present. You want to use gsub() or similar with back references to return parts of the vector.
>
> Will they ALWAYS appear in that pattern (letters, numbers, letters) or is there some level of variation?
>
> If they will always appear as in your example, then one approach is:
>
>> strsplit(gsub("([a-z]+)([0-9]+)([a-z]+)", "\\1 \\2 \\3", "absdfds0213451ab"), " ")
>
> [[1]]
> [1] "absdfds" "0213451" "ab"
>
>
> The initial gsub() returns the 3 parts separated by a space, which is then used as the split argument to strsplit().
>
> If there will be some variation, you can use multiple calls to gsub() or similar, each getting either the letters or the numbers.
>
> Regards,
>
> Marc Schwartz
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: separate numbers from chars in a string

Marc Schwartz-3

On Jul 31, 2014, at 3:17 AM, Uwe Ligges <[hidden email]> wrote:

>
>
> On 31.07.2014 04:46, carol white wrote:
>> There are some level of variation either chars followed by numbers or chars, numbers, chars
>>
>>
>> Perhaps, I should use gsub as you suggested all and if the string is composed of chars followed by numbers, it will return the 3rd part empty?
>
>
> Please read about regularvexpressions and describe your problem accurately. If the last strings are onot always present, use * rather than + at the very end of the regular expression.
>
> Best,
> Uwe Ligges


Carol,

As Uwe notes, reviewing the documentation for ?regex and the examples in ?gsub can be helpful. There are also online regex resources such as:

  http://www.regular-expressions.info

The question is how much variation might be present. If it will always be up to 3 possible components, then as Uwe indicated, using the '*' instead of '+' will allow for the possibility that one or more patterns will not be present. '*' means that 0 or more of the patterns must be present, whereas '+' requires that at least one or more matches are present.

> strsplit(gsub("([a-z]*)([0-9]*)([a-z]*)", "\\1 \\2 \\3", "absdfds0213451ab"), " ")
[[1]]
[1] "absdfds" "0213451" "ab"  

> strsplit(gsub("([a-z]*)([0-9]*)([a-z]*)", "\\1 \\2 \\3", "absdfds0213451"), " ")
[[1]]
[1] "absdfds" "0213451"

> strsplit(gsub("([a-z]*)([0-9]*)([a-z]*)", "\\1 \\2 \\3", "0213451ab"), " ")
[[1]]
[1] ""        "0213451" "ab"  


Using the 3 back references in the regex above will limit the parsing to up to 3 possible components. If you may have more than 3 you can increase the back reference sequence to some maximum number. However that can get tedious, so you may want to consider multiple passes using strsplit() to extract letters during one pass and then numbers during a second, or write a function to encapsulate that process.

Here are examples using strsplit():

# Get the numbers, using letters as the split
> strsplit("absdfds0213451ab", split = "[a-z]+")
[[1]]
[1] ""        "0213451"

> strsplit("absdfds0213451ab4567", split = "[a-z]+")
[[1]]
[1] ""        "0213451" "4567"  


# Get the letters, using numbers as the split
> strsplit("absdfds0213451ab", split = "[0-9]+")
[[1]]
[1] "absdfds" "ab"    

> strsplit("0213451ab", split = "[0-9]+")
[[1]]
[1] ""   "ab"

> strsplit("0213451ab123xyz789lmn", split = "[0-9]+")
[[1]]
[1] ""    "ab"  "xyz" "lmn"


Regards,

Marc


>
>
>> Regards,
>>
>> Carol
>>
>>
>> On Wednesday, July 30, 2014 10:52 PM, Marc Schwartz <[hidden email]> wrote:
>>
>>
>>
>> On Jul 30, 2014, at 3:13 PM, carol white <[hidden email]> wrote:
>>
>>> Hi,
>>> If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers?
>>>
>>> grep doesn't seem to be helpful
>>>
>>> grep("[a-z]","absdfds0213451ab", ignore.case=T)
>>> [1] 1
>>>
>>>
>>>   grep("[0-9]","absdfds0213451ab", ignore.case=T)
>>> [1] 1
>>>
>>> Thanks
>>>
>>> Carol
>>
>>
>> grep() will only tell you that a pattern is present. You want to use gsub() or similar with back references to return parts of the vector.
>>
>> Will they ALWAYS appear in that pattern (letters, numbers, letters) or is there some level of variation?
>>
>> If they will always appear as in your example, then one approach is:
>>
>>> strsplit(gsub("([a-z]+)([0-9]+)([a-z]+)", "\\1 \\2 \\3", "absdfds0213451ab"), " ")
>>
>> [[1]]
>> [1] "absdfds" "0213451" "ab"
>>
>>
>> The initial gsub() returns the 3 parts separated by a space, which is then used as the split argument to strsplit().
>>
>> If there will be some variation, you can use multiple calls to gsub() or similar, each getting either the letters or the numbers.
>>
>> Regards,
>>
>> Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.