Searching for Enumerated Items using str_count() from the stringr package

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching for Enumerated Items using str_count() from the stringr package

Dan Abner
Hi all,

I have a large number of text strings to search for enumerated items.
However, I am receiving this error message even though I thought that I
properly escaped the special character closed parenthesis:


> Count<-str_count(text3,keywords)
Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) :
  Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)


===

Here is example code:


text1<-"This is a list:
1) Number 1
2) Etc
3) Etc"

text2<-"This is NOT a list:
Blah, blah, blah
Blah, blah, blah"

text3<-c(text1,text2)
text3

{keywords<-c(paste(0:9,"\\)"),paste(0:9,"\\)",sep=""),
paste(0:9,"."),paste(0:9,".",sep=""),"-","*")}

keywords

Count<-str_count(text3,keywords)

===

I am looking for Count<-c(3,0)

Any suggestions?

Thanks!

Dan

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Searching for Enumerated Items using str_count() from the stringr package

Tóth Dénes


On 09/28/2017 10:25 PM, Dan Abner wrote:

> Hi all,
>
> I have a large number of text strings to search for enumerated items.
> However, I am receiving this error message even though I thought that I
> properly escaped the special character closed parenthesis:
>
>
>> Count<-str_count(text3,keywords)
> Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) :
>    Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
>
>
> ===
>
> Here is example code:
>
>
> text1<-"This is a list:
> 1) Number 1
> 2) Etc
> 3) Etc"
>
> text2<-"This is NOT a list:
> Blah, blah, blah
> Blah, blah, blah"
>
> text3<-c(text1,text2)
> text3
>
> {keywords<-c(paste(0:9,"\\)"),paste(0:9,"\\)",sep=""),
> paste(0:9,"."),paste(0:9,".",sep=""),"-","*")}
>

You should carefully read the docs, see ?regexp.
You really do not want to pass a multi-element vector as 'keywords' in
this case, but instead:

stri_count_regex(text3, "[0-9]+\\) ")

or:

stri_count_regex(text3, "[[:digit:]]+\\) ")

BTW, I do not understand why to use the stringr package if it is just a
wrapper around the stringi package.

Regards,
Denes




> keywords
>
> Count<-str_count(text3,keywords)
>
> ===
>
> I am looking for Count<-c(3,0)
>
> Any suggestions?
>
> Thanks!
>
> Dan
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Dr. Tóth Dénes ügyvezető
Kogentum Kft.
Tel.: 06-30-2583723
Web: www.kogentum.hu

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Searching for Enumerated Items using str_count() from the stringr package

Tóth Dénes


On 09/29/2017 12:02 AM, Tóth Dénes wrote:

>
>
> On 09/28/2017 10:25 PM, Dan Abner wrote:
>> Hi all,
>>
>> I have a large number of text strings to search for enumerated items.
>> However, I am receiving this error message even though I thought that I
>> properly escaped the special character closed parenthesis:
>>
>>
>>> Count<-str_count(text3,keywords)
>> Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) :
>>    Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
>>
>>
>> ===
>>
>> Here is example code:
>>
>>
>> text1<-"This is a list:
>> 1) Number 1
>> 2) Etc
>> 3) Etc"
>>
>> text2<-"This is NOT a list:
>> Blah, blah, blah
>> Blah, blah, blah"
>>
>> text3<-c(text1,text2)
>> text3
>>
>> {keywords<-c(paste(0:9,"\\)"),paste(0:9,"\\)",sep=""),
>> paste(0:9,"."),paste(0:9,".",sep=""),"-","*")}
>>
>
> You should carefully read the docs, see ?regexp.
> You really do not want to pass a multi-element vector as 'keywords' in
> this case, but instead:
>
> stri_count_regex(text3, "[0-9]+\\) ")
>
> or:
>
> stri_count_regex(text3, "[[:digit:]]+\\) ")
>

Ah, now I see what you were after: enumerations are not in a standard
format, so "1) " can be "1)", "1.", "1 .".

In this case:
text <- "1)Hello\n2.Hi\n3 .Cheers"
keywords <- "[0-9]+(\\)| *?\\.)"
stri_count_regex(text, keywords)

Note the '|' sign in the keyword definition. It means OR in this
context. So literally the regexp expression above can be translated as:
A digit or a digit string followed by a parenthesis, or by arbitrary
number of spaces (even 0) before a dot.

HTH,
Denes

> BTW, I do not understand why to use the stringr package if it is just a
> wrapper around the stringi package.
>
> Regards,
> Denes
>
>
>
>
>> keywords
>>
>> Count<-str_count(text3,keywords)
>>
>> ===
>>
>> I am looking for Count<-c(3,0)
>>
>> Any suggestions?
>>
>> Thanks!
>>
>> Dan
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

--
Dr. Tóth Dénes ügyvezető
Kogentum Kft.
Tel.: 06-30-2583723
Web: www.kogentum.hu

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.