POS counting number of verbs

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

POS counting number of verbs

R help mailing list-2
Hi all,
I have 16630 Messages in my data frame and I would like to count number of verbs in each message, to do so I have the following code:

> str(tar)
'data.frame': 16630 obs. of  2 variables:
$ Message            : Factor w/ 13412 levels "","'alter database  datafile' needs to be executed",..: 11163 1 9715 10110 9683 11364 12952 2242 7153 6907 ...
$ group                   : Factor w/ 16630 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

> tagPOS <-  function(x, ...) {+     s <- as.String(x)+     word_token_annotator <- Maxent_Word_Token_Annotator()+     a2 <- Annotation(1L, "sentence", 1L, nchar(s))+     a2 <- annotate(s, word_token_annotator, a2)+     a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)+     a3w <- a3[a3$type == "word"]+     POStags <- unlist(lapply(a3w$features, `[[`, "POS"))+     POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")+     list(POStagged = POStagged, POStags = POStags)+ }> count_verbs <-function(x) {+     pos_tags <- tagPOS(x)$POStags+     sum(grepl("VB", pos_tags))+ }> library(dplyr)> tar %>% +     group_by(group) %>%+     summarise(num_verbs = count_verbs(Message))
And here is the error I get:Error in summarise_impl(.data, dots) :   Evaluation error: no word token annotations found.

Does anyone know about this error? Thanks for any help.Elahe
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: POS counting number of verbs

R help mailing list-2
Hi Elahe,

First, please post in plain text and make sure the code in your message
is properly formatted.

Second, please state which libraries you are using.

Third, your problem appears to be with empty Messages, this can be
resolved as follows.

 >>>>>>>>>>>
tar <- data.frame(Message=c("","'alter database  datafile' needs to be
executed"), group=c("1","2"),
                   stringsAsFactors=TRUE)
str(tar)

library(openNLP)
library(NLP)

tagPOS <- function(x, ...) {
   s <- as.String(x)
   if(s=="") return(list())
   word_token_annotator <- Maxent_Word_Token_Annotator()
   a2 <- Annotation(1L, "sentence", 1L, nchar(s))
   a2 <- annotate(s, word_token_annotator, a2)
   a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
   a3w <- a3[a3$type == "word"]
   POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
   POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
   list(POStagged = POStagged, POStags = POStags)
}

count_verbs <-function(x) {
   pos_tags <- tagPOS(x)$POStags
   sum(grepl("VB", pos_tags))
}

library(dplyr)

tar %>% group_by(group) %>% summarise(num_verbs = count_verbs(Message))
<<<<<<<<<<<<<<<<

Rgds,
Robert

On 05/11/18 12:38, Elahe chalabi via R-help wrote:

> Hi all, I have 16630 Messages in my data frame and I would like to
> count number of verbs in each message, to do so I have the following
> code:
>> str(tar)
> 'data.frame': 16630 obs. of  2 variables: $ Message            :
> Factor w/ 13412 levels "","'alter database  datafile' needs to be
> executed",..: 11163 1 9715 10110 9683 11364 12952 2242 7153 6907 ... $
> group                   : Factor w/ 16630 levels "1","2","3","4",..: 1
> 2 3 4 5 6 7 8 9 10 ...
>> tagPOS <-  function(x, ...) {+  s <- as.String(x)+   
>>  word_token_annotator <- Maxent_Word_Token_Annotator()+     a2 <-
>> Annotation(1L, "sentence", 1L, nchar(s))+     a2 <- annotate(s,
>> word_token_annotator, a2)+     a3 <- annotate(s,
>> Maxent_POS_Tag_Annotator(), a2)+     a3w <- a3[a3$type == "word"]+   
>>  POStags <- unlist(lapply(a3w$features, `[[`, "POS"))+     POStagged
>> <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")+   
>>  list(POStagged = POStagged, POStags = POStags)+ }> count_verbs
>> <-function(x) {+  pos_tags <- tagPOS(x)$POStags+     sum(grepl("VB",
>> pos_tags))+ }> library(dplyr)> tar %>% +  group_by(group) %>%+   
>>  summarise(num_verbs = count_verbs(Message))
> And here is the error I get:Error in summarise_impl(.data, dots) :  
> Evaluation error: no word token annotations found. Does anyone know
> about this error? Thanks for any help.Elahe [[alternative HTML version
> deleted]] ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
> posting guide http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.