Sentence Splitting using R's openNLP library is not efficient

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Sentence Splitting using R's openNLP library is not efficient

anshuji

Please suggest any efficient way/code of splitting text into sentences in R.

Currently, I’m using openNLP library for the same, it is taking several hours to process 8,000+ records of twitter post/comments.

 

Below is my R code for same:

 

options(java.parameters = "-Xmx4g")

 

library("NLP"); library("openNLPdata"); library("openNLP")

sentence_token_annotator <- Maxent_Sent_Token_Annotator()

convert_text_to_sentences <- function(text) {

text <- as.String(text)

sentence.boundaries <- annotate(text, sentence_token_annotator)

sentences <- text[sentence.boundaries]

return(sentences)

}

 

system.time(textofcomment_list <- lapply(data_all$TEXT, convert_text_to_sentences))

 

Thanks in advance

 

Disclaimer: "The materials contained in this email and any attachments may contain confidential or legally privileged information. The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. Sonata is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt"