Help with cleaning a corpus

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Help with cleaning a corpus


I created a corpus and I started to clean through this piece of code:

txt <-tm_map(txt,removeWords, stopwords("spanish"))
txt <-tm_map(txt,stripWhitespace)
txt <-tm_map(txt,tolower)
txt <-tm_map(txt,removeNumbers)
txt <-tm_map(txt,removePunctuation)

But something happpended: some of the documents  in the corpus became empty, this is a problem when i try to make a document term matrix with tfidf.
Is there any way to eliminate  automatically a document if it become empty?

Or manually, how could i get the lenght of every document?

hope you can help me! thanks a lot