text mining

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

text mining

rgui
Hi,

I have a problem when indexing the corpus. I used the following syntax:

> Setwd ("c :/....")
> Library (tm)
> Txt = Corpus (DirSource ("."); readerControl = list (language = "frensh"))

an error message comes:

>>> Messages d'avis :
1: In readLines(y, encoding = x$Encoding) :
  ligne finale incomplète trouvée dans './n3.txt'
2: In readLines(y, encoding = x$Encoding) :
  ligne finale incomplète trouvée dans './n32.

another question:
 how can I read different document types (. pdf,. "...) html using the package "tm"?

Thanks very well for help

Reply | Threaded
Open this post in threaded view
|

Re: text mining

Duncan Murdoch-2
On 30/05/2011 6:17 AM, rgui wrote:
> Hi,
>
> I have a problem when indexing the corpus. I used the following syntax:
>
> >  Setwd ("c :/....")
> >  Library (tm)
> >  Txt = Corpus (DirSource ("."); readerControl = list (language = "frensh"))
>
Capitalization is important in R, so when asking a question, please cut
and paste what you actually did.  In this case, it doesn't matter.

> an error message comes:
>
> >>>  Messages d'avis :
> 1: In readLines(y, encoding = x$Encoding) :
>    ligne finale incomplète trouvée dans './n3.txt'
> 2: In readLines(y, encoding = x$Encoding) :
>    ligne finale incomplète trouvée dans './n32.

Those are warnings, not errors.   readLines gives those warnings when
the last line of the file stops abruptly, rather than having an end of
line marker.  On Unix systems this usually signals a problem with the
file.  Windows is more tolerant, so many editors don't bother to add the
final marker.
> another question:
>   how can I read different document types (. pdf,. "...) html using the
> package "tm"?

I think you need to convert them to text first (by some tool outside of
R), but I might be wrong.

Duncan Murdoch

> Thanks very well for help
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3560367p3560367.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.