Mining non-english text

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Mining non-english text

saikiransunny
I am new to R programming and trying to mine this pdf file
http://164.100.180.82/Rollpdf/AC276/S24A276P001.pdf. This pdf file is in
non-English language and I'm not able to figure out how to proceed. And,
I'm not even sure how to extract information from a PDF file, so please
help!

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Mining non-english text

Loris Bennett-2
saikiran putta <[hidden email]> writes:

> I am new to R programming and trying to mine this pdf file
> http://164.100.180.82/Rollpdf/AC276/S24A276P001.pdf. This pdf file is in
> non-English language and I'm not able to figure out how to proceed. And,
> I'm not even sure how to extract information from a PDF file, so please
> help!
>
> [[alternative HTML version deleted]]
>

Nothing to do with R, but the command-line program pdftotxt might help
you to get going and is available for Linux and, apparently, for
Windows.  It can deal with various encodings.

Cheers,

Loris

--
This signature is currently under construction.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.