It seems like R can’t find pdfinfo & pdftotext exe files, but not sure as
to why this would be the case despite xpdf files being copied into
‘C:\Program Files’ (Im using Windows 7 64bits)
I’m aware that ‘pdf_text’ function from pdftools package can extract texts
from PDF file and outputs into a string. But I was after something which is
able to convert PDF (ie transaction data) into a dataframe without regular
expression. Is tm package capable of doing this conversion? Are there any
other alternatives to these methods?
Your expertise in resolving this problem would be highly appreciated.
This is neither the Xpdf support forum nor the Windows Setup Program Reinvention support group... and you really need to read and follow the Posting Guide for the R mailing lists.
FWIW I would guess that you need to learn about environment variables and in particular about the PATH variable. There are subtleties about when and how they get defined that are OS-specific and certainly off topic here that may trip you up along the way. Alternatively, you may read the Xpdf documentation or a how-to blog about Xpdf that gives you a recipe, but again that is not about R. Once you can start a CMD shell and run the command directly then you are most of the way to getting R to invoke it.
Sent from my phone. Please excuse my brevity.
>Hi R users,
>I’m having some issues trying to extract texts from PDF file using tm
>Here are the steps that were carried out:
>1. Downloaded and installed the following programs:
>- Xpdf (Copied the ‘bin32’, ‘bin64’, ‘doc’ folders into ‘C:\Program
>Files\Xpdf’ directory; also added C:\Program
>C:\Program Files\Xpdf\bin64\pdftotext.exe in existing PATH
>2. Used the following scripts and the corresponding error messages:
># Directory where PDF files are stored
>>cname <- getwd()
>>Corpus(DirSource(cname), readerControl=list(reader = readPDF))
>Error in system2("pdftotext", c(control$text, shQuote(x), "-"), stdout
>'"pdftotext"' not found
> In addition: Warning message:
>running command '"pdfinfo" "C:\Users\R_Files\XXX.pdf"' had status 127
> FALSE FALSE
>It seems like R can’t find pdfinfo & pdftotext exe files, but not sure
>to why this would be the case despite xpdf files being copied into
>‘C:\Program Files’ (Im using Windows 7 64bits)
>I’m aware that ‘pdf_text’ function from pdftools package can extract
>from PDF file and outputs into a string. But I was after something
>able to convert PDF (ie transaction data) into a dataframe without
>expression. Is tm package capable of doing this conversion? Are there
>other alternatives to these methods?
>Your expertise in resolving this problem would be highly appreciated.
> [[alternative HTML version deleted]]
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.