Help for pdf conversion

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Help for pdf conversion

Thomas Subia-2
Colleagues,

I'm trying to convert a pdf to a text file with the following code.

# pdf to excel
library(pdftools) # pdf to excel library
# set working directory
setwd("C:/Users")
# input pdf
txt <- pdf_text("C:/Users/10619.pdf")
cat(txt[1])
write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE)

When I examine the contents of cat(txt[1]) on the console, everything I need is displayed in the format I need.

However when I execute write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE) and examine the output, my output does not match cat(txt[1]).
I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might be the error.

How can one output the contents of cat(txt[1]) and retain its format?

Thomas Subia



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help for pdf conversion

Jim Lemon-4
Hi Thomas,
Perhaps you should be doing something like writeLines(txt[1],...) or just:

sink("10619.txt")
cat(txt[1])
sink()

Jim

On Thu, Oct 31, 2019 at 4:48 PM Thomas Subia <[hidden email]> wrote:

>
> Colleagues,
>
> I'm trying to convert a pdf to a text file with the following code.
>
> # pdf to excel
> library(pdftools) # pdf to excel library
> # set working directory
> setwd("C:/Users")
> # input pdf
> txt <- pdf_text("C:/Users/10619.pdf")
> cat(txt[1])
> write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE)
>
> When I examine the contents of cat(txt[1]) on the console, everything I need is displayed in the format I need.
>
> However when I execute write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE) and examine the output, my output does not match cat(txt[1]).
> I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might be the error.
>
> How can one output the contents of cat(txt[1]) and retain its format?
>
> Thomas Subia
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help for pdf conversion

Thomas Subia-2
Jim,

That works well!
Thanks again for your help!

Thomas Subia

-----Original Message-----
From: Jim Lemon <[hidden email]>
Sent: Wednesday, October 30, 2019 11:14 PM
To: Thomas Subia <[hidden email]>
Cc: [hidden email]
Subject: Re: [R] Help for pdf conversion

Hi Thomas,
Perhaps you should be doing something like writeLines(txt[1],...) or just:

sink("10619.txt")
cat(txt[1])
sink()

Jim

On Thu, Oct 31, 2019 at 4:48 PM Thomas Subia <[hidden email]> wrote:

>
> Colleagues,
>
> I'm trying to convert a pdf to a text file with the following code.
>
> # pdf to excel
> library(pdftools) # pdf to excel library # set working directory
> setwd("C:/Users")
> # input pdf
> txt <- pdf_text("C:/Users/10619.pdf")
> cat(txt[1])
> write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names
> =TRUE,col.names =FALSE)
>
> When I examine the contents of cat(txt[1]) on the console, everything I need is displayed in the format I need.
>
> However when I execute write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE) and examine the output, my output does not match cat(txt[1]).
> I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might be the error.
>
> How can one output the contents of cat(txt[1]) and retain its format?
>
> Thomas Subia
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.