Some of you know how to convert the content of a pdf to text?
I mean normally in the method "function handleDocumentInfo($DocInfo)" I get the content of the web site with the instruction: $content = $DocInfo->content. but if the content is from a pdf file I will have to convert it from pdf to text.
Best regards.
Jorge von Rudno
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi everybody.
Some of you know how to convert the content of a pdf to text?
I mean normally in the method "function handleDocumentInfo($DocInfo)" I get the content of the web site with the instruction: $content = $DocInfo->content. but if the content is from a pdf file I will have to convert it from pdf to text.
Best regards.
Jorge von Rudno
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi Jorge,
just do a google-search, there are several possibilities.
If you are on a linux-machine, you can use ghsotscript for example (on windows too i think).
Some distributions even come along with a tool for that, for instance: http://manpages.ubuntu.com/manpages/precise/man1/pdf2txt.1.html
Last edit: Anonymous 2014-01-08
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi,
Thanks a lot for your answer. At the moment I am trying with a tool Called "class.pdf2text.pdf" Anyway I will check your suggestion.
Best regards.
Jorge von Rudno