Sorry if this subject is more MS-DOS oriented than anything else.
I am trying to extract the text out of some pdf files on a Windows PC.
Using the "system symbol" (MS-DOS console) I have managed to "pipe" (ala UNIX | command) the output of ExtractText into a text file with this command:
java -classpath H:\TMP\PDF\Multivalent20040415.jar tool.doc.ExtractText -verbose SAMPLE.pdf | TEXTO.txt
The extracted text shows up in a notepad file, and I can then copy and paste it into a text editor to remove the funky return characters. Fine. Thanks. Love it.
I wonder if anyone knows how to automate it a little more in order to work with a folder full of pdfs, as using the above command only "pipes" the first pdf file into the text file, seemengly ignoring the rest.
Also if there are any other file formats available for output, I have tried .rtf, .doc, .htm to no avail.
MS-DOS does have its own scripting tools, and you use them to build batch processing files (.BAT). They are not particularly extensive, or elegant when compared to Unix shell scripts, but you should be able to do what you hope to, here.
The following URL has some brief guidelines on how to use the handful of DOS scripting commands. http://www.cs.ntu.edu.au/homepages/bea/home/subjects/ith305/description.html
The FOR command will probably do what you want it to.
Instead of sending to a pipe, redirect to a file.
(I'm actually surprised the pipe worked)
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.