Currently using pdftoxml.win32.1.2.7.exe to extract tabular content from PDFs containing thousands of pages. The product works very well but I have one request.
The output files are given a fixed name (pageNum-???.xml) which is fine, but I send the output to another program (cross-platform) which must process the files in order. I have found that the sequencing of files is not consistant (pageNum-100.xml processed before pageNum-8.xml).
A command line switch or switches to designate the file naming convention would be cool, something on the order of pfx=pageNum, numPic=0000 such that the resulting filenames would be
pageNum-0100.xml and pageNum-0009.xml ensuring that they are sorted correctly.
I currently use a Java program to modify the filenames, but it would be nice if I didn't have to.
Thanks
Anonymous