and applied to:
which was downloaded on:
produces what looks like a <TOKEN>...</TOKEN> element for each word.
For example, the attachment shows a portion of the xml output after
running thru xmlindent.
Could pdf2xml be modified so that words on same line are concatenated
in a single say,
The code here:
does that; hence, it must be possible.
Also, the f1040.pdf has many pdf form fields which don't appear in the
resulting .xml file produced by pdf2xml. Could pdf2xml be modified to
produce some type of xform fields, something like that shown here:
Thanks for all the work on this.
I'm a pretty good c++ programmer and I'm trying to understand pdf;
hence, maybe I could provide some help on these features.