Herve Dejean - 2013-12-16

Lines roughly correspond to TEXT tags. A simple concatenation of TOKEN content creates the line. TOKEN are generated since they carry typographical information for each token.

RE: forms, pdf2xml extracts information found in the PDF. Your PDF form is a set of text and graphical information. The form structure is not explicitly given. It has to be generated.