Question about PDF

Jack Park
    Jack Park - 2002-03-14

    At the present time, the PDF reader just opens a document and displays it (near as I can tell).  It's not opened in a fashion such that a SaveAs could write it back as, say, html, etc.

    The folks at the greenstone project have a perl script that rewrites PDF files into an XML file and they archive that file. 

    One of the potential features of Multivalent that I see would be the ability to transform documents of all kinds into some common XML format. 

    Any comments?
    • Tom Phelps

      Tom Phelps - 2002-03-15

      PDF, as well as all the other document formats,
      exist at runtime as a document tree that has
      both geometric information and as much structural
      information as the media adaptor was able to preserve and/or impute.  It doesn't bridge the semantic gap between layout-oriented formats such as PDF and structure-based formats such as XML, but it does provide the same information that xpdf gives to the pdftohtml converter used by Greenstone.



