At the present time, the PDF reader just opens a document and displays it (near as I can tell). It's not opened in a fashion such that a SaveAs could write it back as, say, html, etc.
The folks at the greenstone project http://www.greenstone.org have a perl script that rewrites PDF files into an XML file and they archive that file.
One of the potential features of Multivalent that I see would be the ability to transform documents of all kinds into some common XML format.
PDF, as well as all the other document formats,
exist at runtime as a document tree that has
both geometric information and as much structural
information as the media adaptor was able to preserve and/or impute. It doesn't bridge the semantic gap between layout-oriented formats such as PDF and structure-based formats such as XML, but it does provide the same information that xpdf gives to the pdftohtml converter used by Greenstone.
Log in to post a comment.