From: David G. <go...@us...> - 2002-09-24 02:33:30
|
Adam Chodorowski wrote: > In the project I'm using docutils in, we've decided to convert the > ReST sources to XML (with the docutils DTD) and then use some XSLT > tools to transform this into different formats. [#]_ ... > .. [#] The reason for this, as opposed to writing a new docutils > writer, is mainly because we have a developer that has a lot > of experience with XML/XSLT and no experience with docutils > or Python: we believe that this aproach will give us results > faster, basically. That's fair and reasonable. Whatever works! > However, we discovered that the output from "quicktest.py -r" is not > valid XML. That's probably not the best tool to use. Note the name, "quicktest.py". It's meant for quick & dirty output for parser testing purposes. There's no Reader or Writer involved, and no transforms are applied. See http://docutils.sf.net/docs/tools.html#quicktest-py . > What's the best way to fix these issues? Perhaps a dedicated XML > writer is necessary (I hope not...)? One already exists: docutils-xml.py. Try it instead of quicktest.py. I just tried running it on tools/test.txt and had some trouble. Fell victim to a bug in Python 2.2's StringIO.py. Moral: upgrade to 2.2.1! With 2.2.1 in place, it worked without problem. I've added an encoding attribute to the XML declaration (``<?xml version="1.0" encoding="utf-8"?>``) and a ``<!DOCTYPE document ...`` declaration. It would be easy enough to add an XSL stylesheet declaration as well; I'll wait for the need to arise. > There are two problems with the output: > > 1. The text contents are not encoded / decoded in any way; > everything is just output verbatim. This is a problem if the > input file is ISO-8859-1 and XML is supposed to be UTF-8. Some > --input-encoding and --output-encoding options would be nice. Patches are welcome, but I suspect that once you try docutils-xml.py you'll forget all about quicktest.py. > 2. The "xml:space" attribute is used incorrectly. In the output one > can find "xml:space=1"; the only valid values from xml:space are > "default" and "preserved". An oversight, now fixed ("preserve", actually). Thanks. The snapshot has been updated: http://docutils.sf.net/docutils-snapshot.tgz -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |