From: Adam C. <ad...@ch...> - 2002-09-23 16:36:14
|
Hi. In the project I'm using docutils in, we've decided to convert the ReST sources to XML (with the docutils DTD) and then use some XSLT tools to transform this into different formats. [#]_ However, we discovered that the output from "quicktest.py -r" is not valid XML. There are two problems with the output: 1. The text contents are not encoded / decoded in any way; everything is just output verbatim. This is a problem if the input file is ISO-8859-1 and XML is supposed to be UTF-8. Some --input-encoding and --output-encoding options would be nice. 2. The "xml:space" attribute is used incorrectly. In the output one can find "xml:space=1"; the only valid values from xml:space are "default" and "preserved". What's the best way to fix these issues? Perhaps a dedicated XML writer is necessary (I hope not...)? .. [#] The reason for this, as opposed to writing a new docutils writer, is mainly because we have a developer that has a lot of experience with XML/XSLT and no experience with docutils or Python: we believe that this aproach will give us results faster, basically. --- Adam Chodorowski <ad...@ch...> Om du skiter kulor har du magproblem. Ät mer fibrer som asbest och knäckebröd. -- Brad S / Datormagaizin |
From: David G. <go...@us...> - 2002-09-24 02:33:30
|
Adam Chodorowski wrote: > In the project I'm using docutils in, we've decided to convert the > ReST sources to XML (with the docutils DTD) and then use some XSLT > tools to transform this into different formats. [#]_ ... > .. [#] The reason for this, as opposed to writing a new docutils > writer, is mainly because we have a developer that has a lot > of experience with XML/XSLT and no experience with docutils > or Python: we believe that this aproach will give us results > faster, basically. That's fair and reasonable. Whatever works! > However, we discovered that the output from "quicktest.py -r" is not > valid XML. That's probably not the best tool to use. Note the name, "quicktest.py". It's meant for quick & dirty output for parser testing purposes. There's no Reader or Writer involved, and no transforms are applied. See http://docutils.sf.net/docs/tools.html#quicktest-py . > What's the best way to fix these issues? Perhaps a dedicated XML > writer is necessary (I hope not...)? One already exists: docutils-xml.py. Try it instead of quicktest.py. I just tried running it on tools/test.txt and had some trouble. Fell victim to a bug in Python 2.2's StringIO.py. Moral: upgrade to 2.2.1! With 2.2.1 in place, it worked without problem. I've added an encoding attribute to the XML declaration (``<?xml version="1.0" encoding="utf-8"?>``) and a ``<!DOCTYPE document ...`` declaration. It would be easy enough to add an XSL stylesheet declaration as well; I'll wait for the need to arise. > There are two problems with the output: > > 1. The text contents are not encoded / decoded in any way; > everything is just output verbatim. This is a problem if the > input file is ISO-8859-1 and XML is supposed to be UTF-8. Some > --input-encoding and --output-encoding options would be nice. Patches are welcome, but I suspect that once you try docutils-xml.py you'll forget all about quicktest.py. > 2. The "xml:space" attribute is used incorrectly. In the output one > can find "xml:space=1"; the only valid values from xml:space are > "default" and "preserved". An oversight, now fixed ("preserve", actually). Thanks. The snapshot has been updated: http://docutils.sf.net/docutils-snapshot.tgz -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
From: Adam C. <ad...@ch...> - 2002-09-24 08:49:10
|
On Mon, 23 Sep 2002 22:37:04 -0400 David Goodger <go...@us...> wrote: [...] > > What's the best way to fix these issues? Perhaps a dedicated XML > > writer is necessary (I hope not...)? > > One already exists: docutils-xml.py. Try it instead of quicktest.py. For some weird reason I managed to miss that one. :-) Sorry about that... > I just tried running it on tools/test.txt and had some trouble. Fell > victim to a bug in Python 2.2's StringIO.py. Moral: upgrade to 2.2.1! > With 2.2.1 in place, it worked without problem. Good thing to know. > I've added an encoding attribute to the XML declaration (``<?xml > version="1.0" encoding="utf-8"?>``) and a ``<!DOCTYPE document ...`` > declaration. Thanks. > It would be easy enough to add an XSL stylesheet > declaration as well; I'll wait for the need to arise. AFAIK, we currently don't have any such need. We won't be serving the XML files dynamically or such, but rather do the transformation at "build time". [...] > Patches are welcome, but I suspect that once you try docutils-xml.py > you'll forget all about quicktest.py. Yes, docutils-xml.py works much better. :-) > > 2. The "xml:space" attribute is used incorrectly. In the output one > > can find "xml:space=1"; the only valid values from xml:space are > > "default" and "preserved". > > An oversight, now fixed ("preserve", actually). Thanks. The snapshot > has been updated: http://docutils.sf.net/docutils-snapshot.tgz Great! --- Adam Chodorowski <ad...@ch...> BTW, I made the statistics up. I read somewhere that 60% of statistics are made up on the spot :-) -- Phill Wooller |
From: Adam C. <ad...@ch...> - 2002-09-24 09:47:20
|
On Mon, 23 Sep 2002 22:37:04 -0400 David Goodger <go...@us...> wrote: [...] > I've added an encoding attribute to the XML declaration (``<?xml > version="1.0" encoding="utf-8"?>``) and a ``<!DOCTYPE document ...`` > declaration. It would be easy enough to add an XSL stylesheet > declaration as well; I'll wait for the need to arise. Hmmm, the DOCTYPE declaration is causing some problems for us. Using Internet Explorer or XMLmind XML Editor, the first error is about "SYSTEM" in the DOCTYPE:: A string literal was expected, but no opening quote character was found. Removing the SYSTEM, so you have:: <!DOCTYPE document PUBLIC "+//IDN docutils.sourceforge.net//DTD Docutils \ Generic//EN//XML" "http://docutils.sourceforge.net/spec/docutils.dtd"> This works better, but when it then tries to parse the DTD it also complains:: Invalid character in content model. Error processing resource \ 'http://docutils.sourceforge.net/spec/docutils.dtd'. Line 433, Position 56 <!ELEMENT figure (image, ((caption, legend?) | legend) > I tried running the generated XML file through the XML validator at http://www.stg.brown.edu/service/xmlvalid/, and it resulted in a *lot* of errors and warnings. :-( --- Adam Chodorowski <ad...@ch...> There are two major products that come from Berkeley: LSD and UNIX. We don't believe this to be a coincidence. -- Jeremy S. Anderson |
From: David G. <go...@us...> - 2002-09-25 03:05:49
|
Adam Chodorowski wrote: > Hmmm, the DOCTYPE declaration is causing some problems for us. > Using Internet Explorer or XMLmind XML Editor, the first error is > about "SYSTEM" in the DOCTYPE:: I keep forgetting that the "SYSTEM" is not necessary. There are subtle differences between SGML and XML. > This works better, but when it then tries to parse the DTD it also > complains:: > > Invalid character in content model. Error processing resource \ > 'http://docutils.sourceforge.net/spec/docutils.dtd'. Line 433, > Position 56 > <!ELEMENT figure (image, ((caption, legend?) | legend) > A missing close-parenthesis, now fixed. I also fixed some "%number;" parameter entities that were mistakenly written "&number;". I've been using the DTD as a convenient notation, but I've never validated it or actually used the XML produced, so I'm not surprised there were bugs. > I tried running the generated XML file through the XML validator at > http://www.stg.brown.edu/service/xmlvalid/, and it resulted in a > *lot* of errors and warnings. :-( I wasn't aware of that resource. I'll try it on Docutils output and see what it says. Of course, patches & fixes are always welcome! -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
From: David G. <go...@us...> - 2002-10-03 22:29:08
|
After the following commands:: cd docutils/tools docutils-xml.py test.txt test.xml I tried running the result (located at http://docutils.sf.net/tools/test.xml) through the XML validator at http://www.stg.brown.edu/service/xmlvalid/. It picked up a content model bug in the parser, which I fixed. When I run the XML output through the validator now, it still reports lots of warnings, but no errors. Examining the warnings carefully (and referring to the XML spec) reveals that they're all bogus. So I'm quite confident that Docutils is producing valid XML now. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |