From: David G. <go...@py...> - 2003-03-02 16:51:02
|
Morten W. Petersen wrote: > I'm trying to use the docutils package to format STX text as XHTML, and > there's two issues. >=20 > First, the text contains special characters, like =E6, =F8 and =E5; this > isn't rendered correctly, as the "a v=E6re" turns into "=C3=A5 v=C3=A6re". That looks like correct output, using the default output encoding of UTF-8. If you want a different output encoding, you need to specify it. > I've figured that one needs to pass a settings objects to the publish_str= ing > method with input_encoding set, Docutils is probably correctly guessing the input encoding, using its built-in heuristics in docutils.io.Input.decode. Latin-1 is the final fall-back if nothing else works. > but I can't figure out how to create a settings object. How is that done= ? If you want to use publish_string, it's easier just to pass a dictionary containing the settings you're interested in. For example:: output =3D docutils.core.publish_string( ..., settings_overrides=3D{'input_encoding': 'latin-1', 'output_encoding': 'latin-1'}) > Second, the published XHTML is a valid document, I only want the body > of the XHTML returned, how can I do this? This has been discussed but never implemented; see the first two items of <http://docutils.sf.net/spec/notes.html#html-writer>. Also, the files in <http://docutils.sf.net/sandbox/oliverr/ht> may be useful. Without a new writer, this isn't possible with the publish_string function. If you have access to the Writer object (it's the .writer attribute of a Publisher object), you can access the individual parts of an HTML document. To get access to the Publisher & Writer objects, you'd have to use lower-level code, which publish_string shields you from. -- David Goodger http://starship.python.net/~goodger Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv |