Morten W. Petersen wrote:
> I'm trying to use the docutils package to format STX text as XHTML, and
> there's two issues.
>=20
> First, the text contains special characters, like =E6, =F8 and =E5; this
> isn't rendered correctly, as the "a v=E6re" turns into "=C3=A5 v=C3=A6re".
That looks like correct output, using the default output encoding of UTF-8.
If you want a different output encoding, you need to specify it.
> I've figured that one needs to pass a settings objects to the publish_str=
ing
> method with input_encoding set,
Docutils is probably correctly guessing the input encoding, using its
built-in heuristics in docutils.io.Input.decode. Latin-1 is the final
fall-back if nothing else works.
> but I can't figure out how to create a settings object. How is that done=
?
If you want to use publish_string, it's easier just to pass a dictionary
containing the settings you're interested in. For example::
output =3D docutils.core.publish_string(
..., settings_overrides=3D{'input_encoding': 'latin-1',
'output_encoding': 'latin-1'})
> Second, the published XHTML is a valid document, I only want the body
> of the XHTML returned, how can I do this?
This has been discussed but never implemented; see the first two items of
<http://docutils.sf.net/spec/notes.html#html-writer>. Also, the files in
<http://docutils.sf.net/sandbox/oliverr/ht> may be useful.
Without a new writer, this isn't possible with the publish_string function.
If you have access to the Writer object (it's the .writer attribute of a
Publisher object), you can access the individual parts of an HTML document.
To get access to the Publisher & Writer objects, you'd have to use
lower-level code, which publish_string shields you from.
-- David Goodger http://starship.python.net/~goodger
Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv
|