Re: [Docutils-users] publish_string, inline XHTML and international characters & settings

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Morten W. Petersen wrote:
> I'm trying to use the docutils package to format STX text as XHTML, and
> there's two issues.
>=20
> First, the text contains special characters, like =E6, =F8 and =E5; this
> isn't rendered correctly, as the "a v=E6re" turns into "=C3=A5 v=C3=A6re".

That looks like correct output, using the default output encoding of UTF-8.
If you want a different output encoding, you need to specify it.

> I've figured that one needs to pass a settings objects to the publish_str=
ing
> method with input_encoding set,

Docutils is probably correctly guessing the input encoding, using its
built-in heuristics in docutils.io.Input.decode.  Latin-1 is the final
fall-back if nothing else works.

> but I can't figure out how to create a settings object.  How is that done=
?

If you want to use publish_string, it's easier just to pass a dictionary
containing the settings you're interested in.  For example::

    output =3D docutils.core.publish_string(
        ..., settings_overrides=3D{'input_encoding': 'latin-1',
                                 'output_encoding': 'latin-1'})

> Second, the published XHTML is a valid document, I only want the body
> of the XHTML returned, how can I do this?

This has been discussed but never implemented; see the first two items of
<http://docutils.sf.net/spec/notes.html#html-writer>.  Also, the files in
<http://docutils.sf.net/sandbox/oliverr/ht> may be useful.

Without a new writer, this isn't possible with the publish_string function.
If you have access to the Writer object (it's the .writer attribute of a
Publisher object), you can access the individual parts of an HTML document.
To get access to the Publisher & Writer objects, you'd have to use
lower-level code, which publish_string shields you from.

-- David Goodger    http://starship.python.net/~goodger

Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv