I thought I replied to this earlier, but I don't see it in the
archives, so this may be a duplicate.
Pierre-Yves Delens wrote:
> My Docfactory setting is on ISO-8859-1, and works fine for [accented
> characters] etc.
>
> BUT I failed to find anything on the web about :
> Out of range 256 Error
> when processing in DocFactory a document containing some special
> characters copied-pasted from Word documents or HTML or RTF messages
> : [oe ligature? 2 chars], special apostrophes, special '-' or
> quotes, carets, [copyright sign], [registered sign]
ISO-8859-1 is a limited encoding, only handling Unicode code points up
to 255. Some of the characters you show above are beyond this range.
I'd guess your input encoding should be a Windows-specific code page,
cp-1252 or something similar. This link has info:
<http://www.bbsinc.com/iso8859.html>.
> Setting Docfactory on UTF-8 or 16 seems not to help.
For input or output?
> Are such strings supported by DocFactory, by Docutils standalone ?
All of Unicode is supported by Docutils. You just have to get the
encodings right.
> What should i do, better than detecting faulty characters prior to
> F7-Processing, which is counterproductive ?
Determine what encoding your input text is really using.
--
David Goodger http://starship.python.net/~goodger
For hire: http://starship.python.net/~goodger/cv
Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
|