|
From: David G. <go...@py...> - 2003-10-19 22:07:28
|
I thought I replied to this earlier, but I don't see it in the archives, so this may be a duplicate. Pierre-Yves Delens wrote: > My Docfactory setting is on ISO-8859-1, and works fine for [accented > characters] etc. > > BUT I failed to find anything on the web about : > Out of range 256 Error > when processing in DocFactory a document containing some special > characters copied-pasted from Word documents or HTML or RTF messages > : [oe ligature? 2 chars], special apostrophes, special '-' or > quotes, carets, [copyright sign], [registered sign] ISO-8859-1 is a limited encoding, only handling Unicode code points up to 255. Some of the characters you show above are beyond this range. I'd guess your input encoding should be a Windows-specific code page, cp-1252 or something similar. This link has info: <http://www.bbsinc.com/iso8859.html>. > Setting Docfactory on UTF-8 or 16 seems not to help. For input or output? > Are such strings supported by DocFactory, by Docutils standalone ? All of Unicode is supported by Docutils. You just have to get the encodings right. > What should i do, better than detecting faulty characters prior to > F7-Processing, which is counterproductive ? Determine what encoding your input text is really using. -- David Goodger http://starship.python.net/~goodger For hire: http://starship.python.net/~goodger/cv Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) |