#62 Enhance Poco::TextEncoding functionality

  1. Add to the Poco::TextEncoding and all it's seccessor two virtual methods:

class TextEncoding {
virtual const char* canonicalName() const = 0;
virtual bool isA(const std::string& encodingName) const = 0;

canonicalName returns "canonical" (i.e. standard) name for this encoding.
isA returns true if requested name is one of the names for this encoding (for example Cyrillic Windows character set is known as windows-1251 and cp1251(cp-1251)).

  1. add to the Poco::TextEncoding five global static methods. All this methods assume that all TextEncoding objects are thread safe and stateless (that is true by the fact, and I yet don't see a case at which it would be a mistake):

class TextEncoding {
static TextEncoding& byName(const std::string& encodingName, bool _throw = true);
static void add(const std::string& encodingName, TextEncoding& encoding);
static void remove(const std::string& encodingName);
static TextEncoding& global(TextEncoding& encoding);
static TextEncoding& global();

byName method returns encoding object for given encoding name.
add method register new global encoding object with given name.
remove method unregister encoding object with given name.
global method with parameter register new global encoding object as default encoding (as std::locale::global does) and returns previous default encoding.
global method returns current default encoding.

All Poco encodings are registered by default and UTF-8 is global default encoding. Thus this subsystem don't need to be initialized explicitly.

  1. In XML::ParserEngine we add logic to use this subsystem in addition to exists encoding recognition rules (ParserEngine::handleUnknownEncoding).

1. We once register all known TextEncoding with their names, and then only we use them.
2. We do not have necessity to remember, with what encoding we work. Always it is possible to learn its name using the general interface (in XML writing for example).
3. It is easy to add new encodings in library without necessity of rewriting program code for their use. As it is easy to add own encodings in one place of the program code, and then to use everywhere how if they were a part of library.
4. Presence of the information on names of encodings and their alternatives. I had to spend some time to find for the information only about Russian encodings. And I cannot guarantee, that have found all variants. It is good that this information collected in library for general use.
5. If this patch will be accepted, I can add still all the variants of Russian encodings found by me with their realization :-)

Sergey N. Yatskevichsnc@begun.ru


  • Logged In: YES
    Originator: NO


    this sounds really useful, and I will incorporate it into the upcoming 1.3 release. I'd like to have a few minor API changes, though. But more on that later.

    Thank you,


  • Logged In: NO

    I am glad :-) I hope, that changes of the API will be not too significant as we already use this functions in a number of programs.

    P.S. I have forgotten to write, that this patch must be applied after fix-encoding patch.

    Sergey N. Yatskevichsnc@begun.ru

  • Alex Fabijanic
    Alex Fabijanic

    Logged In: YES
    Originator: NO

    I am in favor of this proposal - I actually wanted to ask for 1) because I need it for the servlet API. However, maybe a better place for 2) is Util, with the other global settings like Application::logger() and Application::config().


  • Logged In: NO

    Two reasons for me of placing of all functionality in TextEncoding :

    • I designed the given decision on an image std::locale :-), where all supported locales and global functions for work with them are part of basic library.

    • functionality 2) is used in the XML-module (for input encodings handling) about which the Util-module depends. Carry 2) in Util will create ugly cyclic dependence between XML and Util modules.

    Sergey N. Yatskevichsnc@begun.ru

  • Logged In: YES
    Originator: NO

    I have added this to 1.3. Interface is mostly the same as your proposal with the exception of the add() member function, which has the two parameters swapped, and byName(), which does not have the second boolean argument (NULL references make me feel quite uncomfortable). Now there is a find() member function instead. The implementation of TextEncoding is also slightly different. The changes will be in the SVN trunk soon.