Menu

Implementing a Character Set

Developer
Anonymous
2003-10-25
2004-01-07
  • Anonymous

    Anonymous - 2003-10-25

    Hi!

    Great piece of software! However, I read somewhere it supports only Latin 1 character set. I would also like it to support Windows-1250 character set.

    Which is the fastest way to enable this?

     
    • B Sizer

      B Sizer - 2003-10-27

      I expect it's not Latin-1 specific, but ASCII-specific. It will probably support whichever codeset your locale is currently set to. If you need it to support wide characters, you probably need to change the #define for TIXML_STRING from std::string to std::wstring, but I expect that will break other parts of the code.

       
    • Anonymous

      Anonymous - 2003-11-23

      it works with extended ascii (european chars) with a small modification :

      line 234, tinyxmlparser.cpp:
      else if ( *p==' '/*isspace( *p )*/ )

      so, the alien characters (because *yes* we are aliens!) won't be skipped.

      -sbrt

       
      • B Sizer

        B Sizer - 2003-11-23

        However, doing that will probably break your code when it encounters a tab, newline, or carriage return in that position, no?

         
    • Anonymous

      Anonymous - 2003-12-03

      As this code is intendeed to forget "empty" characters surrounding the meat of real words, the tab, newline, etc just won't be skipped, which is not very important. Anyway, it can't make the code crash or the output XML be corrupted.

      I've just tested my app ... and it just works fine.

      Know that many people use xml for config files, with installation/current/whatever folders, and filenames. And extended ascii can be in these paths/filenames in european/asian systems ...

      It was a pain to find a small, stable and extended-ascii-compliant XML IO library ... Thanks to tinyXML (with the slight modification above). I used expat/sxp before and I had to extend all asian/european strings with two blank chars, otherwise it simply crashed.

      -sbrt

       
      • B Sizer

        B Sizer - 2003-12-05

        Ok, I just took a closer look at the code.

        It already checks for newlines and carriage returns before the isspace check, so you're ok there. However if you want your tabs to be properly converted then just add:

        || *p == '\t'

        to the carriage return checks above. I don't remember the code for a form feed but it's so rarely used that I doubt it matters.

        Incidentally, for most (if not all) users, the separate carriage return checks are unnecessary anyway, as they are covered by isspace().

        However, I would point out that if 'isspace' is treating your extended ASCII as spaces, there's probably something else wrong. For example, maybe you have the wrong locale set in tinyxml/the application using it. You may just need to call setlocale(LC_ALL, "french") for example, choosing the appropriate language/region string.

         
    • Anonymous

      Anonymous - 2004-01-07

      Argh! You're right, and I've wasted hours just because of my ignorance about setLocale :-/

      But it is not require to specify "french", etc. The simplest is to call at app startup :
      setlocale(LC_ALL,"");
      Which, according to MSDN:
      "Sets the locale to the default, which is the system-default ANSI code page obtained from the operating system."

      -sbrt

       

Log in to post a comment.

MongoDB Logo MongoDB