Menu

Extended ASCII codes

2005-09-06
2013-05-20
  • Leon Zandman

    Leon Zandman - 2005-09-06

    Hi,

    For my project I need to able to read XML files containing French language text. When the XML-file contains the word "Franais", it parses the c-cedille (a "c" character with a little tail on the bottom) wrongly. I tried replacing the XML-code with ç and ! and even ‡ but that didn't work. When I print the values the following appears:

    Fran┬ais

    Does anybody know how this is solved?

     
    • Leon Zandman

      Leon Zandman - 2005-09-06

      Oops, SourceForge also doesn't know how to handle this stuff and prints it out wrong.

      To sum up: I put a c-cedille character (entity code ‡) in my XML file and load it into TinyXML. When I then retrieve the value the character has changed into two characters. The second one is the actual c-cedille, but the first one is a totally different character. What's happening?

      My XML-file is UTF-8.

       
      • Leon Zandman

        Leon Zandman - 2005-09-14

        Can nobody answer my question?

         
        • Ellers

          Ellers - 2005-09-14

          I think everybody is expecting you to RTFM :)
          There are various posts on UTF-8, unicode etc.

          However, in the interests of world peace, I'll see what I can do...

          TinyXML doesn't support all entites; it does support &amp, and maybe some others though; I'm not 100% sure.

          TinyXML, being tiny, is by design not intended to support every encoding. There are other libraries for that.

          But it definitely works with UTF-8, and thats all you need. (Technically you might be able to make it worth with 'extended ascii'==ISO8859-1 but I haven't tried that)

          The question is: is your document truly saved as UTF-8?

          I made a little demo xml file and put it here:
          http://ellerton.net/software/french.xml

          It views correctly in IE and it loads fine with tinyxml.

          When I run it through my dump.cpp program it displays on screen:

          Load: 1
          Document
          + Declaration
          + Element "utf"
          --+ Text: [Fran+ais]
           
          Tip: With encoding issues, only trust a hexdump of the memory.
          In the debugger it shows:

          cstring[4]    0xc3 ''
          cstring[5]    0xa7 ''

          which I'm pretty sure is the correct UTF-8 encoding of the  letter.

          HTH
          Ellers

           
          • Leon Zandman

            Leon Zandman - 2005-09-20

            >  I think everybody is expecting you to RTFM :)

            Oops! I did search this forum, but didn't find anything useful. I knew that TinyXML didn't support all those HTML character entities (I looked in the source code). But I also can't get it to work using HEX codes.

            > The question is: is your document truly saved
            > as UTF-8?

            Yes, it is. The only difference is that I use DOS-format, whereas your file is UNIX-format. But that's shouldn't be a problem for TinyXML.

            Apparently TinyXML works fine, but I have to change something in my main application. I use TinyXML to read XML language files that contain the various language text resources of my application. I've created a French XML language resource file, but when I use it my application shows weird text.

            To isolate the problem I created a small console application that reads the French XML language file and prints it's contents to the console (a normal Windows 2000 command-prompt window). But I noticed the text was again displayed wrongly. For instance, when my XML file contains the code © (which should be a copyright character) my console windows displays a circle with a "R" character in the middle. To get a copyright character in my console window I have to use ¸ So I concluded something must be wrong with TinyXML.

            Apparently this conclusion is wrong. TinyXML returns the right text, but my application is somehow not able to display it properly. Maybe I should do something extra to enable UTF-8 display?

             
            • Yves Berquin

              Yves Berquin - 2005-09-20

              Leon,

              You have to use a function to convert UTF8 to ISO-8859-1, the character set used if you have a French Windows.
              Another way would be to specify a ISO-8859-1 character set in your XML header.

              TinyXML does not contain such a conversion library. You should use something like the ICONV library to make the conversion.

              Yves

               

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.