Menu

TinyXML, special characters and C++

Big Biggie
2004-07-23
2004-09-15
  • Big Biggie

    Big Biggie - 2004-07-23

    Hi,

    I'm using TinyXMl in my C++ program. I'd like it to be able to load file in UTF-8 so people can put their own alphabet in the xml. The program reads the xml, then execute programs it contain. Example:

    ***CODE:***
    <mt39>
        <item>
        <program>path\to\program\to\run</program>
           <arguments>arguments</arguments>
           <hide>0</hide>
           <wait>0</wait>
       </item>
    ***END CODE***

    If the xml is in UTF-8 and I put the "" character somewhere in the text, it gives me two characters: "~A" and "(c)" (copyright)... If the xml is in ANSI or ISO 8859-2 (Latin-2), then the corect character shows up...

    What should I do to be able to support those characters? What should I include in the C++ program??

    Thank you very much.

     
    • Lee Thomason

      Lee Thomason - 2004-07-23

      That is fully supported in the 2.3.1 beta. You either have a 1) bug, 2) older version, 3) an encoding issue.

      What version are you using? How are you invoking the parser? (Load? Parse? etc.)

      thanks,
      lee

       
      • Big Biggie

        Big Biggie - 2004-07-23

        I am using v2.3.1 (downloaded sunday july 18...)

        Here is what I use:

        ***CODE***
        TiXmlDocument runs(configFile);
        ...
        XML_Version = string(runs.FirstChild("mt39")->FirstChild("configuration")->FirstChild("version")->FirstChild()->ToText()->Value());
        ...
        Arguments = new string(element->FirstChild("arguments")->FirstChild()->ToText()->Value())
        ***END CODE***

        What should "Load" and "Parse" do?

        Thank you very much

         
    • Lee Thomason

      Lee Thomason - 2004-07-23

      LoadFile loads and parses a file from disk. Parse parses a string from memory. I'm assuming you have a LoadFile in there else you would have bigger problems than encoding. :)

      I suspect TinyXml is assuming your input is Legacy (a latin-1 variant on a western machine) instead of UTF-8. Either fix will work:

      1) LoadFile( TIXML_ENCODING_UTF8 )
      or
      2) add the following to your XML:
      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

      That should fix it - tell me if it does!
      lee

       
      • Big Biggie

        Big Biggie - 2004-07-24

        Thanks lee for your response.

        I was already using <?xml version="1.0" encoding="UTF-8" standalone="yes"?> and it doesn't seems to change anything. TinyXML seems to detect the file encoding directly, without checking this tag.

        I also tryed this:
        TiXmlDocument runs(configFile);
        if (!runs.LoadFile(TIXML_ENCODING_UTF8)) {
        ...
        }
        But it doesn't change anything...

        Thank you though...

         
    • Lee Thomason

      Lee Thomason - 2004-07-24

      *Please post a bug*, with the XML file you are using in a zip file. (It's very important to not "inline" the XML into the web page.)

      Thanks much,
      lee

       
    • Lee Thomason

      Lee Thomason - 2004-07-25

      Just to wrap up this thread, I looked at the example. TinyXml is working fine, but it is a rendering problem.

      The tricky part to remember is that char*, in windows, is an ISO-8859 encoding. Using TinyXml in UTF-8 mode means it will hand char* strings back as UTF-8. Handing UTF-8 strings to an OS in ISO-8859 will result in garbage display.

       
    • Anonymous

      Anonymous - 2004-09-14

      What is the solution to this problem? I have the exact same problem. When reading back Dutch characters from my XML I am getting garbage char's here and there. When I debug TinyXML, it is returning incorrect char's so it is not the renderer that is outputting wrong.

      If I load up the XML in notepad and save with ANSI encoding the output is correct which shows that the XML source is correct. Do i need to set the codepage up in the Win32 API?

      This is currently a HUGE problem for me as I am using several files with differant language texts.

       
    • Lee Thomason

      Lee Thomason - 2004-09-15

      Actually, if you see the characters correctly in ANSI it means you are (probably) running latin-1 encoded text through TinyXml in UTF-8 mode.

      You have 3 options:
      1) Add encoding="ISO-8859" to the XML declaration, or
      2) Run TinyXml in Legacy mode. See the section "UTF-8" in the documentation.
      3) Switch to using UTF-8 encoded text.

      Hope that helps!
      lee

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.