I'm using TinyXMl in my C++ program. I'd like it to be able to load file in UTF-8 so people can put their own alphabet in the xml. The program reads the xml, then execute programs it contain. Example:
If the xml is in UTF-8 and I put the "" character somewhere in the text, it gives me two characters: "~A" and "(c)" (copyright)... If the xml is in ANSI or ISO 8859-2 (Latin-2), then the corect character shows up...
What should I do to be able to support those characters? What should I include in the C++ program??
Thank you very much.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
LoadFile loads and parses a file from disk. Parse parses a string from memory. I'm assuming you have a LoadFile in there else you would have bigger problems than encoding. :)
I suspect TinyXml is assuming your input is Legacy (a latin-1 variant on a western machine) instead of UTF-8. Either fix will work:
1) LoadFile( TIXML_ENCODING_UTF8 )
or
2) add the following to your XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
That should fix it - tell me if it does!
lee
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was already using <?xml version="1.0" encoding="UTF-8" standalone="yes"?> and it doesn't seems to change anything. TinyXML seems to detect the file encoding directly, without checking this tag.
I also tryed this:
TiXmlDocument runs(configFile);
if (!runs.LoadFile(TIXML_ENCODING_UTF8)) {
...
}
But it doesn't change anything...
Thank you though...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just to wrap up this thread, I looked at the example. TinyXml is working fine, but it is a rendering problem.
The tricky part to remember is that char*, in windows, is an ISO-8859 encoding. Using TinyXml in UTF-8 mode means it will hand char* strings back as UTF-8. Handing UTF-8 strings to an OS in ISO-8859 will result in garbage display.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-09-14
What is the solution to this problem? I have the exact same problem. When reading back Dutch characters from my XML I am getting garbage char's here and there. When I debug TinyXML, it is returning incorrect char's so it is not the renderer that is outputting wrong.
If I load up the XML in notepad and save with ANSI encoding the output is correct which shows that the XML source is correct. Do i need to set the codepage up in the Win32 API?
This is currently a HUGE problem for me as I am using several files with differant language texts.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually, if you see the characters correctly in ANSI it means you are (probably) running latin-1 encoded text through TinyXml in UTF-8 mode.
You have 3 options:
1) Add encoding="ISO-8859" to the XML declaration, or
2) Run TinyXml in Legacy mode. See the section "UTF-8" in the documentation.
3) Switch to using UTF-8 encoded text.
Hope that helps!
lee
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm using TinyXMl in my C++ program. I'd like it to be able to load file in UTF-8 so people can put their own alphabet in the xml. The program reads the xml, then execute programs it contain. Example:
***CODE:***
<mt39>
<item>
<program>path\to\program\to\run</program>
<arguments>arguments</arguments>
<hide>0</hide>
<wait>0</wait>
</item>
***END CODE***
If the xml is in UTF-8 and I put the "" character somewhere in the text, it gives me two characters: "~A" and "(c)" (copyright)... If the xml is in ANSI or ISO 8859-2 (Latin-2), then the corect character shows up...
What should I do to be able to support those characters? What should I include in the C++ program??
Thank you very much.
That is fully supported in the 2.3.1 beta. You either have a 1) bug, 2) older version, 3) an encoding issue.
What version are you using? How are you invoking the parser? (Load? Parse? etc.)
thanks,
lee
I am using v2.3.1 (downloaded sunday july 18...)
Here is what I use:
***CODE***
TiXmlDocument runs(configFile);
...
XML_Version = string(runs.FirstChild("mt39")->FirstChild("configuration")->FirstChild("version")->FirstChild()->ToText()->Value());
...
Arguments = new string(element->FirstChild("arguments")->FirstChild()->ToText()->Value())
***END CODE***
What should "Load" and "Parse" do?
Thank you very much
LoadFile loads and parses a file from disk. Parse parses a string from memory. I'm assuming you have a LoadFile in there else you would have bigger problems than encoding. :)
I suspect TinyXml is assuming your input is Legacy (a latin-1 variant on a western machine) instead of UTF-8. Either fix will work:
1) LoadFile( TIXML_ENCODING_UTF8 )
or
2) add the following to your XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
That should fix it - tell me if it does!
lee
Thanks lee for your response.
I was already using <?xml version="1.0" encoding="UTF-8" standalone="yes"?> and it doesn't seems to change anything. TinyXML seems to detect the file encoding directly, without checking this tag.
I also tryed this:
TiXmlDocument runs(configFile);
if (!runs.LoadFile(TIXML_ENCODING_UTF8)) {
...
}
But it doesn't change anything...
Thank you though...
*Please post a bug*, with the XML file you are using in a zip file. (It's very important to not "inline" the XML into the web page.)
Thanks much,
lee
Just to wrap up this thread, I looked at the example. TinyXml is working fine, but it is a rendering problem.
The tricky part to remember is that char*, in windows, is an ISO-8859 encoding. Using TinyXml in UTF-8 mode means it will hand char* strings back as UTF-8. Handing UTF-8 strings to an OS in ISO-8859 will result in garbage display.
What is the solution to this problem? I have the exact same problem. When reading back Dutch characters from my XML I am getting garbage char's here and there. When I debug TinyXML, it is returning incorrect char's so it is not the renderer that is outputting wrong.
If I load up the XML in notepad and save with ANSI encoding the output is correct which shows that the XML source is correct. Do i need to set the codepage up in the Win32 API?
This is currently a HUGE problem for me as I am using several files with differant language texts.
Actually, if you see the characters correctly in ANSI it means you are (probably) running latin-1 encoded text through TinyXml in UTF-8 mode.
You have 3 options:
1) Add encoding="ISO-8859" to the XML declaration, or
2) Run TinyXml in Legacy mode. See the section "UTF-8" in the documentation.
3) Switch to using UTF-8 encoded text.
Hope that helps!
lee