For my project I need to able to read XML files containing French language text. When the XML-file contains the word "Franais", it parses the c-cedille (a "c" character with a little tail on the bottom) wrongly. I tried replacing the XML-code with ç and ! and even ‡ but that didn't work. When I print the values the following appears:
Fran┬ais
Does anybody know how this is solved?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Oops, SourceForge also doesn't know how to handle this stuff and prints it out wrong.
To sum up: I put a c-cedille character (entity code ‡) in my XML file and load it into TinyXML. When I then retrieve the value the character has changed into two characters. The second one is the actual c-cedille, but the first one is a totally different character. What's happening?
My XML-file is UTF-8.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think everybody is expecting you to RTFM :)
There are various posts on UTF-8, unicode etc.
However, in the interests of world peace, I'll see what I can do...
TinyXML doesn't support all entites; it does support &, and maybe some others though; I'm not 100% sure.
TinyXML, being tiny, is by design not intended to support every encoding. There are other libraries for that.
But it definitely works with UTF-8, and thats all you need. (Technically you might be able to make it worth with 'extended ascii'==ISO8859-1 but I haven't tried that)
The question is: is your document truly saved as UTF-8?
Oops! I did search this forum, but didn't find anything useful. I knew that TinyXML didn't support all those HTML character entities (I looked in the source code). But I also can't get it to work using HEX codes.
> The question is: is your document truly saved
> as UTF-8?
Yes, it is. The only difference is that I use DOS-format, whereas your file is UNIX-format. But that's shouldn't be a problem for TinyXML.
Apparently TinyXML works fine, but I have to change something in my main application. I use TinyXML to read XML language files that contain the various language text resources of my application. I've created a French XML language resource file, but when I use it my application shows weird text.
To isolate the problem I created a small console application that reads the French XML language file and prints it's contents to the console (a normal Windows 2000 command-prompt window). But I noticed the text was again displayed wrongly. For instance, when my XML file contains the code © (which should be a copyright character) my console windows displays a circle with a "R" character in the middle. To get a copyright character in my console window I have to use ¸ So I concluded something must be wrong with TinyXML.
Apparently this conclusion is wrong. TinyXML returns the right text, but my application is somehow not able to display it properly. Maybe I should do something extra to enable UTF-8 display?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You have to use a function to convert UTF8 to ISO-8859-1, the character set used if you have a French Windows.
Another way would be to specify a ISO-8859-1 character set in your XML header.
TinyXML does not contain such a conversion library. You should use something like the ICONV library to make the conversion.
Yves
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
For my project I need to able to read XML files containing French language text. When the XML-file contains the word "Franais", it parses the c-cedille (a "c" character with a little tail on the bottom) wrongly. I tried replacing the XML-code with ç and ! and even ‡ but that didn't work. When I print the values the following appears:
Fran┬ais
Does anybody know how this is solved?
Oops, SourceForge also doesn't know how to handle this stuff and prints it out wrong.
To sum up: I put a c-cedille character (entity code ‡) in my XML file and load it into TinyXML. When I then retrieve the value the character has changed into two characters. The second one is the actual c-cedille, but the first one is a totally different character. What's happening?
My XML-file is UTF-8.
Can nobody answer my question?
I think everybody is expecting you to RTFM :)
There are various posts on UTF-8, unicode etc.
However, in the interests of world peace, I'll see what I can do...
TinyXML doesn't support all entites; it does support &, and maybe some others though; I'm not 100% sure.
TinyXML, being tiny, is by design not intended to support every encoding. There are other libraries for that.
But it definitely works with UTF-8, and thats all you need. (Technically you might be able to make it worth with 'extended ascii'==ISO8859-1 but I haven't tried that)
The question is: is your document truly saved as UTF-8?
I made a little demo xml file and put it here:
http://ellerton.net/software/french.xml
It views correctly in IE and it loads fine with tinyxml.
When I run it through my dump.cpp program it displays on screen:
Load: 1
Document
+ Declaration
+ Element "utf"
--+ Text: [Fran+ais]
Tip: With encoding issues, only trust a hexdump of the memory.
In the debugger it shows:
cstring[4] 0xc3 ''
cstring[5] 0xa7 ''
which I'm pretty sure is the correct UTF-8 encoding of the letter.
HTH
Ellers
> I think everybody is expecting you to RTFM :)
Oops! I did search this forum, but didn't find anything useful. I knew that TinyXML didn't support all those HTML character entities (I looked in the source code). But I also can't get it to work using HEX codes.
> The question is: is your document truly saved
> as UTF-8?
Yes, it is. The only difference is that I use DOS-format, whereas your file is UNIX-format. But that's shouldn't be a problem for TinyXML.
Apparently TinyXML works fine, but I have to change something in my main application. I use TinyXML to read XML language files that contain the various language text resources of my application. I've created a French XML language resource file, but when I use it my application shows weird text.
To isolate the problem I created a small console application that reads the French XML language file and prints it's contents to the console (a normal Windows 2000 command-prompt window). But I noticed the text was again displayed wrongly. For instance, when my XML file contains the code © (which should be a copyright character) my console windows displays a circle with a "R" character in the middle. To get a copyright character in my console window I have to use ¸ So I concluded something must be wrong with TinyXML.
Apparently this conclusion is wrong. TinyXML returns the right text, but my application is somehow not able to display it properly. Maybe I should do something extra to enable UTF-8 display?
Leon,
You have to use a function to convert UTF8 to ISO-8859-1, the character set used if you have a French Windows.
Another way would be to specify a ISO-8859-1 character set in your XML header.
TinyXML does not contain such a conversion library. You should use something like the ICONV library to make the conversion.
Yves