Hi,
I tried parsing a UTF-8 encoded xml file using RSSReader.java given in the sourcefourge website and I get an exception which reads:
"com.ximpleware.ParseException: UTF 8 encoding error: should never happen"
The line due to which its failing is
<description>The Pentagon has cultivated “military analysts� in a campaign to generate favorable news coverage of the Bush administration’s wartime performance.</description>
As is obvious the problem is due to "â€".
I got the above utf-8 encoded xml as response from a website. When i checked the original content in the website I noticed ("") was replaced with (â€) due to utf-8 encoding.
The original content was:
<description>The Pentagon has cultivated “military analysts” in a campaign to generate favorable news coverage of the Bush administration’s wartime performance.</description>
Another example of the occurence of the above exception is while parsing the line:
<nick>ప�రవీణ�</nick>
In this case the original content was a regional language(Kannada(an Indian language) I think).
Any idea why this exception should occur? What is the fix for this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
that seems not a part of UTF-8 encoding...according to XML spec, if you don't declare the encoding, the default is uTF-8...
so you have to declare the encodig of XML to the right encoding (e.g iso-8859)
the problem should go away.. let me know it works or not...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I tried parsing a UTF-8 encoded xml file using RSSReader.java given in the sourcefourge website and I get an exception which reads:
"com.ximpleware.ParseException: UTF 8 encoding error: should never happen"
The line due to which its failing is
<description>The Pentagon has cultivated “military analysts� in a campaign to generate favorable news coverage of the Bush administration’s wartime performance.</description>
As is obvious the problem is due to "â€".
I got the above utf-8 encoded xml as response from a website. When i checked the original content in the website I noticed ("") was replaced with (â€) due to utf-8 encoding.
The original content was:
<description>The Pentagon has cultivated “military analysts” in a campaign to generate favorable news coverage of the Bush administration’s wartime performance.</description>
Another example of the occurence of the above exception is while parsing the line:
<nick>ప�రవీణ�</nick>
In this case the original content was a regional language(Kannada(an Indian language) I think).
Any idea why this exception should occur? What is the fix for this?
that seems not a part of UTF-8 encoding...according to XML spec, if you don't declare the encoding, the default is uTF-8...
so you have to declare the encodig of XML to the right encoding (e.g iso-8859)
the problem should go away.. let me know it works or not...
Please find my comments on the same in the bugs section.
I think you were right about the encoding not being proper. Thanks for the help. Really appreciate it.