Menu

Non-English characters not supported

2009-04-29
2013-05-15
  • Pallav Sipani

    Pallav Sipani - 2009-04-29

    Hi,
      I tried parsing a UTF-8 encoded xml file using RSSReader.java given in the sourcefourge website  and I get an exception which reads:
       "com.ximpleware.ParseException: UTF 8 encoding error: should never happen"

    The line due to which its failing is

    <description>The Pentagon has cultivated “military analysts� in a campaign to generate favorable news coverage of the Bush administration’s wartime performance.</description>

    As is obvious the problem is due to "â€". 

    I got the above utf-8 encoded xml as response from a website. When i checked the original content in the  website I noticed ("") was replaced with (â€) due to utf-8 encoding.

    The original content was:
    <description>The Pentagon has cultivated “military analysts” in a campaign to generate favorable news coverage of the Bush administration’s wartime performance.</description>

    Another example of the occurence of the above exception is while parsing the line:

    <nick>ప�రవీణ�</nick>

    In this case the original content was a regional language(Kannada(an Indian language) I think).

    Any idea why this exception should occur? What is the fix for this?

             

     
    • jimmy zhang

      jimmy zhang - 2009-04-29

      that seems not a part of UTF-8 encoding...according to XML spec, if you don't declare the encoding, the default is uTF-8...
      so you have to declare the encodig of XML to the right encoding (e.g iso-8859)
      the problem should go away.. let me know it works or not...

       
    • Pallav Sipani

      Pallav Sipani - 2009-04-30

      Please find my comments on the same in the bugs section.

       
    • Pallav Sipani

      Pallav Sipani - 2009-05-08

      I think you were right about the encoding not being proper. Thanks for the help. Really appreciate it.

       

Log in to post a comment.