International characters

Bob Sparks
  • Bob Sparks

    Bob Sparks - 2008-09-09

    Is there a way to tell the parser to handle international characters.

    If I parse this into a dom..

    <td valign="top" class="tdbgsr"><b>Dur&#233;e du contrat :</b></td>

    And then get it out with this from the org.w3c.dom.Document methods...

    if (curNode.getNodeType() == Node.TEXT_NODE && curNode.getNodeValue() != null && curNode.getNodeValue().trim().length() >0) {
        String xx = curNode.getNodeValue().trim();

    I get

       e du contrat :

    Indicating that it split the text into two nodes and dropped the accented "é"
    which was encoded "&#233;".

    I got around this by replacing the "&#233;" with "é" but this seems hokey.



  • Hoang Long Nguyen

    have you got a solution for it yet? I got the same issue and i haven't found a workaround. I tried with

    Document d  =  _parser.parse( myString.getBytes(), "utf-8" );  

    but it seems that character encoding 8 doesn't work.

    Let me know if you have got your solution.


    -Hoang Long


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks