Menu

Have problem with setEncoding ? Pls., Helps

Help
2005-07-07
2013-04-27
  • TheShowMustGoOn

    TheShowMustGoOn - 2005-07-07

    my code
    Parser parser = new Parser ("http://www.google.co.th");
    parser.setEncoding("tis-620");

    TextExtractingVisitor visitor = new TextExtractingVisitor ();       
    parser.visitAllNodesWith (visitor);
    System.out.println (visitor.getExtractedText());

    this  websit present thai language
    in this site use charset=windows-874
    but not support in java

    in error code after run is

    unable to determine cannonical charset name for windows-874 - using ISO-8859-1

     
    • Derrick Oswald

      Derrick Oswald - 2005-07-08

      This has been a problem in the past.  The encoding/charset for the page is not supported or unknown in Java.  The solution that probably will be adopted is to provide a static accessor pair on the Page class to set/get the default charset so the fallback character set is the correct one.  You can code this yourself...

      In the Page class, add a static class variable, initialized to the original default:

          static String mDefaultCharset = DEFAULT_CHARSET;

      Create accessor methods to get and set it:

      static void setDefaultCharset (String charset) { mDefaultCharset = charset; }
      static String getDefaultCharset () { return (mDefaultCharset); }

      Then use this accessor in the getCharset method (line 259?):

          ret = getDefaultCharset (); // was DEFAULT_CHARSET

      Rebuild the htmlparser.jar (ant task: jar & other building instructions).

      Then, set up the default in your program:

          // parser.setEncoding("tis-620");
          Page.setDefaultCharset ("tis-620");

      The error message will still be generated, but now it should say:

          unable to determine cannonical charset name for windows-874 - using tis-620

      Of course you need to use the correct cannonical name for the character set you want, which may not be "tis-620".

      If this works for you, let us know so it can be incorporated as a permanent solution.

       
    • TheShowMustGoOn

      TheShowMustGoOn - 2005-07-12

      Thanks for your answer.
      It's work, but when I replace url with my html file in local drive. It doesn't respond anything.

      Could you give me another suggest.

       
    • Derrick Oswald

      Derrick Oswald - 2005-07-12

      If there isn't an error saying it couldn't open the file, it's found the file and is processing it.
      If so there are nodes being returned.
      If there isn't any output I would check the logic of your visitor, assuming you are using one like in your original post.
      Use a debugger and break on visitTag() or visitStringNode() as appropriate.

       
    • TheShowMustGoOn

      TheShowMustGoOn - 2005-07-14

      Thanks very much for your help

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.