Menu

Big5 encoding problem

Help
2005-04-15
2013-04-27
  • mikeliu1976

    mikeliu1976 - 2005-04-15

    regards:

    I use Current Version (1.5) to parse the web location "http://www.yzu.edu.tw/"
    and output to test2.txt.

    "javac ToHtmlDemoBig5" is ok.
    when 
    "java ToHtmlDemoBig5"

    monitor says:
    unable to determine cannonical charset name for big5 - using ISO-8859-1

    Do I miss something important? 
    thank you
    Followings is my code.
    ---------------------------------------------------------------------------
    import org.htmlparser.Parser;
    import org.htmlparser.util.NodeIterator;
    import org.htmlparser.util.ParserException;
    import java.io.*; 

    public class ToHtmlDemoBig5
    {
    public static void main (String[] args) throws ParserException
    {
    // PrintWriter out= new PrintWriter(); 
    try{ 

    FileOutputStream htmlparserOutput = new FileOutputStream("test2.txt"); 
    PrintWriter out=null;
    out = new PrintWriter(htmlparserOutput); 
    Parser parser = new Parser ("http://www.yzu.edu.tw/");
    parser.setEncoding("Big5"); 
    //StringBuffer html = new StringBuffer (4096);
    for (NodeIterator i = parser.elements();i.hasMoreNodes();){
    // out.print (list.elementAt(i).toHtml ()); 
    // html.append 
    out.print(i.nextNode().toHtml());}

    // System.out.println (html);

    out.close();
    htmlparserOutput.close();
    }
    catch(IOException e)
    {
    }
    // test2.close();

    }
    }      
    ---------------------------------------------------------------------------

    Output is as following.Output cannot show Chinese words correctly.
    ---------------------------------------------------------------------------
    <omitted>
    <p>
    <a href="/arts/movie.htm?PHPSESSID=4b9482f30369aa1ee51c22f5de0d1412">???????</a><br>
    <a href="/life/channel.htm?PHPSESSID=4b9482f30369aa1ee51c22f5de0d1412">??????T</a><br>
    <a href="http://www.yzu.edu.tw/yzu/aa/wec/" target="_blank">?^?y????</a>
    <omitted>
    ---------------------------------------------------------------------------
    Any positive suggestion is welcome.
    thank you
    May goodness be with you all

     
    • Derrick Oswald

      Derrick Oswald - 2005-04-15

      It seems java.nio.charset.Charset.forName ("big5") has failed.
      Try using "Big5" (with the capitalization).

       
    • mikeliu1976

      mikeliu1976 - 2005-04-17

      regards: 

      I change my jdk edition from 1.4.0 to 1.4.2_08.
      It is OK.

      ---------------------------------------------------------------------------
      monitor "NO" says: 
      unable to determine cannonical charset name for big5 - using ISO-8859-1
      ---------------------------------------------------------------------------

      thank you for your patient reply and instructions. 
      May goodness be with you all

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.