I use Current Version (1.5) to parse the web location "http://www.yzu.edu.tw/"
and output to test2.txt.
"javac ToHtmlDemoBig5" is ok.
when
"java ToHtmlDemoBig5"
monitor says:
unable to determine cannonical charset name for big5 - using ISO-8859-1
Do I miss something important?
thank you
Followings is my code.
---------------------------------------------------------------------------
import org.htmlparser.Parser;
import org.htmlparser.util.NodeIterator;
import org.htmlparser.util.ParserException;
import java.io.*;
public class ToHtmlDemoBig5
{
public static void main (String[] args) throws ParserException
{
// PrintWriter out= new PrintWriter();
try{
FileOutputStream htmlparserOutput = new FileOutputStream("test2.txt");
PrintWriter out=null;
out = new PrintWriter(htmlparserOutput);
Parser parser = new Parser ("http://www.yzu.edu.tw/");
parser.setEncoding("Big5");
//StringBuffer html = new StringBuffer (4096);
for (NodeIterator i = parser.elements();i.hasMoreNodes();){
// out.print (list.elementAt(i).toHtml ());
// html.append
out.print(i.nextNode().toHtml());}
Output is as following.Output cannot show Chinese words correctly.
---------------------------------------------------------------------------
<omitted>
<p>
<a href="/arts/movie.htm?PHPSESSID=4b9482f30369aa1ee51c22f5de0d1412">???????</a><br>
<a href="/life/channel.htm?PHPSESSID=4b9482f30369aa1ee51c22f5de0d1412">??????T</a><br>
<a href="http://www.yzu.edu.tw/yzu/aa/wec/" target="_blank">?^?y????</a>
<omitted>
---------------------------------------------------------------------------
Any positive suggestion is welcome.
thank you
May goodness be with you all
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I change my jdk edition from 1.4.0 to 1.4.2_08.
It is OK.
---------------------------------------------------------------------------
monitor "NO" says:
unable to determine cannonical charset name for big5 - using ISO-8859-1
---------------------------------------------------------------------------
thank you for your patient reply and instructions.
May goodness be with you all
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
regards:
I use Current Version (1.5) to parse the web location "http://www.yzu.edu.tw/"
and output to test2.txt.
"javac ToHtmlDemoBig5" is ok.
when
"java ToHtmlDemoBig5"
monitor says:
unable to determine cannonical charset name for big5 - using ISO-8859-1
Do I miss something important?
thank you
Followings is my code.
---------------------------------------------------------------------------
import org.htmlparser.Parser;
import org.htmlparser.util.NodeIterator;
import org.htmlparser.util.ParserException;
import java.io.*;
public class ToHtmlDemoBig5
{
public static void main (String[] args) throws ParserException
{
// PrintWriter out= new PrintWriter();
try{
FileOutputStream htmlparserOutput = new FileOutputStream("test2.txt");
PrintWriter out=null;
out = new PrintWriter(htmlparserOutput);
Parser parser = new Parser ("http://www.yzu.edu.tw/");
parser.setEncoding("Big5");
//StringBuffer html = new StringBuffer (4096);
for (NodeIterator i = parser.elements();i.hasMoreNodes();){
// out.print (list.elementAt(i).toHtml ());
// html.append
out.print(i.nextNode().toHtml());}
// System.out.println (html);
out.close();
htmlparserOutput.close();
}
catch(IOException e)
{
}
// test2.close();
}
}
---------------------------------------------------------------------------
Output is as following.Output cannot show Chinese words correctly.
---------------------------------------------------------------------------
<omitted>
<p>
<a href="/arts/movie.htm?PHPSESSID=4b9482f30369aa1ee51c22f5de0d1412">???????</a><br>
<a href="/life/channel.htm?PHPSESSID=4b9482f30369aa1ee51c22f5de0d1412">??????T</a><br>
<a href="http://www.yzu.edu.tw/yzu/aa/wec/" target="_blank">?^?y????</a>
<omitted>
---------------------------------------------------------------------------
Any positive suggestion is welcome.
thank you
May goodness be with you all
It seems java.nio.charset.Charset.forName ("big5") has failed.
Try using "Big5" (with the capitalization).
regards:
I change my jdk edition from 1.4.0 to 1.4.2_08.
It is OK.
---------------------------------------------------------------------------
monitor "NO" says:
unable to determine cannonical charset name for big5 - using ISO-8859-1
---------------------------------------------------------------------------
thank you for your patient reply and instructions.
May goodness be with you all