[Htmlparser-user] A strange question?
Brought to you by:
derrickoswald
|
From: hpq852 <hp...@gm...> - 2006-08-08 16:19:16
|
Hi All, I encountered a very strange question. My code is very simple as following:
public void doTest() throws Exception
{
URL url = new URL("http://www.uume.com/play_CPRz8a2si4zK");
InputStream in = url.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(in, "GB2312"));
String line = null;
StringBuffer sb = new StringBuffer();
while ((line = br.readLine()) != null)
{
sb.append(line);
sb.append("\n");
}
extractText2(sb.toString());
}
public String extractText2(String inputHtml) throws Exception
{
Parser parser = Parser.createParser(new String(inputHtml.getBytes(),"GB2312"), "GB2312");
HtmlUtils.registerTags(parser);
NodeFilter tagNameFilter = new TagNameFilter("div");
NodeList nodeList = parser.extractAllNodesThatMatch(tagNameFilter);
System.out.println(nodeList.toHtml());
return null;
}
I just want to get all of div tags, so I used a TagNameFilter, but the result I got in the console is strange, it includes many repeated div tags with same content.
I have tried for many times, but what I got was the same, I really don't know what't the reason. Could you help me please?
Thanks and Best Regards
Jesse
|