[Htmlparser-user] A strange question?
Brought to you by:
derrickoswald
From: hpq852 <hp...@gm...> - 2006-08-08 16:19:16
|
Hi All, I encountered a very strange question. My code is very simple as following: public void doTest() throws Exception { URL url = new URL("http://www.uume.com/play_CPRz8a2si4zK"); InputStream in = url.openStream(); BufferedReader br = new BufferedReader(new InputStreamReader(in, "GB2312")); String line = null; StringBuffer sb = new StringBuffer(); while ((line = br.readLine()) != null) { sb.append(line); sb.append("\n"); } extractText2(sb.toString()); } public String extractText2(String inputHtml) throws Exception { Parser parser = Parser.createParser(new String(inputHtml.getBytes(),"GB2312"), "GB2312"); HtmlUtils.registerTags(parser); NodeFilter tagNameFilter = new TagNameFilter("div"); NodeList nodeList = parser.extractAllNodesThatMatch(tagNameFilter); System.out.println(nodeList.toHtml()); return null; } I just want to get all of div tags, so I used a TagNameFilter, but the result I got in the console is strange, it includes many repeated div tags with same content. I have tried for many times, but what I got was the same, I really don't know what't the reason. Could you help me please? Thanks and Best Regards Jesse |