Re: [Htmlparser-user] Hi All
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-09-30 10:17:28
|
Hi Drew, > I have a doubt...I am trying to extract only the text from the > html pages..But i just could nto get it..I have seen the > HTMLStringFilter.java..Bu t I could not add it to the existing > ones and run..bcoz in that the HTML parser has only one argument > passed whereas other even have the feedback...and also if it > shoudl work what feedback do we give...I mean a (T or i or s or l) > And i guess the jar file does not have code fro extracting the > text.. Sorry bout that - the web page hasnt been updated for a while. You will need to create a feedback object. If you dont need feedback from the parser, use the default one that we've provided in the com.kizna.html.util package. Try this : * Below is some sample code to parse Yahoo.com and print only the text information. This scanning * will run faster, as there are no scanners registered here. HTMLParser parser = new HTMLParser("http://www.yahoo.com",new DefaultHTMLParserFeedback()); // In this example, none of the scanners need to be registered // as a string node is not a tag to be scanned for. for (Enumeration e = parser.elements();e.hasMoreElements();) { HTMLNode node = (HTMLNode)e.nextElement(); if (node instanceof HTMLStringNode) { HTMLStringNode stringNode = (HTMLStringNode)node; System.out.println(stringNode.getText()); } } Let us know if you still face problems. Regards, Somik |