Re: [Htmlparser-developer] Tags
Brought to you by:
derrickoswald
From: Elliot H. <ell...@gm...> - 2010-09-10 17:53:34
|
I don't know exactly what you mean by "analyzes." But I think the answer to your question is all of them. Here is an example that might help you get started. You'll want to make sure you understand the various interfaces provided in the API (ie: Node, NodeFilter, etc...). import org.htmlparser.Parser; import org.htmlparser.filters.NodeClassFilter; import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; import org.htmlparser.tags.Html; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; public class Example { public static void main(String... params) { // Parser parser = getParser(getHtml(), "UTF-8"); Parser parser = getParser(getHtml()); try { NodeList list = parser.extractAllNodesThatMatch(new NodeClassFilter(Html.class)); for(int i = 0; i < list.size(); i++) { Html html = (Html) list.elementAt(i); System.out.println(html.toString()); } } catch(ParserException e) { e.printStackTrace(); } } private static Parser getParser(String html, String charset) { return new Parser(new Lexer(new Page(html, charset))); } private static Parser getParser(String html) { Parser parser = new Parser(); try { parser.setInputHTML(html); } catch(ParserException e) { e.printStackTrace(); } return parser; } private static String getHtml() { return new StringBuilder() .append("\n<html>") .append("\n\t<head>") .append("\n\t\t<title>Html Parser Example</title>") .append("\n\t</head>") .append("\n\t<body>") .append("\n\t\t<p>Hello <span>World</span>!</p>") .append("\n\t\t<thisIsAMadeUpTag name=\"don't try this at home!\">but html parser still understands it</thisIsAMadeUpTag>") .append("\n\t</body>") .append("\n</html>") .toString(); } } On Fri, Sep 10, 2010 at 4:27 AM, Enrique Estelles <kik...@gm...>wrote: > Hello, > > can anybody tell me which html tags HtmlParser analyzes in order to extract > text from a web page??? > > Thank you!!! > > > ------------------------------------------------------------------------------ > Automate Storage Tiering Simply > Optimize IT performance and efficiency through flexible, powerful, > automated storage tiering capabilities. View this brief to learn how > you can reduce costs and improve performance. > http://p.sf.net/sfu/dell-sfdev2dev > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > -- Elliot |