Re: [Htmlparser-developer] Tags
Brought to you by:
derrickoswald
|
From: Elliot H. <ell...@gm...> - 2010-09-10 17:53:34
|
I don't know exactly what you mean by "analyzes." But I think the answer to
your question is all of them.
Here is an example that might help you get started. You'll want to make sure
you understand the various interfaces provided in the API (ie: Node,
NodeFilter, etc...).
import org.htmlparser.Parser;
import org.htmlparser.filters.NodeClassFilter;
import org.htmlparser.lexer.Lexer;
import org.htmlparser.lexer.Page;
import org.htmlparser.tags.Html;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
public class Example {
public static void main(String... params) {
// Parser parser = getParser(getHtml(), "UTF-8");
Parser parser = getParser(getHtml());
try {
NodeList list = parser.extractAllNodesThatMatch(new
NodeClassFilter(Html.class));
for(int i = 0; i < list.size(); i++) {
Html html = (Html) list.elementAt(i);
System.out.println(html.toString());
}
} catch(ParserException e) {
e.printStackTrace();
}
}
private static Parser getParser(String html, String charset) {
return new Parser(new Lexer(new Page(html, charset)));
}
private static Parser getParser(String html) {
Parser parser = new Parser();
try {
parser.setInputHTML(html);
} catch(ParserException e) {
e.printStackTrace();
}
return parser;
}
private static String getHtml() {
return new StringBuilder()
.append("\n<html>")
.append("\n\t<head>")
.append("\n\t\t<title>Html Parser Example</title>")
.append("\n\t</head>")
.append("\n\t<body>")
.append("\n\t\t<p>Hello <span>World</span>!</p>")
.append("\n\t\t<thisIsAMadeUpTag name=\"don't try this at
home!\">but html parser still understands it</thisIsAMadeUpTag>")
.append("\n\t</body>")
.append("\n</html>")
.toString();
}
}
On Fri, Sep 10, 2010 at 4:27 AM, Enrique Estelles <kik...@gm...>wrote:
> Hello,
>
> can anybody tell me which html tags HtmlParser analyzes in order to extract
> text from a web page???
>
> Thank you!!!
>
>
> ------------------------------------------------------------------------------
> Automate Storage Tiering Simply
> Optimize IT performance and efficiency through flexible, powerful,
> automated storage tiering capabilities. View this brief to learn how
> you can reduce costs and improve performance.
> http://p.sf.net/sfu/dell-sfdev2dev
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>
--
Elliot
|