the URL is http://www.time.com/time/election2004
I would like to get the story title, abstract and the full story text throuh the story's title URL
for example:
story title: Who Stretches the Truth?
story abstract: TIME lays out the facts behind what both candidates said during the first match-up
Story titleURL: http://www.time.com/time/election2004/article/0,18471,709071,00.html
As I am get touched with htmlParser, could please tell me how to achieve that? Thanks a lot
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The information you want is in the meta tags in the HTML header:
<meta name="HEAD" content="Who Stretches the Truth?">
<meta name="DESCRIPTION" content="TIME lays out the facts behind what both candidates said during the first match-up">
A filter extracting these two would be:
NodeFilter filter =
new OrFilter ( // get both nodes matching either of the following:
new AndFilter ( // a node must be or have both the following:
new TagNameFilter ("META"), // the name "META"
new HasAttributeFilter ("name", "HEAD")), // an attribute "name" with the value "HEAD"
new AndFilter (
new TagNameFilter ("META"),
new HasAttributeFilter ("name", "DESCRIPTION")));
The NodeList you get from Parser.extractAllNodesThatMatch (filter) would contain these two nodes.
You could then get the text from them with:
String title = ((Tag)list.elementAt (0)).getAttribute ("content");
String abstract = ((Tag)list.elementAt (1)).getAttribute ("content");
Derrick
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the URL is http://www.time.com/time/election2004
I would like to get the story title, abstract and the full story text throuh the story's title URL
for example:
story title: Who Stretches the Truth?
story abstract: TIME lays out the facts behind what both candidates said during the first match-up
Story titleURL: http://www.time.com/time/election2004/article/0,18471,709071,00.html
As I am get touched with htmlParser, could please tell me how to achieve that? Thanks a lot
The information you want is in the meta tags in the HTML header:
<meta name="HEAD" content="Who Stretches the Truth?">
<meta name="DESCRIPTION" content="TIME lays out the facts behind what both candidates said during the first match-up">
A filter extracting these two would be:
NodeFilter filter =
new OrFilter ( // get both nodes matching either of the following:
new AndFilter ( // a node must be or have both the following:
new TagNameFilter ("META"), // the name "META"
new HasAttributeFilter ("name", "HEAD")), // an attribute "name" with the value "HEAD"
new AndFilter (
new TagNameFilter ("META"),
new HasAttributeFilter ("name", "DESCRIPTION")));
The NodeList you get from Parser.extractAllNodesThatMatch (filter) would contain these two nodes.
You could then get the text from them with:
String title = ((Tag)list.elementAt (0)).getAttribute ("content");
String abstract = ((Tag)list.elementAt (1)).getAttribute ("content");
Derrick
thank you very much! It works!