HTML Parser / Discussion / Help: pls tell me how to parse this web page

SpencerWHJ - 2004-10-06

the URL is http://www.time.com/time/election2004
I would like to get the story title, abstract and the full story text throuh the story's title URL
for example:
story title: Who Stretches the Truth?
story abstract: TIME lays out the facts behind what both candidates said during the first match-up
Story titleURL: http://www.time.com/time/election2004/article/0,18471,709071,00.html

As I am get touched with htmlParser, could please tell me how to achieve that? Thanks a lot

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2004-10-09
  
  The information you want is in the meta tags in the HTML header:
  
  <meta name="HEAD" content="Who Stretches the Truth?">
  <meta name="DESCRIPTION" content="TIME lays out the facts behind what both candidates said during the first match-up">
  
  A filter extracting these two would be:
  
  NodeFilter filter =
     new OrFilter ( // get both nodes matching either of the following:
         new AndFilter ( // a node must be or have both the following:
            new TagNameFilter ("META"), // the name "META"
            new HasAttributeFilter ("name", "HEAD")), // an attribute "name" with the value "HEAD"
         new AndFilter (
            new TagNameFilter ("META"),
            new HasAttributeFilter ("name", "DESCRIPTION")));
  
  The NodeList you get from Parser.extractAllNodesThatMatch (filter) would contain these two nodes.
  You could then get the text from them with:
     String title = ((Tag)list.elementAt (0)).getAttribute ("content");
     String abstract = ((Tag)list.elementAt (1)).getAttribute ("content");
  
  Derrick
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- SpencerWHJ - 2004-10-11
  
  thank you very much! It works!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pls tell me how to parse this web page

Forums

Help

pls tell me how to parse this web page document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

pls tell me how to parse this web page