Menu

pls tell me how to parse this web page

Help
SpencerWHJ
2004-10-06
2013-04-27
  • SpencerWHJ

    SpencerWHJ - 2004-10-06

    the URL is http://www.time.com/time/election2004
    I would like to get the story title, abstract and the full story text throuh the story's title URL
    for example:
    story title: Who Stretches the Truth?
    story abstract: TIME lays out the facts behind what both candidates said during the first match-up
    Story titleURL: http://www.time.com/time/election2004/article/0,18471,709071,00.html

    As I am get touched with htmlParser, could please tell me how to achieve that? Thanks a lot

     
    • Derrick Oswald

      Derrick Oswald - 2004-10-09

      The information you want is in the meta tags in the HTML header:

      <meta name="HEAD" content="Who Stretches the Truth?">
      <meta name="DESCRIPTION" content="TIME lays out the facts behind what both candidates said during the first match-up">

      A filter extracting these two would be:

      NodeFilter filter =
         new OrFilter ( // get both nodes matching either of the following:
             new AndFilter ( // a node must be or have both the following:
                new TagNameFilter ("META"), // the name "META"
                new HasAttributeFilter ("name", "HEAD")), // an attribute "name" with the value "HEAD"
             new AndFilter (
                new TagNameFilter ("META"),
                new HasAttributeFilter ("name", "DESCRIPTION")));

      The NodeList you get from Parser.extractAllNodesThatMatch (filter) would contain these two nodes.
      You could then get the text from them with:
         String title = ((Tag)list.elementAt (0)).getAttribute ("content");
         String abstract = ((Tag)list.elementAt (1)).getAttribute ("content");

      Derrick

       
    • SpencerWHJ

      SpencerWHJ - 2004-10-11

      thank you very much! It works!

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.