Menu

Getting previous sibling tag (newbie)

Help
2005-09-13
2013-04-27
  • Shuvadeep Lahiri

    Hello all,
           I have got the required <tr> tag within first <table> of my page where the <tr> contains a particular regular expression. My program portion is like this :
         Lexer lex=new Lexer(alltableTags[0].toHtml());
        Parser  newparse=new Parser(lex);
        NodeList list = newparse.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("TR"),new HasChildFilter(new RegexFilter("\\d{1,}[.]\\d{0,2}"),true)));
     
       Now I want to extract all <tr> tags, within the table, preceding this particular <tr> tag, which is contained in the list ( or may be I want the <tr> tag just before the <tr> tag having the expression).
          Could anybody help me in it?

     
    • Derrick Oswald

      Derrick Oswald - 2005-09-13

      You should be able to climb the tree by using getParent() until you find the <table> tag.  From there you can search again with a filter that stops accepting after finding your original,
      or you can just hunt manually through the node list.

      public class BeforeFilter implements NodeFilter
      {
          Node mBookmark;
          boolean mTriggered;
          public BeforeFilter (Node node)
          {
              mBookmark = node;
              mTriggered = false;
          }
          public boolean accept (Node node)
          {
              boolean ret;

              ret = false;
              if (!mTriggered)
              {
                  if (node == mBookmark)
                      mTriggered = true;
                  else
                      ret = true;
              }

              return (ret);
      }

      NodeFilter filter = new AndFilter (
        new BeforeFilter (tag),
        new TagNameFilter ("TR"));
      while (!tag.getTagName().equals ("TABLE"))
          tag = tag.getParent ();
      NodeList priors = tag.getChildren ().extractAllNodesThatMatch (filter, true);

       
    • Shuvadeep Lahiri

      Thank's again for providing a wonderful solution. It works and lessens my headaches.
            But still I suppose there should be some in-built methods in HTMLParser so that we need not to worry about creating a new class for conveniently traverse precceding or next siblings tag of an already-reached tag.
                 Thank you.

       
      • Derrick Oswald

        Derrick Oswald - 2005-09-14

        So far, all the example filters are stateless. That is, the filter works the same way all the time, no matter what.

        Adding state, like the mTrigger member of the example I provided, means that you need to create a new one for the next parse, since the one you have just used is 'spent', like a bullet. And you have to create the whole filter tree again if it's part of a larger set.

        The alternative is to provide a reset() method, or a begin()/end() pair of methods to allow state manipulation. The task then is to have the filters propagate the begin/end signals to their children (think of the AndFilter needing to tell all it's predicates).  So instead of a lightweight filter satisfying one method, the interface is three methods that have some crucial functionality behind them. This may be ameliorated with a base class handling the message chaining, but at the cost of another class with some type of list 'predicates' member variable.

        These enhancements have been considered and the lightweight implementation was chosen for simplicity. It could be revisited if demand warrants. Enter it in the Request For Enhancement page if you want (http://sourceforge.net/tracker/?group_id=24399&atid=381402).

         
      • Ian Macfarlane

        Ian Macfarlane - 2005-10-26
         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.