Menu

extracting stuff before/after a filter match

Help
zerodrift
2006-02-03
2013-04-27
  • zerodrift

    zerodrift - 2006-02-03

    I'd like to match items before or after a specific filter match, e.g., the next three lines after a StringFilter match. 

    So, the HTML code is:

    <P><CENTER>Loads are calculated from raw telemetry data and are approximate.</CENTER>
    <CENTER>The displayed values are NOT official PJM Loads.</CENTER>

    <BR><BR><BR>

    <P><CENTER><H2>Current PJM Transmission Limits</H2></CENTER>
    <P>Contingency WYLIERID500 KV  WYLIERID TRAN  5  (IROL) 
    <P>Monitor WYLIERID500 KV  WYLIERID TRAN  7  (IROL)   -> Redispatch

    </BODY>
    </HTML>

    and I want to extract everything (without the HTML markup) after "Current PJM Transmission Limits".  The text you see now may not always be the same but the "Current PJM Transmission Limits" title will always be the same.

    Any help please would be appreciated!!

     
    • Derrick Oswald

      Derrick Oswald - 2006-02-05

      Yes, just use the children of the parent of the node you have.

      Some methods were recently added on AbstractNode (which TextNode inherits from) to handle this...

      getPreviousSibling() and getNextSibling()

      These are only available in the latest Integration Build.

       
      • zerodrift

        zerodrift - 2006-02-05

        can you please supply an example snippet as a starter?

        it would be easier to understand.

        here's where i'm stuck...

        ----
        try
                {
                    URLTunnelReader in = new URLTunnelReader();
                    InputStream inStream = in.GetSecureConnection(args[0]);
                    Page pg = new Page(inStream, "ISO-8859-1");
                    Lexer lex = new Lexer(pg);
                    Parser parser = new Parser(lex);
                    StringFilter filter = new StringFilter();
                    filter.setCaseSensitive(false);
        //           
                    filter.setPattern("Current PJM Transmission Limits");
                   
                    NodeList list = parser.parse(filter);
                   
                   
                   
                   
                }

         
      • zerodrift

        zerodrift - 2006-02-05

        with this code:

        ---try
                {
                    URLTunnelReader in = new URLTunnelReader();
                    InputStream inStream = in.GetSecureConnection(args[0]);
                    Page pg = new Page(inStream, "ISO-8859-1");
                    Lexer lex = new Lexer(pg);
                    Parser parser = new Parser(lex);
                    StringFilter filter = new StringFilter();
                    filter.setCaseSensitive(false);
                    filter.setPattern("Current PJM Transmission Limits");
                    NodeList list = parser.parse(filter);
                   
                    Node node = list.elementAt(0);
                   
                    Node peernode = node.getNextSibling();
                   
                    System.out.println(peernode.getText());
                   
                }

        ----

        and this html page:

        ====

        </TABLE>
        </CENTER>

        <P><CENTER>Loads are calculated from raw telemetry data and are approximate.</CENTER>
        <CENTER>The displayed values are NOT official PJM Loads.</CENTER>

        <BR><BR><BR>

        <P><CENTER><H2>Current PJM Transmission Limits</H2></CENTER>
        <P>Contingency LINE    500 KV  MTSTORM-PRUNTYTO         
        <P>Reacinf-ctg BED-BLA  -> Redispatch

        </BODY>
        </HTML>

        ====

        I get a Null pointer exception...

         
    • sidhu

      sidhu - 2006-02-06

      Dear zerodrift,
      you can create a new filter

      import org.htmlparser.NodeFilter;

      public AfterFilter implements NodeFilter{
         NodeFilter filter;
         boolean ret;
        public AfterFilter(NodeFilter nFilter){
         filter = nFilter;
         ret = false;
        }
        public boolean accept(Node node){
         if(!ret && filter.accept(node)){
           ret = true;
           return false;
          }
         return ret;
        }
      }

      your code
      ----
      try 
      {
      URLTunnelReader in = new URLTunnelReader();
      InputStream inStream = in.GetSecureConnection(args[0]);
      Page pg = new Page(inStream, "ISO-8859-1");
      Lexer lex = new Lexer(pg);
      Parser parser = new Parser(lex);
      NodeFilter filter = new AfterFilter(new StringFilter("Current PJM Transmission Limits"));
      NodeList list =parser.extractAllNodesThatMatch(filter);
      for(int i =0;i<list.size();i++){
      System.out.println (list.elementAt(i).toPlainTextString());
      }

       
    • sidhu

      sidhu - 2006-02-06

      similarly you can have before filter.

       
      • zerodrift

        zerodrift - 2006-02-06

        Thanks Siddhu,

        But this still isn't clear 100%.  Do you have some sample using the getNextSibling()?

        I'm looking for something more intuitive... :-)

        Thanks,

         
        • zerodrift

          zerodrift - 2006-02-06

          This one seems to work...

                  try
                  {
                      URLTunnelReader in = new URLTunnelReader();
                      InputStream inStream = in.GetSecureConnection(args[0]);
                      Page pg = new Page(inStream, "ISO-8859-1");
                      Lexer lex = new Lexer(pg);
                      Parser parser = new Parser(lex);
                      NodeFilter filter = new StringFilter("Current PJM Transmission Limits");
                      NodeList list = parser.parse(filter);
                     
                      Node parentnode = list.elementAt(0).getParent();
                      System.out.println(parentnode.toPlainTextString());
                     
                      while (parentnode.getNextSibling() != null) {
                          parentnode = parentnode.getNextSibling();
                          System.out.println(parentnode.toPlainTextString());
                      }
                     
                     
                  }

           
    • sidhu

      sidhu - 2006-02-07

      Dear zerodrift
      there is no problem in tree traversing if you get desired data.
      And i feel sorry as i don't have code for a similar problem at present.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.