Menu

Filtering a HTML Document

Help
2003-12-04
2003-12-12
  • Saleh matani

    Saleh matani - 2003-12-04

    I have alot of HTML Dokuments and need to filter them from Javascripts Links (maybe more!)
    has anybody an example how to do this ?

    Q2: the Parser use to Parse a URL odr Local HTML Dokument , i would like to know if i can give the Parser a HTML String and define Filter that i get back a HTML Code that dose not contain tags that i have filtered!

    thanks :)

     
    • Saleh matani

      Saleh matani - 2003-12-05

      how to Filter this html code:
      <html>
      <head>
      <title>test</title>
      </head>
      <body>
      <p><a href="http://www.google.de">http://www.google.de</a></p>
      <p>text</p>
      <p>text</p>
      <table border="1" width="100%">
        <tr>
          <td width="50%">table</td>
          <td width="50%">table</td>
        </tr>
      </table>
      <form method="POST" action="--WEBBOT-SELF--">
        <p><input type="button" value="Button" name="B3"></p>
      </form>
      <p> the text</p>
      </body>
      </html>

      to this html Code:

      <p>text</p>
      <p>text</p>
      <table border="1" width="100%">
        <tr>
          <td width="50%">table</td>
          <td width="50%">table</td>
        </tr>
      </table>
        <p>&nbsp;</p>
      <p> the text</p>

      -----------------------------------------

      thats mean : to parse the html site and remove the html tag , title tag ,form tags , link tags and get as result whhat between <Body> and </Body>

       
      • Derrick Oswald

        Derrick Oswald - 2003-12-06

        You might try the NodeVisitor pattern. Create a class that implements NodeVisitor and overrides some of the methods:

        class MyVisitor implements NodeVisitor
        {
            boolean inbody = false;

            public void visitTag (Tag tag)
            {
                if (inbody)
                    if (!tag.getTagName().equals("A"))
                        System.out.println (tag.toHtml ());
                if (tag.getTagName().equals("BODY"))
                    inbody = true;
            }
           
            public void visitEndTag (Tag tag)
            {
                if (tag.getTagName().equals("BODY"))
                    inbody = false;
                if (inbody)
                    System.out.println (tag.toHtml ());
            }
        }

        Then run through all the nodes with:
            parser.visitAllNodesWith(new MyVisitor ());

         
        • Derrick Oswald

          Derrick Oswald - 2003-12-06

          oops, extends not implements

           
    • Saleh matani

      Saleh matani - 2003-12-07

      thank you for help , thats work but it dose not return back the Tags that i need with the text!! i am getting back just the tags!

      i am getting this :

      <p></p>
      <p></p>
      <table border="1" width="100%">
      <tr>
      <td width="50%"></td>
      <td width="50%"></td>
      </tr>
      </table>
      <p></p>
      <p> </p>

      schuld be this :

      <p>text</p>
      <p>text</p>
      <table border="1" width="100%">
      <tr>
      <td width="50%">table</td>
      <td width="50%">table</td>
      </tr>
      </table>
      <p>&nbsp;</p>
      <p> the text</p>

       
    • Derrick Oswald

      Derrick Oswald - 2003-12-12

      add this method:

          public void visitStringNode (StringNode stringNode)
          {
              System.out.println (stringNode.toHtml ());
          }

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.