Menu

Simple HTML Transformation.

Help
2006-03-01
2013-04-27
  • Clive Haworth

    Clive Haworth - 2006-03-01

    Hi. I'm stuck. Here is an HTML snippet from a multi-part MIME email:

    <html>
    <body>
    <p>
    Hello
    <footer><p>Footer Text</footer>
    <p>
    Again
    </body>
    </html>

    I want to remove the 'footer' tag (and it's text node) which could occur anywhere in the doc. So I try something like:

    addNewFooter(Part part) {

        Page page = new Page(part.getInputStream(), encoding);
        Lexer lexer = new Lexer(page);
        Parser parser = new Parser(lexer);
        try {
            log.info("Stripping HTML footer");
            NodeList complete = parser.parse(null);
            NodeList stripped = complete.extractAllNodesThatMatch(new NotFilter(new TagNameFilter("footer")));

    //        part.setContent(complete.toHtml(), "text/html; charset=" + encoding);
            log.info(complete.toHtml());
        } ....

    This doesn't work and it isn't clear how I do it.

    Next I want to add a new 'footer' tag just before the body close:

    <html>
    <body>
    <p>
    Hello
    <p>
    Again
    <footer><p>Footer Text</footer>
    </body>
    </html>

    I have no idea how to do this. I can see that you add nodes to node lists, but how exactly in this case ?

    Do I create a node like this ..

                TagNode footerTag = new TagNode();
                footerTag.setTagName("footer");
                TextNode footerText = new TextNode("Footer Text");
                footerText.setParent(footerTag);
               
    ... and how do I stick it in the correct place in the tree (node list) ?

    Thanks for any help you can offer.

    Clive

     
    • Derrick Oswald

      Derrick Oswald - 2006-03-02

      removing:

      The NotFilter is bound to get you every node but the footer nodes, however it will be a linear list.

      I would filter for the footer nodes and remove them from their parent:

      NodeList footers = complete.extractAllNodesThatMatch (new TagNameFilter("footer"));
      ... foreach footer in the list
        footer.getParent ().remove (footer);

      adding:

      The footerText needs to be added to the footerTag's children list:
        footerTag.getChildren ().add (footerText);

      Adding the footer just before the end of the <html> tag is the same, it's a simple add() which puts it at the end:

      HtmlTag html;

      ... get the html tag somehow
      html.getChildren ().add (my_new_footer);

       
    • Clive Haworth

      Clive Haworth - 2006-03-02

      Ah. Got it. I thought extractAllNodesThatMatch() extracted a list unrelated to the main node list. It is basically just selecting which nodes to act on - maybe selectAllNodesThatMatch() would have been clearer, but what's in a name? Thanks for the help.

       
    • Clive Haworth

      Clive Haworth - 2006-03-02

      Nope. Not what I thought. Consider the following code:

              try {
                  Parser parser = new Parser("file:///clive.html");
                  NodeList root = parser.parse(null);
                  NodeList divs = root.extractAllNodesThatMatch(new NodeClassFilter(Div.class), true);

                  System.out.println("found " + divs.size() + " div tags");

                  for(int i = 0; i < divs.size(); i++) {
                      TagNode div = (TagNode) divs.elementAt(i);
                      String id = div.getAttribute("id");
                      if(id != null && id.equals("__footer__")) {
                          System.out.println("found footer: " + div);
                          if(divs.remove(div)) {
                              System.out.println("removed node");
                          }
                      }
                  }
                  System.out.println(root.toHtml());
              } catch(ParserException e) {
                  e.printStackTrace();
              }

      This removes the div node from the divs list, not the root list which are obviously different.
      All I want to do is print out the HTML (all of it) without the footer div ?

      If I try:

          root.remove(div)

      in place of:

          divs.remove(div)

      It doesn't find it (returns false) ....

      How, exactly do I do this ?

      Regards
      Clive

       
    • Derrick Oswald

      Derrick Oswald - 2006-03-02

      You need:

        div.getParent ().getChildren ().remove (div);

      Need to go to the next node up the tree and remove the div node from
      it's list of children.

       
    • Clive Haworth

      Clive Haworth - 2006-03-03

      Great. Now I get it ! Thanks

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.