Menu

How to remove tags and its content?

Help
2005-01-28
2013-04-27
  • mikeliu1976

    mikeliu1976 - 2005-01-28

    regards:

    How to remove tags and the removen tags'
    content ?

    Then save the result to the file?  @@.
    By using the NodeVisitor pattern,followings is my codes.
    The ^^^^ area is my focus.
    Could someone good give me the hand? @@.

    thank you
    May god bless you all

    --------------------------------------------

    import org.htmlparser.Parser;
    import org.htmlparser.util.NodeIterator;
    import org.htmlparser.util.*;
    import org.htmlparser.util.ParserException;
    import org.htmlparser.visitors.HtmlPage;
    import org.htmlparser.tags.*;
    import org.htmlparser.visitors.NodeVisitor;
    import org.htmlparser.*;

    class MyVisitor extends NodeVisitor{

    public void visitTag(Tag tag)
    {

    if(tag instanceof ScriptTag)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    <----I want to move the ScriptTag and it's content.   @@.   ---->  
    }
    }

    public class ToHtmlDemo
    {
        public static void main (String[] args) throws ParserException
        {
         Parser parser = new Parser ("http://www.yzu.edu.tw");
         parser.visitAllNodesWith(new MyVisitor());
      
        }
    }

     
    • Derrick Oswald

      Derrick Oswald - 2005-01-29

      You should be able to collect all the nodes you want to remove in a NodeList:

      NodeList list = new NodeList (); // top level nodes gathered
      NodeIterator iterator = parser.elements ();
      while (iterator.hasMoreNodes ())
          list.add (iteartor.NextNode ());

      And after completion of the visiting:
      for (int i = 0; i < list.size (); i++)
          list.elementAt (i).accept (myvisitor);

      Run through the list and do something like:
      NodeList matching_ones = myvisitor.getMatchingNodes ();
      for (int i = 0; i < list.size (); i++)
          item = matching_ones.elementAt (i);
      ...
      list = item.getParent ().getChildren ();
      for (int i = 0; i < list.size (); i++)
          if (item == list.elementAt (i))
          {
              list.remove (i);
              break;
           }
      }

      Then print the list of all nodes in a similar manner:
      for (int i = 0; i < list.size (); i++)
          System.out.print (list.elementAt (i).toHtml ());

       
    • mikeliu1976

      mikeliu1976 - 2005-01-29

      regards:

      Many thanks to your reply.
                                                                                     
      Follwings is my codes.
      It work successfully to move the ScriptTag and it's content.
      But my codes cannot move the word "Script" is in 'a' tag.

      Could someone good give me the hand? @@.
      thank you
      may god be with you
      ---------------------------------------------------------------

      import org.htmlparser.Parser;
      import org.htmlparser.util.NodeIterator;
      import org.htmlparser.util.*;
      import org.htmlparser.util.ParserException;
      import org.htmlparser.visitors.HtmlPage;
      import org.htmlparser.tags.*;
      import org.htmlparser.visitors.NodeVisitor;
      import org.htmlparser.*;
      import org.htmlparser.*;
      import org.htmlparser.filters.*;
      import org.htmlparser.filters.*;
      import java.io.*;

      public class ToHtmlDemoTest
      {
      public static void main (String[] args) throws ParserException
      {

      NodeList list = new NodeList();
      NodeFilter filter=new NotFilter(new TagNameFilter("Script"));

      Parser parser = new Parser("http://www.yzu.edu.tw");

      NodeIterator iterator = parser.elements();
      while(iterator.hasMoreNodes()){
      list.add(iterator.nextNode());
      }
      list.keepAllNodesThatMatch(filter,true);
      for (int i = 0;i<list.size(); i++)
      System.out.print(list.elementAt(i).toHtml());

      }
      }

       
    • mikeliu1976

      mikeliu1976 - 2005-01-29

      regards:

      Many thanks to your reply.
                                                                                     
      Follwings is my codes.
      It work successfully to move the ScriptTag and it's content.
      But my codes cannot move the word "Script" is in 'a' tag.
      I mean the follwing line.
      <a href="JavaScript:loadwindow(106,90);" style="font-family:Verdana;">&#20803;&#26234;Intranet</a>

      Could someone good give me the hand? @@.
      thank you
      may god be with you
      ---------------------------------------------------------------

      import org.htmlparser.Parser;
      import org.htmlparser.util.NodeIterator;
      import org.htmlparser.util.*;
      import org.htmlparser.util.ParserException;
      import org.htmlparser.visitors.HtmlPage;
      import org.htmlparser.tags.*;
      import org.htmlparser.visitors.NodeVisitor;
      import org.htmlparser.*;
      import org.htmlparser.*;
      import org.htmlparser.filters.*;
      import org.htmlparser.filters.*;
      import java.io.*;

      public class ToHtmlDemoTest
      {
      public static void main (String[] args) throws ParserException
      {

      NodeList list = new NodeList();
      NodeFilter filter=new NotFilter(new TagNameFilter("Script"));

      Parser parser = new Parser("http://www.yzu.edu.tw");

      NodeIterator iterator = parser.elements();
      while(iterator.hasMoreNodes()){
      list.add(iterator.nextNode());
      }
      list.keepAllNodesThatMatch(filter,true);
      for (int i = 0;i<list.size(); i++)
      System.out.print(list.elementAt(i).toHtml());

      }
      }

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.