HTML Parser / Discussion / Help: problem getEndTag()

hayato - 2005-12-04

hi

what im trying to do is to parse through an html page for pre-tags. when i found one i want to get it and all its children in one string. all my previous approaches failed so i tried to use a NodeVisitor class to find the start index of the opening pre tag <pre> and the end index of the closing pre tag </pre> so i can just extract the string between those positions out of the page... the code is:

public class PreTagVisitor extends NodeVisitor
{
    public void visitTag (Tag tag)
    {
      if (!tag.isEmptyXmlTag() && !tag.isEndTag() && (tag.getTagName().equals("PRE"))) {
        System.out.println ("\n" + tag.getTagName () + " " + tag.getStartPosition() + " " + tag.getEndPosition () + " " + tag.getEndTag().getTagName());
      }
    }
.....
the probelm is ....... if i try it like that i wont get any output on my sample html files containing wellformed <pre></pre> tags. if i delete the "tag.getEndTag().getTagName()" i get output. i havent found any hints regarding this problem in the documentation so i would be very pleased if anyone could help me solving this problem or providing me a better strategy to solve my general problem.

thx alot in advance

greetz

hayato

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- hayato - 2005-12-04
  
  little correction
  
  tag.getEndTag().getEndPosition() is what i wanted to do instead of tag.getEndTag().getTagName() in order to find the index i was talking about.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- hayato - 2005-12-04
  
  oh .... and it works perfectly with <body> or <p> tags for samplefile:
  
  if (!tag.isEmptyXmlTag() && !tag.isEndTag() && (tag.getTagName().equals("BODY"))) {
  System.out.println ("\n" + tag.getTagName () + " " + tag.getStartPosition() + " " + tag.getEndPosition () + " " + tag.getEndTag().getEndPosition());
  }
  
  this works .... but with the pre tags the algorithm will just stop .... not even throw an exception or anything.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- hayato - 2005-12-04
  
  oki i got it ^-^
  registering a customized CompositeTag in a PrototypeNodeFactory was the solution. Good that the sourcecode of the other tags was provided with the framework, otherwise i wouldnt have had a chance to get it working.
  
  greetz
  
  hayato
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

problem getEndTag()

Forums

Help

problem getEndTag() document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

problem getEndTag()