HTML Parser / Discussion / Help: Parser producing non existant </a> tag

Parser producing non existant </a> tag

Forum: Help

Creator: Abhimanyu

Created: 2006-05-18

Updated: 2013-04-27

Abhimanyu - 2006-05-18

Hi,

I have a simple little html file. I'm writing a java program to check that all open tags in the file are closed and that too in the same order they are opened. For this I push into a stack any open tag and pop to compare when I get a close tag.

Here is the file :

<HTML>
<TABLE border="0" cellpadding="0" cellspacing="0" height="77" width="275">
    <TBODY>
        <TR>
            <TD>
                <DIV style="MARGIN-LEFT: 12px; WIDTH: 50px; TEXT-ALIGN: center">
                    <STRONG style="FONT-SIZE: 12px; COLOR: #000000; FONT-FAMILY: arial,verdana,sans-serif">
    <BR/>Hi<BR/>Whatsup!

                    </STRONG>
                </DIV>
            </TD>
            <TD align="left" valign="bottom">
                <DIV style="MARGIN-LEFT: 12px; WIDTH: 200px; TEXT-ALIGN: left">
                    <FONT style="FONT-SIZE: 11px; color:#cc6633; FONT-FAMILY: arial,verdana,sans-serif">

                        We all want to party in life!

                    </FONT>
                </DIV>
            </TD>
        </TR>
    </TBODY>
</TABLE>
<a href="http://www.google.com" charset="dfdf">
<div style="position: relative;top:-77px;height:77px;width:275px;cursor:hand;-moz-opacity:0.01; opacity: 0.01; filter: alpha(opacity=1);background-color: blue;">
</div>
</a>
</HTML>

If you see the last <a href .. around the <div> its clear that is is wrapped around the div and closed after it. But for some reason the NodeList returned by Parser seems to contain a </a> before the div begins... here is my Programs output.

src.run:
     [java] Pushing : HTML
     [java] Pushing : TABLE
     [java] Pushing : TBODY
     [java] Pushing : TR
     [java] Pushing : TD
     [java] Pushing : DIV
     [java] Pushing : STRONG
     [java] Close tag : /STRONG Popping : STRONG
     [java] Close tag : /DIV Popping : DIV
     [java] Close tag : /TD Popping : TD
     [java] Pushing : TD
     [java] Pushing : DIV
     [java] Pushing : FONT
     [java] Close tag : /FONT Popping : FONT
     [java] Close tag : /DIV Popping : DIV
     [java] Close tag : /TD Popping : TD
     [java] Close tag : /TR Popping : TR
     [java] Close tag : /TBODY Popping : TBODY
     [java] Close tag : /TABLE Popping : TABLE
     [java] Pushing : a
     [java] Close tag : /a Popping : a
     [java] Pushing : div
     [java] Close tag : /div Popping : div
     [java] Close tag : /a Popping : HTML
     [java] Exception in thread "main" java.util.EmptyStackException
     [java]     at java.util.Stack.peek(Stack.java:79)
     [java]     at java.util.Stack.pop(Stack.java:61)
     [java]     at com.trilogy.ce.emailservice.validators.digitalimpact.EmailContentValidator.validateStructure(EmailContentValidator.java:65)
     [java]     at com.trilogy.ce.emailservice.validators.digitalimpact.EmailContentValidator.main(EmailContentValidator.java:112)

Can anyone tell me what the problem is? thanks in advacnce.

Abhimanyu.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2006-05-19
  
  The parser generates closing tags for CompositeTag derived classes. These virtual nodes can be detected by checking if the getStartPosition() is equal to the getEndPosition(). These closing tags can be obtained from the opening tag using getEndTag().
  
  The behaviour of when these are generated is controlled by the getEnders() and getEndTagEnders() methods on the subclasses for each tag. For example the <A> tag (LinkTag.java) has "A", "P", "DIV", "TD", "TR", "FORM" and "LI" as triggers for generating an end tag and popping the A tag off the parse stack.
  
  Hope this helps, even though its confusing.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Abhimanyu - 2006-05-22
    
    Hey thanks alot. I figured that the Parser class was inserting an implicit </a> because of the div after it. What you said makes sense. Since I didn't need the functionality of Parser, I just used Lexer instead. I was wondering, this doesn't have any implicit tags it enters?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Derrick Oswald - 2006-05-22
      
      The Lexer just returns undifferentiated TextNode, RemarkNode and TagNode objects. Those do not derive from the CompositeTag (which has the scanner that inserts the closing tags) so hence, they have no inserted closing tags.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Parser producing non existant </a> tag

Forums

Help

Parser producing non existant </a> tag document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Parser producing non existant </a> tag