Hi everybody,
I am new to html parsing and I have trouble understanding the meaning of
"composite tag": At first I thought that it was a tag such as
<tagname></tagname> opposing this notion to the non-composite version
<tagname/>.
I wrote a short html file to check it. Here it is:
/<html>
<head></head>
<body>
<br/>
text just afer html
<br/>
<H1>Title 1 1</H1>
Text 1
<h2>Title 2</h2>
<p>Phrase 1 p1. Phrase 2 p1</p>
<br/>
<p>Phrase 1 p2. <strong>Phrase in <em>bold</em> </strong> 2 p2</p>
</body>
</html>/
Then I started my "visitTag(Tag tag) method that way:
/public void visitTag(Tag tag) {
System.out.println("Visit tag : " + tag.getTagName());
if (tag.getEndTag() != null) {
System.out.println("getEndTag
returns:"+tag.getEndTag().getTagName());
} else {
System.out.println("getEndTag returns null");
}/
I get this:
/Visit tag : HTML
getEndTag returns:HTML
Visit tag : HEAD
getEndTag returns:HEAD
visit endTag :/head
Visit tag : BODY
getEndTag returns:BODY
Visit tag : BR
getEndTag returns null
Visit tag : BR
getEndTag returns null
Visit tag : H1
getEndTag returns null
visit endTag :/H1
Visit tag : H2
getEndTag returns null
visit endTag :/h2
Visit tag : P
getEndTag returns null
visit endTag :/p
Visit tag : BR
getEndTag returns null
Visit tag : P
getEndTag returns null
Visit tag : STRONG
getEndTag returns null
Visit tag : EM
getEndTag returns null
visit endTag :/em
visit endTag :/strong
visit endTag :/p
visit endTag :/body
11202: Info: Load project source files: 329ms
visit endTag :/html/
Apparently, getEndTag() returns null for h1,h2,strong, em and p tags.
It seems I don't understand the word «composite"!
Could somebody help me?
|