[Htmlparser-user] Not CompositeNode ("ADDRESS", "CENTER" TAG, etc...)
Brought to you by:
derrickoswald
|
From: <ka...@ex...> - 2006-02-07 06:46:36
|
Hi, all.
I parsed a html, and create a dom , using
HTMLParser Version 1.6 (Integration Build Nov 12, 2005)
The "P" tag has "P" END TAG as child.
(It's is same at "HEAD", "TITLE", "BODY", etc...)
The othe hand, there are 2 "ADDRESS" Tag ("ADDRESS" and "/ADDRESS")
on the same level in dom.
(It's the same thing at "CENTER" tag.)
I expected that ADDRESS tag become like "P" tag, but not.
Why the reason ?
How can I that the paser recognize ADDRESS tag as a single
CompositeTag.
Thank you, all. Sorry my poor english.
---------code-----------
import org.htmlparser.Node;
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
public class SampleHTMLParserJ {
/**
* HTMLParser sample
*
* @param args
*/
public static void main(String[] args) {
try {
Parser parser = new Parser(
"file:///D:/data/test03.html");
NodeList list = parser.parse(null);
Node node = list.elementAt(0);
System.out.println(node);
} catch (ParserException e) {
e.printStackTrace();
}
}
}
---------stdout-----------
Tag (0[0,0],57[0,57]): Html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja"
Txt (57[0,57],60[1,1]): \n
Tag (60[1,1],66[1,7]): head
Txt (66[1,7],70[2,2]): \n
Tag (70[2,2],77[2,9]): title
Txt (77[2,9],88[2,20]): title title
End (88[2,20],96[2,28]): /title
Txt (96[2,28],99[3,1]): \n
End (99[3,1],106[3,8]): /head
Txt (106[3,8],109[4,1]): \n
Tag (109[4,1],115[4,7]): body
Txt (115[4,7],121[6,2]): \n\n
Tag (121[6,2],130[6,11]): address
Txt (130[6,11],137[6,18]): My name
End (137[6,18],147[6,28]): /address
Txt (147[6,28],151[7,2]): \n
Tag (151[7,2],159[7,10]): CENTER
Txt (159[7,10],165[7,16]): CENTER
End (165[7,16],174[7,25]): /CENTER
Txt (174[7,25],178[8,2]): \n
Tag (178[8,2],181[8,5]): p
Tag (181[8,5],220[8,44]): img src="welcome.gif" alt="welcome" /
End (220[8,44],224[8,48]): /p
Txt (224[8,48],230[10,2]): \n\n
Tag (230[10,2],234[10,6]): h1
Txt (234[10,6],238[10,10]): main
End (238[10,10],243[10,15]): /h1
Txt (243[10,15],247[11,2]): \n
Tag (247[11,2],253[11,8]): hr /
Txt (253[11,8],256[12,1]): \n
End (256[12,1],263[12,8]): /body
Txt (263[12,8],265[13,0]): \n
End (265[13,0],272[13,7]): /html
---------html-----------
<Html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja">
<head>
<title>title title</title>
</head>
<body>
<address>My name</address>
<CENTER>CENTER</CENTER>
<p><img src="welcome.gif" alt="welcome" /></p>
<h1>main</h1>
<hr />
</body>
</html>
------------------
|