[Htmlparser-user] Not CompositeNode ("ADDRESS", "CENTER" TAG, etc...)
Brought to you by:
derrickoswald
From: <ka...@ex...> - 2006-02-07 06:46:36
|
Hi, all. I parsed a html, and create a dom , using HTMLParser Version 1.6 (Integration Build Nov 12, 2005) The "P" tag has "P" END TAG as child. (It's is same at "HEAD", "TITLE", "BODY", etc...) The othe hand, there are 2 "ADDRESS" Tag ("ADDRESS" and "/ADDRESS") on the same level in dom. (It's the same thing at "CENTER" tag.) I expected that ADDRESS tag become like "P" tag, but not. Why the reason ? How can I that the paser recognize ADDRESS tag as a single CompositeTag. Thank you, all. Sorry my poor english. ---------code----------- import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; public class SampleHTMLParserJ { /** * HTMLParser sample * * @param args */ public static void main(String[] args) { try { Parser parser = new Parser( "file:///D:/data/test03.html"); NodeList list = parser.parse(null); Node node = list.elementAt(0); System.out.println(node); } catch (ParserException e) { e.printStackTrace(); } } } ---------stdout----------- Tag (0[0,0],57[0,57]): Html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" Txt (57[0,57],60[1,1]): \n Tag (60[1,1],66[1,7]): head Txt (66[1,7],70[2,2]): \n Tag (70[2,2],77[2,9]): title Txt (77[2,9],88[2,20]): title title End (88[2,20],96[2,28]): /title Txt (96[2,28],99[3,1]): \n End (99[3,1],106[3,8]): /head Txt (106[3,8],109[4,1]): \n Tag (109[4,1],115[4,7]): body Txt (115[4,7],121[6,2]): \n\n Tag (121[6,2],130[6,11]): address Txt (130[6,11],137[6,18]): My name End (137[6,18],147[6,28]): /address Txt (147[6,28],151[7,2]): \n Tag (151[7,2],159[7,10]): CENTER Txt (159[7,10],165[7,16]): CENTER End (165[7,16],174[7,25]): /CENTER Txt (174[7,25],178[8,2]): \n Tag (178[8,2],181[8,5]): p Tag (181[8,5],220[8,44]): img src="welcome.gif" alt="welcome" / End (220[8,44],224[8,48]): /p Txt (224[8,48],230[10,2]): \n\n Tag (230[10,2],234[10,6]): h1 Txt (234[10,6],238[10,10]): main End (238[10,10],243[10,15]): /h1 Txt (243[10,15],247[11,2]): \n Tag (247[11,2],253[11,8]): hr / Txt (253[11,8],256[12,1]): \n End (256[12,1],263[12,8]): /body Txt (263[12,8],265[13,0]): \n End (265[13,0],272[13,7]): /html ---------html----------- <Html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja"> <head> <title>title title</title> </head> <body> <address>My name</address> <CENTER>CENTER</CENTER> <p><img src="welcome.gif" alt="welcome" /></p> <h1>main</h1> <hr /> </body> </html> ------------------ |