Thread: [Htmlparser-cvs] htmlparser/src/org/htmlparser/tags package.html,1.20,1.21
Brought to you by:
derrickoswald
From: Derrick O. <der...@us...> - 2005-04-24 17:48:36
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv5117/htmlparser/src/org/htmlparser/tags Modified Files: package.html Log Message: Documentation revamp part three. Reworked some JavaDoc descriptions. Added "HTML Parser for dummies" introductory text. Removed checkstyle.jar and fit.jar (and it's cruft). Index: package.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/package.html,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** package.html 10 Apr 2005 23:20:45 -0000 1.20 --- package.html 24 Apr 2005 17:48:27 -0000 1.21 *************** *** 41,48 **** <p>The classes in this package have been added in an ad-hoc fashion, with the most useful ones having existed a long time, while some obvious ones are rather ! new. Please feel free to add your own, and register them with the {@link org.htmlparser.PrototypicalNodeFactory PrototypicalNodeFactory}, and they will be treated like any other in-built tag. In fact tags do not need to reside in this package.</p> <p>If the tag can contain other nodes, i.e. {@.html <h1>My Heading</h1>}, then it should derive from (i.e. be a subclass of) {@link org.htmlparser.tags.CompositeTag}. --- 41,51 ---- <p>The classes in this package have been added in an ad-hoc fashion, with the most useful ones having existed a long time, while some obvious ones are rather ! new. Please feel free to add your own custom tags, and register them with the {@link org.htmlparser.PrototypicalNodeFactory PrototypicalNodeFactory}, and they will be treated like any other in-built tag. In fact tags do not need to reside in this package.</p> + <br><b>Custom Tags</b> + <p>Creating custom tags is fairly straight forward. Simply copy one of the + simpler tags you find in this package and alter it as follows. <p>If the tag can contain other nodes, i.e. {@.html <h1>My Heading</h1>}, then it should derive from (i.e. be a subclass of) {@link org.htmlparser.tags.CompositeTag}. *************** *** 51,59 **** and nodes between the start and end tag will be gathered into the list of children. Most of the tags in this package derive from CompositeTag, and that ! why the nodes returned from the Parser are nested.</p> <p>If it is a simple tag, i.e. {@.html <br>}, then it should derive from {@link org.htmlparser.nodes.TagNode TagNode}. See for example {@link org.htmlparser.tags.MetaTag} or {@link org.htmlparser.tags.ImageTag}.</p> <!-- Put @see and @since tags down here. --> --- 54,89 ---- and nodes between the start and end tag will be gathered into the list of children. Most of the tags in this package derive from CompositeTag, and that ! is why the nodes returned from the Parser are nested.</p> <p>If it is a simple tag, i.e. {@.html <br>}, then it should derive from {@link org.htmlparser.nodes.TagNode TagNode}. See for example {@link org.htmlparser.tags.MetaTag} or {@link org.htmlparser.tags.ImageTag}.</p> + <p>To be registered with {@link org.htmlparser.PrototypicalNodeFactory#registerTag}, + and especially if it is a composite tag, the tag needs to implement + <code>getIds</code> which returns the UPPERCASE list of names for the tag + (usually only one), for example "HTML". If the tag can be smart enough to know + what other tags can't be contained within it, it should also implement + {@link org.htmlparser.nodes.TagNode#getEnders getEnders()} which returns the + list of other tags that should cause this tag to close itself, and + {@link org.htmlparser.nodes.TagNode#getEndTagEnders getEndTagEnders()} which + returns the list of end tags (i.e. {@.html </xxx>}), other than it's own name, that + should cause this tag to close itself. When these 'ender' lists cause a tag to + end before seeing it's own end tag, a virtual end tag is created and 'inserted' + at the location where the end tag should have been. These end tags can be + distinguished because their {@link org.htmlparser.Node#getStartPosition starting} + and {@link org.htmlparser.Node#getEndPosition ending} locations are the same + (i.e. they take up no character length in the HTML stream). + <p>For example, the {@.html <OPTION>} tag from a form can be prematurely ended by + any of {@.html <INPUT>}, {@.html <TEXTAREA>}, {@.html <SELECT>}, + or another {@.html <OPTION>} tag. These are the tags in the getEnders() list. + It can also be prematurely ended by {@.html </SELECT>}, {@.html </FORM>}, + {@.html </BODY>}, or {@.html </HTML>}. These are the tags in the + getEndTagEnders() list. + <p>Other than that any functionality is up to you. You should note that + {@link org.htmlparser.Node#doSemanticAction doSemanticAction()} is called after + the tag has been completely scanned (it has it's children and end tag), but before + its siblings further downstream have been scanned. If transformation is your purpose, + this is the opportunity to mess around with the content, for example to set the link URL, + or lowercase the tag name, or whatever. <!-- Put @see and @since tags down here. --> |