[Htmlparser-cvs] htmlparser/src/org/htmlparser/filters AndFilter.java,1.2,1.3 CssSelectorNodeFilter.
Brought to you by:
derrickoswald
From: Derrick O. <der...@us...> - 2005-04-10 23:21:38
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30655/htmlparser/src/org/htmlparser/filters Modified Files: AndFilter.java CssSelectorNodeFilter.java HasAttributeFilter.java HasChildFilter.java HasParentFilter.java HasSiblingFilter.java LinkRegexFilter.java LinkStringFilter.java NodeClassFilter.java NotFilter.java OrFilter.java RegexFilter.java TagNameFilter.java Log Message: Documentation revamp part one. Deprecated node decorators. Added doSemanticAction for Text and Comment nodes. Added missing sitecapturer scripts. Fixed DOS batch files to work when called from any location. Index: AndFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/AndFilter.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** AndFilter.java 13 Feb 2005 20:36:00 -0000 1.2 --- AndFilter.java 10 Apr 2005 23:20:43 -0000 1.3 *************** *** 95,98 **** --- 95,100 ---- * Accept nodes that are acceptable to all of it's predicate filters. * @param node The node to check. + * @return <code>true</code> if all the predicate filters find the node + * is acceptable, <code>false</code> otherwise. */ public boolean accept (Node node) Index: RegexFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/RegexFilter.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** RegexFilter.java 13 Feb 2005 20:36:00 -0000 1.2 --- RegexFilter.java 10 Apr 2005 23:20:43 -0000 1.3 *************** *** 54,58 **** * </pre> * which matches a date in yyyy-mm-dd format between 1900-01-01 and 2099-12-31, ! * with a choice of five separators, dash, space, either slash or a period. * The year is matched by (19|20)\d\d which uses alternation to allow the * either 19 or 20 as the first two digits. The round brackets are mandatory. --- 54,59 ---- * </pre> * which matches a date in yyyy-mm-dd format between 1900-01-01 and 2099-12-31, ! * with a choice of five separators, either a dash, a space, either kind of ! * slash or a period. * The year is matched by (19|20)\d\d which uses alternation to allow the * either 19 or 20 as the first two digits. The round brackets are mandatory. *************** *** 174,177 **** --- 175,180 ---- * Accept string nodes that match the regular expression. * @param node The node to check. + * @return <code>true</code> if the regular expression matches the + * text of the node, <code>false</code> otherwise. */ public boolean accept (Node node) Index: HasParentFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/HasParentFilter.java,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** HasParentFilter.java 13 Feb 2005 20:36:00 -0000 1.6 --- HasParentFilter.java 10 Apr 2005 23:20:43 -0000 1.7 *************** *** 128,131 **** --- 128,133 ---- * filter. * @param node The node to check. + * @return <code>true</code> if the node has an acceptable parent, + * <code>false</code> otherwise. */ public boolean accept (Node node) Index: LinkStringFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/LinkStringFilter.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** LinkStringFilter.java 6 Apr 2005 10:20:23 -0000 1.1 --- LinkStringFilter.java 10 Apr 2005 23:20:43 -0000 1.2 *************** *** 38,42 **** --- 38,49 ---- public class LinkStringFilter implements NodeFilter { + /** + * The pattern to search for in the link. + */ protected String mPattern; + + /** + * Flag indicating case sensitive/insensitive search. + */ protected boolean mCaseSensitive; Index: LinkRegexFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/LinkRegexFilter.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** LinkRegexFilter.java 6 Apr 2005 10:20:23 -0000 1.1 --- LinkRegexFilter.java 10 Apr 2005 23:20:43 -0000 1.2 *************** *** 40,43 **** --- 40,46 ---- public class LinkRegexFilter implements NodeFilter { + /** + * The regular expression to use on the link. + */ protected Pattern mRegex; *************** *** 47,51 **** * @param regexPattern The pattern to match. */ ! public LinkRegexFilter (String regexPattern) throws Exception { this (regexPattern, true); --- 50,54 ---- * @param regexPattern The pattern to match. */ ! public LinkRegexFilter (String regexPattern) { this (regexPattern, true); *************** *** 58,62 **** * @param caseSensitive Specifies case sensitivity for the matching process. */ ! public LinkRegexFilter (String regexPattern, boolean caseSensitive) throws Exception { if (caseSensitive) --- 61,65 ---- * @param caseSensitive Specifies case sensitivity for the matching process. */ ! public LinkRegexFilter (String regexPattern, boolean caseSensitive) { if (caseSensitive) Index: OrFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/OrFilter.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** OrFilter.java 13 Feb 2005 20:36:00 -0000 1.2 --- OrFilter.java 10 Apr 2005 23:20:43 -0000 1.3 *************** *** 93,96 **** --- 93,98 ---- * Accept nodes that are acceptable to any of it's predicate filters. * @param node The node to check. + * @return <code>true</code> if any of the predicate filters find the node + * is acceptable, <code>false</code> otherwise. */ public boolean accept (Node node) Index: HasChildFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/HasChildFilter.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** HasChildFilter.java 13 Feb 2005 20:36:00 -0000 1.3 --- HasChildFilter.java 10 Apr 2005 23:20:43 -0000 1.4 *************** *** 34,37 **** --- 34,40 ---- /** * This class accepts all tags that have a child acceptable to the filter. + * It can be set to operate recursively, that is perform a scan down + * through the node heirarchy in a breadth first traversal looking for any + * descendant that matches the predicate filter (which stops the search). */ public class HasChildFilter *************** *** 123,126 **** --- 126,131 ---- * Accept tags with children acceptable to the filter. * @param node The node to check. + * @return <code>true</code> if the node has an acceptable child, + * <code>false</code> otherwise. */ public boolean accept (Node node) Index: HasSiblingFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/HasSiblingFilter.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** HasSiblingFilter.java 13 Feb 2005 20:36:00 -0000 1.1 --- HasSiblingFilter.java 10 Apr 2005 23:20:43 -0000 1.2 *************** *** 85,88 **** --- 85,90 ---- * Accept tags with a sibling acceptable to the filter. * @param node The node to check. + * @return <code>true</code> if the node has an acceptable sibling, + * <code>false</code> otherwise. */ public boolean accept (Node node) Index: CssSelectorNodeFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/CssSelectorNodeFilter.java,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** CssSelectorNodeFilter.java 17 Jul 2004 13:45:04 -0000 1.4 --- CssSelectorNodeFilter.java 10 Apr 2005 23:20:43 -0000 1.5 *************** *** 72,75 **** --- 72,79 ---- private NodeFilter therule; + /** + * Create a Cascading Style Sheet node filter. + * @param selector The selector expression. + */ public CssSelectorNodeFilter(String selector) { *************** *** 79,85 **** } ! public boolean accept(Node n) { ! return therule.accept(n); } --- 83,95 ---- } ! /** ! * Accept nodes that match the selector expression. ! * @param node The node to check. ! * @return <code>true</code> if the node matches, ! * <code>false</code> otherwise. ! */ ! public boolean accept(Node node) { ! return therule.accept(node); } *************** *** 235,241 **** if ("~=".equals(rel) && val != null) n = new AttribMatchFilter(unescape(attrib), ! "\\b" ! + val.replaceAll("([^a-zA-Z0-9])", "\\\\$1") ! + "\\b"); else if ("=".equals(rel) && val != null) n = new HasAttributeFilter(attrib, val); --- 245,251 ---- if ("~=".equals(rel) && val != null) n = new AttribMatchFilter(unescape(attrib), ! "\\b" ! + val.replaceAll("([^a-zA-Z0-9])", "\\\\$1") ! + "\\b"); else if ("=".equals(rel) && val != null) n = new HasAttributeFilter(attrib, val); *************** *** 249,252 **** --- 259,268 ---- } + /** + * Replace escape sequences in a string. + * @param escaped The string to examine. + * @return The argument with escape sequences replaced by their + * equivalent character. + */ public static String unescape(String escaped) { *************** *** 258,262 **** if (m.group(1) != null) m.appendReplacement(result, ! String.valueOf((char)Integer.parseInt(m.group(1), 16))); else if (m.group(2) != null) m.appendReplacement(result, m.group(2)); --- 274,278 ---- if (m.group(1) != null) m.appendReplacement(result, ! String.valueOf((char)Integer.parseInt(m.group(1), 16))); else if (m.group(2) != null) m.appendReplacement(result, m.group(2)); Index: TagNameFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/TagNameFilter.java,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** TagNameFilter.java 13 Feb 2005 20:36:00 -0000 1.4 --- TagNameFilter.java 10 Apr 2005 23:20:43 -0000 1.5 *************** *** 87,90 **** --- 87,92 ---- * The end tags are available on the enclosing non-end tag. * @param node The node to check. + * @return <code>true</code> if the tag name matches, + * <code>false</code> otherwise. */ public boolean accept (Node node) Index: HasAttributeFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/HasAttributeFilter.java,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** HasAttributeFilter.java 13 Feb 2005 20:36:00 -0000 1.5 --- HasAttributeFilter.java 10 Apr 2005 23:20:43 -0000 1.6 *************** *** 119,122 **** --- 119,124 ---- * Accept tags with a certain attribute. * @param node The node to check. + * @return <code>true</code> if the node has the attribute + * (and value if that is being checked too), <code>false</code> otherwise. */ public boolean accept (Node node) Index: NodeClassFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/NodeClassFilter.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** NodeClassFilter.java 13 Feb 2005 20:36:00 -0000 1.2 --- NodeClassFilter.java 10 Apr 2005 23:20:43 -0000 1.3 *************** *** 78,81 **** --- 78,83 ---- * Accept nodes that are assignable from the class provided in the constructor. * @param node The node to check. + * @return <code>true</code> if the node is the right class, + * <code>false</code> otherwise. */ public boolean accept (Node node) Index: NotFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/NotFilter.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** NotFilter.java 13 Feb 2005 22:45:47 -0000 1.3 --- NotFilter.java 10 Apr 2005 23:20:43 -0000 1.4 *************** *** 85,88 **** --- 85,90 ---- * Accept nodes that are not acceptable to the predicate filter. * @param node The node to check. + * @return <code>true</code> if the node is not acceptable to the + * predicate filter, <code>false</code> otherwise. */ public boolean accept (Node node) |