[Htmlparser-cvs] htmlparser/src/org/htmlparser/visitors HtmlPage.java,1.32,1.33 LinkFindingVisitor.j
Brought to you by:
derrickoswald
From: <der...@us...> - 2003-09-10 03:54:15
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors In directory sc8-pr-cvs1:/tmp/cvs-serv24483/src/org/htmlparser/visitors Modified Files: HtmlPage.java LinkFindingVisitor.java NodeVisitor.java ObjectFindingVisitor.java StringFindingVisitor.java TagFindingVisitor.java TextExtractingVisitor.java UrlModifyingVisitor.java package.html Log Message: Add style checking target to ant build script: ant checkstyle It uses a jar from http://checkstyle.sourceforge.net which is dropped in the lib directory. The rules are in the file htmlparser_checks.xml in the src directory. Added lexerapplications package with Tabby as the first app. It performs whitespace manipulation on source files to follow the style rules. This reduced the number of style violations to roughly 14,000. There are a few issues with the style checker that need to be resolved before it should be taken too seriously. For example: It thinks all method arguments should be final, even if they are modified by the code (which the compiler frowns on). It complains about long lines, even when there is no possibility of wrapping the line, i.e. a URL in a comment that's more than 80 characters long. It considers all naked integers as 'magic numbers', even when they are obvious, i.e. the 4 corners of a box. It complains about whitespace following braces, even in array initializers, i.e. X[][] = { {a, b} { } } But it points out some really interesting things, even if you don't agree with the style guidelines, so it's worth a look. Index: HtmlPage.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/HtmlPage.java,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** HtmlPage.java 8 Sep 2003 02:26:33 -0000 1.32 --- HtmlPage.java 10 Sep 2003 03:38:25 -0000 1.33 *************** *** 1,27 **** // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com --- 1,27 ---- // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com *************** *** 45,49 **** private NodeList tables; private boolean bodyTagBegin; ! public HtmlPage(Parser parser) { super(false); --- 45,49 ---- private NodeList tables; private boolean bodyTagBegin; ! public HtmlPage(Parser parser) { super(false); *************** *** 54,58 **** bodyTagBegin = false; } ! public String getTitle() { return title; --- 54,58 ---- bodyTagBegin = false; } ! public String getTitle() { return title; *************** *** 65,69 **** public void visitTag(Tag tag) { addTagToBodyIfApplicable(tag); ! if (isTable(tag)) { tables.add(tag); --- 65,69 ---- public void visitTag(Tag tag) { addTagToBodyIfApplicable(tag); ! if (isTable(tag)) { tables.add(tag); *************** *** 85,91 **** public void visitEndTag(EndTag endTag) { ! if (isBodyTag(endTag)) bodyTagBegin = false; ! addTagToBodyIfApplicable(endTag); } --- 85,91 ---- public void visitEndTag(EndTag endTag) { ! if (isBodyTag(endTag)) bodyTagBegin = false; ! addTagToBodyIfApplicable(endTag); } *************** *** 97,109 **** addTagToBodyIfApplicable(stringNode); } ! private boolean isBodyTag(Tag tag) { return tag.getTagName().equals("BODY"); } ! public NodeList getBody() { return nodesInBody; } ! public TableTag [] getTables() { TableTag [] tableArr = new TableTag[tables.size()]; --- 97,109 ---- addTagToBodyIfApplicable(stringNode); } ! private boolean isBodyTag(Tag tag) { return tag.getTagName().equals("BODY"); } ! public NodeList getBody() { return nodesInBody; } ! public TableTag [] getTables() { TableTag [] tableArr = new TableTag[tables.size()]; Index: LinkFindingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/LinkFindingVisitor.java,v retrieving revision 1.27 retrieving revision 1.28 diff -C2 -d -r1.27 -r1.28 *** LinkFindingVisitor.java 8 Sep 2003 02:26:33 -0000 1.27 --- LinkFindingVisitor.java 10 Sep 2003 03:38:25 -0000 1.28 *************** *** 1,27 **** // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com --- 1,27 ---- // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com *************** *** 35,39 **** private boolean linkTagFound = false; private int count = 0; ! public LinkFindingVisitor(String linkTextToFind) { this.linkTextToFind = linkTextToFind.toUpperCase(); --- 35,39 ---- private boolean linkTagFound = false; private int count = 0; ! public LinkFindingVisitor(String linkTextToFind) { this.linkTextToFind = linkTextToFind.toUpperCase(); *************** *** 47,55 **** } } ! public boolean linkTextFound() { return linkTagFound; } ! public int getCount() { return count; --- 47,55 ---- } } ! public boolean linkTextFound() { return linkTagFound; } ! public int getCount() { return count; Index: NodeVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/NodeVisitor.java,v retrieving revision 1.27 retrieving revision 1.28 diff -C2 -d -r1.27 -r1.28 *** NodeVisitor.java 8 Sep 2003 02:26:33 -0000 1.27 --- NodeVisitor.java 10 Sep 2003 03:38:25 -0000 1.28 *************** *** 1,27 **** // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com --- 1,27 ---- // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com *************** *** 40,86 **** private boolean recurseChildren; private boolean recurseSelf; ! public NodeVisitor() { ! this(true); } ! public NodeVisitor(boolean recurseChildren) { this.recurseChildren = recurseChildren; ! this.recurseSelf = true; } ! public NodeVisitor(boolean recurseChildren,boolean recurseSelf) { this.recurseChildren = recurseChildren; ! this.recurseSelf = recurseSelf; } public void visitTag(Tag tag) { ! } public void visitStringNode(StringNode stringNode) { } ! public void visitLinkTag(LinkTag linkTag) { } ! public void visitImageTag(ImageTag imageTag) { } ! public void visitEndTag(EndTag endTag) { ! } ! public void visitTitleTag(TitleTag titleTag) { ! } public void visitRemarkNode(RemarkNode remarkNode) { ! } ! public boolean shouldRecurseChildren() { return recurseChildren; } ! public boolean shouldRecurseSelf() { return recurseSelf; --- 40,86 ---- private boolean recurseChildren; private boolean recurseSelf; ! public NodeVisitor() { ! this(true); } ! public NodeVisitor(boolean recurseChildren) { this.recurseChildren = recurseChildren; ! this.recurseSelf = true; } ! public NodeVisitor(boolean recurseChildren,boolean recurseSelf) { this.recurseChildren = recurseChildren; ! this.recurseSelf = recurseSelf; } public void visitTag(Tag tag) { ! } public void visitStringNode(StringNode stringNode) { } ! public void visitLinkTag(LinkTag linkTag) { } ! public void visitImageTag(ImageTag imageTag) { } ! public void visitEndTag(EndTag endTag) { ! } ! public void visitTitleTag(TitleTag titleTag) { ! } public void visitRemarkNode(RemarkNode remarkNode) { ! } ! public boolean shouldRecurseChildren() { return recurseChildren; } ! public boolean shouldRecurseSelf() { return recurseSelf; *************** *** 89,93 **** /** * Override this method if you wish to do special ! * processing upon completion of parsing */ public void finishedParsing() { --- 89,93 ---- /** * Override this method if you wish to do special ! * processing upon completion of parsing */ public void finishedParsing() { Index: ObjectFindingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/ObjectFindingVisitor.java,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** ObjectFindingVisitor.java 8 Sep 2003 02:26:33 -0000 1.32 --- ObjectFindingVisitor.java 10 Sep 2003 03:38:25 -0000 1.33 *************** *** 1,27 **** // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com --- 1,27 ---- // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com *************** *** 39,47 **** private int count = 0; private NodeList tags; ! public ObjectFindingVisitor(Class classTypeToFind) { this(classTypeToFind,false); } ! public ObjectFindingVisitor(Class classTypeToFind,boolean recurse) { super(recurse); --- 39,47 ---- private int count = 0; private NodeList tags; ! public ObjectFindingVisitor(Class classTypeToFind) { this(classTypeToFind,false); } ! public ObjectFindingVisitor(Class classTypeToFind,boolean recurse) { super(recurse); *************** *** 49,53 **** this.tags = new NodeList(); } ! public int getCount() { return count; --- 49,53 ---- this.tags = new NodeList(); } ! public int getCount() { return count; Index: StringFindingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/StringFindingVisitor.java,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** StringFindingVisitor.java 8 Sep 2003 02:26:33 -0000 1.32 --- StringFindingVisitor.java 10 Sep 2003 03:38:25 -0000 1.33 *************** *** 1,27 **** // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com --- 1,27 ---- // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com *************** *** 38,42 **** private int foundCount; private boolean multipleSearchesWithinStrings; ! public StringFindingVisitor(String stringToFind) { this.stringToFind = stringToFind.toUpperCase(); --- 38,42 ---- private int foundCount; private boolean multipleSearchesWithinStrings; ! public StringFindingVisitor(String stringToFind) { this.stringToFind = stringToFind.toUpperCase(); *************** *** 44,55 **** multipleSearchesWithinStrings = false; } ! public void doMultipleSearchesWithinStrings() { multipleSearchesWithinStrings = true; } ! public void visitStringNode(StringNode stringNode) { String stringToBeSearched = stringNode.getText().toUpperCase(); ! if (!multipleSearchesWithinStrings && stringToBeSearched.indexOf(stringToFind) != -1) { stringFound = true; --- 44,55 ---- multipleSearchesWithinStrings = false; } ! public void doMultipleSearchesWithinStrings() { multipleSearchesWithinStrings = true; } ! public void visitStringNode(StringNode stringNode) { String stringToBeSearched = stringNode.getText().toUpperCase(); ! if (!multipleSearchesWithinStrings && stringToBeSearched.indexOf(stringToFind) != -1) { stringFound = true; *************** *** 60,72 **** index = stringToBeSearched.indexOf(stringToFind, index+1); if (index!=-1) ! foundCount++; } while (index != -1); } } ! public boolean stringWasFound() { return stringFound; ! } ! public int stringFoundCount() { return foundCount; --- 60,72 ---- index = stringToBeSearched.indexOf(stringToFind, index+1); if (index!=-1) ! foundCount++; } while (index != -1); } } ! public boolean stringWasFound() { return stringFound; ! } ! public int stringFoundCount() { return foundCount; Index: TagFindingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/TagFindingVisitor.java,v retrieving revision 1.33 retrieving revision 1.34 diff -C2 -d -r1.33 -r1.34 *** TagFindingVisitor.java 8 Sep 2003 02:26:33 -0000 1.33 --- TagFindingVisitor.java 10 Sep 2003 03:38:25 -0000 1.34 *************** *** 1,27 **** // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http:// www.industriallogic.com --- 1,27 ---- // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http:// www.industriallogic.com *************** *** 43,47 **** private NodeList [] endTags; private boolean endTagCheck; ! public TagFindingVisitor(String [] tagsToBeFound) { this(tagsToBeFound,false); --- 43,47 ---- private NodeList [] endTags; private boolean endTagCheck; ! public TagFindingVisitor(String [] tagsToBeFound) { this(tagsToBeFound,false); *************** *** 61,67 **** } this.count = new int[tagsToBeFound.length]; ! this.endTagCheck = endTagCheck; ! } ! public int getTagCount(int index) { return count[index]; --- 61,67 ---- } this.count = new int[tagsToBeFound.length]; ! this.endTagCheck = endTagCheck; ! } ! public int getTagCount(int index) { return count[index]; *************** *** 88,95 **** } } ! public int getEndTagCount(int index) { return endTagCount[index]; } ! } --- 88,95 ---- } } ! public int getEndTagCount(int index) { return endTagCount[index]; } ! } Index: TextExtractingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/TextExtractingVisitor.java,v retrieving revision 1.31 retrieving revision 1.32 diff -C2 -d -r1.31 -r1.32 *** TextExtractingVisitor.java 8 Sep 2003 02:26:33 -0000 1.31 --- TextExtractingVisitor.java 10 Sep 2003 03:38:25 -0000 1.32 *************** *** 1,27 **** // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com --- 1,27 ---- // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com *************** *** 49,53 **** private StringBuffer textAccumulator; private boolean preTagBeingProcessed; ! public TextExtractingVisitor() { textAccumulator = new StringBuffer(); --- 49,53 ---- private StringBuffer textAccumulator; private boolean preTagBeingProcessed; ! public TextExtractingVisitor() { textAccumulator = new StringBuffer(); *************** *** 62,66 **** String text = stringNode.getText(); if (!preTagBeingProcessed) { ! text = Translate.decode(text); text = replaceNonBreakingSpaceWithOrdinarySpace(text); } --- 62,66 ---- String text = stringNode.getText(); if (!preTagBeingProcessed) { ! text = Translate.decode(text); text = replaceNonBreakingSpaceWithOrdinarySpace(text); } *************** *** 77,86 **** public void visitEndTag(EndTag endTag) { ! if (isPreTag(endTag)) preTagBeingProcessed = false; } public void visitTag(Tag tag) { ! if (isPreTag(tag)) preTagBeingProcessed = true; } --- 77,86 ---- public void visitEndTag(EndTag endTag) { ! if (isPreTag(endTag)) preTagBeingProcessed = false; } public void visitTag(Tag tag) { ! if (isPreTag(tag)) preTagBeingProcessed = true; } Index: UrlModifyingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/UrlModifyingVisitor.java,v retrieving revision 1.30 retrieving revision 1.31 diff -C2 -d -r1.30 -r1.31 *** UrlModifyingVisitor.java 8 Sep 2003 02:26:33 -0000 1.30 --- UrlModifyingVisitor.java 10 Sep 2003 03:38:25 -0000 1.31 *************** *** 1,27 **** // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com --- 1,27 ---- // HTMLParser Library v1_4_20030907 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha ! // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. ! // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. ! // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ! // // For any questions or suggestions, you can write to me at : // Email :so...@in... ! // ! // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation ! // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com *************** *** 42,50 **** private StringBuffer modifiedResult; private Parser parser; ! public UrlModifyingVisitor(Parser parser, String linkPrefix) { super(true,false); this.parser = parser; ! LinkScanner linkScanner = new LinkScanner(); parser.addScanner(linkScanner); parser.addScanner( --- 42,50 ---- private StringBuffer modifiedResult; private Parser parser; ! public UrlModifyingVisitor(Parser parser, String linkPrefix) { super(true,false); this.parser = parser; ! LinkScanner linkScanner = new LinkScanner(); parser.addScanner(linkScanner); parser.addScanner( *************** *** 53,60 **** ) ); ! this.linkPrefix =linkPrefix; modifiedResult = new StringBuffer(); } ! public void visitLinkTag(LinkTag linkTag) { linkTag.setLink(linkPrefix + linkTag.getLink()); --- 53,60 ---- ) ); ! this.linkPrefix =linkPrefix; modifiedResult = new StringBuffer(); } ! public void visitLinkTag(LinkTag linkTag) { linkTag.setLink(linkPrefix + linkTag.getLink()); *************** *** 65,69 **** modifiedResult.append(imageTag.toHtml()); } ! public void visitEndTag(EndTag endTag) { modifiedResult.append(endTag.toHtml()); --- 65,69 ---- modifiedResult.append(imageTag.toHtml()); } ! public void visitEndTag(EndTag endTag) { modifiedResult.append(endTag.toHtml()); *************** *** 77,83 **** modifiedResult.append(tag.toHtml()); } ! public String getModifiedResult() { ! return modifiedResult.toString(); } } --- 77,83 ---- modifiedResult.append(tag.toHtml()); } ! public String getModifiedResult() { ! return modifiedResult.toString(); } } Index: package.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/package.html,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** package.html 8 Sep 2003 02:26:33 -0000 1.13 --- package.html 10 Sep 2003 03:38:25 -0000 1.14 *************** *** 18,22 **** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. ! You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software --- 18,22 ---- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. ! You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software *************** *** 25,37 **** For any questions or suggestions, you can write to me at : Email :so...@in... ! ! Postal Address : Somik Raha Extreme Programmer & Coach Industrial Logic Corporation ! 2583 Cedar Street, Berkeley, CA 94708, USA Website : http://www.industriallogic.com ! --> </head> --- 25,37 ---- For any questions or suggestions, you can write to me at : Email :so...@in... ! ! Postal Address : Somik Raha Extreme Programmer & Coach Industrial Logic Corporation ! 2583 Cedar Street, Berkeley, CA 94708, USA Website : http://www.industriallogic.com ! --> </head> |