htmlparser-cvs Mailing List for HTML Parser (Page 11)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(141) |
Jun
(108) |
Jul
(66) |
Aug
(127) |
Sep
(155) |
Oct
(149) |
Nov
(72) |
Dec
(72) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(100) |
Feb
(36) |
Mar
(21) |
Apr
(3) |
May
(87) |
Jun
(28) |
Jul
(84) |
Aug
(5) |
Sep
(14) |
Oct
|
Nov
|
Dec
|
2005 |
Jan
(1) |
Feb
(39) |
Mar
(26) |
Apr
(38) |
May
(14) |
Jun
(10) |
Jul
|
Aug
|
Sep
(13) |
Oct
(8) |
Nov
(10) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(24) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Derrick O. <der...@us...> - 2004-09-24 23:16:57
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25509/docs Modified Files: contributors.html Log Message: Update Alberto's contributor info. Index: contributors.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/contributors.html,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** contributors.html 13 Jul 2004 01:02:39 -0000 1.12 --- contributors.html 24 Sep 2004 23:16:48 -0000 1.13 *************** *** 233,242 **** Corso Sebastopoli 39,<br> 10134 Torino, Italy<br> ! <a href="http://members.xoom.virgilio.it/nacher/Home.html">Personal Home Page</a><br> <a href="http://sourceforge.net/sendmessage.php?touser=892989">email</a><br> </td> <td width="39%" valign="top"> <strong>On Alberto Nacher</strong> ! <p>I'm 31 years old, I'm a computer engineer and I have been working as consultant since 1998.</p> <p>I've worked with Microsoft VB and VB.NET technologies, with Java --- 233,242 ---- Corso Sebastopoli 39,<br> 10134 Torino, Italy<br> ! <a href="http://xoomer.virgilio.it/giugiod/">Personal Home Page</a><br> <a href="http://sourceforge.net/sendmessage.php?touser=892989">email</a><br> </td> <td width="39%" valign="top"> <strong>On Alberto Nacher</strong> ! <p>I was born in 1972, I'm a computer engineer and I have been working as consultant since 1998.</p> <p>I've worked with Microsoft VB and VB.NET technologies, with Java |
From: Derrick O. <der...@us...> - 2004-09-06 17:19:23
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv8433 Modified Files: build.xml Log Message: Provide for building with JDK 1.5 by adding source="1.3" to javac tasks. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.72 retrieving revision 1.73 diff -C2 -d -r1.72 -r1.73 *** build.xml 2 Sep 2004 02:28:16 -0000 1.72 --- build.xml 6 Sep 2004 17:19:14 -0000 1.73 *************** *** 223,231 **** <target name="compile" description="compile all java files"> ! <javac srcdir="${src}" includes="org/htmlparser/**" excludes="org/htmlparser/tests/**" debug="on" classpath="src:${commons-logging.jar}"/> </target> <target name="compilelexer" description="compile lexer java files"> ! <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}" target="1.1"> <include name="org/htmlparser/lexer/*.java"/> <include name="org/htmlparser/nodes/*.java"/> --- 223,231 ---- <target name="compile" description="compile all java files"> ! <javac srcdir="${src}" includes="org/htmlparser/**" excludes="org/htmlparser/tests/**" debug="on" classpath="src:${commons-logging.jar}" source="1.3"/> </target> <target name="compilelexer" description="compile lexer java files"> ! <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}" target="1.1" source="1.3"> <include name="org/htmlparser/lexer/*.java"/> <include name="org/htmlparser/nodes/*.java"/> *************** *** 250,254 **** <target name="compileparser" depends="compilelexer" description="compile parser java files"> ! <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}:${sax2.jar}"> <include name="org/htmlparser/**/*.java"/> <exclude name="org/htmlparser/tests/**"/> --- 250,254 ---- <target name="compileparser" depends="compilelexer" description="compile parser java files"> ! <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}:${sax2.jar}" source="1.3"> <include name="org/htmlparser/**/*.java"/> <exclude name="org/htmlparser/tests/**"/> *************** *** 326,330 **** <!-- Create the lib directory --> <mkdir dir="${lib}"/> ! <javac compiler="javac1.4" srcdir="${src}" debug="on" classpath="src:${lib}/htmllexer.jar"> <include name="org/htmlparser/lexerapplications/thumbelina/**/*.java"/> </javac> --- 326,330 ---- <!-- Create the lib directory --> <mkdir dir="${lib}"/> ! <javac compiler="javac1.4" srcdir="${src}" debug="on" classpath="src:${lib}/htmllexer.jar" source="1.3"> <include name="org/htmlparser/lexerapplications/thumbelina/**/*.java"/> </javac> *************** *** 343,347 **** <!-- Run the unit tests --> <target name="test" depends="jar" description="run the JUnit tests"> ! <javac srcdir="${src}" includes="org/htmlparser/tests/**" debug="on"> <classpath> <pathelement location="src"/> --- 343,347 ---- <!-- Run the unit tests --> <target name="test" depends="jar" description="run the JUnit tests"> ! <javac srcdir="${src}" includes="org/htmlparser/tests/**" debug="on" source="1.3"> <classpath> <pathelement location="src"/> |
From: Derrick O. <der...@us...> - 2004-09-06 17:13:39
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7192/src/org/htmlparser/tags Modified Files: MetaTag.java Log Message: Incorporate patch #1004985 Page.java, by making getCharset() and findCharset() static. Index: MetaTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/MetaTag.java,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** MetaTag.java 2 Jul 2004 00:49:29 -0000 1.37 --- MetaTag.java 6 Sep 2004 17:12:59 -0000 1.38 *************** *** 28,31 **** --- 28,32 ---- import org.htmlparser.Attribute; + import org.htmlparser.lexer.Page; import org.htmlparser.nodes.TagNode; import org.htmlparser.util.ParserException; *************** *** 115,119 **** if ("Content-Type".equalsIgnoreCase (httpEquiv)) { ! charset = getPage ().getCharset (getAttribute ("CONTENT")); getPage ().setEncoding (charset); } --- 116,120 ---- if ("Content-Type".equalsIgnoreCase (httpEquiv)) { ! charset = Page.getCharset (getAttribute ("CONTENT")); getPage ().setEncoding (charset); } |
From: Derrick O. <der...@us...> - 2004-09-06 17:13:24
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7192/src/org/htmlparser/tests Modified Files: ParserTest.java Log Message: Incorporate patch #1004985 Page.java, by making getCharset() and findCharset() static. Index: ParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTest.java,v retrieving revision 1.62 retrieving revision 1.63 diff -C2 -d -r1.62 -r1.63 *** ParserTest.java 25 Aug 2004 03:36:01 -0000 1.62 --- ParserTest.java 6 Sep 2004 17:13:08 -0000 1.63 *************** *** 540,617 **** } ! /** ! * Test a bogus comma delimited charset specification in the HTTP header. ! * See bug #722941. ! * A comma delimted charset in the HTTP header does not meet the HTTP/1.1 ! * specification in RFC 2068. In this case that I believe ! * that some idiot has misconfigured the HTTP server, but since it's ! * AOL it would be nice to handle this case. ! */ ! public void testCommaListCharset () throws ParserException ! { ! URL url; ! URLConnection connection; ! Page page; ! Parser parser; ! String idiots = "http://users.aol.com/geinster/rej.htm"; ! ! try ! { ! url = new URL (idiots); ! connection = url.openConnection (); ! // this little subclass just gets around normal JDK 1.4 processing ! // that filters out bogus character sets ! page = new Page ("") ! { ! public String getCharset(String content) ! { ! final String CHARSET_STRING = "charset"; ! int index; ! String ret; ! ! ret = DEFAULT_CHARSET; ! if (null != content) ! { ! index = content.indexOf (CHARSET_STRING); ! ! if (index != -1) ! { ! content = content.substring (index + CHARSET_STRING.length ()).trim (); ! if (content.startsWith ("=")) ! { ! content = content.substring (1).trim (); ! index = content.indexOf (";"); ! if (index != -1) ! content = content.substring (0, index); ! ! //remove any double quotes from around charset string ! if (content.startsWith ("\"") && content.endsWith ("\"") && (1 < content.length ())) ! content = content.substring (1, content.length () - 1); ! ! //remove any single quote from around charset string ! if (content.startsWith ("'") && content.endsWith ("'") && (1 < content.length ())) ! content = content.substring (1, content.length () - 1); ! ! ret = content; // short circuit findCharset() processing ! } ! } ! } ! ! return (ret); ! } ! }; ! page.setConnection (connection); ! parser = new Parser (new Lexer (page)); ! // must be the default ! assertTrue ("Wrong encoding", parser.getEncoding ().equals ("ISO-8859-1")); ! for (NodeIterator e = parser.elements();e.hasMoreNodes();) ! e.nextNode(); ! assertTrue ("Wrong encoding", parser.getEncoding ().equals ("windows-1252")); ! } ! catch (Exception e) ! { ! fail (e.getMessage ()); ! } ! } public void testNullUrl() { --- 540,576 ---- } ! // This test is commented out because the URL no longer has a comma delimited character set. ! // Reinstate when a suitable URL is discovered, or the unit tests set up their own HTTP server. ! // /** ! // * Test a bogus comma delimited charset specification in the HTTP header. ! // * See bug #722941. ! // * A comma delimted charset in the HTTP header does not meet the HTTP/1.1 ! // * specification in RFC 2068. In this case that I believe ! // * that some idiot has misconfigured the HTTP server, but since it's ! // * AOL it would be nice to handle this case. ! // */ ! // public void testCommaListCharset () throws ParserException ! // { ! // URL url; ! // URLConnection connection; ! // Parser parser; ! // String bogus = "http://users.aol.com/geinster/rej.htm"; ! // ! // try ! // { ! // url = new URL (bogus); ! // connection = url.openConnection (); ! // parser = new Parser (new Lexer (new Page (connection))); ! // // must be the default ! // assertTrue ("Wrong encoding", parser.getEncoding ().equals ("ISO-8859-1")); ! // for (NodeIterator e = parser.elements();e.hasMoreNodes();) ! // e.nextNode(); ! // assertTrue ("Wrong encoding", parser.getEncoding ().equals ("windows-1252")); ! // } ! // catch (Exception e) ! // { ! // fail (e.getMessage ()); ! // } ! // } public void testNullUrl() { *************** *** 623,627 **** catch (ParserException e) { ! } } --- 582,586 ---- catch (ParserException e) { ! // expected outcome } } |
From: Derrick O. <der...@us...> - 2004-09-06 17:13:24
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7192/src/org/htmlparser/lexer Modified Files: Page.java Log Message: Incorporate patch #1004985 Page.java, by making getCharset() and findCharset() static. Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** Page.java 2 Sep 2004 02:28:14 -0000 1.44 --- Page.java 6 Sep 2004 17:13:00 -0000 1.45 *************** *** 202,205 **** --- 202,334 ---- } + /** + * Get a CharacterSet name corresponding to a charset parameter. + * @param content A text line of the form: + * <pre> + * text/html; charset=Shift_JIS + * </pre> + * which is applicable both to the HTTP header field Content-Type and + * the meta tag http-equiv="Content-Type". + * Note this method also handles non-compliant quoted charset directives such as: + * <pre> + * text/html; charset="UTF-8" + * </pre> + * and + * <pre> + * text/html; charset='UTF-8' + * </pre> + * @return The character set name to use when reading the input stream. + * For JDKs that have the Charset class this is qualified by passing + * the name to findCharset() to render it into canonical form. + * If the charset parameter is not found in the given string, the default + * character set is returned. + * @see #findCharset + * @see #DEFAULT_CHARSET + */ + public static String getCharset (String content) + { + final String CHARSET_STRING = "charset"; + int index; + String ret; + + ret = DEFAULT_CHARSET; + if (null != content) + { + index = content.indexOf (CHARSET_STRING); + + if (index != -1) + { + content = content.substring (index + CHARSET_STRING.length ()).trim (); + if (content.startsWith ("=")) + { + content = content.substring (1).trim (); + index = content.indexOf (";"); + if (index != -1) + content = content.substring (0, index); + + //remove any double quotes from around charset string + if (content.startsWith ("\"") && content.endsWith ("\"") && (1 < content.length ())) + content = content.substring (1, content.length () - 1); + + //remove any single quote from around charset string + if (content.startsWith ("'") && content.endsWith ("'") && (1 < content.length ())) + content = content.substring (1, content.length () - 1); + + ret = findCharset (content, ret); + + // Charset names are not case-sensitive; + // that is, case is always ignored when comparing charset names. + // if (!ret.equalsIgnoreCase (content)) + // { + // System.out.println ( + // "detected charset \"" + // + content + // + "\", using \"" + // + ret + // + "\""); + // } + } + } + } + + return (ret); + } + + /** + * Lookup a character set name. + * <em>Vacuous for JVM's without <code>java.nio.charset</code>.</em> + * This uses reflection so the code will still run under prior JDK's but + * in that case the default is always returned. + * @param name The name to look up. One of the aliases for a character set. + * @param _default The name to return if the lookup fails. + */ + public static String findCharset (String name, String _default) + { + String ret; + + try + { + Class cls; + Method method; + Object object; + + cls = Class.forName ("java.nio.charset.Charset"); + method = cls.getMethod ("forName", new Class[] { String.class }); + object = method.invoke (null, new Object[] { name }); + method = cls.getMethod ("name", new Class[] { }); + object = method.invoke (object, new Object[] { }); + ret = (String)object; + } + catch (ClassNotFoundException cnfe) + { + // for reflection exceptions, assume the name is correct + ret = name; + } + catch (NoSuchMethodException nsme) + { + // for reflection exceptions, assume the name is correct + ret = name; + } + catch (IllegalAccessException ia) + { + // for reflection exceptions, assume the name is correct + ret = name; + } + catch (InvocationTargetException ita) + { + // java.nio.charset.IllegalCharsetNameException + // and java.nio.charset.UnsupportedCharsetException + // return the default + ret = _default; + System.out.println ( + "unable to determine cannonical charset name for " + + name + + " - using " + + _default); + } + + return (ret); + } + // // Serialization support *************** *** 602,734 **** /** - * Get a CharacterSet name corresponding to a charset parameter. - * @param content A text line of the form: - * <pre> - * text/html; charset=Shift_JIS - * </pre> - * which is applicable both to the HTTP header field Content-Type and - * the meta tag http-equiv="Content-Type". - * Note this method also handles non-compliant quoted charset directives such as: - * <pre> - * text/html; charset="UTF-8" - * </pre> - * and - * <pre> - * text/html; charset='UTF-8' - * </pre> - * @return The character set name to use when reading the input stream. - * For JDKs that have the Charset class this is qualified by passing - * the name to findCharset() to render it into canonical form. - * If the charset parameter is not found in the given string, the default - * character set is returned. - * @see #findCharset - * @see #DEFAULT_CHARSET - */ - public String getCharset (String content) - { - final String CHARSET_STRING = "charset"; - int index; - String ret; - - ret = DEFAULT_CHARSET; - if (null != content) - { - index = content.indexOf (CHARSET_STRING); - - if (index != -1) - { - content = content.substring (index + CHARSET_STRING.length ()).trim (); - if (content.startsWith ("=")) - { - content = content.substring (1).trim (); - index = content.indexOf (";"); - if (index != -1) - content = content.substring (0, index); - - //remove any double quotes from around charset string - if (content.startsWith ("\"") && content.endsWith ("\"") && (1 < content.length ())) - content = content.substring (1, content.length () - 1); - - //remove any single quote from around charset string - if (content.startsWith ("'") && content.endsWith ("'") && (1 < content.length ())) - content = content.substring (1, content.length () - 1); - - ret = findCharset (content, ret); - - // Charset names are not case-sensitive; - // that is, case is always ignored when comparing charset names. - // if (!ret.equalsIgnoreCase (content)) - // { - // System.out.println ( - // "detected charset \"" - // + content - // + "\", using \"" - // + ret - // + "\""); - // } - } - } - } - - return (ret); - } - - /** - * Lookup a character set name. - * <em>Vacuous for JVM's without <code>java.nio.charset</code>.</em> - * This uses reflection so the code will still run under prior JDK's but - * in that case the default is always returned. - * @param name The name to look up. One of the aliases for a character set. - * @param _default The name to return if the lookup fails. - */ - public String findCharset (String name, String _default) - { - String ret; - - try - { - Class cls; - Method method; - Object object; - - cls = Class.forName ("java.nio.charset.Charset"); - method = cls.getMethod ("forName", new Class[] { String.class }); - object = method.invoke (null, new Object[] { name }); - method = cls.getMethod ("name", new Class[] { }); - object = method.invoke (object, new Object[] { }); - ret = (String)object; - } - catch (ClassNotFoundException cnfe) - { - // for reflection exceptions, assume the name is correct - ret = name; - } - catch (NoSuchMethodException nsme) - { - // for reflection exceptions, assume the name is correct - ret = name; - } - catch (IllegalAccessException ia) - { - // for reflection exceptions, assume the name is correct - ret = name; - } - catch (InvocationTargetException ita) - { - // java.nio.charset.IllegalCharsetNameException - // and java.nio.charset.UnsupportedCharsetException - // return the default - ret = _default; - System.out.println ( - "unable to determine cannonical charset name for " - + name - + " - using " - + _default); - } - - return (ret); - } - - /** * Get the current encoding being used. * @return The encoding used to convert characters. --- 731,734 ---- |
From: Derrick O. <der...@us...> - 2004-09-02 02:28:54
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29769/src/org/htmlparser/tests/tagTests Modified Files: ImageTagTest.java ScriptTagTest.java JspTagTest.java TagTest.java LinkTagTest.java Log Message: Implemented: RFE #1017249 HTML Client Doesn't Support Cookies but will follow redirect RFE #1010586 Add support for password protected URL and RFE #1000739 Add support for proxy scenario A new http package is added, the primary class being Connectionmanager which handles proxies, passwords and cookies. Some testing still needed. Also removed some line separator cruft. Index: ImageTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ImageTagTest.java,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** ImageTagTest.java 31 Jul 2004 16:42:31 -0000 1.45 --- ImageTagTest.java 2 Sep 2004 02:28:14 -0000 1.46 *************** *** 28,32 **** import org.htmlparser.Node; - import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.ImageTag; --- 28,31 ---- *************** *** 272,276 **** { createParser("<IMG SRC=\"../abc/def/Hello \r\nWorld.jpg\">","http://www.yahoo.com/ghi"); - Parser.setLineSeparator("\r\n"); parseAndAssertNodeCount(1); assertTrue("Node identified should be HTMLImageTag",node[0] instanceof ImageTag); --- 271,274 ---- Index: LinkTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/LinkTagTest.java,v retrieving revision 1.51 retrieving revision 1.52 diff -C2 -d -r1.51 -r1.52 *** LinkTagTest.java 31 Jul 2004 16:42:31 -0000 1.51 --- LinkTagTest.java 2 Sep 2004 02:28:14 -0000 1.52 *************** *** 28,32 **** import org.htmlparser.Node; - import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Tag; --- 28,31 ---- *************** *** 125,129 **** "href=" + link2 + ">��ï</a> <a\n"+ "href=" + link3 + ">�q�T��</a> ","http://www.cj.com"); - Parser.setLineSeparator("\r\n"); parseAndAssertNodeCount(6); assertTrue("Node should be a LinkTag",node[2] instanceof LinkTag); --- 124,127 ---- *************** *** 245,249 **** "<LI><font color=\"FF0000\" size=-1><b>Tech Samachar:</b></font>" + link2 + " by Rajesh Jain","http://www.cj.com/"); - Parser.setLineSeparator("\r\n"); parser.setNodeFactory (new PrototypicalNodeFactory (new LinkTag ())); parseAndAssertNodeCount(10); --- 243,246 ---- *************** *** 500,504 **** "href=\"http://ads.samachar.com/bin/redirect/tech.txt?http://www.samachar.com/tech\n"+ "nical.html\"> Journalism 3.0</a> by Rajesh Jain"); - Parser.setLineSeparator("\r\n"); parser.setNodeFactory (new PrototypicalNodeFactory (new LinkTag ())); parseAndAssertNodeCount(8); --- 497,500 ---- Index: JspTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/JspTagTest.java,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** JspTagTest.java 2 Jul 2004 00:49:31 -0000 1.45 --- JspTagTest.java 2 Sep 2004 02:28:14 -0000 1.46 *************** *** 27,31 **** package org.htmlparser.tests.tagTests; - import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Tag; --- 27,30 ---- *************** *** 83,87 **** "<" + contents2 + ">\n<jsp:forward page=\"transferConfirm.jsp\"/><%\n"+ "%>"); - Parser.setLineSeparator("\r\n"); parser.setNodeFactory (new PrototypicalNodeFactory (new JspTag ())); parseAndAssertNodeCount(8); --- 82,85 ---- *************** *** 137,141 **** + "%><jsp:forward page=\"transferConfirm.jsp\"/><%\n"+ "%>\n"); - Parser.setLineSeparator("\r\n"); parser.setNodeFactory (new PrototypicalNodeFactory (new JspTag ())); parseAndAssertNodeCount(8); --- 135,138 ---- Index: ScriptTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ScriptTagTest.java,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** ScriptTagTest.java 31 Jul 2004 16:42:31 -0000 1.45 --- ScriptTagTest.java 2 Sep 2004 02:28:14 -0000 1.46 *************** *** 27,31 **** package org.htmlparser.tests.tagTests; - import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.ScriptTag; --- 27,30 ---- *************** *** 96,100 **** createParser(testHTML1); - Parser.setLineSeparator("\r\n"); parser.setNodeFactory (new PrototypicalNodeFactory (new ScriptTag ())); parseAndAssertNodeCount(3); --- 95,98 ---- Index: TagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TagTest.java,v retrieving revision 1.61 retrieving revision 1.62 diff -C2 -d -r1.61 -r1.62 *** TagTest.java 31 Jul 2004 16:42:31 -0000 1.61 --- TagTest.java 2 Sep 2004 02:28:14 -0000 1.62 *************** *** 30,34 **** import org.htmlparser.Node; - import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Tag; --- 30,33 ---- *************** *** 546,550 **** String testHTML = "<html><body>text\n<>text</body></html>"; createParser(testHTML); - Parser.setLineSeparator ("\r\n"); // actually a static method parseAndAssertNodeCount(1); assertTrue("Only node should be an HTML node",node[0] instanceof Html); --- 545,548 ---- *************** *** 566,570 **** String testHTML = "<html><body>text<\n>text</body></html>"; createParser(testHTML); - Parser.setLineSeparator ("\r\n"); // actually a static method parseAndAssertNodeCount(1); assertTrue("Only node should be an HTML node",node[0] instanceof Html); --- 564,567 ---- *************** *** 586,590 **** String testHTML = "<html><body>text<>\ntext</body></html>"; createParser(testHTML); - Parser.setLineSeparator ("\r\n"); // actually a static method parseAndAssertNodeCount(1); assertTrue("Only node should be an HTML node",node[0] instanceof Html); --- 583,586 ---- |
From: Derrick O. <der...@us...> - 2004-09-02 02:28:54
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29769/src/org/htmlparser/lexer Modified Files: Page.java Lexer.java Log Message: Implemented: RFE #1017249 HTML Client Doesn't Support Cookies but will follow redirect RFE #1010586 Add support for password protected URL and RFE #1000739 Add support for proxy scenario A new http package is added, the primary class being Connectionmanager which handles proxies, passwords and cookies. Some testing still needed. Also removed some line separator cruft. Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.43 retrieving revision 1.44 diff -C2 -d -r1.43 -r1.44 *** Page.java 25 Aug 2004 03:36:01 -0000 1.43 --- Page.java 2 Sep 2004 02:28:14 -0000 1.44 *************** *** 42,45 **** --- 42,46 ---- import java.util.zip.InflaterInputStream; + import org.htmlparser.http.ConnectionManager; import org.htmlparser.util.ParserException; *************** *** 95,113 **** /** ! * Messages for page not there (404). */ ! static private final String[] mFourOhFour = ! { ! "The web site you seek cannot be located, but countless more exist", ! "You step in the stream, but the water has moved on. This page is not here.", ! "Yesterday the page existed. Today it does not. The internet is like that.", ! "That page was so big. It might have been very useful. But now it is gone.", ! "Three things are certain: death, taxes and broken links. Guess which has occured.", ! "Chaos reigns within. Reflect, repent and enter the correct URL. Order shall return.", ! "Stay the patient course. Of little worth is your ire. The page is not found.", ! "A non-existant URL reduces your expensive computer to a simple stone.", ! "Many people have visited that page. Today, you are not one of the lucky ones.", ! "Cutting the wind with a knife. Bookmarking a URL. Both are ephemeral.", ! }; /** --- 96,102 ---- /** ! * Connection control (proxy, cookies, authorization). */ ! public static ConnectionManager mConnectionManager = new ConnectionManager (); /** *************** *** 192,195 **** --- 181,206 ---- // + // static methods + // + + /** + * Get the connection manager all Parsers use. + * @return The connection manager. + */ + public static ConnectionManager getConnectionManager () + { + return (mConnectionManager); + } + + /** + * Set the connection manager to use. + * @return The connection manager. + */ + public static void setConnectionManager (ConnectionManager manager) + { + mConnectionManager = manager; + } + + // // Serialization support // *************** *** 351,372 **** try { - try - { - getConnection ().setRequestProperty ("Accept-Encoding", "gzip, deflate"); - } - catch (IllegalStateException ise) // already connected - { - // assume all request properties have already been set - } getConnection ().connect (); } catch (UnknownHostException uhe) { ! int message = (int)(Math.random () * mFourOhFour.length); ! throw new ParserException (mFourOhFour[message], uhe); } catch (IOException ioe) { ! throw new ParserException (ioe.getMessage (), ioe); } type = getContentType (); --- 362,374 ---- try { getConnection ().connect (); } catch (UnknownHostException uhe) { ! throw new ParserException ("Connect to " + mConnection.getURL ().toExternalForm () + " failed.", uhe); } catch (IOException ioe) { ! throw new ParserException ("Exception connecting to " + mConnection.getURL ().toExternalForm () + " (" + ioe.getMessage () + ").", ioe); } type = getContentType (); *************** *** 409,413 **** catch (IOException ioe) { ! throw new ParserException (ioe.getMessage (), ioe); } mUrl = connection.getURL ().toExternalForm (); --- 411,415 ---- catch (IOException ioe) { ! throw new ParserException ("Exception getting input stream from " + mConnection.getURL ().toExternalForm () + " (" + ioe.getMessage () + ").", ioe); } mUrl = connection.getURL ().toExternalForm (); Index: Lexer.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** Lexer.java 1 Aug 2004 02:16:04 -0000 1.32 --- Lexer.java 2 Sep 2004 02:28:14 -0000 1.33 *************** *** 30,34 **** import java.io.Serializable; import java.net.MalformedURLException; - import java.net.URL; import java.net.URLConnection; import java.util.Vector; --- 30,33 ---- *************** *** 39,42 **** --- 38,42 ---- import org.htmlparser.Text; import org.htmlparser.Tag; + import org.htmlparser.http.ConnectionManager; import org.htmlparser.nodes.RemarkNode; import org.htmlparser.nodes.TextNode; *************** *** 1105,1109 **** ParserException { - URL url; Lexer lexer; Node node; --- 1105,1108 ---- *************** *** 1113,1120 **** else { - url = new URL (args[0]); try { ! lexer = new Lexer (url.openConnection ()); while (null != (node = lexer.nextNode ())) System.out.println (node.toString ()); --- 1112,1119 ---- else { try { ! ConnectionManager manager = Page.getConnectionManager (); ! lexer = new Lexer (manager.openConnection (args[0])); while (null != (node = lexer.nextNode ())) System.out.println (node.toString ()); |
From: Derrick O. <der...@us...> - 2004-09-02 02:28:53
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29769/src/org/htmlparser Modified Files: Parser.java Log Message: Implemented: RFE #1017249 HTML Client Doesn't Support Cookies but will follow redirect RFE #1010586 Add support for password protected URL and RFE #1000739 Add support for proxy scenario A new http package is added, the primary class being Connectionmanager which handles proxies, passwords and cookies. Some testing still needed. Also removed some line separator cruft. Index: Parser.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v retrieving revision 1.98 retrieving revision 1.99 diff -C2 -d -r1.98 -r1.99 *** Parser.java 29 Jul 2004 02:01:02 -0000 1.98 --- Parser.java 2 Sep 2004 02:28:08 -0000 1.99 *************** *** 27,42 **** package org.htmlparser; - import java.io.File; - import java.io.IOException; import java.io.Serializable; ! import java.net.MalformedURLException; ! import java.net.URL; import java.net.URLConnection; - import java.util.HashMap; - import java.util.Iterator; - import java.util.Map; import org.htmlparser.filters.TagNameFilter; import org.htmlparser.filters.NodeClassFilter; import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; --- 27,38 ---- package org.htmlparser; import java.io.Serializable; ! import java.net.HttpURLConnection; import java.net.URLConnection; import org.htmlparser.filters.TagNameFilter; import org.htmlparser.filters.NodeClassFilter; + import org.htmlparser.http.ConnectionManager; + import org.htmlparser.http.ConnectionMonitor; import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; *************** *** 59,67 **** * This is a thread-safe way, and you only get the control back after a * particular element is parsed and returned, which could be the entire body. ! * @see Parser#elements() */ public class Parser implements ! Serializable { // Please don't change the formatting of the version variables below. --- 55,64 ---- * This is a thread-safe way, and you only get the control back after a * particular element is parsed and returned, which could be the entire body. ! * @see Parser#elements() */ public class Parser implements ! Serializable, ! ConnectionMonitor { // Please don't change the formatting of the version variables below. *************** *** 99,112 **** /** - * Default Request header fields. - * So far this is just "User-Agent". - */ - protected static Map mDefaultRequestProperties = new HashMap (); - static - { - mDefaultRequestProperties.put ("User-Agent", "HTMLParser/" + VERSION_NUMBER); - } - - /** * Feedback object. */ --- 96,99 ---- *************** *** 119,130 **** /** - * Variable to store lineSeparator. - * This is setup to read <code>line.separator</code> from the System property. - * However it can also be changed using the mutator methods. - * This will be used in the toHTML() methods in all the sub-classes of Node. - */ - protected static String lineSeparator = System.getProperty("line.separator", "\n"); - - /** * A quiet message sink. * Use this for no feedback. --- 106,109 ---- *************** *** 143,154 **** /** - * @param lineSeparatorString New Line separator to be used - */ - public static void setLineSeparator(String lineSeparatorString) - { - lineSeparator = lineSeparatorString; - } - - /** * Return the version string of this parser. * @return A string of the form: --- 122,125 ---- *************** *** 173,237 **** /** ! * Get the current default request header properties. ! * A String-to-String map of header keys and values. ! * These fields are set by the parser when creating a connection. */ ! public static Map getDefaultRequestProperties () { ! return (mDefaultRequestProperties); } /** ! * Set the default request header properties. ! * A String-to-String map of header keys and values. ! * These fields are set by the parser when creating a connection. ! * Some of these can be set directly on a <code>URLConnection</code>, ! * i.e. If-Modified-Since is set with setIfModifiedSince(long), ! * but since the parser transparently opens the connection on behalf ! * of the developer, these properties are not available before the ! * connection is fetched. Setting these request header fields affects all ! * subsequent connections opened by the parser. For more direct control ! * create a <code>URLConnection</code> and set it on the parser.<p> ! * From <a href="http://www.ietf.org/rfc/rfc2616.txt">RFC 2616 Hypertext Transfer Protocol -- HTTP/1.1</a>: ! * <pre> ! * 5.3 Request Header Fields ! * ! * The request-header fields allow the client to pass additional ! * information about the request, and about the client itself, to the ! * server. These fields act as request modifiers, with semantics ! * equivalent to the parameters on a programming language method ! * invocation. ! * ! * request-header = Accept ; Section 14.1 ! * | Accept-Charset ; Section 14.2 ! * | Accept-Encoding ; Section 14.3 ! * | Accept-Language ; Section 14.4 ! * | Authorization ; Section 14.8 ! * | Expect ; Section 14.20 ! * | From ; Section 14.22 ! * | Host ; Section 14.23 ! * | If-Match ; Section 14.24 ! * | If-Modified-Since ; Section 14.25 ! * | If-None-Match ; Section 14.26 ! * | If-Range ; Section 14.27 ! * | If-Unmodified-Since ; Section 14.28 ! * | Max-Forwards ; Section 14.31 ! * | Proxy-Authorization ; Section 14.34 ! * | Range ; Section 14.35 ! * | Referer ; Section 14.36 ! * | TE ; Section 14.39 ! * | User-Agent ; Section 14.43 ! * ! * Request-header field names can be extended reliably only in ! * combination with a change in the protocol version. However, new or ! * experimental header fields MAY be given the semantics of request- ! * header fields if all parties in the communication recognize them to ! * be request-header fields. Unrecognized header fields are treated as ! * entity-header fields. ! * </pre> */ ! public static void setDefaultRequestProperties (Map properties) { ! mDefaultRequestProperties = properties; } --- 144,181 ---- /** ! * Get the connection manager all Parsers use. ! * @return The connection manager. */ ! public static ConnectionManager getConnectionManager () { ! return (Page.getConnectionManager ()); } /** ! * Set the connection manager all Parsers use. ! * @return The connection manager. */ ! public static void setConnectionManager (ConnectionManager manager) { ! Page.setConnectionManager (manager); ! } ! ! /** ! * Creates the parser on an input string. ! * @param html The string containing HTML. ! * @param charset <em>Optional</em>. The character set encoding that will ! * be reported by {@link #getEncoding}. If charset is <code>null</code> ! * the default character set is used. ! * @return A parser with the <code>html</code> string as input. ! */ ! public static Parser createParser (String html, String charset) ! { ! Parser ret; ! ! if (null == html) ! throw new IllegalArgumentException ("html cannot be null"); ! ret = new Parser (new Lexer (new Page (html, charset))); ! ! return (ret); } *************** *** 271,275 **** * is provided. */ ! public Parser(Lexer lexer, ParserFeedback fb) { setFeedback (fb); --- 215,219 ---- * is provided. */ ! public Parser (Lexer lexer, ParserFeedback fb) { setFeedback (fb); *************** *** 303,309 **** * @see #Parser(URLConnection,ParserFeedback) */ ! public Parser(String resourceLocn, ParserFeedback feedback) throws ParserException { ! this (openConnection (resourceLocn, feedback), feedback); } --- 247,253 ---- * @see #Parser(URLConnection,ParserFeedback) */ ! public Parser (String resourceLocn, ParserFeedback feedback) throws ParserException { ! this (getConnectionManager ().openConnection (resourceLocn), feedback); } *************** *** 313,317 **** * @param resourceLocn Either the URL or the filename (autodetects). */ ! public Parser(String resourceLocn) throws ParserException { this (resourceLocn, stdout); --- 257,261 ---- * @param resourceLocn Either the URL or the filename (autodetects). */ ! public Parser (String resourceLocn) throws ParserException { this (resourceLocn, stdout); *************** *** 395,399 **** { if ((null != url) && !"".equals (url)) ! setConnection (openConnection (url, getFeedback ())); } --- 339,343 ---- { if ((null != url) && !"".equals (url)) ! setConnection (Page.getConnectionManager ().openConnection (url)); } *************** *** 573,748 **** } - /** - * Opens a connection using the given url. - * @param url The url to open. - * @param feedback The ibject to use for messages or <code>null</code>. - * @exception ParserException if an i/o exception occurs accessing the url. - */ - public static URLConnection openConnection (URL url, ParserFeedback feedback) - throws - ParserException - { - Map properties; - String key; - String value; - URLConnection ret; - - try - { - ret = url.openConnection (); - properties = getDefaultRequestProperties (); - if (null != properties) - for (Iterator iterator = properties.keySet ().iterator (); iterator.hasNext (); ) - { - key = (String)iterator.next (); - value = (String)properties.get (key); - ret.setRequestProperty (key, value); - } - } - catch (IOException ioe) - { - String msg = "HTMLParser.openConnection() : Error in opening a connection to " + url.toExternalForm (); - ParserException ex = new ParserException (msg, ioe); - if (null != feedback) - feedback.error (msg, ex); - throw ex; - } - - return (ret); - } - - /** - * Turn spaces into %20. - * @param url The url containing spaces. - * @return The URL with spaces as %20 sequences. - */ - public static String fixSpaces (String url) - { - int index; - int length; - char ch; - StringBuffer returnURL; - - index = url.indexOf (' '); - if (-1 != index) - { - length = url.length (); - returnURL = new StringBuffer (length * 3); - returnURL.append (url.substring (0, index)); - for (int i = index; i < length; i++) - { - ch = url.charAt (i); - if (ch==' ') - returnURL.append ("%20"); - else - returnURL.append (ch); - } - url = returnURL.toString (); - } - - return (url); - } - - /** - * Opens a connection based on a given string. - * The string is either a file, in which case <code>file://localhost</code> - * is prepended to a canonical path derived from the string, or a url that - * begins with one of the known protocol strings, i.e. <code>http://</code>. - * Embedded spaces are silently converted to %20 sequences. - * @param string The name of a file or a url. - * @param feedback The object to use for messages or <code>null</code> for no feedback. - * @exception ParserException if the string is not a valid url or file. - */ - public static URLConnection openConnection (String string, ParserFeedback feedback) - throws - ParserException - { - final String prefix = "file://localhost"; - String resource; - URL url; - StringBuffer buffer; - URLConnection ret; - - try - { - url = new URL (fixSpaces (string)); - ret = openConnection (url, feedback); - } - catch (MalformedURLException murle) - { // try it as a file - try - { - File file = new File (string); - resource = file.getCanonicalPath (); - buffer = new StringBuffer (prefix.length () + resource.length ()); - buffer.append (prefix); - if (!resource.startsWith ("/")) - buffer.append ("/"); - buffer.append (resource); - url = new URL (fixSpaces (buffer.toString ())); - ret = openConnection (url, feedback); - if (null != feedback) - feedback.info (url.toExternalForm ()); - } - catch (MalformedURLException murle2) - { - String msg = "HTMLParser.openConnection() : Error in opening a connection to " + string; - ParserException ex = new ParserException (msg, murle2); - if (null != feedback) - feedback.error (msg, ex); - throw ex; - } - catch (IOException ioe) - { - String msg = "HTMLParser.openConnection() : Error in opening a connection to " + string; - ParserException ex = new ParserException (msg, ioe); - if (null != feedback) - feedback.error (msg, ex); - throw ex; - } - } - - return (ret); - } - - /** - * The main program, which can be executed from the command line - */ - public static void main(String [] args) - { - System.out.println("HTMLParser v"+VERSION_STRING); - if (args.length<1 || args[0].equals("-help")) - { - System.out.println(); - System.out.println("Syntax : java -jar htmlparser.jar <resourceLocn/website> [node_type]"); - System.out.println(" <resourceLocn/website> the URL or file to be parsed"); - System.out.println(" node_type an optional node name, for example:"); - System.out.println(" A - Show only the link tags extracted from the document"); - System.out.println(" IMG - Show only the image tags extracted from the document"); - System.out.println(" TITLE - Extract the title from the document"); - System.out.println(); - System.out.println("Example : java -jar htmlparser.jar http://www.yahoo.com"); - System.out.println(); - System.out.println("For support, please join the HTMLParser mailing list (user/developer) from the HTML Parser home page..."); - System.out.println("HTML Parser home page : http://htmlparser.sourceforge.net"); - System.out.println(); - System.exit(-1); - } - try - { - Parser parser = new Parser (args[0]); - System.out.println ("Parsing " + parser.getURL ()); - NodeFilter filter; - if (1 < args.length) - filter = new TagNameFilter (args[1]); - else - filter = null; - parser.parse (filter); - } - catch (ParserException e) { - e.printStackTrace(); - } - } - public void visitAllNodesWith(NodeVisitor visitor) throws ParserException { Node node; --- 517,520 ---- *************** *** 798,825 **** } /** ! * Creates the parser on an input string. ! * @param html The string containing HTML. ! * @param charset <em>Optional</em>. The character set encoding that will ! * be reported by {@link #getEncoding}. If charset is <code>null</code> ! * the default character set is used. ! * @return A parser with the <code>html</code> string as input. */ ! public static Parser createParser (String html, String charset) ! { ! Parser ret; ! ! if (null == html) ! throw new IllegalArgumentException ("html cannot be null"); ! ret = new Parser (new Lexer (new Page (html, charset))); ! return (ret); } /** ! * @return String lineSeparator that will be used in toHTML() */ ! public static String getLineSeparator() { ! return lineSeparator; } } --- 570,652 ---- } + // + // ConnectionMonitor interface + // + /** ! * Called just prior to calling connect. ! * The connection has been conditioned with proxy, URL user/password, ! * and cookie information. It is still possible to adjust the ! * connection to alter the request method for example. ! * @param connection The connection which is about to be connected. ! * @exception This exception is thrown if the connection monitor ! * wants the ConnectionManager to bail out. */ ! public void preConnect (HttpURLConnection connection) ! throws ! ParserException ! { ! if (null != getFeedback ()) ! getFeedback ().info (ConnectionManager.getRequestHeader (connection)); ! } ! /** Called just after calling connect. ! * The response code and header fields can be examined. ! * @param connection The connection that was just connected. ! * @exception This exception is thrown if the connection monitor ! * wants the ConnectionManager to bail out. ! */ ! public void postConnect (HttpURLConnection connection) ! throws ! ParserException ! { ! if (null != getFeedback ()) ! getFeedback ().info (ConnectionManager.getResponseHeader (connection)); } /** ! * The main program, which can be executed from the command line */ ! public static void main (String [] args) ! { ! Parser parser; ! NodeFilter filter; ! ! if (args.length < 1 || args[0].equals ("-help")) ! { ! System.out.println ("HTML Parser v" + VERSION_STRING + "\n"); ! System.out.println (); ! System.out.println ("Syntax : java -jar htmlparser.jar <resourceLocn/website> [node_type]"); ! System.out.println (" <resourceLocn/website> the URL or file to be parsed"); ! System.out.println (" node_type an optional node name, for example:"); ! System.out.println (" A - Show only the link tags extracted from the document"); ! System.out.println (" IMG - Show only the image tags extracted from the document"); ! System.out.println (" TITLE - Extract the title from the document"); ! System.out.println (); ! System.out.println ("Example : java -jar htmlparser.jar http://www.yahoo.com"); ! System.out.println (); ! System.out.println ("For support, please join the HTMLParser mailing list (user/developer) from the HTML Parser home page..."); ! System.out.println ("HTML Parser home page : http://htmlparser.org"); ! System.out.println (); ! } ! else ! try ! { ! parser = new Parser (); ! if (1 < args.length) ! filter = new TagNameFilter (args[1]); ! else ! { // for a simple dump, use more verbose settings ! filter = null; ! parser.setFeedback (Parser.stdout); ! getConnectionManager ().setMonitor (parser); ! } ! parser.setURL (args[0]); ! parser.parse (filter); ! } ! catch (ParserException e) ! { ! e.printStackTrace (); ! } } } |
From: Derrick O. <der...@us...> - 2004-09-02 02:28:52
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexerapplications/thumbelina In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29769/src/org/htmlparser/lexerapplications/thumbelina Modified Files: ThumbelinaFrame.java Log Message: Implemented: RFE #1017249 HTML Client Doesn't Support Cookies but will follow redirect RFE #1010586 Add support for password protected URL and RFE #1000739 Add support for proxy scenario A new http package is added, the primary class being Connectionmanager which handles proxies, passwords and cookies. Some testing still needed. Also removed some line separator cruft. Index: ThumbelinaFrame.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexerapplications/thumbelina/ThumbelinaFrame.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** ThumbelinaFrame.java 31 Jul 2004 16:42:30 -0000 1.3 --- ThumbelinaFrame.java 2 Sep 2004 02:28:14 -0000 1.4 *************** *** 929,933 **** if (null != query) { ! // replace spzces with + terms = query.replace (' ', '+'); buffer = new StringBuffer (1024); --- 929,933 ---- if (null != query) { ! // replace spaces with + terms = query.replace (' ', '+'); buffer = new StringBuffer (1024); *************** *** 944,948 **** if (USE_MOZILLA_HEADERS) { ! // Theses are the Mozilla header fields: //Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1 //Accept-Language: en-us, en;q=0.50 --- 944,948 ---- if (USE_MOZILLA_HEADERS) { ! // These are the Mozilla header fields: //Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1 //Accept-Language: en-us, en;q=0.50 |
From: Derrick O. <der...@us...> - 2004-09-02 02:28:46
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29769/src/org/htmlparser/tests/scannersTests Modified Files: ScriptScannerTest.java JspScannerTest.java Log Message: Implemented: RFE #1017249 HTML Client Doesn't Support Cookies but will follow redirect RFE #1010586 Add support for password protected URL and RFE #1000739 Add support for proxy scenario A new http package is added, the primary class being Connectionmanager which handles proxies, passwords and cookies. Some testing still needed. Also removed some line separator cruft. Index: JspScannerTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/JspScannerTest.java,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** JspScannerTest.java 14 Jan 2004 02:53:47 -0000 1.37 --- JspScannerTest.java 2 Sep 2004 02:28:07 -0000 1.38 *************** *** 27,31 **** package org.htmlparser.tests.scannersTests; - import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.JspTag; --- 27,30 ---- *************** *** 85,89 **** "}\n" + "%>"); - Parser.setLineSeparator("\r\n"); parser.setNodeFactory (new PrototypicalNodeFactory (new JspTag ())); parseAndAssertNodeCount(1); --- 84,87 ---- Index: ScriptScannerTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java,v retrieving revision 1.56 retrieving revision 1.57 diff -C2 -d -r1.56 -r1.57 *** ScriptScannerTest.java 18 Jul 2004 21:31:19 -0000 1.56 --- ScriptScannerTest.java 2 Sep 2004 02:28:07 -0000 1.57 *************** *** 113,117 **** createParser(testHTML1,"http://www.google.com/test/index.html"); - Parser.setLineSeparator("\r\n"); parseAndAssertNodeCount(1); assertTrue("Node should be a body tag", node[0] instanceof BodyTag); --- 113,116 ---- |
From: Derrick O. <der...@us...> - 2004-09-02 02:28:33
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29769 Modified Files: build.xml Log Message: Implemented: RFE #1017249 HTML Client Doesn't Support Cookies but will follow redirect RFE #1010586 Add support for password protected URL and RFE #1000739 Add support for proxy scenario A new http package is added, the primary class being Connectionmanager which handles proxies, passwords and cookies. Some testing still needed. Also removed some line separator cruft. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.71 retrieving revision 1.72 diff -C2 -d -r1.71 -r1.72 *** build.xml 29 Jul 2004 03:02:19 -0000 1.71 --- build.xml 2 Sep 2004 02:28:16 -0000 1.72 *************** *** 236,239 **** --- 236,242 ---- <include name="org/htmlparser/Tag.java"/> <include name="org/htmlparser/Text.java"/> + <include name="org/htmlparser/http/ConnectionManager.java"/> + <include name="org/htmlparser/http/ConnectionMonitor.java"/> + <include name="org/htmlparser/http/Cookie.java"/> <include name="org/htmlparser/util/ParserException.java"/> <include name="org/htmlparser/util/ChainedException.java"/> *************** *** 272,275 **** --- 275,281 ---- <include name="org/htmlparser/scanners/Scanner.class"/> <include name="org/htmlparser/scanners/TagScanner.class"/> + <include name="org/htmlparser/http/ConnectionManager.class"/> + <include name="org/htmlparser/http/ConnectionMonitor.class"/> + <include name="org/htmlparser/http/Cookie.class"/> <include name="org/htmlparser/util/ParserException.class"/> <include name="org/htmlparser/util/ChainedException.class"/> *************** *** 420,423 **** --- 426,430 ---- <group title="Beans" packages="org.htmlparser.beans"/> <group title="Patterns" packages="org.htmlparser.visitors,org.htmlparser.nodeDecorators,org.htmlparser.filters"/> + <group title="Http" packages="org.htmlparser.http"/> <group title="Sax" packages="org.htmlparser.sax"/> <group title="Utility" packages="org.htmlparser.util,org.htmlparser.util.sort"/> |
From: Derrick O. <der...@us...> - 2004-09-02 02:28:33
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29769/src/org/htmlparser/tests/parserHelperTests Modified Files: StringParserTest.java RemarkNodeParserTest.java Log Message: Implemented: RFE #1017249 HTML Client Doesn't Support Cookies but will follow redirect RFE #1010586 Add support for password protected URL and RFE #1000739 Add support for proxy scenario A new http package is added, the primary class being Connectionmanager which handles proxies, passwords and cookies. Some testing still needed. Also removed some line separator cruft. Index: RemarkNodeParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests/RemarkNodeParserTest.java,v retrieving revision 1.47 retrieving revision 1.48 diff -C2 -d -r1.47 -r1.48 *** RemarkNodeParserTest.java 18 Jul 2004 21:31:21 -0000 1.47 --- RemarkNodeParserTest.java 2 Sep 2004 02:28:16 -0000 1.48 *************** *** 27,31 **** package org.htmlparser.tests.parserHelperTests; - import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Remark; --- 27,30 ---- *************** *** 75,79 **** "</TEST>\n"); parser.setNodeFactory (new PrototypicalNodeFactory (true)); - Parser.setLineSeparator("\r\n"); parseAndAssertNodeCount(15); // The first node should be a Remark --- 74,77 ---- *************** *** 99,103 **** "</TEST>\n"); parser.setNodeFactory (new PrototypicalNodeFactory (true)); - Parser.setLineSeparator("\r\n"); parseAndAssertNodeCount(15); // The first node should be a Remark --- 97,100 ---- *************** *** 124,128 **** "</TEST>\n"); parser.setNodeFactory (new PrototypicalNodeFactory (true)); - Parser.setLineSeparator("\r\n"); parseAndAssertNodeCount(15); // The first node should be a Remark --- 121,124 ---- *************** *** 160,164 **** "-->"); parser.setNodeFactory (new PrototypicalNodeFactory (true)); - Parser.setLineSeparator("\r\n"); parseAndAssertNodeCount(1); assertTrue("Node should be a Remark",node[0] instanceof Remark); --- 156,159 ---- *************** *** 195,199 **** "bcd -->"); parser.setNodeFactory (new PrototypicalNodeFactory (true)); - Parser.setLineSeparator("\n"); parseAndAssertNodeCount(1); assertTrue("Node should be a Remark",node[0] instanceof Remark); --- 190,193 ---- *************** *** 217,221 **** "ssd -->"); parser.setNodeFactory (new PrototypicalNodeFactory (true)); - Parser.setLineSeparator("\n"); parseAndAssertNodeCount(1); assertTrue("Node should be a Tag but was "+node[0],node[0] instanceof Tag); --- 211,214 ---- *************** *** 225,229 **** "-\n"+ "ssd --",tag.getText()); - Parser.setLineSeparator("\r\n"); } --- 218,221 ---- Index: StringParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests/StringParserTest.java,v retrieving revision 1.49 retrieving revision 1.50 diff -C2 -d -r1.49 -r1.50 *** StringParserTest.java 18 Jul 2004 21:31:21 -0000 1.49 --- StringParserTest.java 2 Sep 2004 02:28:15 -0000 1.50 *************** *** 27,31 **** package org.htmlparser.tests.parserHelperTests; - import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Remark; --- 27,30 ---- *************** *** 81,85 **** createParser("view these documents, you must have <A href='http://www.adobe.com'>Adobe \n"+ "Acrobat Reader</A> installed on your computer."); - Parser.setLineSeparator("\r\n"); parseAndAssertNodeCount(3); // The first node should be a Text- with the text - view these documents, you must have --- 80,83 ---- |
From: Derrick O. <der...@us...> - 2004-09-02 02:28:33
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29769/src/org/htmlparser/http Added Files: ConnectionMonitor.java Cookie.java package.html ConnectionManager.java Log Message: Implemented: RFE #1017249 HTML Client Doesn't Support Cookies but will follow redirect RFE #1010586 Add support for password protected URL and RFE #1000739 Add support for proxy scenario A new http package is added, the primary class being Connectionmanager which handles proxies, passwords and cookies. Some testing still needed. Also removed some line separator cruft. --- NEW FILE: package.html --- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <html> <head> <!-- HTMLParser Library $Name: $ - A java-based parser for HTML http://sourceforge.org/projects/htmlparser Copyright (C) 2004 Derrick Oswald Revision Control Information $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http/package.html,v $ $Author: derrickoswald $ $Date: 2004/09/02 02:28:15 $ $Revision: 1.1 $ This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA --> </head> <body> The http package is responsible for HTTP connections to servers. The Lexer and Parser provide many ways to supply text to be parsed, but this package only deals with cases where a URL is supplied as a string, with the expectation that the Lexer or Parser will perform the HTTP connection. <p>The {@link org.htmlparser.http.ConnectionManager} class adds <ul> <li>cookie</li> <li>proxy</li> <li>password protected URL</li> </ul> capabilities when accessing the internet via the <a href="http://www.ietf.org/rfc/rfc2616.txt">HTTP protocol</a>. Each of these capabilities requires conditioning the HTTP connection. <p>The {@link org.htmlparser.http.ConnectionMonitor} interface is a callback mechanism for the ConnectionManager to notify an interested application when an HTTP connection is made. Example uses may include conditioning the connection further, accessing HTTP header information, or providing reporting or statistical functions. Callbacks are not performed for FileURLConnections, which are also handled by the connection manager. <p>The {@link org.htmlparser.http.Cookie} class is a container for cookie data received and sent in HTTP requests and responses. It may be necessary to prime the ConnectionManager with cookies received via a login procedure in order to access protected HTML content. <p> A typical use of this package, might look something like this: <pre> ConnectionManager manager = Parser.getConnectionManager (); // set up proxying manager.setProxyHost ("proxyhost.mycompany.com"); manager.setProxyPort (8888); manager.setProxyUser ("FredBarnes"); manager.setProxyPassword ("secret"); // set up cookies Cookie cookie = new Cookie ("USER", "FreddyBaby"); manager.setCookie (cookie, "www.freshmeat.net"); cookie = new Cookie ("PHPSESSID", "e5dbeb6152e70d99427f2458d8969f8b"); cookie.setDomain (".freshmeat.net"); manager.setCookie (cookie, null); // set up security to access a password protected URL manager.setUser ("FredB"); manager.setPassword ("holy$cow"); // set up an inner class for callbacks ConnectionMonitor monitor = new ConnectionMonitor () { public void preConnect (HttpURLConnection connection) { System.out.println (ConnectionManager.getRequestHeader (connection)); } public void postConnect (HttpURLConnection connection) { System.out.println (ConnectionManager.getResponseHeader (connection)); } } manager.setMonitor (monitor); // perform the connection Parser parser = new Parser ("http://frehmeat.net"); </pre> The ConnectionManager used by the Parser class is actually held by the Page class. It is accessible from either the Parser or the Page class via <code>getConnectionManager()</code>. It is a static (singleton) instance so that subsequent connections made by the parser will use the contents of the cookie jar from previous connections. By default, cookie processing is not enabled. It can be enabled by either setting a cookie or using <code>setCookieProcessingEnabled().</code>. </body> </html> --- NEW FILE: Cookie.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http/Cookie.java,v $ // $Author: derrickoswald $ // $Date: 2004/09/02 02:28:15 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.http; import java.io.Serializable; import java.util.Date; /** * A HTTP cookie. * This class represents a "Cookie", as used for session management with HTTP * and HTTPS protocols. Cookies are used to get user agents (web browsers etc) * to hold small amounts of state associated with a user's web browsing. Common * applications for cookies include storing user preferences, automating low * security user signon facilities, and helping collect data used for "shopping * cart" style applications. * <P> * Cookies are named, and have a single value. They may have optional * attributes, including a comment presented to the user, path and domain * qualifiers for which hosts see the cookie, a maximum age, and a version. * Current web browsers often have bugs in how they treat those attributes, so * interoperability can be improved by not relying on them heavily. * <P> * Cookies are assigned by servers, using fields added to HTTP response headers. * Cookies are passed back to those servers using fields added to HTTP request * headers.Several cookies with the same name can be returned; * they have different path attributes, but those attributes * will not be visible when using "old format" cookies. * <P> * Cookies affect the caching of the web pages used to set their values. At this * time, none of the sophisticated HTTP/1.1 cache control models are supported. * Standard HTTP/1.0 caches will not cache pages which contain * cookies created by this class. * <P> * Cookies are being standardized by the IETF. This class supports the original * Cookie specification (from Netscape Communications Corp.) as well as the * updated <a href="http://www.ietf.org/rfc/rfc2109.txt">RFC 2109</a> specification. */ public class Cookie implements Cloneable, Serializable { // // from RFC 2068, token special case characters // private static final String mSpecials = "()<>@,;:\\\"/[]?={} \t"; /** * The name of the cookie. */ protected String mName; /** * The cookie value. */ protected String mValue; // value of NAME /** * Describes the cookie's use. */ protected String mComment; /** * Domain that sees cookie. */ protected String mDomain; /** * Cookie expires after this date. */ protected Date mExpiry; /** * URLs that see the cookie. */ protected String mPath; /** * Use SSL. */ protected boolean mSecure; /** * If Version=1 it means RFC 2109++ style cookies. */ protected int mVersion; /** * Defines a cookie with an initial name/value pair. The name must be an * HTTP/1.1 "token" value; alphanumeric ASCII strings work. Names starting * with a "$" character are reserved by RFC 2109. * The path for the cookie is set to the root ("/") and there is no * expiry time set. * @param name * name of the cookie * @param value * value of the cookie * @throws IllegalArgumentException * if the cookie name is not an HTTP/1.1 "token", or if it is * one of the tokens reserved for use by the cookie protocol */ public Cookie (String name, String value) { if (!isToken (name) || name.equalsIgnoreCase ("Comment") // rfc2019 || name.equalsIgnoreCase ("Discard") // 2019++ || name.equalsIgnoreCase ("Domain") || name.equalsIgnoreCase ("Expires") // (old cookies) || name.equalsIgnoreCase ("Max-Age") // rfc2019 || name.equalsIgnoreCase ("Path") || name.equalsIgnoreCase ("Secure") || name.equalsIgnoreCase ("Version")) throw new IllegalArgumentException ("invalid cookie name: " + name); mName = name; mValue = value; mComment = null; mDomain = null; mExpiry = null; // not persisted mPath = "/"; mSecure = false; mVersion = 0; } /** * If a user agent (web browser) presents this cookie to a user, the * cookie's purpose will be described using this comment. This is not * supported by version zero cookies. * * @see #getComment */ public void setComment (String purpose) { mComment = purpose; } /** * Returns the comment describing the purpose of this cookie, or null if no * such comment has been defined. * * @see #setComment */ public String getComment () { return (mComment); } /** * This cookie should be presented only to hosts satisfying this domain name * pattern. Read RFC 2109 for specific details of the syntax. Briefly, a * domain name name begins with a dot (".foo.com") and means that hosts in * that DNS zone ("www.foo.com", but not "a.b.foo.com") should see the * cookie. By default, cookies are only returned to the host which saved * them. * * @see #getDomain */ public void setDomain (String pattern) { mDomain = pattern.toLowerCase (); // IE allegedly needs this } /** * Returns the domain of this cookie. * * @see #setDomain */ public String getDomain () { return (mDomain); } /** * Sets the expiry date of the cookie. The cookie will expire after the * date specified. A null value indicates the default behaviour: * the cookie is not stored persistently, and will be deleted when the user * agent (web browser) exits. * * @see #getExpiryDate */ public void setExpiryDate (Date expiry) { mExpiry = expiry; } /** * Returns the expiry date of the cookie. If none was specified, * null is returned, indicating the default behaviour described * with <em>setExpiryDate</em>. * * @see #setExpiryDate */ public Date getExpiryDate () { return (mExpiry); } /** * This cookie should be presented only with requests beginning with this * URL. Read RFC 2109 for a specification of the default behaviour. * Basically, URLs in the same "directory" as the one which set the cookie, * and in subdirectories, can all see the cookie unless a different path is * set. * * @see #getPath */ public void setPath (String uri) { mPath = uri; } /** * Returns the prefix of all URLs for which this cookie is targetted. * * @see #setPath */ public String getPath () { return (mPath); } /** * Indicates to the user agent that the cookie should only be sent using a * secure protocol (https). This should only be set when the cookie's * originating server used a secure protocol to set the cookie's value. * * @see #getSecure */ public void setSecure (boolean flag) { mSecure = flag; } /** * Returns the value of the 'secure' flag. * * @see #setSecure */ public boolean getSecure () { return (mSecure); } /** * Returns the name of the cookie. This name may not be changed after the * cookie is created. */ public String getName () { return (mName); } /** * Sets the value of the cookie. BASE64 encoding is suggested for use with * binary values. * * <P> * With version zero cookies, you need to be careful about the kinds of * values you use. Values with various special characters (whitespace, * brackets and parentheses, the equals sign, comma, double quote, slashes, * question marks, the "at" sign, colon, and semicolon) should be avoided. * Empty values may not behave the same way on all browsers. * * @see #getValue */ public void setValue (String newValue) { mValue = newValue; } /** * Returns the value of the cookie. * * @see #setValue */ public String getValue () { return (mValue); } /** * Returns the version of the cookie. Version 1 complies with RFC 2109, * version 0 indicates the original version, as specified by Netscape. Newly * constructed cookies use version 0 by default, to maximize * interoperability. Cookies provided by a user agent will identify the * cookie version used by the browser. * * @see #setVersion */ public int getVersion () { return (mVersion); } /** * Sets the version of the cookie protocol used when this cookie saves * itself. Since the IETF standards are still being finalized, consider * version 1 as experimental; do not use it (yet) on production sites. * * @see #getVersion */ public void setVersion (int version) { mVersion = version; } /* * Return true iff the string counts as an HTTP/1.1 "token". */ private boolean isToken (String value) { int length; char c; boolean ret; ret = true; length = value.length (); for (int i = 0; i < length && ret; i++) { c = value.charAt (i); if (c < 0x20 || c >= 0x7f || mSpecials.indexOf (c) != -1) ret = false; } return (ret); } /** * Returns a copy of this object. */ public Object clone () { try { return (super.clone ()); } catch (CloneNotSupportedException e) { throw new RuntimeException (e.getMessage ()); } } /** * Convert this cookie into a user friendly string. * @return A short form string representing this cookie. */ public String toString () { StringBuffer ret; ret = new StringBuffer (50); if (getSecure ()) ret.append ("secure "); if (0 != getVersion ()) { ret.append ("version "); ret.append (getVersion ()); ret.append (" "); } ret.append ("cookie"); if (null != getDomain ()) { ret.append (" for "); ret.append (getDomain ()); if (null != getPath ()) ret.append (getPath ()); } else { if (null != getPath ()) { ret.append (" (path "); ret.append (getPath ()); ret.append (")"); } } ret.append (": "); ret.append (getName ()); ret.append ("="); if (getValue ().length () > 40) { ret.append (getValue ().substring (1, 40)); ret.append ("..."); } else ret.append (getValue ()); if (null != getComment ()) { ret.append (" // "); ret.append (getComment ()); } return (ret.toString ()); } } --- NEW FILE: ConnectionMonitor.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http/ConnectionMonitor.java,v $ // $Author: derrickoswald $ // $Date: 2004/09/02 02:28:15 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.http; import java.net.HttpURLConnection; import org.htmlparser.util.ParserException; /** * Interface for HTTP connection notification callbacks. */ public interface ConnectionMonitor { /** * Called just prior to calling connect. * The connection has been conditioned with proxy, URL user/password, * and cookie information. It is still possible to adjust the * connection, to alter the request method for example. * @param connection The connection which is about to be connected. * @exception This exception is thrown if the connection monitor * wants the ConnectionManager to bail out. */ void preConnect (HttpURLConnection connection) throws ParserException; /** Called just after calling connect. * The response code and header fields can be examined. * @param connection The connection that was just connected. * @exception This exception is thrown if the connection monitor * wants the ConnectionManager to bail out. */ void postConnect (HttpURLConnection connection) throws ParserException; } --- NEW FILE: ConnectionManager.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http/ConnectionManager.java,v $ // $Author: derrickoswald $ // $Date: 2004/09/02 02:28:15 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU [...1110 lines suppressed...] saveCookies (cookies, connection); } } protected void saveCookies (Vector list, URLConnection connection) { Cookie cookie; String domain; for (int i = 0; i < list.size (); i++) { cookie = (Cookie)list.elementAt (i); domain = cookie.getDomain (); if (null == domain) domain = connection.getURL ().getHost (); setCookie (cookie, domain); } } } |
From: Derrick O. <der...@us...> - 2004-09-02 02:27:56
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29748/src/org/htmlparser/http Log Message: Directory /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http added to the repository |
From: Alberto N. <an...@us...> - 2004-08-27 09:57:18
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27405 Modified Files: HTMLParserUtilsTest.java Log Message: New tests added for: bug fixing and trimAllTags method test. Index: HTMLParserUtilsTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/HTMLParserUtilsTest.java,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** HTMLParserUtilsTest.java 17 Jul 2004 13:45:03 -0000 1.18 --- HTMLParserUtilsTest.java 27 Aug 2004 09:56:56 -0000 1.19 *************** *** 167,170 **** --- 167,195 ---- ParserUtils.trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/") ); + assertStringEquals( + "modified text", + "0", + ParserUtils.trimSpacesBeginEnd("0", "") + ); + assertStringEquals( + "modified text", + "verifying the last char x", + ParserUtils.trimSpacesBeginEnd("verifying the last char x", "") + ); + assertStringEquals( + "modified text", + "verifying the last char x", + ParserUtils.trimSpacesBeginEnd("verifying the last char x ", "") + ); + assertStringEquals( + "modified text", + "x verifying the first char", + ParserUtils.trimSpacesBeginEnd("x verifying the first char", "") + ); + assertStringEquals( + "modified text", + "x verifying the first char", + ParserUtils.trimSpacesBeginEnd(" x verifying the first char", "") + ); } *************** *** 216,219 **** --- 241,280 ---- ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true) ); + // Test trimAllTags method + assertStringEquals( + "modified text", + " +12.5 ALL OK", + ParserUtils.trimAllTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", false) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimAllTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", true) + ); + assertStringEquals( + "modified text", + " +12.5 ", + ParserUtils.trimAllTags("<DIV><DIV> +12.5 </DIV></DIV>", false) + ); + assertStringEquals( + "modified text", + "", + ParserUtils.trimAllTags("<DIV><DIV> +12.5 </DIV></DIV>", true) + ); + assertStringEquals( + "modified text", + " YYY ", + ParserUtils.trimAllTags("<XXX> YYY <ZZZ>", false) + ); + assertStringEquals( + "modified text", + "YYY", + ParserUtils.trimAllTags("YYY", false) + ); + assertStringEquals( + "modified text", + "> OK <", + ParserUtils.trimAllTags("> OK <", true) + ); } catch (Exception e) *************** *** 274,277 **** --- 335,361 ---- ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, true) ); + NodeFilter filterTableRow = new TagNameFilter("TR"); + NodeFilter filterTableColumn = new TagNameFilter("TD"); + OrFilter filterOr = new OrFilter(filterTableRow, filterTableColumn); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<TR><TD> +12.5 </TD></TR> ALL OK", filterOr) + ); + assertStringEquals( + "modified text", + "<TD> +12.5 </TD> ALL OK", + ParserUtils.trimTags("<TR><TD> +12.5 </TD></TR> ALL OK", filterOr, false, false) + ); + assertStringEquals( + "modified text", + " +12.5 ALL OK", + ParserUtils.trimTags("<TR><TD> +12.5 </TD></TR> ALL OK", filterOr, true, false) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<TR><TD> +12.5 </TD></TR> ALL OK", filterOr, false, true) + ); } catch (Exception e) *************** *** 332,335 **** --- 416,442 ---- ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, true) ); + NodeFilter filterTableRow = new NodeClassFilter(TableRow.class); + NodeFilter filterTableColumn = new NodeClassFilter(TableColumn.class); + OrFilter filterOr = new OrFilter(filterTableRow, filterTableColumn); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<TR><TD> +12.5 </TD></TR> ALL OK", filterOr) + ); + assertStringEquals( + "modified text", + "<TD> +12.5 </TD> ALL OK", + ParserUtils.trimTags("<TR><TD> +12.5 </TD></TR> ALL OK", filterOr, false, false) + ); + assertStringEquals( + "modified text", + " +12.5 ALL OK", + ParserUtils.trimTags("<TR><TD> +12.5 </TD></TR> ALL OK", filterOr, true, false) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<TR><TD> +12.5 </TD></TR> ALL OK", filterOr, false, true) + ); } catch (Exception e) |
From: Alberto N. <an...@us...> - 2004-08-27 09:54:37
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26947 Modified Files: ParserUtils.java Log Message: Bug fixing and trimAllTags method added. Index: ParserUtils.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/ParserUtils.java,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** ParserUtils.java 31 Jul 2004 16:42:34 -0000 1.45 --- ParserUtils.java 27 Aug 2004 09:54:27 -0000 1.46 *************** *** 93,97 **** /** * Split the input string considering as string separator ! * all the non numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>For example if you call splitButDigits("<DIV> +12.5, +3.4 </DIV>", "+."), --- 93,97 ---- /** * Split the input string considering as string separator ! * all the not numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>For example if you call splitButDigits("<DIV> +12.5, +3.4 </DIV>", "+."), *************** *** 154,158 **** /** ! * Remove from the input string all the non numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>For example if you call trimButDigits("<DIV> +12.5 </DIV>", "+."), --- 154,158 ---- /** ! * Remove from the input string all the not numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>For example if you call trimButDigits("<DIV> +12.5 </DIV>", "+."), *************** *** 185,189 **** /** ! * Remove from the input string all the non numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>The removal process removes only chars at the beginning and at the end of the string. --- 185,189 ---- /** ! * Remove from the beginning and the end of the input string all the not numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>The removal process removes only chars at the beginning and at the end of the string. *************** *** 191,195 **** * <BR>you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed). * <BR>For example if you call trimButDigitsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+."), ! * <BR>you obtain a string "+1 2 . 5" as output (the spaces inside the string are not removed). * @param input - The string in input. * @param charsDoNotBeRemoved - The chars that do not be removed. --- 191,195 ---- * <BR>you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed). * <BR>For example if you call trimButDigitsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+."), ! * <BR>you obtain a string "+1 2 . 5" as output (the spacess inside the string are not removed). * @param input - The string in input. * @param charsDoNotBeRemoved - The chars that do not be removed. *************** *** 238,242 **** /** * Split the input string considering as string separator ! * all the space and tabs like chars and * the chars specified in the input variable charsToBeRemoved. * <BR>For example if you call splitSpaces("<DIV> +12.5, +3.4 </DIV>", "<>DIV/,"), --- 238,242 ---- /** * Split the input string considering as string separator ! * all the spaces and tabs like chars and * the chars specified in the input variable charsToBeRemoved. * <BR>For example if you call splitSpaces("<DIV> +12.5, +3.4 </DIV>", "<>DIV/,"), *************** *** 299,308 **** /** ! * Remove from the input string all the space and tabs like chars. ! * <BR>Remove also the chars specified in the input variable charsToBeRemoved. * <BR>For example if you call trimSpaces("<DIV> +12.5 </DIV>", "<>DIV/"), * <BR>you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimSpaces("<DIV> Trim All Spaces Also The Ones Inside The String </DIV>", "<>DIV/"), ! * <BR>you obtain a string "TrimAllSpacesAlsoTheOnesInsideTheString" as output (all the space inside the string are removed). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. --- 299,308 ---- /** ! * Remove from the input string all the spaces and tabs like chars. ! * Remove also the chars specified in the input variable charsToBeRemoved. * <BR>For example if you call trimSpaces("<DIV> +12.5 </DIV>", "<>DIV/"), * <BR>you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimSpaces("<DIV> Trim All Spaces Also The Ones Inside The String </DIV>", "<>DIV/"), ! * <BR>you obtain a string "TrimAllSpacesAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. *************** *** 330,340 **** /** ! * Remove from the input string all the space and tabs like chars. ! * <BR>Remove also the chars specified in the input variable charsToBeRemoved. * <BR>The removal process removes only chars at the beginning and at the end of the string. * <BR>For example if you call trimSpacesBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/"), * <BR>you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/"), ! * <BR>you obtain a string "Trim all spaces but not the ones inside the string" as output (all the space inside the string are preserved). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. --- 330,340 ---- /** ! * Remove from the beginning and the end of the input string all the spaces and tabs like chars. ! * Remove also the chars specified in the input variable charsToBeRemoved. * <BR>The removal process removes only chars at the beginning and at the end of the string. * <BR>For example if you call trimSpacesBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/"), * <BR>you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/"), ! * <BR>you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. *************** *** 369,373 **** if (charsToBeRemoved.charAt(charsCount)==input.charAt(index)) charFound=true; ! if (!( (Character.isWhitespace(input.charAt(index))) || (Character.isSpaceChar(input.charAt(index-1))) || (charFound) )) { end=index; --- 369,373 ---- if (charsToBeRemoved.charAt(charsCount)==input.charAt(index)) charFound=true; ! if (!( (Character.isWhitespace(input.charAt(index))) || (Character.isSpaceChar(input.charAt(index))) || (charFound) )) { end=index; *************** *** 475,479 **** /** ! * Remove from the input string all the characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>The removal process removes only chars at the beginning and at the end of the string. --- 475,479 ---- /** ! * Remove from the beginning and the end of the input string all the characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>The removal process removes only chars at the beginning and at the end of the string. *************** *** 592,596 **** * <BR>you obtain a string "+12.5" as output (<,>,D,I,V,/ and space char are chars that must be removed). * <BR>For example if you call trimChars("<DIV> Trim All Chars Also The Ones Inside The String </DIV>", "<>DIV/ "), ! * <BR>you obtain a string "TrimAllCharsAlsoTheOnesInsideTheString" as output (all the space inside the string are removed). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. --- 592,596 ---- * <BR>you obtain a string "+12.5" as output (<,>,D,I,V,/ and space char are chars that must be removed). * <BR>For example if you call trimChars("<DIV> Trim All Chars Also The Ones Inside The String </DIV>", "<>DIV/ "), ! * <BR>you obtain a string "TrimAllCharsAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. *************** *** 618,627 **** /** ! * Remove from the input string all the chars specified in the input variable charsToBeRemoved. * <BR>The removal process removes only chars at the beginning and at the end of the string. * <BR>For example if you call trimCharsBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/ "), * <BR>you obtain a string "+12.5" as output (' ' is a space char and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimCharsBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/ "), ! * <BR>you obtain a string "Trim all spaces but not the ones inside the string" as output (all the space inside the string are preserved). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. --- 618,627 ---- /** ! * Remove from the beginning and the end of the input string all the chars specified in the input variable charsToBeRemoved. * <BR>The removal process removes only chars at the beginning and at the end of the string. * <BR>For example if you call trimCharsBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/ "), * <BR>you obtain a string "+12.5" as output (' ' is a space char and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimCharsBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/ "), ! * <BR>you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. *************** *** 900,905 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * @see ParserUtils#trimTags (String input, String[] tags, boolean recursive, boolean insideTag). */ --- 900,945 ---- /** ! * Trim the input string, removing all the tags in the input string. ! * <BR>The method trims all the substrings included in the input string of the following type: ! * "<XXX>", where XXX could be a string of any type. ! * <BR>If you set to true the inside parameter, the method deletes also the YYY string in the following input string: ! * "<XXX>YYY<ZZZ>", note that ZZZ is not necessary the closing tag of XXX. ! * @param input The string in input. ! * @param inside If true, it forces the method to delete also what is inside the tags. ! * @return The string without tags. ! */ ! public static String trimAllTags (String input, boolean inside) ! { ! ! StringBuffer output = new StringBuffer(); ! ! if (inside) { ! if ((input.indexOf('<')==-1) || (input.lastIndexOf('>')==-1) || (input.lastIndexOf('>')<input.indexOf('<'))) { ! output.append(input); ! } else { ! output.append(input.substring(0, input.indexOf('<'))); ! output.append(input.substring(input.lastIndexOf('>')+1, input.length())); ! } ! } else { ! boolean write = true; ! for (int index=0; index<input.length(); index++) ! { ! if (input.charAt(index)=='<' && write) ! write = false; ! if (write) ! output.append(input.charAt(index)); ! if (input.charAt(index)=='>' && (!write)) ! write = true; ! } ! } ! ! return output.toString(); ! } ! ! ! /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content. * @see ParserUtils#trimTags (String input, String[] tags, boolean recursive, boolean insideTag). */ *************** *** 988,993 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * <BR>Use Class class as input parameter * instead of tags[] string array. --- 1028,1034 ---- /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content. * <BR>Use Class class as input parameter * instead of tags[] string array. *************** *** 1001,1006 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * <BR>Use Class class as input parameter * instead of tags[] string array. --- 1042,1048 ---- /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content (optional). * <BR>Use Class class as input parameter * instead of tags[] string array. *************** *** 1014,1019 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * <BR>Use NodeFilter class as input parameter * instead of tags[] string array. --- 1056,1062 ---- /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content. * <BR>Use NodeFilter class as input parameter * instead of tags[] string array. *************** *** 1027,1032 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * <BR>Use NodeFilter class as input parameter * instead of tags[] string array. --- 1070,1076 ---- /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content (optional). * <BR>Use NodeFilter class as input parameter * instead of tags[] string array. |
From: Derrick O. <der...@us...> - 2004-08-25 03:36:12
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2978/src/org/htmlparser/tests Modified Files: ParserTest.java Log Message: Fix bug #1005409 Input file not free by parser. Files larger than 16K on Windows can now be explicitly closed with Page.close(), or will be closed when the page is finalized. Index: ParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTest.java,v retrieving revision 1.61 retrieving revision 1.62 diff -C2 -d -r1.61 -r1.62 *** ParserTest.java 31 Jul 2004 16:42:33 -0000 1.61 --- ParserTest.java 25 Aug 2004 03:36:01 -0000 1.62 *************** *** 46,49 **** --- 46,50 ---- import org.htmlparser.filters.NodeClassFilter; import org.htmlparser.filters.TagNameFilter; + import org.htmlparser.lexer.InputStreamSource; import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; *************** *** 375,378 **** --- 376,436 ---- /** + * Tests deleting a file held open by the parser. + * See bug #1005409 Input file not free by parser + */ + public void testFileDelete () + { + String path; + File file; + PrintWriter out; + Parser parser; + NodeIterator enumeration; + + path = System.getProperty ("user.dir"); + if (!path.endsWith (File.separator)) + path += File.separator; + file = new File (path + "delete_me.html"); + try + { + out = new PrintWriter (new FileWriter (file)); + out.println ("<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"); + out.println ("<html>"); + out.println ("<head>"); + out.println ("<title>test</title>"); + out.println ("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\">"); + out.println ("</head>"); + out.println ("<body>"); + out.println ("This is a test page "); + out.println ("</body>"); + out.println ("</html>"); + // fill our 16K buffer on read + for (int i = 0; i < InputStreamSource.BUFFER_SIZE; i++) + out.println (); + out.close (); + parser = new Parser (file.getAbsolutePath (), new DefaultParserFeedback(DefaultParserFeedback.QUIET)); + parser.setNodeFactory (new PrototypicalNodeFactory (true)); + enumeration = parser.elements (); + enumeration.nextNode (); + if (-1 != System.getProperty ("os.name").indexOf("Windows")) + // linux/unix lets you delete a file even when it's open + assertTrue ("file deleted with more available", !file.delete ()); + // parser.getLexer ().getPage ().close (); + parser = null; + enumeration = null; + System.gc (); + System.runFinalization (); + assertTrue ("file not deleted after destroy", file.delete ()); + } + catch (Exception e) + { + fail (e.toString ()); + } + finally + { + file.delete (); + } + } + + /** * Test with a HTTP header with a valid charset parameter. * Here, ibm.co.jp is an example of a HTTP server that correctly sets the |
From: Derrick O. <der...@us...> - 2004-08-25 03:36:11
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2978/src/org/htmlparser/lexer Modified Files: Page.java Log Message: Fix bug #1005409 Input file not free by parser. Files larger than 16K on Windows can now be explicitly closed with Page.close(), or will be closed when the page is finalized. Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.42 retrieving revision 1.43 diff -C2 -d -r1.42 -r1.43 *** Page.java 1 Aug 2004 02:16:04 -0000 1.42 --- Page.java 25 Aug 2004 03:36:01 -0000 1.43 *************** *** 302,305 **** --- 302,323 ---- /** + * Close the page by destroying the source of characters. + */ + public void close () throws IOException + { + getSource ().destroy (); + } + + /** + * Clean up this page, releasing resources. + * Calls <code>close()</code>. + * @exception Throwable if <code>close()</code> throws an <code>IOException</code>. + */ + protected void finalize () throws Throwable + { + close (); + } + + /** * Get the connection, if any. * @return The connection object for this page, or null if this page |
From: Derrick O. <der...@us...> - 2004-08-01 02:16:13
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2072/src/org/htmlparser/lexer Modified Files: Lexer.java PageIndex.java Page.java Log Message: Speed optimizations based on profiling. Index: PageIndex.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/PageIndex.java,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** PageIndex.java 2 Jan 2004 16:24:53 -0000 1.16 --- PageIndex.java 1 Aug 2004 02:16:04 -0000 1.17 *************** *** 45,51 **** { /** * Increment for allocations. */ ! protected static final int mIncrement = 100; /** --- 45,56 ---- { /** + * Starting increment for allocations. + */ + protected static final int mStartIncrement = 100; + + /** * Increment for allocations. */ ! protected int mIncrement; /** *************** *** 73,76 **** --- 78,82 ---- mIndices = new int[mIncrement]; mCount = 0; + mIncrement = mStartIncrement * 2; } *************** *** 136,148 **** { int position; int ret; - // find where it goes - ret = Sort.bsearch (this, cursor); - - // insert, but not twice position = cursor.getPosition (); ! if (!((ret < size ()) && (position == mIndices[ret]))) insertElementAt (position, ret); return (ret); --- 142,175 ---- { int position; + int last; int ret; position = cursor.getPosition (); ! if (0 == mCount) ! { ! ret = 0; insertElementAt (position, ret); + } + else + { + last = mIndices[mCount - 1]; + if (position == last) + ret = mCount - 1; + else + if (position > last) + { + ret = mCount; + insertElementAt (position, ret); + } + else + { + // find where it goes + ret = Sort.bsearch (this, cursor); + + // insert, but not twice + if (!((ret < size ()) && (position == mIndices[ret]))) + insertElementAt (position, ret); + } + } return (ret); *************** *** 304,307 **** --- 331,335 ---- { // allocate more space int new_values[] = new int[Math.max (capacity () + mIncrement, index + 1)]; + mIncrement *= 2; if (index < capacity ()) { Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** Page.java 31 Jul 2004 16:42:31 -0000 1.41 --- Page.java 1 Aug 2004 02:16:04 -0000 1.42 *************** *** 369,372 **** --- 369,373 ---- stream = new Stream (getConnection ().getInputStream ()); } + try { *************** *** 952,968 **** public String getText () { ! String ret; ! ! try ! { ! ret = mSource.getString (0, mSource.offset ()); ! } ! catch (IOException ioe) ! { ! throw new IllegalArgumentException ( ! "can't get all the previous characters - " + ioe.getMessage ()); ! } ! ! return (ret); } --- 953,957 ---- public String getText () { ! return (getText (0, mSource.offset ())); } Index: Lexer.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v retrieving revision 1.31 retrieving revision 1.32 diff -C2 -d -r1.31 -r1.32 *** Lexer.java 31 Jul 2004 16:42:31 -0000 1.31 --- Lexer.java 1 Aug 2004 02:16:04 -0000 1.32 *************** *** 245,249 **** ParserException { ! Cursor probe; char ch; Node ret; --- 245,249 ---- ParserException { ! int start; char ch; Node ret; *************** *** 257,262 **** mDebugLineTrigger = lineno + 1; // trigger on subsequent lines too } ! probe = mCursor.dup (); ! ch = mPage.getCharacter (probe); switch (ch) { --- 257,262 ---- mDebugLineTrigger = lineno + 1; // trigger on subsequent lines too } ! start = mCursor.getPosition (); ! ch = mPage.getCharacter (mCursor); switch (ch) { *************** *** 265,299 **** break; case '<': ! ch = mPage.getCharacter (probe); if (0 == ch) ! ret = makeString (probe); else if ('%' == ch) { ! probe.retreat (); ! ret = parseJsp (probe); } else if ('/' == ch || '%' == ch || Character.isLetter (ch)) { ! probe.retreat (); ! ret = parseTag (probe); } else if ('!' == ch) { ! ch = mPage.getCharacter (probe); if (0 == ch) ! ret = makeString (probe); else { if ('>' == ch) // handle <!> ! ret = makeRemark (probe); else { ! probe.retreat (); // remark and tag need this character if ('-' == ch) ! ret = parseRemark (probe, quotesmart); else { ! probe.retreat (); // tag needs the previous one too ! ret = parseTag (probe); } } --- 265,299 ---- break; case '<': ! ch = mPage.getCharacter (mCursor); if (0 == ch) ! ret = makeString (start, mCursor.getPosition ()); else if ('%' == ch) { ! mCursor.retreat (); ! ret = parseJsp (start); } else if ('/' == ch || '%' == ch || Character.isLetter (ch)) { ! mCursor.retreat (); ! ret = parseTag (start); } else if ('!' == ch) { ! ch = mPage.getCharacter (mCursor); if (0 == ch) ! ret = makeString (start, mCursor.getPosition ()); else { if ('>' == ch) // handle <!> ! ret = makeRemark (start, mCursor.getPosition ()); else { ! mCursor.retreat (); // remark and tag need this character if ('-' == ch) ! ret = parseRemark (start, quotesmart); else { ! mCursor.retreat (); // tag needs the previous one too ! ret = parseTag (start); } } *************** *** 301,309 **** } else ! ret = parseString (probe, quotesmart); break; default: ! probe.retreat (); // string needs to see leading foreslash ! ret = parseString (probe, quotesmart); break; } --- 301,309 ---- } else ! ret = parseString (start, quotesmart); break; default: ! mCursor.retreat (); // string needs to see leading foreslash ! ret = parseString (start, quotesmart); break; } *************** *** 364,368 **** * @param quotesmart If <code>true</code>, strings ignore quoted contents. */ ! protected Node parseString (Cursor cursor, boolean quotesmart) throws ParserException --- 364,368 ---- * @param quotesmart If <code>true</code>, strings ignore quoted contents. */ ! protected Node parseString (int start, boolean quotesmart) throws ParserException *************** *** 376,402 **** while (!done) { ! ch = mPage.getCharacter (cursor); if (0 == ch) done = true; else if (0x1b == ch) // escape { ! ch = mPage.getCharacter (cursor); if (0 == ch) done = true; else if ('$' == ch) { ! ch = mPage.getCharacter (cursor); if (0 == ch) done = true; else if ('B' == ch) ! scanJIS (cursor); else { ! cursor.retreat (); ! cursor.retreat (); } } else ! cursor.retreat (); } else if (quotesmart && (0 == quote) && (('\'' == ch) || ('"' == ch))) --- 376,402 ---- while (!done) { ! ch = mPage.getCharacter (mCursor); if (0 == ch) done = true; else if (0x1b == ch) // escape { ! ch = mPage.getCharacter (mCursor); if (0 == ch) done = true; else if ('$' == ch) { ! ch = mPage.getCharacter (mCursor); if (0 == ch) done = true; else if ('B' == ch) ! scanJIS (mCursor); else { ! mCursor.retreat (); ! mCursor.retreat (); } } else ! mCursor.retreat (); } else if (quotesmart && (0 == quote) && (('\'' == ch) || ('"' == ch))) *************** *** 405,413 **** else if (quotesmart && (0 != quote) && ('\\' == ch)) { ! ch = mPage.getCharacter (cursor); //try to consume escaped character if ( (ch != '\\') // escaped backslash && (ch != quote)) // escaped quote character // ( reflects ["] or ['] whichever opened the quotation) ! cursor.retreat(); // unconsume char if character was not an escapable char. } else if (quotesmart && (ch == quote)) --- 405,413 ---- else if (quotesmart && (0 != quote) && ('\\' == ch)) { ! ch = mPage.getCharacter (mCursor); //try to consume escaped character if ( (ch != '\\') // escaped backslash && (ch != quote)) // escaped quote character // ( reflects ["] or ['] whichever opened the quotation) ! mCursor.retreat(); // unconsume char if character was not an escapable char. } else if (quotesmart && (ch == quote)) *************** *** 417,421 **** // handle multiline and double slash comments (with a quote) in script like: // I can't handle single quotations. ! ch = mPage.getCharacter (cursor); if (0 == ch) done = true; --- 417,421 ---- // handle multiline and double slash comments (with a quote) in script like: // I can't handle single quotations. ! ch = mPage.getCharacter (mCursor); if (0 == ch) done = true; *************** *** 423,427 **** { do ! ch = mPage.getCharacter (cursor); while ((ch != 0) && (ch != '\n')); } --- 423,427 ---- { do ! ch = mPage.getCharacter (mCursor); while ((ch != 0) && (ch != '\n')); } *************** *** 431,448 **** { do ! ch = mPage.getCharacter (cursor); while ((ch != 0) && (ch != '*')); ! ch = mPage.getCharacter (cursor); if (ch == '*') ! cursor.retreat (); } while ((ch != 0) && (ch != '/')); } else ! cursor.retreat (); } else if ((0 == quote) && ('<' == ch)) { ! ch = mPage.getCharacter (cursor); if (0 == ch) done = true; --- 431,448 ---- { do ! ch = mPage.getCharacter (mCursor); while ((ch != 0) && (ch != '*')); ! ch = mPage.getCharacter (mCursor); if (ch == '*') ! mCursor.retreat (); } while ((ch != 0) && (ch != '/')); } else ! mCursor.retreat (); } else if ((0 == quote) && ('<' == ch)) { ! ch = mPage.getCharacter (mCursor); if (0 == ch) done = true; *************** *** 451,466 **** { done = true; ! cursor.retreat (); ! cursor.retreat (); } else { // it's not a tag, so keep going, but check for quotes ! cursor.retreat (); } } } ! return (makeString (cursor)); } --- 451,466 ---- { done = true; ! mCursor.retreat (); ! mCursor.retreat (); } else { // it's not a tag, so keep going, but check for quotes ! mCursor.retreat (); } } } ! return (makeString (start, mCursor.getPosition ())); } *************** *** 468,487 **** * Create a string node based on the current cursor and the one provided. */ ! protected Node makeString (Cursor cursor) throws ParserException { int length; - int begin; - int end; Node ret; ! begin = mCursor.getPosition (); ! end = cursor.getPosition (); ! length = end - begin; if (0 != length) { // got some characters ! mCursor = cursor; ! ret = getNodeFactory ().createStringNode (this.getPage (), begin, end); } else --- 468,482 ---- * Create a string node based on the current cursor and the one provided. */ ! protected Node makeString (int start, int end) throws ParserException { int length; Node ret; ! length = end - start; if (0 != length) { // got some characters ! ret = getNodeFactory ().createStringNode (this.getPage (), start, end); } else *************** *** 583,587 **** * @return The parsed tag. */ ! protected Node parseTag (Cursor cursor) throws ParserException --- 578,582 ---- * @return The parsed tag. */ ! protected Node parseTag (int start) throws ParserException *************** *** 597,605 **** state = 0; bookmarks = new int[8]; ! bookmarks[0] = cursor.getPosition (); while (!done) { ! bookmarks[state + 1] = cursor.getPosition (); ! ch = mPage.getCharacter (cursor); switch (state) { --- 592,600 ---- state = 0; bookmarks = new int[8]; ! bookmarks[0] = mCursor.getPosition (); while (!done) { ! bookmarks[state + 1] = mCursor.getPosition (); ! ch = mPage.getCharacter (mCursor); switch (state) { *************** *** 610,615 **** { // don't consume the opening angle ! cursor.retreat (); ! bookmarks[state + 1] = cursor.getPosition (); } whitespace (attributes, bookmarks); --- 605,610 ---- { // don't consume the opening angle ! mCursor.retreat (); ! bookmarks[state + 1] = mCursor.getPosition (); } whitespace (attributes, bookmarks); *************** *** 628,633 **** { // don't consume the opening angle ! cursor.retreat (); ! bookmarks[state + 1] = cursor.getPosition (); } standalone (attributes, bookmarks); --- 623,628 ---- { // don't consume the opening angle ! mCursor.retreat (); ! bookmarks[state + 1] = mCursor.getPosition (); } standalone (attributes, bookmarks); *************** *** 718,722 **** standalone (attributes, bookmarks); bookmarks[0]=bookmarks[6]; ! cursor.retreat(); state=0; } --- 713,717 ---- standalone (attributes, bookmarks); bookmarks[0]=bookmarks[6]; ! mCursor.retreat(); state=0; } *************** *** 740,744 **** standalone (attributes, bookmarks); bookmarks[0]=bookmarks[6]; ! cursor.retreat(); state=0; } --- 735,739 ---- standalone (attributes, bookmarks); bookmarks[0]=bookmarks[6]; ! mCursor.retreat(); state=0; } *************** *** 749,753 **** } ! return (makeTag (cursor, attributes)); } --- 744,748 ---- } ! return (makeTag (start, mCursor.getPosition (), attributes)); } *************** *** 755,777 **** * Create a tag node based on the current cursor and the one provided. */ ! protected Node makeTag (Cursor cursor, Vector attributes) throws ParserException { int length; - int begin; - int end; Node ret; ! begin = mCursor.getPosition (); ! end = cursor.getPosition (); ! length = end - begin; if (0 != length) { // return tag based on second character, '/', '%', Letter (ch), '!' if (2 > length) // this is an error ! return (makeString (cursor)); ! mCursor = cursor; ! ret = getNodeFactory ().createTagNode (this.getPage (), begin, end, attributes); } else --- 750,767 ---- * Create a tag node based on the current cursor and the one provided. */ ! protected Node makeTag (int start, int end, Vector attributes) throws ParserException { int length; Node ret; ! length = end - start; if (0 != length) { // return tag based on second character, '/', '%', Letter (ch), '!' if (2 > length) // this is an error ! return (makeString (start, end)); ! ret = getNodeFactory ().createTagNode (this.getPage (), start, end, attributes); } else *************** *** 821,825 **** * @param quotesmart If <code>true</code>, strings ignore quoted contents. */ ! protected Node parseRemark (Cursor cursor, boolean quotesmart) throws ParserException --- 811,815 ---- * @param quotesmart If <code>true</code>, strings ignore quoted contents. */ ! protected Node parseRemark (int start, boolean quotesmart) throws ParserException *************** *** 833,837 **** while (!done) { ! ch = mPage.getCharacter (cursor); if (0 == ch) done = true; --- 823,827 ---- while (!done) { ! ch = mPage.getCharacter (mCursor); if (0 == ch) done = true; *************** *** 845,849 **** state = 1; else ! return (parseString (cursor, quotesmart)); break; case 1: // prior to the second open delimiter --- 835,839 ---- state = 1; else ! return (parseString (start, quotesmart)); break; case 1: // prior to the second open delimiter *************** *** 851,855 **** { // handle <!--> because netscape does ! ch = mPage.getCharacter (cursor); if (0 == ch) done = true; --- 841,845 ---- { // handle <!--> because netscape does ! ch = mPage.getCharacter (mCursor); if (0 == ch) done = true; *************** *** 858,867 **** else { ! cursor.retreat (); state = 2; } } else ! return (parseString (cursor, quotesmart)); break; case 2: // prior to the first closing delimiter --- 848,857 ---- else { ! mCursor.retreat (); state = 2; } } else ! return (parseString (start, quotesmart)); break; case 2: // prior to the first closing delimiter *************** *** 869,873 **** state = 3; else if (0 == ch) ! return (parseString (cursor, quotesmart)); // no terminator break; case 3: // prior to the second closing delimiter --- 859,863 ---- state = 3; else if (0 == ch) ! return (parseString (start, quotesmart)); // no terminator break; case 3: // prior to the second closing delimiter *************** *** 892,896 **** } ! return (makeRemark (cursor)); } --- 882,886 ---- } ! return (makeRemark (start, mCursor.getPosition ())); } *************** *** 898,920 **** * Create a remark node based on the current cursor and the one provided. */ ! protected Node makeRemark (Cursor cursor) throws ParserException { int length; - int begin; - int end; Node ret; ! begin = mCursor.getPosition (); ! end = cursor.getPosition (); ! length = end - begin; if (0 != length) { // return tag based on second character, '/', '%', Letter (ch), '!' if (2 > length) // this is an error ! return (makeString (cursor)); ! mCursor = cursor; ! ret = getNodeFactory ().createRemarkNode (this.getPage (), begin, end); } else --- 888,905 ---- * Create a remark node based on the current cursor and the one provided. */ ! protected Node makeRemark (int start, int end) throws ParserException { int length; Node ret; ! length = end - start; if (0 != length) { // return tag based on second character, '/', '%', Letter (ch), '!' if (2 > length) // this is an error ! return (makeString (start, end)); ! ret = getNodeFactory ().createRemarkNode (this.getPage (), start, end); } else *************** *** 930,934 **** * @param cursor The position at which to start scanning. */ ! protected Node parseJsp (Cursor cursor) throws ParserException --- 915,919 ---- * @param cursor The position at which to start scanning. */ ! protected Node parseJsp (int start) throws ParserException *************** *** 952,956 **** while (!done) { ! ch = mPage.getCharacter (cursor); switch (state) { --- 937,941 ---- while (!done) { ! ch = mPage.getCharacter (mCursor); switch (state) { *************** *** 977,987 **** case '=': // <%= case '@': // <%@ ! code = cursor.getPosition (); ! attributes.addElement (new PageAttribute (mPage, mCursor.getPosition () + 1, code, -1, -1, (char)0)); state = 2; break; default: // <%x ! code = cursor.getPosition () - 1; ! attributes.addElement (new PageAttribute (mPage, mCursor.getPosition () + 1, code, -1, -1, (char)0)); state = 2; break; --- 962,972 ---- case '=': // <%= case '@': // <%@ ! code = mCursor.getPosition (); ! attributes.addElement (new PageAttribute (mPage, start + 1, code, -1, -1, (char)0)); state = 2; break; default: // <%x ! code = mCursor.getPosition () - 1; ! attributes.addElement (new PageAttribute (mPage, start + 1, code, -1, -1, (char)0)); state = 2; break; *************** *** 1056,1060 **** if (0 != code) { ! state = cursor.getPosition () - 2; // reuse state attributes.addElement (new PageAttribute (mPage, code, state, -1, -1, (char)0)); attributes.addElement (new PageAttribute (mPage, state, state + 1, -1, -1, (char)0)); --- 1041,1045 ---- if (0 != code) { ! state = mCursor.getPosition () - 2; // reuse state attributes.addElement (new PageAttribute (mPage, code, state, -1, -1, (char)0)); attributes.addElement (new PageAttribute (mPage, state, state + 1, -1, -1, (char)0)); *************** *** 1064,1070 **** } else ! return (parseString (cursor, true)); // hmmm, true? ! return (makeTag (cursor, attributes)); } --- 1049,1055 ---- } else ! return (parseString (start, true)); // hmmm, true? ! return (makeTag (start, mCursor.getPosition (), attributes)); } |
From: Derrick O. <der...@us...> - 2004-07-31 16:43:23
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18039/src/org/htmlparser/tests Modified Files: ParserTest.java FunctionalTests.java InstanceofPerformanceTest.java ParserTestCase.java PerformanceTest.java Log Message: Remove unused variables and other fixes exposed by turning on compiler warnings. Index: FunctionalTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/FunctionalTests.java,v retrieving revision 1.55 retrieving revision 1.56 diff -C2 -d -r1.55 -r1.56 *** FunctionalTests.java 25 Jan 2004 21:33:12 -0000 1.55 --- FunctionalTests.java 31 Jul 2004 16:42:33 -0000 1.56 *************** *** 86,90 **** Node node; for (NodeIterator e= parser.elements();e.hasMoreNodes();) { ! node = (Node)e.nextNode(); if (node instanceof ImageTag) { parserImgTagCount++; --- 86,90 ---- Node node; for (NodeIterator e= parser.elements();e.hasMoreNodes();) { ! node = e.nextNode(); if (node instanceof ImageTag) { parserImgTagCount++; Index: InstanceofPerformanceTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/InstanceofPerformanceTest.java,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** InstanceofPerformanceTest.java 16 Jun 2004 02:17:26 -0000 1.21 --- InstanceofPerformanceTest.java 31 Jul 2004 16:42:33 -0000 1.22 *************** *** 76,80 **** for (long i=0;i<numTimes;i++) { for (Enumeration e = formChildren.elements();e.hasMoreElements();) { ! Node node = (Node)e.nextElement(); } } --- 76,80 ---- for (long i=0;i<numTimes;i++) { for (Enumeration e = formChildren.elements();e.hasMoreElements();) { ! e.nextElement(); } } *************** *** 88,92 **** for (long i=0;i<numTimes;i++) { for (SimpleNodeIterator e = formTag.children();e.hasMoreNodes();) { ! Node node = e.nextNode(); } } --- 88,92 ---- for (long i=0;i<numTimes;i++) { for (SimpleNodeIterator e = formTag.children();e.hasMoreNodes();) { ! e.nextNode(); } } Index: ParserTestCase.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v retrieving revision 1.51 retrieving revision 1.52 diff -C2 -d -r1.51 -r1.52 *** ParserTestCase.java 17 Jul 2004 13:45:04 -0000 1.51 --- ParserTestCase.java 31 Jul 2004 16:42:33 -0000 1.52 *************** *** 358,362 **** if (a.isWhitespace ()) continue; - String actualValue = actualTag.getAttribute (a.getName ()); String expectedValue = expectedTag.getAttribute (a.getName ()); if (null == expectedValue) --- 358,361 ---- Index: PerformanceTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/PerformanceTest.java,v retrieving revision 1.47 retrieving revision 1.48 diff -C2 -d -r1.47 -r1.48 *** PerformanceTest.java 2 Jan 2004 16:24:55 -0000 1.47 --- PerformanceTest.java 31 Jul 2004 16:42:33 -0000 1.48 *************** *** 27,31 **** package org.htmlparser.tests; - import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.util.DefaultParserFeedback; --- 27,30 ---- *************** *** 58,66 **** // Create the parser object parser = new Parser(file,new DefaultParserFeedback()); - Node node; long start=System.currentTimeMillis(); ! for (NodeIterator e = parser.elements();e.hasMoreNodes();) { ! node = e.nextNode(); ! } long elapsedTime=System.currentTimeMillis()-start; if (i!=0) --- 57,63 ---- // Create the parser object parser = new Parser(file,new DefaultParserFeedback()); long start=System.currentTimeMillis(); ! for (NodeIterator e = parser.elements();e.hasMoreNodes();) ! e.nextNode(); long elapsedTime=System.currentTimeMillis()-start; if (i!=0) *************** *** 86,94 **** // Create the parser object parser = new Parser(file,new DefaultParserFeedback()); - Node node; long start=System.currentTimeMillis(); ! for (NodeIterator e = parser.elements();e.hasMoreNodes();) { ! node = e.nextNode(); ! } long elapsedTime=System.currentTimeMillis()-start; if (i!=0) --- 83,89 ---- // Create the parser object parser = new Parser(file,new DefaultParserFeedback()); long start=System.currentTimeMillis(); ! for (NodeIterator e = parser.elements();e.hasMoreNodes();) ! e.nextNode(); long elapsedTime=System.currentTimeMillis()-start; if (i!=0) Index: ParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTest.java,v retrieving revision 1.60 retrieving revision 1.61 diff -C2 -d -r1.60 -r1.61 *** ParserTest.java 2 Jul 2004 00:49:29 -0000 1.60 --- ParserTest.java 31 Jul 2004 16:42:33 -0000 1.61 *************** *** 558,565 **** public void testNullUrl() { - Parser parser; try { ! parser = new Parser("http://none.existant.url.org", Parser.noFeedback); assertTrue("Should have thrown an exception!",false); } --- 558,564 ---- public void testNullUrl() { try { ! new Parser("http://none.existant.url.org", Parser.noFeedback); assertTrue("Should have thrown an exception!",false); } *************** *** 834,838 **** nodes = new Node[0]; } - int count = nodes.length; assertTrue ("node count", 3 == nodes.length); } --- 833,836 ---- |
From: Derrick O. <der...@us...> - 2004-07-31 16:43:23
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18039/src/org/htmlparser/tests/scannersTests Modified Files: CompositeTagScannerTest.java Log Message: Remove unused variables and other fixes exposed by turning on compiler warnings. Index: CompositeTagScannerTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/CompositeTagScannerTest.java,v retrieving revision 1.61 retrieving revision 1.62 diff -C2 -d -r1.61 -r1.62 *** CompositeTagScannerTest.java 2 Jul 2004 00:49:30 -0000 1.61 --- CompositeTagScannerTest.java 31 Jul 2004 16:42:32 -0000 1.62 *************** *** 48,71 **** } - private CompositeTagScanner scanner; - private String url; - public CompositeTagScannerTest(String name) { super(name); } - protected void setUp() { - scanner = - new CompositeTagScanner() { - String [] arr = { - "SOMETHING" - }; - public String[] getID() { - return arr; - } - - }; - } - private CustomTag parseCustomTag(int expectedNodeCount) throws ParserException { parser.setNodeFactory (new PrototypicalNodeFactory (new CustomTag ())); --- 48,55 ---- *************** *** 606,617 **** public static class CustomScanner extends CompositeTagScanner { private static final String MATCH_NAME [] = { "CUSTOM" }; - private boolean selfChildrenAllowed; public CustomScanner() { - this(true); - } - - public CustomScanner(boolean selfChildrenAllowed) { - // super("", selfChildrenAllowed ? new String[] {} : MATCH_NAME); - this.selfChildrenAllowed = selfChildrenAllowed; } --- 590,594 ---- *************** *** 623,635 **** public static class AnotherScanner extends CompositeTagScanner { private static final String MATCH_NAME [] = { "ANOTHER" }; - private boolean acceptCustomTagsButDontAcceptCustomEndTags; public AnotherScanner() { - // super("", new String[] {"CUSTOM"}); - acceptCustomTagsButDontAcceptCustomEndTags = false; - } - - public AnotherScanner(boolean acceptCustomTagsButDontAcceptCustomEndTags) { - // super("", new String[] {}, new String[] {"CUSTOM"}); - this.acceptCustomTagsButDontAcceptCustomEndTags = acceptCustomTagsButDontAcceptCustomEndTags; } --- 600,604 ---- *************** *** 656,660 **** * The default scanner for custom tags. */ ! protected final static CustomScanner mDefaultScanner = new CustomScanner (); public CustomTag () --- 625,629 ---- * The default scanner for custom tags. */ ! protected final static CustomScanner mCustomScanner = new CustomScanner (); public CustomTag () *************** *** 669,673 **** else mEnders = mIds; ! setThisScanner (mDefaultScanner); } --- 638,642 ---- else mEnders = mIds; ! setThisScanner (mCustomScanner); } *************** *** 713,717 **** * The default scanner for custom tags. */ ! protected final static AnotherScanner mDefaultScanner = new AnotherScanner (); public AnotherTag (boolean acceptCustomTagsButDontAcceptCustomEndTags) --- 682,686 ---- * The default scanner for custom tags. */ ! protected final static AnotherScanner mAnotherScanner = new AnotherScanner (); public AnotherTag (boolean acceptCustomTagsButDontAcceptCustomEndTags) *************** *** 727,731 **** mEndTagEnders = new String[] {"CUSTOM"}; } ! setThisScanner (mDefaultScanner); } --- 696,700 ---- mEndTagEnders = new String[] {"CUSTOM"}; } ! setThisScanner (mAnotherScanner); } |
From: Derrick O. <der...@us...> - 2004-07-31 16:43:22
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18039/src/org/htmlparser/tests/lexerTests Modified Files: PageTests.java TagTests.java KitTest.java SourceTests.java Log Message: Remove unused variables and other fixes exposed by turning on compiler warnings. Index: PageTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/PageTests.java,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** PageTests.java 18 Mar 2004 04:04:08 -0000 1.17 --- PageTests.java 31 Jul 2004 16:42:31 -0000 1.18 *************** *** 78,86 **** public void testNull () throws ParserException { - Page page; - try { ! page = new Page ((URLConnection)null); assertTrue ("null value in constructor", false); } --- 78,84 ---- public void testNull () throws ParserException { try { ! new Page ((URLConnection)null); assertTrue ("null value in constructor", false); } *************** *** 92,96 **** try { ! page = new Page ((String)null); assertTrue ("null value in constructor", false); } --- 90,94 ---- try { ! new Page ((String)null); assertTrue ("null value in constructor", false); } *************** *** 108,116 **** String link; URL url; - Page page; link = "http://www.ibm.com/jp/"; url = new URL (link); ! page = new Page (url.openConnection ()); } --- 106,113 ---- String link; URL url; link = "http://www.ibm.com/jp/"; url = new URL (link); ! new Page (url.openConnection ()); } *************** *** 122,126 **** String link; URL url; - Page page; link = "http://www.bigbogosity.org/"; --- 119,122 ---- *************** *** 128,132 **** try { ! page = new Page (url.openConnection ()); } catch (ParserException pe) --- 124,128 ---- try { ! new Page (url.openConnection ()); } catch (ParserException pe) Index: KitTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/KitTest.java,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** KitTest.java 24 May 2004 16:18:31 -0000 1.7 --- KitTest.java 31 Jul 2004 16:42:31 -0000 1.8 *************** *** 27,35 **** import java.io.IOException; - import java.io.Reader; import java.net.URL; import java.util.Vector; import javax.swing.text.BadLocationException; - import javax.swing.text.Element; import javax.swing.text.MutableAttributeSet; import javax.swing.text.html.HTML; --- 27,33 ---- *************** *** 91,95 **** { ch = s.charAt (i); ! if (!Character.isWhitespace (ch) && !(160 == (int)ch)) ret.append (ch); } --- 89,93 ---- { ch = s.charAt (i); ! if (!Character.isWhitespace (ch) && !(160 == ch)) ret.append (ch); } *************** *** 130,134 **** for (int i = 0; i < data.length; i++) { ! if (160 == (int)data[i]) sb.append (" "); else --- 128,132 ---- for (int i = 0; i < data.length; i++) { ! if (160 == data[i]) sb.append (" "); else *************** *** 256,260 **** public void handleStartTag (HTML.Tag t, MutableAttributeSet a, int pos) { - StringBuffer sb; String theirs; Node node; --- 254,257 ---- *************** *** 321,325 **** public void handleEndTag (HTML.Tag t, int pos) { - StringBuffer sb; String theirs; Node node; --- 318,321 ---- *************** *** 387,391 **** public void handleSimpleTag (HTML.Tag t, MutableAttributeSet a, int pos) { - StringBuffer sb; String theirs; Node node; --- 383,386 ---- *************** *** 579,583 **** Parser parser; - Element[] elements; if (0 == args.length) --- 574,577 ---- *************** *** 597,601 **** kit = test.getKit (); parser = kit.getParser (); ! parser.parse ((Reader)lexer.getPage ().getSource (), (ParserCallback)test, true); } } --- 591,595 ---- kit = test.getKit (); parser = kit.getParser (); ! parser.parse (lexer.getPage ().getSource (), test, true); } } *************** *** 605,608 **** --- 599,605 ---- * * $Log$ + * Revision 1.8 2004/07/31 16:42:31 derrickoswald + * Remove unused variables and other fixes exposed by turning on compiler warnings. + * * Revision 1.7 2004/05/24 16:18:31 derrickoswald * Part three of a multiphase refactoring. Index: SourceTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/SourceTests.java,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** SourceTests.java 3 Jul 2004 13:56:08 -0000 1.17 --- SourceTests.java 31 Jul 2004 16:42:31 -0000 1.18 *************** *** 250,254 **** Source source; char[] buffer; - int c; int length; --- 250,253 ---- *************** *** 517,521 **** Source source; char[] buffer; - int c; int length; --- 516,519 ---- Index: TagTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/TagTests.java,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** TagTests.java 2 Jul 2004 00:49:30 -0000 1.11 --- TagTests.java 31 Jul 2004 16:42:31 -0000 1.12 *************** *** 26,31 **** package org.htmlparser.tests.lexerTests; - import java.util.HashMap; - import java.util.Map; import org.htmlparser.Node; --- 26,29 ---- *************** *** 79,83 **** "</head>" + "<body bgcolor=\"#FFFFFF\" text=\"#000000\" leftmargin=\"0\" topmargin=\"0\" marginwidth=\"0\" marginheight=\"0\" link=\"#003399\" vlink=\"#003399\" alink=\"#003399\">"; - private Map results; private int testProgress; --- 77,80 ---- *************** *** 289,293 **** ParsingThread parsingThread [] = new ParsingThread[100]; - results = new HashMap(); testProgress = 0; for (int i=0;i<parsingThread.length;i++) { --- 286,289 ---- *************** *** 350,379 **** class ParsingThread implements Runnable { ! Parser parser; ! int id; ! LinkTag link1, link2; ! boolean result; ! int max; ParsingThread(int id, String testHtml, int max) { ! this.id = id; ! this.max = max; ! this.parser = Parser.createParser(testHtml, null); } public void run() { try { ! result = false; ! Node linkTag [] = parser.extractAllNodesThatAre(LinkTag.class); ! link1 = (LinkTag)linkTag[0]; ! link2 = (LinkTag)linkTag[1]; ! if (id<max/2) { ! if (link1.getLink().equals("/cgi-bin/view_search?query_text=postdate>20020701&txt_clr=White&bg_clr=Red&url=http://localhost/Testing/Report1.html") && ! link2.getLink().equals("http://normallink.com/sometext.html")) ! result = true; } else { ! if (link1.getLink().equals("http://normallink.com/sometext.html") && ! link2.getLink().equals("http://normallink.com/sometext.html")) ! result = true; } } --- 346,376 ---- class ParsingThread implements Runnable { ! Parser mParser; ! int mId; ! LinkTag mLink1; ! LinkTag mLink2; ! boolean mResult; ! int mMax; ParsingThread(int id, String testHtml, int max) { ! mId = id; ! mMax = max; ! mParser = Parser.createParser(testHtml, null); } public void run() { try { ! mResult = false; ! Node linkTag [] = mParser.extractAllNodesThatAre(LinkTag.class); ! mLink1 = (LinkTag)linkTag[0]; ! mLink2 = (LinkTag)linkTag[1]; ! if (mId < mMax / 2) { ! if (mLink1.getLink().equals("/cgi-bin/view_search?query_text=postdate>20020701&txt_clr=White&bg_clr=Red&url=http://localhost/Testing/Report1.html") && ! mLink2.getLink().equals("http://normallink.com/sometext.html")) ! mResult = true; } else { ! if (mLink1.getLink().equals("http://normallink.com/sometext.html") && ! mLink2.getLink().equals("http://normallink.com/sometext.html")) ! mResult = true; } } *************** *** 383,400 **** } finally { ! testProgress += id; } } public LinkTag getLink1() { ! return link1; } public LinkTag getLink2() { ! return link2; } public boolean passed() { ! return result; } } --- 380,397 ---- } finally { ! testProgress += mId; } } public LinkTag getLink1() { ! return (mLink1); } public LinkTag getLink2() { ! return (mLink2); } public boolean passed() { ! return (mResult); } } |
From: Derrick O. <der...@us...> - 2004-07-31 16:43:22
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18039/src/org/htmlparser/tests/tagTests Modified Files: ScriptTagTest.java LinkTagTest.java TagTest.java ImageTagTest.java TableTagTest.java ObjectCollectionTest.java FormTagTest.java Log Message: Remove unused variables and other fixes exposed by turning on compiler warnings. Index: FormTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/FormTagTest.java,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** FormTagTest.java 2 Jul 2004 00:49:31 -0000 1.45 --- FormTagTest.java 31 Jul 2004 16:42:31 -0000 1.46 *************** *** 244,248 **** int i = 0; for (NodeIterator e=formTag.children();e.hasMoreNodes();) { ! Node formNode = (Node)e.nextNode(); if (formNode instanceof Remark) { remarkNode[i++] = (Remark)formNode; --- 244,248 ---- int i = 0; for (NodeIterator e=formTag.children();e.hasMoreNodes();) { ! Node formNode = e.nextNode(); if (formNode instanceof Remark) { remarkNode[i++] = (Remark)formNode; Index: LinkTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/LinkTagTest.java,v retrieving revision 1.50 retrieving revision 1.51 diff -C2 -d -r1.50 -r1.51 *** LinkTagTest.java 22 Jul 2004 02:22:31 -0000 1.50 --- LinkTagTest.java 31 Jul 2004 16:42:31 -0000 1.51 *************** *** 559,563 **** for (SimpleNodeIterator e = linkTag.children();e.hasMoreNodes();) { ! dataNode[i++] = (Node)e.nextNode(); } assertEquals("Number of data nodes",new Integer(2),new Integer(i)); --- 559,563 ---- for (SimpleNodeIterator e = linkTag.children();e.hasMoreNodes();) { ! dataNode[i++] = e.nextNode(); } assertEquals("Number of data nodes",new Integer(2),new Integer(i)); *************** *** 778,782 **** int j =0 ; for (SimpleNodeIterator e = linkTag.children();e.hasMoreNodes();) { ! insideNodes[j++]= (Node)e.nextNode(); } assertEquals("Number of contained internal nodes",1,j); --- 778,782 ---- int j =0 ; for (SimpleNodeIterator e = linkTag.children();e.hasMoreNodes();) { ! insideNodes[j++]= e.nextNode(); } assertEquals("Number of contained internal nodes",1,j); Index: ObjectCollectionTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ObjectCollectionTest.java,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** ObjectCollectionTest.java 2 Jul 2004 00:49:31 -0000 1.21 --- ObjectCollectionTest.java 31 Jul 2004 16:42:31 -0000 1.22 *************** *** 34,38 **** import org.htmlparser.tags.TableTag; import org.htmlparser.tests.ParserTestCase; - import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; import org.htmlparser.util.ParserUtils; --- 34,37 ---- *************** *** 113,117 **** parseAndAssertNodeCount(1); TableTag tableTag = (TableTag)node[0]; - NodeList nodeList = new NodeList(); Node[] spans = ParserUtils.findTypeInNode (tableTag, Span.class); assertSpanContent(spans); --- 112,115 ---- Index: ScriptTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ScriptTagTest.java,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** ScriptTagTest.java 18 Jul 2004 21:31:21 -0000 1.44 --- ScriptTagTest.java 31 Jul 2004 16:42:31 -0000 1.45 *************** *** 29,33 **** import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; - import org.htmlparser.scanners.ScriptScanner; import org.htmlparser.tags.ScriptTag; import org.htmlparser.tests.ParserTestCase; --- 29,32 ---- *************** *** 41,46 **** } - private ScriptScanner scriptScanner; - public ScriptTagTest(String name) { --- 40,43 ---- *************** *** 48,57 **** } - protected void setUp() throws Exception - { - super.setUp(); - scriptScanner = new ScriptScanner(); - } - public void testCreation() throws ParserException { --- 45,48 ---- Index: TableTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TableTagTest.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** TableTagTest.java 7 Dec 2003 23:41:43 -0000 1.1 --- TableTagTest.java 31 Jul 2004 16:42:31 -0000 1.2 *************** *** 180,186 **** "http://www.sec.gov/Archives/edgar/data/30554/000089322002000287/w57038e10-k.htm" ); - Node node; for (NodeIterator e = parser.elements(); e.hasMoreNodes(); ) ! node = e.nextNode(); } } --- 180,185 ---- "http://www.sec.gov/Archives/edgar/data/30554/000089322002000287/w57038e10-k.htm" ); for (NodeIterator e = parser.elements(); e.hasMoreNodes(); ) ! e.nextNode(); } } Index: ImageTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ImageTagTest.java,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** ImageTagTest.java 18 Jul 2004 21:31:21 -0000 1.44 --- ImageTagTest.java 31 Jul 2004 16:42:31 -0000 1.45 *************** *** 293,297 **** Node thisNode; for (NodeIterator e = parser.elements();e.hasMoreNodes();) { ! thisNode = (Node)e.nextNode(); if (thisNode instanceof ImageTag) node[i++] = thisNode; --- 293,297 ---- Node thisNode; for (NodeIterator e = parser.elements();e.hasMoreNodes();) { ! thisNode = e.nextNode(); if (thisNode instanceof ImageTag) node[i++] = thisNode; Index: TagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TagTest.java,v retrieving revision 1.60 retrieving revision 1.61 diff -C2 -d -r1.60 -r1.61 *** TagTest.java 17 Jul 2004 13:45:06 -0000 1.60 --- TagTest.java 31 Jul 2004 16:42:31 -0000 1.61 *************** *** 27,31 **** package org.htmlparser.tests.tagTests; - import java.util.Hashtable; import org.htmlparser.Attribute; --- 27,30 ---- *************** *** 291,295 **** createParser(lin1); NodeIterator en = parser.elements(); - Hashtable h; String a,nice; --- 290,293 ---- |
From: Derrick O. <der...@us...> - 2004-07-31 16:43:22
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18039/src/org/htmlparser/lexer Modified Files: Page.java Cursor.java Lexer.java Log Message: Remove unused variables and other fixes exposed by turning on compiler warnings. Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** Page.java 29 Jul 2004 01:19:22 -0000 1.40 --- Page.java 31 Jul 2004 16:42:31 -0000 1.41 *************** *** 169,174 **** public Page (String text, String charset) { - InputStream stream; - if (null == text) throw new IllegalArgumentException ("text cannot be null"); --- 169,172 ---- Index: Lexer.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v retrieving revision 1.30 retrieving revision 1.31 diff -C2 -d -r1.30 -r1.31 *** Lexer.java 24 May 2004 16:18:16 -0000 1.30 --- Lexer.java 31 Jul 2004 16:42:31 -0000 1.31 *************** *** 371,375 **** char ch; char quote; - Node ret; done = false; --- 371,374 ---- *************** *** 940,944 **** Vector attributes; int code; - Node ret; done = false; --- 939,942 ---- Index: Cursor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Cursor.java,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** Cursor.java 2 Jan 2004 19:32:04 -0000 1.17 --- Cursor.java 31 Jul 2004 16:42:31 -0000 1.18 *************** *** 125,130 **** public String toString () { - int row; - int column; StringBuffer ret; --- 125,128 ---- |
From: Derrick O. <der...@us...> - 2004-07-31 16:43:22
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18039/src/org/htmlparser/scanners Modified Files: ScriptScanner.java StyleScanner.java CompositeTagScanner.java Log Message: Remove unused variables and other fixes exposed by turning on compiler warnings. Index: StyleScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/StyleScanner.java,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** StyleScanner.java 17 Jul 2004 13:45:03 -0000 1.37 --- StyleScanner.java 31 Jul 2004 16:42:32 -0000 1.38 *************** *** 116,120 **** // build new end tag if required if (null == end) ! end = (Tag)lexer.getNodeFactory ().createTagNode ( lexer.getPage (), endpos, endpos, new Vector ()); ret = tag; --- 116,120 ---- // build new end tag if required if (null == end) ! end = lexer.getNodeFactory ().createTagNode ( lexer.getPage (), endpos, endpos, new Vector ()); ret = tag; Index: CompositeTagScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/CompositeTagScanner.java,v retrieving revision 1.88 retrieving revision 1.89 diff -C2 -d -r1.88 -r1.89 *** CompositeTagScanner.java 17 Jul 2004 13:45:03 -0000 1.88 --- CompositeTagScanner.java 31 Jul 2004 16:42:32 -0000 1.89 *************** *** 189,193 **** Vector attributes = new Vector (); attributes.addElement (new Attribute (name, null)); ! Tag opener = (Tag)lexer.getNodeFactory ().createTagNode ( lexer.getPage (), next.getStartPosition (), next.getEndPosition (), attributes); --- 189,193 ---- Vector attributes = new Vector (); attributes.addElement (new Attribute (name, null)); ! Tag opener = lexer.getNodeFactory ().createTagNode ( lexer.getPage (), next.getStartPosition (), next.getEndPosition (), attributes); *************** *** 325,329 **** attributes = new Vector (); attributes.addElement (new Attribute (name, (String)null)); ! ret = (Tag)lexer.getNodeFactory ().createTagNode ( page, position, position, attributes); --- 325,329 ---- attributes = new Vector (); attributes.addElement (new Attribute (name, (String)null)); ! ret = lexer.getNodeFactory ().createTagNode ( page, position, position, attributes); Index: ScriptScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/ScriptScanner.java,v retrieving revision 1.60 retrieving revision 1.61 diff -C2 -d -r1.60 -r1.61 *** ScriptScanner.java 17 Jul 2004 13:45:03 -0000 1.60 --- ScriptScanner.java 31 Jul 2004 16:42:32 -0000 1.61 *************** *** 92,96 **** language.equalsIgnoreCase ("VBScript.Encode"))) { - int start = lexer.getPosition (); String code = ScriptDecoder.Decode (lexer.getPage (), lexer.getCursor ()); ((ScriptTag)tag).setScriptCode (code); --- 92,95 ---- *************** *** 133,137 **** // build new end tag if required if (null == end) ! end = (Tag)lexer.getNodeFactory ().createTagNode ( lexer.getPage (), endpos, endpos, new Vector ()); ret = tag; --- 132,136 ---- // build new end tag if required if (null == end) ! end = lexer.getNodeFactory ().createTagNode ( lexer.getPage (), endpos, endpos, new Vector ()); ret = tag; |