htmlparser-cvs Mailing List for HTML Parser (Page 16)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(141) |
Jun
(108) |
Jul
(66) |
Aug
(127) |
Sep
(155) |
Oct
(149) |
Nov
(72) |
Dec
(72) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(100) |
Feb
(36) |
Mar
(21) |
Apr
(3) |
May
(87) |
Jun
(28) |
Jul
(84) |
Aug
(5) |
Sep
(14) |
Oct
|
Nov
|
Dec
|
2005 |
Jan
(1) |
Feb
(39) |
Mar
(26) |
Apr
(38) |
May
(14) |
Jun
(10) |
Jul
|
Aug
|
Sep
(13) |
Oct
(8) |
Nov
(10) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(24) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Derrick O. <der...@us...> - 2004-06-03 01:18:35
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv5722 Modified Files: contributors.html Log Message: Add Rodney S. Foley's photo. Index: contributors.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/contributors.html,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** contributors.html 1 Jun 2004 01:44:59 -0000 1.9 --- contributors.html 3 Jun 2004 01:18:27 -0000 1.10 *************** *** 13,17 **** <tr> <td width="25%" height="270" valign="top"> ! <img src="pics/derrick.jpg" width="97" height="101"><strong> <img src="pics/canada.gif" width="53" height="39"></strong><br> Derrick Oswald<br> Software Development Manager<br> --- 13,17 ---- <tr> <td width="25%" height="270" valign="top"> ! <img src="pics/derrick.jpg" width="97" height="101"> <img src="pics/canada.gif" width="53" height="39"><br> Derrick Oswald<br> Software Development Manager<br> *************** *** 58,62 **** <td width="25%" height="270"valign="top"> <img src="pics/somik.jpg" width="95" height="101"> ! <strong><img src="pics/india.gif" width="53" height="39"></strong><br> Somik Raha <br> Extreme Programmer & Coach<br> --- 58,62 ---- <td width="25%" height="270"valign="top"> <img src="pics/somik.jpg" width="95" height="101"> ! <img src="pics/india.gif" width="53" height="39"><br> Somik Raha <br> Extreme Programmer & Coach<br> *************** *** 97,101 **** <tr> <td width="25%" height="270" valign="top"> ! <img src="pics/joshua.jpg" width="122" height="101"><strong><img src="pics/usa.gif" width="53" height="39"></strong><br> Joshua Kerievsky<br> Founder, Extreme Programmer & Coach<br> --- 97,101 ---- <tr> <td width="25%" height="270" valign="top"> ! <img src="pics/joshua.jpg" width="122" height="101"><img src="pics/usa.gif" width="53" height="39"><br> Joshua Kerievsky<br> Founder, Extreme Programmer & Coach<br> *************** *** 124,132 **** numerous XP and patterns-based articles, simulations and games, including the forthcoming book, Refactoring to Patterns.</p></td> ! <td width="36%"><p><strong><br> </td> </tr> <tr> ! <td valign="top"><img src="pics/kk.jpg" width="100" height="109"><strong><img src="pics/finland.gif" width="58" height="37"></strong><br> Kaarle Kaila<br> Software Developer - Consult.<br> --- 124,132 ---- numerous XP and patterns-based articles, simulations and games, including the forthcoming book, Refactoring to Patterns.</p></td> ! <td width="36%"><p><br> </td> </tr> <tr> ! <td valign="top"><img src="pics/kk.jpg" width="100" height="109"><img src="pics/finland.gif" width="58" height="37"><br> Kaarle Kaila<br> Software Developer - Consult.<br> *************** *** 228,232 **** <!-- <img src="pics/alberto.jpg" width="181" height="265">--> <img src="pics/alberto.jpg" width="100"> ! <strong><img src="pics/italy.gif" width="53" height="39"></strong><br> Alberto Nacher<br> Software Developer - Consultant<br> --- 228,232 ---- <!-- <img src="pics/alberto.jpg" width="181" height="265">--> <img src="pics/alberto.jpg" width="100"> ! <img src="pics/italy.gif" width="53" height="39"><br> Alberto Nacher<br> Software Developer - Consultant<br> *************** *** 381,385 **** </tr> <tr> ! <td valign="top"><p>Rodney S. Foley<br> <a href="http://sourceforge.net/sendmessage.php?touser=231872">email</a> </p></td> --- 381,387 ---- </tr> <tr> ! <td valign="top"> ! <img src="pics/rsf.gif" width="102" height="150"><img src="pics/usa.gif" width="53" height="39"><br> ! <p>Rodney S. Foley<br> <a href="http://sourceforge.net/sendmessage.php?touser=231872">email</a> </p></td> |
Update of /cvsroot/htmlparser/htmlparser/resources/logofiles In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv4560 Added Files: htmlparser2in.gif htmlparser_cmyk.eps htmlparser_greyscale.eps htmlparser_pms.eps htmlparser_rgb_2inch.jpg htmlparser_rgb_5inch.jpg Log Message: Full set of logo files from Jon Gillette. --- NEW FILE: htmlparser2in.gif --- (This appears to be a binary file; contents omitted.) --- NEW FILE: htmlparser_pms.eps --- (This appears to be a binary file; contents omitted.) --- NEW FILE: htmlparser_rgb_2inch.jpg --- (This appears to be a binary file; contents omitted.) --- NEW FILE: htmlparser_rgb_5inch.jpg --- (This appears to be a binary file; contents omitted.) --- NEW FILE: htmlparser_greyscale.eps --- (This appears to be a binary file; contents omitted.) --- NEW FILE: htmlparser_cmyk.eps --- (This appears to be a binary file; contents omitted.) |
From: Derrick O. <der...@us...> - 2004-06-03 00:42:54
|
Update of /cvsroot/htmlparser/htmlparser/resources/logofiles In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31966/logofiles Log Message: Directory /cvsroot/htmlparser/htmlparser/resources/logofiles added to the repository |
From: Somik R. <so...@us...> - 2004-06-02 22:47:30
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12254/src/org/htmlparser/tests Modified Files: ParserTestCase.java Log Message: modified to allow usage of assertXmlEquals Index: ParserTestCase.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v retrieving revision 1.46 retrieving revision 1.47 diff -C2 -d -r1.46 -r1.47 *** ParserTestCase.java 24 May 2004 16:18:30 -0000 1.46 --- ParserTestCase.java 2 Jun 2004 22:47:21 -0000 1.47 *************** *** 159,163 **** ); System.out.println ("string differs, expected \"" + expected + "\", actual \"" + actual + "\""); ! fail(errorMsg.toString()); } --- 159,163 ---- ); System.out.println ("string differs, expected \"" + expected + "\", actual \"" + actual + "\""); ! failWithMessage(errorMsg.toString()); } *************** *** 165,169 **** } ! public void parseNodes() throws ParserException{ nodeCount = 0; for (NodeIterator e = parser.elements();e.hasMoreNodes();) --- 165,173 ---- } ! public void failWithMessage(String message) { ! fail(message); ! } ! ! public void parseNodes() throws ParserException{ nodeCount = 0; for (NodeIterator e = parser.elements();e.hasMoreNodes();) |
From: Somik R. <so...@us...> - 2004-06-02 22:47:21
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12184 Added Files: .cvsignore Log Message: added .cvsignore --- NEW FILE: .cvsignore --- bin |
From: Derrick O. <der...@us...> - 2004-06-01 01:45:07
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24250 Modified Files: contributors.html Log Message: Add htmlparser.org reference in Rodney S. Foley's writeup. Index: contributors.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/contributors.html,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** contributors.html 31 May 2004 22:27:09 -0000 1.8 --- contributors.html 1 Jun 2004 01:44:59 -0000 1.9 *************** *** 22,26 **** K1R 7Y2<br> (613) 755-5065 ! <br> <a href="http://www.autodesk.com">http://www.autodesk.com</a><br> <!--a href="mailto:Der...@Au...">Der...@Au...</a--> <a href="http://sourceforge.net/sendmessage.php?touser=605407">email</a> --- 22,26 ---- K1R 7Y2<br> (613) 755-5065 ! <br><a href="http://www.autodesk.com">http://www.autodesk.com</a><br> <!--a href="mailto:Der...@Au...">Der...@Au...</a--> <a href="http://sourceforge.net/sendmessage.php?touser=605407">email</a> *************** *** 382,390 **** <tr> <td valign="top"><p>Rodney S. Foley<br> ! <a href="mailto:rs...@ha...">rs...@ha...</a> </p></td> <td valign="top"><p>Rodney made an important contribution to this project. HTMLParser initially used to have a complex mechanism of auto-registering scanners. Rodney first suggested that this should be done away with, as it was confusing.</p> ! <p>This single suggestion helped simplify the design of the parser.</p></td> <td valign="top"> </td> </tr> --- 382,394 ---- <tr> <td valign="top"><p>Rodney S. Foley<br> ! <a href="http://sourceforge.net/sendmessage.php?touser=231872">email</a> ! </p></td> <td valign="top"><p>Rodney made an important contribution to this project. HTMLParser initially used to have a complex mechanism of auto-registering scanners. Rodney first suggested that this should be done away with, as it was confusing.</p> ! <p>This single suggestion helped simplify the design of the parser.</p> ! <p>Rodney has also kindly offered to register and hold the ! <a href="http://htmlparser.org" target="_parent">htmlparser.org</a> ! domain name and forward traffic to the SourceForge project page.</p></td> <td valign="top"> </td> </tr> *************** *** 392,397 **** <p>Thanks to Gernot Fricke, Nick Burch, Stephen Harrington, Domenico Lordi, Kamen, John Zook, Cheng Jun, Mazlan Mat, Rob Shields, Wolfgang Germund, Raj Sharma, ! Robert Kausch, Gordon Deudney, Serge Kruppa, Roger Kjensrud, Rodney S Foley ! and Manpreet Singh for suggestions, bug reports and feature ideas. <br> <p>Thanks to Jon Gillette for the cool new logo.<br> </body> --- 396,401 ---- <p>Thanks to Gernot Fricke, Nick Burch, Stephen Harrington, Domenico Lordi, Kamen, John Zook, Cheng Jun, Mazlan Mat, Rob Shields, Wolfgang Germund, Raj Sharma, ! Robert Kausch, Gordon Deudney, Serge Kruppa, Roger Kjensrud, and Manpreet Singh ! for suggestions, bug reports and feature ideas. <br> <p>Thanks to Jon Gillette for the cool new logo.<br> </body> |
From: Derrick O. <der...@us...> - 2004-05-31 22:27:18
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15166 Modified Files: release.txt panel.html htmlparserlogo.jpg contributors.html Added Files: htmlparser.jpg Log Message: New logo from Jon Gillette. Index: release.txt =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/release.txt,v retrieving revision 1.60 retrieving revision 1.61 diff -C2 -d -r1.60 -r1.61 *** release.txt 29 May 2004 20:40:11 -0000 1.60 --- release.txt 31 May 2004 22:27:09 -0000 1.61 *************** *** 81,84 **** --- 81,85 ---- [32] Alberto Nacher [33] Rogers George + [34] Jon Gillette If you find any bugs, please go to Index: panel.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/panel.html,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** panel.html 4 Jan 2004 03:23:08 -0000 1.7 --- panel.html 31 May 2004 22:27:09 -0000 1.8 *************** *** 11,15 **** </head> <body bgcolor="#FFFFFF" background="background.gif"> ! <img SRC="htmlparserlogo.jpg" BORDER=0 height=40 width=100> <p><strong>About HTMLParser</strong></p> <ul> --- 11,15 ---- </head> <body bgcolor="#FFFFFF" background="background.gif"> ! <img SRC="htmlparserlogo.jpg" BORDER=0 height=132 width=157> <p><strong>About HTMLParser</strong></p> <ul> Index: htmlparserlogo.jpg =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/htmlparserlogo.jpg,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 Binary files /tmp/cvsLrIY6I and /tmp/cvswvGLv1 differ Index: contributors.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/contributors.html,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** contributors.html 20 Apr 2004 10:49:51 -0000 1.7 --- contributors.html 31 May 2004 22:27:09 -0000 1.8 *************** *** 394,398 **** Robert Kausch, Gordon Deudney, Serge Kruppa, Roger Kjensrud, Rodney S Foley and Manpreet Singh for suggestions, bug reports and feature ideas. <br> ! </body> </html> --- 394,398 ---- Robert Kausch, Gordon Deudney, Serge Kruppa, Roger Kjensrud, Rodney S Foley and Manpreet Singh for suggestions, bug reports and feature ideas. <br> ! <p>Thanks to Jon Gillette for the cool new logo.<br> </body> </html> --- NEW FILE: htmlparser.jpg --- (This appears to be a binary file; contents omitted.) |
From: Derrick O. <der...@us...> - 2004-05-30 01:44:35
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187 Modified Files: build.xml Log Message: Use WikiCapturer to pull Wiki pages locally. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.67 retrieving revision 1.68 diff -C2 -d -r1.67 -r1.68 *** build.xml 29 May 2004 20:40:10 -0000 1.67 --- build.xml 30 May 2004 01:43:54 -0000 1.68 *************** *** 6,33 **** Release Procedure - cd htmlparser ! - delete the local Wiki pages with 'rm /home/derrick/htmlparser_cvs/htmlparser/docs/wiki/*' ! and 'rm /home/derrick/htmlparser_cvs/htmlparser/docs/wiki/images/*', ! of course any one else would have to adjust this and ! also the hard-coded path in WikiCapturer ! - 'javac -classpath lib/htmlparser.jar ../WikiCapturer/src/org/htmlparser/wikicapturer/CaptureWiki.java ../WikiCapturer/src/org/htmlparser/wikicapturer/PhpWikiVisitor.java' ! and 'java -classpath lib/htmlparser.jar:../WikiCapturer/src org.htmlparser.wikicapturer.CaptureWiki' ! fetches current Wiki pages - set environment variables CVSROOT and CVS_RSH (see changelog task) - 'ant changelog' generates htmlparser/ChangeLog (this will be changed to use the previous version tag someday) ! - edit the ChangeLog to exclude changes already incorporated and the previous ! release's "update of version headers" drop - the CVS date spec is only accurate ! to the day since it comes from the version coded in the Parser.java file, ! that's why this step can't be automated - incorporate changes from ChangeLog into htmlparser/docs/changes under a heading like "Integration Build 1.5 - 20040522" - 'ant versionSource' updates the version in Parser.java and release.txt ! - perform a CVS update on htmlparser to identify new and changed files ! - commit changed files (i.e. Parser.java, release.txt, docs/changes, docs/wiki ! and docs/wiki/images) to the head revision using a reason of the form: ! Update version to 1.5-20040522. - use CVS to tag the current head revisions with a name like v1_5_20040522. ! - use CVS to checkout everything with the tag used above - 'ant test' compiles and runs the unit tests ! - 'ant clean htmlparser' updates the version headers, creates the jar file and doc files and zips everything into a file htmlparser/distribution/htmlparser1_5_20040522.zip - use CVS to checkout everything against the head revision to reset your workspace --- 6,29 ---- Release Procedure - cd htmlparser ! - 'ant wiki' captures the PhpWiki from http://htmlparser.sourceforge.org/docs/wiki to ! the docs/wiki directory (except for indirect image references). - set environment variables CVSROOT and CVS_RSH (see changelog task) - 'ant changelog' generates htmlparser/ChangeLog (this will be changed to use the previous version tag someday) ! - edit the ChangeLog to exclude changes already incorporated - the CVS date spec ! is only accurate to the day since it comes from the version coded in the ! Parser.java file, that's why this step can't be automated - incorporate changes from ChangeLog into htmlparser/docs/changes under a heading like "Integration Build 1.5 - 20040522" - 'ant versionSource' updates the version in Parser.java and release.txt ! - edit docs/release.txt to update changes since the last version, bugs fixed ! and enhancements completed ! - perform a CVS update on htmlparser/ to identify new and changed files ! - commit changed files (i.e. Parser.java, docs/release.txt, docs/changes.txt, ! docs/release.txt and docs/wiki) to the head revision using a reason of the form: ! Update version to 1.5-20040522. - use CVS to tag the current head revisions with a name like v1_5_20040522. ! - use CVS to checkout everything with the tag created above - 'ant test' compiles and runs the unit tests ! - 'ant clean htmlparser' creates the jar file and doc files and zips everything into a file htmlparser/distribution/htmlparser1_5_20040522.zip - use CVS to checkout everything against the head revision to reset your workspace *************** *** 35,54 **** Sourceforge File Release Procedure - upload the zip file to the sourceforge site ! $ ftp upload.sourceforge.net ! Name: anonymous ! Password: you...@us... ! ftp> cd incoming ! ftp> bin ! ftp> put htmlparser1_5_20040522.zip ! ftp> bye - add a release to the 'Integation Builds' package ! Admin-File Releases-Add Release, use a name of the form '1_5_20040522' - Step 1, 'Paste The Notes' (using numeric character references and character entity references because this is displayed as HTML) with a format like : ! Integration build. ! Failing Unit Tests: ! Open Bugs: ! Pending Bugs: - use the 'Upload Change Log:' field to specify the ChamgeLog file you edited - Step 2, check the checkbox of the htmlparser1_5_20040522.zip file from the --- 31,50 ---- Sourceforge File Release Procedure - upload the zip file to the sourceforge site ! $ ftp upload.sourceforge.net ! Name: anonymous ! Password: you...@us... ! ftp> cd incoming ! ftp> bin ! ftp> put htmlparser1_5_20040522.zip ! ftp> bye - add a release to the 'Integation Builds' package ! Admin-File Releases-Add Release, use a name of the form '1_5_20040522' - Step 1, 'Paste The Notes' (using numeric character references and character entity references because this is displayed as HTML) with a format like : ! Integration build. ! Failing Unit Tests: ! Open Bugs: ! Pending Bugs: - use the 'Upload Change Log:' field to specify the ChamgeLog file you edited - Step 2, check the checkbox of the htmlparser1_5_20040522.zip file from the *************** *** 65,69 **** Submit News - from the project summary screen, select 'Submit News' and title it like: ! HTML Parser Integration Release 1.5-20040522 - type in a summary of the changes made - SUBMIT --- 61,65 ---- Submit News - from the project summary screen, select 'Submit News' and title it like: ! HTML Parser Integration Release 1.5-20040522 - type in a summary of the changes made - SUBMIT *************** *** 71,75 **** --- 67,103 ---- - choose the old news item, change the Status to 'Delete' - SUBMIT + + Update the Web Site + - remove the local docs/wiki directory + - create a tarball of the docs directory + tar -tf docs.tar + - use secure copy to move the tarball onto the shell.sourceforge.net server + scp docs.tar der...@sh...:/home/groups/h/ht/htmlparser/ + - ssh into the shell.sourceforge.net server and cd to /home/groups/h/ht/htmlparser/ + ssh der...@sh... + - move the old htdocs out of the way + mv htdocs oldhtdocs + - create a new htdocs directory + mkdir htdocs + - unpack the tarball into htdocs + cd htdocs + tar -xf docs.tar + - copy or move the following files/directories from the old htdocs to the new one: + mv ../olddocs/benchmarks.zip . + mv ../olddocs/HTMLParser_Coverage.html . + mv ../olddocs/javadoc_1_2 . + mv ../olddocs/javadoc_1_3 . + mv ../olddocs/performance . + mv ../olddocs/test . + mv ../olddocs/wiki . + - edit the panel.html file to change the target of the Wiki link from + wiki/index.html + to + wiki/index.php + - delete the old htmldocs directory + rm -rf ../oldhtdocs + --> + <project name="HTMLParser" default="htmlparser" basedir="."> *************** *** 86,89 **** --- 114,118 ---- <property name="src" value="src"/> <property name="docs" value="docs"/> + <property name="wiki" value="${docs}/wiki"/> <property name="bin" value="bin"/> <property name="lib" value="lib"/> *************** *** 326,329 **** --- 355,384 ---- </target> + <!-- Delete the files gathered from the wiki. --> + <target name="cleanwiki" description="delete local wiki files"> + <!-- Delete the content, leave the CVS files. --> + <!-- This is done so deleted wiki pages are not left orphaned in CVS. --> + <delete failonerror="false"> + <fileset dir="${wiki}"> + <filename name="**/*"/> + <not> + <filename name="*CVS*"/> + </not> + </fileset> + </delete> + </target> + + <!-- Capture the wiki --> + <target name="wiki" depends="jar,cleanwiki" description="capture the wiki"> + <java classname="org.htmlparser.parserapplications.WikiCapturer" fork="yes" failonerror="yes"> + <classpath> + <pathelement location="${lib}/htmlparser.jar"/> + </classpath> + <arg value="http://htmlparser.sourceforge.net/wiki/"/> + <arg value="${wiki}"/> + <arg value="true"/> + </java> + </target> + <!-- Create the javadoc for the project --> <target name="javadoc" depends="JDK1.4,JDK_Warning,init" description="create JavaDoc (API) documentation"> |
From: Derrick O. <der...@us...> - 2004-05-30 01:44:34
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/parserapplications In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/src/org/htmlparser/parserapplications Modified Files: WikiCapturer.java Log Message: Use WikiCapturer to pull Wiki pages locally. Index: WikiCapturer.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/parserapplications/WikiCapturer.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** WikiCapturer.java 10 Jan 2004 00:06:03 -0000 1.1 --- WikiCapturer.java 30 May 2004 01:43:54 -0000 1.2 *************** *** 55,58 **** --- 55,120 ---- /** + * Returns <code>true</code> if the link is one we are interested in. + * @param link The link to be checked. + * @return <code>true</code> if the link has the source URL as a prefix + * and doesn't contain '?' or '#'; the former because we won't be able to + * handle server side queries in the static target directory structure and + * the latter because presumably the full page with that reference has + * already been captured previously. This performs a case insensitive + * comparison, which is cheating really, but it's cheap. + */ + protected boolean isToBeCaptured (String link) + { + boolean ret; + + ret = super.isToBeCaptured (link); + + // eliminate PhpWiki specific pages + if (ret) + if (link.endsWith ("PhpWikiAdministration")) + ret = false; + else if (link.endsWith ("PhpWikiDocumentation")) + ret = false; + else if (link.endsWith ("TextFormattingRules")) + ret = false; + else if (link.endsWith ("NewMarkupTestPage")) + ret = false; + else if (link.endsWith ("OldMarkupTestPage")) + ret = false; + else if (link.endsWith ("OldTextFormattingRules")) + ret = false; + else if (link.endsWith ("PgsrcTranslation")) + ret = false; + else if (link.endsWith ("HowToUseWiki")) + ret = false; + else if (link.endsWith ("MoreAboutMechanics")) + ret = false; + else if (link.endsWith ("AddingPages")) + ret = false; + else if (link.endsWith ("WikiWikiWeb")) + ret = false; + else if (link.endsWith ("UserPreferences")) + ret = false; + else if (link.endsWith ("PhpWiki")) + ret = false; + else if (link.endsWith ("WabiSabi")) + ret = false; + else if (link.endsWith ("EditText")) + ret = false; + else if (link.endsWith ("FindPage")) + ret = false; + else if (link.endsWith ("RecentChanges")) + ret = false; + else if (link.endsWith ("RecentEdits")) + ret = false; + else if (link.endsWith ("RecentVisitors")) + ret = false; + else if (link.endsWith ("SteveWainstead")) + ret = false; + + return (ret); + } + + /** * Mainline to capture a web site locally. * @param args The command line arguments. |
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/index.php In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/wiki/index.php Added Files: Benchmarks BlockFeedback CollectingParameter CompositePattern CustomTagExtraction CustomTagLinks CustomVisitorLinks EmailExtraction EnableFeedback ExternalIterators FactoryMethod FeedbackMechanism FilterLinks FrequentlyAskedQuestions HomePage ImageExtraction InternalIterators IteratorPattern JavaBeans LexerLinks LinkBeanLinks LinkExtraction ParserDesign PatternStories PostOperation RSSFeeds ReverseHtml SamplePrograms SearchingForData SomikRaha StrategyPattern StringExtraction TemplateMethod TestDrivenDevelopment UsingCookiesWithParser VisitorLinks VisitorPattern WebCrawler WebRipper WritingYourOwnScanners Log Message: Use WikiCapturer to pull Wiki pages locally. --- NEW FILE: CustomTagLinks --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Custom Tag Links, PhpWiki" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: CustomTagLinks" href="CustomTagLinks?action=viewsource&version=3" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Custom Tag Links</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>Using Custom Tags to Extract Links</b></p> <p>The use of custom tags provides for altered behaviour during the parse:</p> <pre> import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.LinkTag; import org.htmlparser.util.NodeIterator; import org.htmlparser.util.ParserException; class MyLinkTag extends LinkTag { public void doSemanticAction () throws ParserException { System.out.print ("\"" + getLinkText () + "\" => "); System.out.println (getLink ()); } } public class LinkDemo { public static void main (String[] args) throws ParserException { Parser parser = new Parser ("http://urlIWantToParse.com"); PrototypicalNodeFactory factory = new PrototypicalNodeFactory (); factory.registerTag (new MyLinkTag ()); parser.setNodeFactory (factory); for (NodeIterator e = parser.elements (); e.hasMoreNodes (); ) e.nextNode (); // just parsing the nodes executes doSemanticAction } }</pre> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.332 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: CustomTagLinks,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: ReverseHtml --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Reverse Html, PhpWiki" /> <meta name="description" content="Often, it might be desired to modify the html being reconstructed. In such a case, you must change the tag's attributes prior to calling toHtml(). For example, if the tag in question is a link tag, and you wish to modify the href, do this:" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: ReverseHtml" href="ReverseHtml?action=viewsource&version=7" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Reverse Html</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>Reverse Html Rendering</b></p> <p>In order to get back the html representation of a web page, you may use toHtml() recursively. Here's one way to get it:</p> <pre> import org.htmlparser.Parser; import org.htmlparser.util.NodeIterator; import org.htmlparser.util.ParserException; public class ToHtmlDemo { public static void main (String[] args) throws ParserException { Parser parser = new Parser ("http://urlIWantToParse.com"); StringBuffer html = new StringBuffer (4096); for (NodeIterator i = parser.elements();i.hasMoreNodes();) html.append (i.nextNode().toHtml ()); System.out.println (html); } }</pre> <p>Often, it might be desired to modify the html being reconstructed. In such a case, you must change the tag's attributes prior to calling toHtml(). For example, if the tag in question is a link tag, and you wish to modify the href, do this:</p> <pre> linkTag.setLink ("http://newUrlString"); linkTag.toHtml ();</pre> <p>This is equivalent to:</p> <pre> linkTag.setAttribute ("href", "http://newUrlString"); linkTag.toHtml ();</pre> <p>This latter would work on any tag, but few other tags have an HREF attribute according to the <a href="http://www.w3.org/TR/html4/" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />HTML</span> specification</a>. The <i>toHtml()</i> method applies to all nodes, not just tags. For tags it is basically a reconstruction of the tag using its attributes (at the atomic level) and its children (at the macro/composite level).</p> <p>You can also change the name of the tag like so:</p> <pre> tag.setTagName (newTagName);</pre> <p>and there are numerous ways to add, remove or change the attributes of a tag. For example, to add or change the ID attribute to "EditArea" use:</p> <pre> tag.setAttribute ("id", "EditArea", '"');</pre> <p>Whole tags can be added and removed from the list of children held by each tag. For example, to add a <P> tag at the same level as another tag:</p> <pre> newTag = new Tag (); newTag.setTagName ("P"); tag.getParent ().getChildren ().add (newTag);</pre> <p>Be careful, getChildren () may return null for an arbitrary tag.</p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.421 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ReverseHtml,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: PatternStories --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Pattern Stories, PhpWiki" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: PatternStories" href="PatternStories?action=viewsource&version=3" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Pattern Stories</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>Pattern Stories</b></p> <p>The parser uses the following patterns:</p> <ul> <li><a href="../index.php/FactoryMethod" class="wiki">FactoryMethod</a></li> <li><a href="../index.php/TemplateMethod" class="wiki">TemplateMethod</a></li> <li><a href="../index.php/IteratorPattern" class="wiki">IteratorPattern</a></li> <li><a href="../index.php/VisitorPattern" class="wiki">VisitorPattern</a></li> <li><a href="../index.php/CollectingParameter" class="wiki">CollectingParameter</a></li> <li><a href="../index.php/StrategyPattern" class="wiki">StrategyPattern</a></li> <li><a href="../index.php/CompositePattern" class="wiki">CompositePattern</a></li> </ul> <p>--<a href="../index.php/SomikRaha" class="wiki">SomikRaha</a></p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.267 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PatternStories,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: StringExtraction --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="String Extraction, PhpWiki" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: StringExtraction" href="StringExtraction?action=viewsource&version=9" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - String Extraction</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>String Extraction</b></p> <p>To get all the text content from a web page, use the TextExtractingVisitor, like so:</p> <pre> import org.htmlparser.Parser; import org.htmlparser.util.ParserException; import org.htmlparser.visitors.TextExtractingVisitor; public class StringDemo { public static void main (String[] args) throws ParserException { Parser parser = new Parser ("http://pageIwantToParse.com"); TextExtractingVisitor visitor = new TextExtractingVisitor (); parser.visitAllNodesWith (visitor); System.out.println (visitor.getExtractedText()); } }</pre> <p>If you want a more browser like behaviour, use the StringBean like so:</p> <pre> import org.htmlparser.beans.StringBean; public class StringDemo { public static void main (String[] args) { StringBean sb = new StringBean (); sb.setLinks (false); sb.setReplaceNonBreakingSpaces (true); sb.setCollapse (true); sb.setURL ("http://pageIwantToParse.com"); System.out.println (sb.getStrings ()); } }</pre> <p><b>thank you</b></p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.286 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: StringExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: LinkExtraction --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Link Extraction, PhpWiki" /> <meta name="description" content="Is there a preffered method ? Seems to be too many ways." /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: LinkExtraction" href="LinkExtraction?action=viewsource&version=10" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Link Extraction</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>Link Extraction</b></p> <p>There are many ways of extracting links.</p> <ul> <li><a href="../index.php/VisitorLinks" class="named-wiki" title="VisitorLinks">Use an ObjectFindingVisitor</a></li> <li><a href="../index.php/CustomVisitorLinks" class="named-wiki" title="CustomVisitorLinks">Use a custom Visitor</a></li> <li><a href="../index.php/LinkBeanLinks" class="named-wiki" title="LinkBeanLinks">Use a LinkBean</a></li> <li><a href="../index.php/CustomTagLinks" class="named-wiki" title="CustomTagLinks">Use a custom Tag</a></li> <li><a href="../index.php/FilterLinks" class="named-wiki" title="FilterLinks">Use a NodeFilter</a></li> <li><a href="../index.php/LexerLinks" class="named-wiki" title="LexerLinks">Use a low level Lexer</a></li> </ul> <p>Is there a preffered method ? Seems to be too many ways.</p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.426 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: LinkExtraction,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: ExternalIterators --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="External Iterators, PhpWiki" /> <meta name="description" content="You should think of this only when you want to conduct a really quick search, and the moment you've found what you've wanted, you want to stop parsing. The iterator here drives the parsing." /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: ExternalIterators" href="ExternalIterators?action=viewsource&version=2" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - External Iterators</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p><b>External Iterators</b></p> <p>You can use external iterators to drive the entire parsing process like so :</p> <pre> for (NodeIterator i = parser.elements();i.hasMoreNodes();) { Node node = e.nextNode(); if (node instanceof LinkTag) { } if (node instanceof ImageTag) { } }</pre> <p>You should think of this only when you want to conduct a really quick search, and the moment you've found what you've wanted, you want to stop parsing. The iterator here drives the parsing.</p> <p>--<a href="../index.php/SomikRaha" class="wiki">SomikRaha</a></p> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.225 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: ExternalIterators,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: PostOperation --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Post Operation, PhpWiki" /> <meta name="description" content="The standard HTTP request submitted by the parser is a GET. This note describes how to use POST, which is the usual request submitted by a form." /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: PostOperation" href="PostOperation?action=viewsource&version=13" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Post Operation</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><h4>POST Operation</h4> <p>The standard HTTP request submitted by the parser is a GET. This note describes how to use POST, which is the usual request submitted by a form.</p> <p>As an example, we'll submit a form to the U.S. postal service web site.<br /> <i>Note: This is suboptimal, the postal service provides tools for this type of thing: <a href="http://www.uspswebtools.com" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />http://www.uspswebtools.com</span></a></i><br /></p> <p>On the USPS web site, the page <a href="http://www.usps.com/zip4/citytown.htm" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />http://www.usps.com/zip4/citytown.htm</span></a> has the following FORM that asks for a zip code and returns the cities or towns covered by the zip code (only form elements are shown removing all the formatting markup):</p> <pre> <form NAME="frmzip" ACTION="zip_response.jsp" METHOD="post" OnSubmit="return validate(frmzip)"> <input type="text" id="zipcode" name="zipcode" size="5" maxlength="5" TABINDEX="10"> <input TYPE="image" NAME="Submit" SRC="/zip4/images/submit.jpg" BORDER="0" WIDTH="50" HEIGHT="17" ALT="Submit" TABINDEX="11"></pre> <p>From this we determine that the <tt>METHOD</tt> is <tt>POST</tt> and the form should be submitted to <tt>zip_response.jsp</tt>. This relative URL is relative to the page it is found on, so the form should be submitted to <tt>http://www.usps.com/zip4/zip_response.jsp</tt> when the <tt>Submit</tt> input is clicked. The only <tt>input</tt> element other than the <tt>Submit</tt> is a single <tt>text</tt> field that takes 5 or fewer characters. Other types of input element are described in <a href="http://www.w3.org/TR/html4/interact/forms.html" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />http://www.w3.org/TR/html4/interact/forms.html</span></a>.</p> <p>The basic operation is to pass a fully prepared <tt>HttpURLConnection</tt> connected to the <tt>POST</tt> target URL into the <tt>Parser</tt>, either in the constructor or via the <tt>setConnection()</tt> method. To condition the connection, use the <tt>setRequestMethod()</tt> method to set the <tt>POST</tt> operation, and the <tt>setRequestProperty()</tt> and other explicit method calls. Then write the input fields as an ampersand concatenation (<tt>"input1=value1&input2=value2&..."</tt>) into the <tt>PrintWriter</tt> obtained by a call to <tt>getOutputStream()</tt>.</p> <p>The following sample program illustrates the principles using a <tt>StringBean</tt>, but the same code could be used with a <tt>Parser</tt> by replacing the last three lines in the <tt>try</tt> block with:</p> <pre> parser = new Parser (); parser.setConnection (connection); // ... do parser operations</pre> <p><a href="http://htmlparser.sourceforge.net/images/Zip.java" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />Source</span> Code.</a> <a href="http://htmlparser.sourceforge.net/images/Zip.html" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />Pretty</span> Print Source Code</a></p> <pre> /* * Zip.java * POST zip code to look up cities. * * Created on April 20, 2003, 11:09 PM */ import java.io.PrintWriter; import java.net.HttpURLConnection; import java.net.URL; import java.net.URLConnection; import org.htmlparser.beans.StringBean; /** * POST zip code to look up cities. * @author Derrick Oswald */ public class Zip { String mText; // text extracted from the response to the POST request /** * Creates a new instance of Zip */ public Zip (String code) { URL url; HttpURLConnection connection; StringBuffer buffer; PrintWriter out; StringBean bean; try { // from the 'action' (relative to the refering page) url = new URL ("http://www.usps.com/zip4/zip_response.jsp"); connection = (HttpURLConnection)url.openConnection (); connection.setRequestMethod ("POST"); connection.setDoOutput (true); connection.setDoInput (true); connection.setUseCaches (false); // more or less of these may be required // see Request Header Definitions: http://www.ietf.org/rfc/rfc2616.txt connection.setRequestProperty ("Accept-Charset", "*"); connection.setRequestProperty ("Referer", "http://www.usps.com/zip4/citytown.htm"); connection.setRequestProperty ("User-Agent", "Zip.java/1.0"); buffer = new StringBuffer (1024); // 'input' fields separated by ampersands (&) buffer.append ("zipcode="); buffer.append (code); // buffer.append ("&"); // etc. out = new PrintWriter (connection.getOutputStream ()); out.print (buffer); out.close (); bean = new StringBean (); bean.setConnection (connection); mText = bean.getStrings (); } catch (Exception e) { mText = e.getMessage (); } } public String getText () { return (mText); } /** * Program mainline. * @param args The zip code to look up. */ public static void main (String[] args) { if (0 >= args.length) System.out.println ("Usage: java Zip <zipcode>"); else System.out.println (new Zip (args[0]).getText ()); } }</pre> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.342 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: PostOperation,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: SearchingForData --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Searching For Data, PhpWiki" /> <meta name="description" content="Searching for data is one of the most challenging tasks in a web page due to its seemingly unstructured (or badly structured) form. Complex searches are now possible with the parser in a simple to use API. Here's an example :" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: SearchingForData" href="SearchingForData?action=viewsource&version=4" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Searching For Data</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <div class="wikitext"><p>Searching for data is one of the most challenging tasks in a web page due to its seemingly unstructured (or badly structured) form. Complex searches are now possible with the parser in a simple to use API. Here's an example :</p> <p>We are looking at a page which has the following html:</p> <pre> <html> ... <body> <table> <tr> <td><font size="-1">Name:<b><i>John Doe</i></b></font></td> .. </tr> <tr> .. </tr> </table> </body> </html></pre> <p>We'd like to extract the information corresponding to the field "Name". This is possible if we make use of the fact that the name appears two tags after "Name".</p> <p>Code to achieve this would look like:</p> <pre> Node nodes [] = parser.extractAllNodesThatAre(TableTag.class); // Get the first table found TableTag table = (TableTag)nodes[0]; // Find the position of Name. StringNode [] stringNodes = table.digupStringNode("Name"); StringNode name = stringNodes[0]; // We assume that the first node that matched is the one we want. We // navigate to its parent, the column tag <td> CompositeTag td = name.getParent(); // From the parent, we shall find out the position of "Name" int posOfName = td.findPositionOf(name); // Its easy now to navigate to John Doe, as we know it is 3 positions away Node expectedName = td.childAt(posOfName + 3); </pre> <hr /><p>You can move up the parent tree - e.g. when the data is in seperate columns,</p> <pre> <html> ... <body> <table> <tr> <td><font size="-1">Name:</font></td> <td><font size="-1">John Doe</font></td> </tr> <tr> .. </tr> </table> </body> </html></pre> <p>We'd like to perform the same search on "Name".</p> <p>Code to achieve this would look like:</p> <pre> Node nodes [] = parser.extractAllNodesThatAre(TableTag.class); // Get the first table found TableTag table = (TableTag)nodes[0]; // Find the position of Name. StringNode [] stringNodes = table.digupStringNode("Name"); // We assume that the first node that matched is the one we want. We // navigate to its parent (column <td>) CompositeTag td = stringNodes[0].getParent(); // Navigate to its parent (row <tr>) CompositeTag tr = parentOfName.getParent(); // From the parent, we shall find out the position of the column int columnNo = tr.findPositionOf(td); // Its easy now to navigate to John Doe, as we know it is in the next column TableColumn nextColumn = (TableColumn)tr.childAt(columnNo+1); // The name is the second item in the column tag Node expectedName = nextColumn.childAt(1);</pre> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.257 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ $Id: SearchingForData,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> </html> --- NEW FILE: RSSFeeds --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: RSSFeeds,v 1.1 2004/05/30 01:43:56 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="RSSFeeds, PhpWiki" /> <meta name="description" content="Project name: HTML Parser Project description: HTML Parser is a library, written in Java, which allows you to parse HTML (HTML 4.0 supported). It has been used by people on live projects. Developers appreciate how easy it is to use. The architecture is flexible, allowing you to extend it easily. Developers on project: 16 Project administrators: &#60;a href=&#34;http://sourceforge.net/users/derrickoswald/&#34;&#62;derrickoswald&#60;/a&#62;, &#60;a href=&#34;http://sourceforge.net/users/somik/&#34;&#62;somik&#60;/a&#62; Activity percentile (last week): 98.3413% Most recent daily statistics (24 Jan 2004): Ranking: 251, Activity percentile: 98.34%, Downloadable files: 25615 total downloads to date Mos... [truncated message content] |
Update of /cvsroot/htmlparser/htmlparser/docs/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/docs Removed Files: .html Benchmarks.html BlockFeedback.html CollectingParameter.html CompositePattern.html CustomTagExtraction.html CustomTagLinks.html CustomVisitorLinks.html EmailExtraction.html EnableFeedback.html ExternalIterators.html FactoryMethod.html FeedbackMechanism.html FilterLinks.html FirstName.html FrequentlyAskedQuestions.html FullName.html ImageExtraction.html InternalIterators.html IteratorPattern.html JavaBeans.html LastName.html LexerLinks.html LinkBeanLinks.html LinkExtraction.html ParserDesign.html ParsingXml.html PatternStories.html PostOperation.html RSSFeeds.html ReverseHtml.html ReviewerInformation.html SamplePrograms.html SearchingForData.html SomikRaha.html StrategyPattern.html StringExtraction.html TagFindingVisitor.html TagScanner.html TemplateMethod.html TestDrivenDevelopment.html TextExtractingVisitor.html UnitTestingPdf.html UnitTestingXsl.html UsingCookiesWithParser.html VisitorLinks.html VisitorPattern.html WebCrawler.html WebRipper.html WritingYourOwnScanners.html index.html Log Message: Use WikiCapturer to pull Wiki pages locally. --- TestDrivenDevelopment.html DELETED --- --- PatternStories.html DELETED --- --- StrategyPattern.html DELETED --- --- TextExtractingVisitor.html DELETED --- --- PostOperation.html DELETED --- --- IteratorPattern.html DELETED --- --- SomikRaha.html DELETED --- --- UnitTestingXsl.html DELETED --- --- VisitorPattern.html DELETED --- --- ParserDesign.html DELETED --- --- UnitTestingPdf.html DELETED --- --- EmailExtraction.html DELETED --- --- FirstName.html DELETED --- --- VisitorLinks.html DELETED --- --- LexerLinks.html DELETED --- --- TemplateMethod.html DELETED --- --- CustomVisitorLinks.html DELETED --- --- ReverseHtml.html DELETED --- --- JavaBeans.html DELETED --- --- FrequentlyAskedQuestions.html DELETED --- --- FactoryMethod.html DELETED --- --- RSSFeeds.html DELETED --- --- SamplePrograms.html DELETED --- --- FullName.html DELETED --- --- LastName.html DELETED --- --- LinkBeanLinks.html DELETED --- --- ParsingXml.html DELETED --- --- WebCrawler.html DELETED --- --- BlockFeedback.html DELETED --- --- FilterLinks.html DELETED --- --- CustomTagExtraction.html DELETED --- --- FeedbackMechanism.html DELETED --- --- StringExtraction.html DELETED --- --- WritingYourOwnScanners.html DELETED --- --- .html DELETED --- --- EnableFeedback.html DELETED --- --- SearchingForData.html DELETED --- --- ExternalIterators.html DELETED --- --- UsingCookiesWithParser.html DELETED --- --- InternalIterators.html DELETED --- --- CustomTagLinks.html DELETED --- --- index.html DELETED --- --- WebRipper.html DELETED --- --- LinkExtraction.html DELETED --- --- TagScanner.html DELETED --- --- ImageExtraction.html DELETED --- --- CompositePattern.html DELETED --- --- TagFindingVisitor.html DELETED --- --- CollectingParameter.html DELETED --- --- ReviewerInformation.html DELETED --- --- Benchmarks.html DELETED --- |
From: Derrick O. <der...@us...> - 2004-05-30 01:44:10
|
Update of /cvsroot/htmlparser/htmlparser/docs/docs/images In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/docs/images Removed Files: AddingBean.jpg BeanyBaby.jpg BeanyBabyOptions.jpg ChooseBean.jpg ChoosePalette.jpg InstallBean.jpg Mount.jpg SettingProperties.jpg Zip.html Zip.java Log Message: Use WikiCapturer to pull Wiki pages locally. --- BeanyBabyOptions.jpg DELETED --- --- Zip.java DELETED --- --- Zip.html DELETED --- --- ChooseBean.jpg DELETED --- --- Mount.jpg DELETED --- --- AddingBean.jpg DELETED --- --- BeanyBaby.jpg DELETED --- --- InstallBean.jpg DELETED --- --- ChoosePalette.jpg DELETED --- --- SettingProperties.jpg DELETED --- |
From: Derrick O. <der...@us...> - 2004-05-30 01:44:10
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/default/buttons In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/wiki/themes/default/buttons Added Files: vcss.gif vxhtml10.gif Log Message: Use WikiCapturer to pull Wiki pages locally. --- NEW FILE: vxhtml10.gif --- (This appears to be a binary file; contents omitted.) --- NEW FILE: vcss.gif --- (This appears to be a binary file; contents omitted.) |
From: Derrick O. <der...@us...> - 2004-05-30 01:44:08
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/buttons/en In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/wiki/themes/MacOSX/buttons/en Added Files: BackLinks.png DebugInfo.png Diff.png Edit.png FindPage.png LikePages.png PageHistory.png PageInfo.png RecentChanges.png Log Message: Use WikiCapturer to pull Wiki pages locally. --- NEW FILE: LikePages.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: Diff.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: Edit.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: BackLinks.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: DebugInfo.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: RecentChanges.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: FindPage.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: PageInfo.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: PageHistory.png --- (This appears to be a binary file; contents omitted.) |
From: Derrick O. <der...@us...> - 2004-05-30 01:44:07
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/images In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/wiki/themes/MacOSX/images Added Files: http.png logo.png Log Message: Use WikiCapturer to pull Wiki pages locally. --- NEW FILE: logo.png --- (This appears to be a binary file; contents omitted.) --- NEW FILE: http.png --- (This appears to be a binary file; contents omitted.) |
From: Derrick O. <der...@us...> - 2004-05-30 01:44:06
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/wiki Added Files: index.html Log Message: Use WikiCapturer to pull Wiki pages locally. --- NEW FILE: index.html --- <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <!-- $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ --> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <meta name="robots" content="index,follow" /> <meta name="keywords" content="Home Page, PhpWiki" /> <meta name="language" content="" /> <meta name="document-type" content="Public" /> <meta name="document-rating" content="General" /> <meta name="generator" content="phpWiki" /> <meta name="PHPWIKI_VERSION" content="1.3.4" /> <link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /> <link rel="home" title="HomePage" href="HomePage" /> <link rel="help" title="HowToUseWiki" href="HowToUseWiki" /> <link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /> <link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /> <link rel="search" title="FindPage" href="FindPage" /> <link rel="alternate" title="View Source: HomePage" href="HomePage?action=viewsource&version=41" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /> <link rel="bookmark" title="SandBox" href="SandBox" /> <link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /> <link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"> <!-- body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);} --> </style> <title>PhpWiki - Home Page</title> </head> <!-- End head --> <!-- Begin body --> <!-- $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ --> <body> <!-- Begin top --> <!-- $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ --> <!-- End top --> <!-- Begin browse --> <!-- $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ --> <div class="wikitext"><ul> <li>HTMLParser documentation*</li> </ul> <p><a href="index.html" class="namedurl"><span style="white-space: nowrap"><img src="themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />~This</span> page has moved to http://htmlparser.sourceforge.net/wiki</a></p> <p>Welcome to the HTMLParser documentation page. You may visit</p> <ul> <li><a href="index.php/SamplePrograms" class="wiki">SamplePrograms</a> - A quick tutorial on getting started with the parser</li> <li><a href="index.php/WritingYourOwnScanners" class="wiki">WritingYourOwnScanners</a> - ignore this, this is old</li> <li><a href="index.php/SearchingForData" class="wiki">SearchingForData</a> - Learn how to perform powerful searches in html pages</li> <li><a href="index.php/FeedbackMechanism" class="wiki">FeedbackMechanism</a> - Learn how to suppress the default feedback or complement it.</li> <li><a href="index.php/UsingCookiesWithParser" class="wiki">UsingCookiesWithParser</a></li> <li><a href="index.php/PostOperation" class="named-wiki" title="PostOperation">Using POST Requests</a></li> <li><a href="index.php/ParserDesign" class="wiki">ParserDesign</a></li> <li><a href="index.php/FrequentlyAskedQuestions" class="wiki">FrequentlyAskedQuestions</a></li> <li><a href="index.php/TestDrivenDevelopment" class="wiki">TestDrivenDevelopment</a></li> <li><a href="index.php/Benchmarks" class="named-wiki" title="Benchmarks">Benchmarks vs. JTidy</a></li> <li><a href="http://htmlparser.sourceforge.net/javadoc/" class="namedurl"><span style="white-space: nowrap"><img src="themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />~Javadocs</span></a></li> <li><a href="http://htmlparser.sourceforge.net/javadoc_1_3/" class="namedurl"><span style="white-space: nowrap"><img src="themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />~Javadocs</span> for Version 1.3</a></li> <li><a href="http://htmlparser.sourceforge.net/javadoc_1_2/" class="namedurl"><span style="white-space: nowrap"><img src="themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />~Javadocs</span> for Version 1.2</a></li> </ul> </div> <!-- End browse --> <!-- Begin bottom --> <!-- $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ --> <!-- Add your Disclaimer here --> <!-- Begin debug --> <!-- $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ --> <table width="%100" border="0" cellpadding="0" cellspacing="0"> <tr><td> </td><td> <span class="debug">Page Execution took 0.332 seconds</span> </td></tr></table> <!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --> <br style="clear: both;" /> <!-- End debug --> <!-- End bottom --> </body> <!-- End body --> <!-- phpwiki source: $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ $Id: index.html,v 1.1 2004/05/30 01:43:55 derrickoswald Exp $ --> </html> |
From: Derrick O. <der...@us...> - 2004-05-30 01:44:06
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/buttons In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21187/docs/wiki/themes/MacOSX/buttons Added Files: uww.png Log Message: Use WikiCapturer to pull Wiki pages locally. --- NEW FILE: uww.png --- (This appears to be a binary file; contents omitted.) |
From: Derrick O. <der...@us...> - 2004-05-30 01:29:25
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/index.php In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19468/index.php Log Message: Directory /cvsroot/htmlparser/htmlparser/docs/wiki/index.php added to the repository |
From: Derrick O. <der...@us...> - 2004-05-30 01:29:06
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/default/buttons In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19394/buttons Log Message: Directory /cvsroot/htmlparser/htmlparser/docs/wiki/themes/default/buttons added to the repository |
From: Derrick O. <der...@us...> - 2004-05-30 01:28:57
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/default In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19383/default Log Message: Directory /cvsroot/htmlparser/htmlparser/docs/wiki/themes/default added to the repository |
From: Derrick O. <der...@us...> - 2004-05-30 01:28:45
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/images In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19359/images Log Message: Directory /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/images added to the repository |
From: Derrick O. <der...@us...> - 2004-05-30 01:28:30
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/buttons/en In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19323/en Log Message: Directory /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/buttons/en added to the repository |
From: Derrick O. <der...@us...> - 2004-05-30 01:28:21
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/buttons In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19291/buttons Log Message: Directory /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX/buttons added to the repository |
From: Derrick O. <der...@us...> - 2004-05-30 01:28:14
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19269/MacOSX Log Message: Directory /cvsroot/htmlparser/htmlparser/docs/wiki/themes/MacOSX added to the repository |
From: Derrick O. <der...@us...> - 2004-05-30 01:28:08
|
Update of /cvsroot/htmlparser/htmlparser/docs/wiki/themes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19210/themes Log Message: Directory /cvsroot/htmlparser/htmlparser/docs/wiki/themes added to the repository |