[Htmlparser-cvs] htmlparser/docs joinus.html,NONE,1.1 samples.html,NONE,1.1 bug.html,1.1,1.2 contrib
Brought to you by:
derrickoswald
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1:/tmp/cvs-serv11427/htmlparser/docs Modified Files: bug.html contributors.html index.html mailinglists.html main.html panel.html support.html Added Files: joinus.html samples.html Log Message: Web site revamp, phase 1. Main and first level pages are refurbished. The wiki is still to do. Fixed bug #865279 Documentation The samples directory is now orphaned and no longer shipped. --- NEW FILE: joinus.html --- <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Author" content="Somik Raha"> <meta name="GENERATOR" content="Mozilla/4.61 [en] (WinNT; I) [Netscape]"> <title>Join the HTML Parser Project</title> <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> </head> <body> <h2>Join The HTML Parser Project</h2> <p>If you wish to join the htmlparser project as a developer, you have to register as a developer at <a href="http://sourceforge.net/account/register.php">SourceForge</a>. <p>Send your sourceforge login name or id and a brief resume (a write-up about yourself) to <p><a href="http://sourceforge.net/sendmessage.php?touser=605407">Derrick Oswald</a><br> <p>You also need to sign up on the <a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-developer">HTMLParser-Developer</a> mailing list - this is the forum we use for collaborating on this project. You may also want to sign up on the <a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-user">HTMLParser-User</a> mailing list to monitor other user activity. <p>What would you gain by joining us : <ul> <li>If you are a student, you'ld learn a lot about architecture, test-driven development and refactoring.</li> <li>If you are a professional, you'd have fun interacting with other professionals and making a super-fast parser even more powerful.</li> </ul> <p>We'll be happy to have you with us!</p> </body> </html> --- NEW FILE: samples.html --- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <title>Sample Programs</title> <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> </head> <body> <h2>Sample Programs</h2> <p>The example programs included with the HTML Parser distribution are listed below, with some details.</p> <p><strong>Note:</strong> On unix systems if you used the Java jar command or some older unzip utility to extract the distribution zip file, the executable flag will not have been preserved on the files in the bin directory. You can fix this by issuing the following command: <pre> <code>chmod u+x bin/*</code> </pre> <p> <table width="94%" border="0"> <tr> <td valign="top"> <strong>Parser</strong><br> </td> <td> <i>Parse a web page and print the tags in a simple loop.</i><br> <a href="../javadoc/org/htmlparser/Parser.html#main(java.lang.String[])" target="_parent">org.htmlparser.Parser.main(String[] args)</a> <pre> <code>bin/parser http://website_url [tag_name]</code> where tag_name is an optional tag name to be used as a filter, i.e. A - Show only the link tags extracted from the document IMG - Show only the image tags extracted from the document TITLE - Extract the title from the document NOTE: this is also the default program for the htmlparser.jar, so the above could be: <code>java -jar lib/htmlparser.jar http://website_url [tag_name]</code> </pre> </td> </tr> <tr> <td valign="top"> <strong>Link Extractor</strong><br> </td> <td> <i>Extract links/mail addresses from a web page.</i><br> <a href="../javadoc/org/htmlparser/parserapplications/LinkExtractor.html" target="_parent">org.htmlparser.parserapplications.LinkExtractor</a> <pre> <code>bin/linkextractor http://website_url [-maillinks]</code> the optional -maillinks argument causes mailto: links to be printed </pre> </td> </tr> <tr> <td valign="top"> <strong>String Extractor</strong><br> </td> <td> <i>Extract text from a web page.</i><br> <a href="../javadoc/org/htmlparser/parserapplications/LinkExtractor.html" target="_parent">org.htmlparser.parserapplications.StringExtractor</a> <pre> <code>bin/stringextractor http://website_url [-links]</code> the optional -links argument causes hyperlinks to be shown within the text </pre> </td> </tr> <tr> <td valign="top"> <strong>Site Capturer</strong><br> </td> <td> <i>Save a web site locally.</i><br> <a href="../javadoc/org/htmlparser/parserapplications/SiteCapturer.html" target="_parent">org.htmlparser.parserapplications.SiteCapturer</a> <pre> <code>bin/sitecapturer http://source_website /target_directory/ [true|false]</code> the optional boolean argument determines whether resources such as images, audio and video are to be captured </pre> </td> </tr> <tr> <td valign="top"> <strong>Thumbelina</strong><br> </td> <td> <i>View images behind thumbnails.</i><br> <a href="../javadoc/org/htmlparser/lexerapplications/thumbelina/package-summary.html" target="_parent">org.htmlparser.lexerapplications.thumbelina.Thumbelina</a> <pre> <code>bin/thumbelina [http://starting_website]</code> </pre> </td> </tr> <tr> <td valign="top"> <strong>BeanyBaby</strong><br> </td> <td> <i>Parser Java Bean demo.</i><br> <a href="../javadoc/org/htmlparser/beans/BeanyBaby.html" target="_parent">org.htmlparser.beans.BeanyBaby</a> <pre> <code>bin/beanybaby [http://starting_website]</code> </pre> </td> </tr> </table> </body> </html> Index: bug.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/bug.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** bug.html 15 Dec 2002 03:45:00 -0000 1.1 --- bug.html 4 Jan 2004 03:23:08 -0000 1.2 *************** *** 2,23 **** <html> <head> ! <title>Bug Reports</title> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> - <body> <h2>Bug Reports </h2> ! <p>You can submit bug reports here. Before you proceed, please note that you <strong>should</strong> ! <a href="http://sourceforge.net/account/login.php">sign in at sourceforge</a>, ! for only then will our responses reach you. Putting in an anonymous bug report ! will provide us with no way of reaching you. If you do not have a sourceforge ! login, then please consider mentioning your email address in the bug report.</p> <p>Checklist <strong>BEFORE</strong> you submit your bug report :</p> <ul> <li>Have you <a href="http://sourceforge.net/tracker/?func=browse&group_id=24399&atid=381399">checked ! the list of older bug reports</a> </li> ! <li>Have you written a testcase to simulate your bug ? Why do we request this ! - check <a href="design/tests.html#communicate">Communicate with Testcases</a>. ! We do take reports without testcases, but pls note that such reports may take longer for us to respond to.</li> </ul> --- 2,25 ---- <html> <head> ! <title>Bug Reports</title> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> </head> <body> <h2>Bug Reports </h2> ! <p>You can submit bug reports here. Before you proceed, please note that you ! <strong>must</strong> ! <a href="http://sourceforge.net/account/login.php">login to sourceforge</a>. ! This is required so that bug status reports can be forwarded to you. ! If you do not have a sourceforge login, you can get one ! <a href="http://sourceforge.net/account/register.php">here</a>.</p> <p>Checklist <strong>BEFORE</strong> you submit your bug report :</p> <ul> + <li>Have you pretty much isolated the problem to the HTML Parser component.</li> <li>Have you <a href="http://sourceforge.net/tracker/?func=browse&group_id=24399&atid=381399">checked ! the list of older bug reports</a></li> ! <li>Have you written a testcase to simulate your bug? Why do we request this? ! - check <a href="wiki/TestDrivenDevelopment.html">Test Driven Development</a>. ! We do take reports without testcases, but please note that such reports may take longer for us to respond to.</li> </ul> Index: contributors.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/contributors.html,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** contributors.html 27 Apr 2003 19:08:21 -0000 1.3 --- contributors.html 4 Jan 2004 03:23:08 -0000 1.4 *************** *** 2,13 **** <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <meta name="Author" content="Somik Raha"> ! <meta name="GENERATOR" content="Mozilla/4.61 [en] (WinNT; I) [Netscape]"> ! <title>Contributors to HTML Parser</title> </head> ! <body text="#000000" bgcolor="#FFFFFF" link="#3333FF" vlink="#FF6600" alink="#FFCC00"> ! <b><u>Contributors</u></b><b><u></u></b> ! <p>The following people have contributed to this project : <table width="94%" border="1"> <tr> --- 2,13 ---- <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <meta name="Author" content="Somik Raha"> ! <meta name="GENERATOR" content="Mozilla/4.61 [en] (WinNT; I) [Netscape]"> ! <title>Contributors to HTML Parser</title> ! <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> </head> ! <body> ! <h1>Contributors</h1> <table width="94%" border="1"> <tr> *************** *** 22,27 **** K1R 7Y2<br> (613) 755-5065 ! <br> <a href="http://www.autodesk.com">http://www.autodesk.com</a><br> ! <a href="mailto:Der...@Au...">Der...@Au...</a> <br> Voice : 613.755.5065<br> --- 22,28 ---- K1R 7Y2<br> (613) 755-5065 ! <br> <a href="http://www.autodesk.com">http://www.autodesk.com</a><br> ! <!--a href="mailto:Der...@Au...">Der...@Au...</a--> ! <a href="http://sourceforge.net/sendmessage.php?touser=605407">email</a> <br> Voice : 613.755.5065<br> *************** *** 64,69 **** CA USA 94709<br> <a href="http://www.industriallogic.com">http://www.industriallogic.com</a><br> <a href="http://www.industrialxp.org">http://www.industrialxp.org</a><br> ! <a href="http://www.geocities.com/somik/">Personal Home Page</a><br> ! <a href="mailto:so...@in...">so...@in...</a> <br> Voice : 510.540.8336<br> --- 65,71 ---- CA USA 94709<br> <a href="http://www.industriallogic.com">http://www.industriallogic.com</a><br> <a href="http://www.industrialxp.org">http://www.industrialxp.org</a><br> ! <a href="http://www.geocities.com/somik/">Personal Home Page</a><br> ! <!--a href="mailto:so...@in...">so...@in...</a--> ! <a href="http://sourceforge.net/sendmessage.php?touser=187944">email</a> <br> Voice : 510.540.8336<br> *************** *** 102,107 **** CA USA 94709<br> <a href="http://www.industriallogic.com">http://www.industriallogic.com</a><br> ! <a href="http://www.industrialxp.org">http://www.industrialxp.org</a><br> ! <a href="mailto:jo...@in...">so...@in...</a> <br> Voice : 510.540.8336<br> --- 104,110 ---- CA USA 94709<br> <a href="http://www.industriallogic.com">http://www.industriallogic.com</a><br> ! <a href="http://www.industrialxp.org">http://www.industrialxp.org</a><br> ! <!--a href="mailto:jo...@in...">jo...@in...</a--> ! <a href="http://sourceforge.net/sendmessage.php?touser=344339">email</a> <br> Voice : 510.540.8336<br> *************** *** 130,135 **** Kalenteritie 23 B 4<br> 02200 Espoo, Finland<br> ! tel: +358-50-3725844<br> <a href="http://www.kk-software.fi">http://www.kk-software.fi</a><br> ! <a href="mailto:kaa...@ik..."> kaa...@ik...</a><br> </td> <td valign="top"> --- 133,139 ---- Kalenteritie 23 B 4<br> 02200 Espoo, Finland<br> ! tel: +358-50-3725844<br> <a href="http://www.kk-software.fi">http://www.kk-software.fi</a><br> ! <!--a href="mailto:kaa...@ik...">kaa...@ik...</a--> ! <a href="http://sourceforge.net/sendmessage.php?touser=287304">email</a> </td> <td valign="top"> *************** *** 163,168 **** <td valign="top"><img src="pics/claude.jpg" width="100" height="114"> <img src="pics/canada.gif" width="64" height="34"> <br> ! Claude Duguay<br> <a href="http://www.arcessa.com/">Arcessa, Inc.</a><br> ! <a href="mailto:CD...@ar...">CD...@ar...</a><br> </td> <td valign="top"> --- 167,178 ---- <td valign="top"><img src="pics/claude.jpg" width="100" height="114"> <img src="pics/canada.gif" width="64" height="34"> <br> ! Claude Duguay<br> ! Arcessa, Inc.<br> ! 10210 NE Points Drive<br> ! Suite 310<br> ! Kirkland, WA 98033<br> ! <a href="http://www.arcessa.com/">http://www.arcessa.com/</a><br> ! <!--a href="mailto:CD...@ar...">CD...@ar...</a--> ! <a href="http://sourceforge.net/sendmessage.php?touser=350041">email</a> </td> <td valign="top"> *************** *** 188,193 **** +91-22-28290019<br> Extn. 1457 <br> ! <a href="http://www.orbitech.co.in">http://www.orbitech.co.in</a> <br> ! <a href="mailto:dha...@or...">dha...@or...</a><br> </p> </td> --- 198,204 ---- +91-22-28290019<br> Extn. 1457 <br> ! <a href="http://www.orbitech.co.in">http://www.orbitech.co.in</a> <br> ! <!--a href="mailto:dha...@or...">dha...@or...</a--> ! <a href="http://sourceforge.net/sendmessage.php?touser=539715">email</a> </p> </td> *************** *** 200,204 **** - thus making the parser usable across Windows and Linux - which have different conventions for end-of-line characters.</p> ! <p>Read Dhaval's article on <a href="../articles/quest.html">The Quest for HTMLParser</a>.</p></td> <td valign="top"><p>I've been passionate about computers from very early on. Started working<br> --- 211,215 ---- - thus making the parser usable across Windows and Linux - which have different conventions for end-of-line characters.</p> ! <p>Read Dhaval's article on <a href="articles/quest.html">The Quest for HTMLParser</a>.</p></td> <td valign="top"><p>I've been passionate about computers from very early on. Started working<br> *************** *** 234,239 **** <tr> <td valign="top"><img src="pics/france.gif" width="51" height="35"><br> ! Cédric Rosa<br> <a href="mailto:ced...@fr...%20">ced...@fr... ! </a><br> </td> <td valign="top">Cédric was one of the most prolific testers of the parser, coming --- 245,251 ---- <tr> <td valign="top"><img src="pics/france.gif" width="51" height="35"><br> ! Cédric Rosa<br> ! <!--a href="mailto:ced...@fr...">ced...@fr...</a--> ! <a href="http://sourceforge.net/sendmessage.php?touser=584072">email</a> </td> <td valign="top">Cédric was one of the most prolific testers of the parser, coming *************** *** 341,345 **** </tr> </table> ! <p>Thanks to Stephen Harrington, Domenico Lordi, Kamen, John Zook, Cedric Rosa, Cheng Jun, Mazlan Mat, Rob Shields, Wolfgang Germund, Raj Sharma, Robert Kausch, Gordon Deudney, Serge Kruppa, Roger Kjensrud, Rodney S Foley and Manpreet Singh --- 353,357 ---- </tr> </table> ! <p>Thanks to Stephen Harrington, Domenico Lordi, Kamen, John Zook, Cheng Jun, Mazlan Mat, Rob Shields, Wolfgang Germund, Raj Sharma, Robert Kausch, Gordon Deudney, Serge Kruppa, Roger Kjensrud, Rodney S Foley and Manpreet Singh Index: index.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/index.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** index.html 15 Dec 2002 03:45:00 -0000 1.2 --- index.html 4 Jan 2004 03:23:08 -0000 1.3 *************** *** 1,10 **** <html> <head> ! <title>HTMLParser Home Page</title> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <META name="description" content="HTML Parser- A java-based open source html parser"> ! <META name="keywords" content="html, parser, html parser, htmlparser, open source parser, java parser, java html parser"> </head> ! <frameset cols="20%,80%" frameborder="NO" border="0" framespacing="0" rows="*"> <frame name="leftFrame" scrolling="NO" src="panel.html" frameborder="NO" noresize> <frame name="mainFrame" src="main.html" frameborder="NO"> --- 1,11 ---- <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <META name="description" content="HTML Parser- A java-based open source html parser"> ! <META name="keywords" content="html, parser, html parser, htmlparser, open source parser, java parser, java html parser"> ! <title>HTMLParser Home Page</title> ! <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> </head> ! <frameset cols="15%,85%" frameborder="NO" border="0" framespacing="0" rows="*"> <frame name="leftFrame" scrolling="NO" src="panel.html" frameborder="NO" noresize> <frame name="mainFrame" src="main.html" frameborder="NO"> Index: mailinglists.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/mailinglists.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** mailinglists.html 16 Apr 2002 06:28:42 -0000 1.1 --- mailinglists.html 4 Jan 2004 03:23:08 -0000 1.2 *************** *** 2,26 **** <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <meta name="Author" content="Somik Raha"> ! <meta name="GENERATOR" content="Mozilla/4.61 [en] (WinNT; I) [Netscape]"> ! <meta name="KeyWords" content="mailing lists,htmlparser,java,user,developer, announce"> ! <title>HTML Parser Mailing Lists</title> </head> ! <body text="#000000" bgcolor="#FFFFFF" link="#3333FF" vlink="#FF6600" alink="#FFCC00"> ! <b><u>HTML Parser Mailing Lists</u></b> ! <p><a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-announce">HTMLParser ! Announcement mailing list (very low traffic)</a> ! <br>Join this list if you are interested in new releases of HTML Parser. ! Notifications of releases will be put on this list. ! <p><a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-developer">HTMLParser ! Developer mailing list</a>. ! <br>Join this list ONLY if you plan to have a developer discussion about ! the htmlparser library. This list is intended for developers of HTMLParser ! only. ! <p><a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-user">HTMLParser ! Users mailing list</a>. ! <br>Join this list if you want to use the HTMLParser library and need some ! help to get started. Feel free to post your questions here. </body> </html> --- 2,35 ---- <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <meta name="Author" content="Somik Raha"> ! <meta name="GENERATOR" content="Mozilla/4.61 [en] (WinNT; I) [Netscape]"> ! <meta name="KeyWords" content="mailing lists,htmlparser,java,user,developer,announce"> ! <title>HTML Parser Mailing Lists</title> ! <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> </head> ! <body> ! <h2>HTML Parser Mailing Lists</h2> ! <p><a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-announce"> ! HTMLParser Announcement mailing list (very low traffic)</a><br> ! Join this list if you are interested in new releases of HTML Parser. ! Notifications of releases will be put on this list.<br> ! Monitoring of intermediate releases is also possible by clicking on the ! <img src="http://images.sourceforge.net/images/ic/mail16d.png" alt="envelope icon"> ! in the <em>Notes / Monitor</em> column of the <em>Latest File Releases</em> ! list on the <a href="http://sourceforge.net/projects/htmlparser" target="_parent">project page</a>. ! <p><a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-user"> ! HTMLParser Users mailing list</a><br> ! Join this list if you are using the HTMLParser library and need some ! help to get started or solve a problem. Feel free to post your questions here. ! <p><a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-developer"> ! HTMLParser Developer mailing list</a><br> ! Join this list ONLY if you wish to monitor developer discussion about ! the htmlparser library. This list is intended for developer collaboration. ! <p><a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-cvs"> ! HTMLParser cvs commit mailing list</a><br> ! A syncmail script issues messages to this list whenever a CVS commit is ! performed to the /cvsroot/htmlparser repository. ! Subscribe to this list only if you want to be notified of code drops as they happen. </body> </html> Index: main.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/main.html,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** main.html 25 May 2003 22:19:44 -0000 1.7 --- main.html 4 Jan 2004 03:23:08 -0000 1.8 *************** *** 2,24 **** <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <meta name="Author" content="Derrick Oswald"> ! <title>HTMLParser Main</title> </head> <body> ! <h1>HTMLParser v 1.3</span></h1> ! ! <div>Welcome to the homepage of HTMLParser - a super-fast real-time ! parser for real-world HTML. What has attracted most users to HTMLParser has ! been its simplicity in design, speed and ability to handle streaming real-world ! html. ! <p>The production release of HTMLParser 1.3 is available! ! You can download it from the <a href="http://sourceforge.net/project/showfiles.php?group_id=24399&release_id=161563">download page</a>. ! <p>Before you bother downloading the parser, you would probably want to check ! our "simple design" claim. Check <a href="samples/index.html">sample ! programs</a> to see how simple it is to parse HTML using HTMLParser. ! <p> <a href="http://sourceforge.net/projects/htmlparser"> ! <img src="http://sourceforge.net/sflogo.php?group_id=24399&type=1" width="88" height="31" border="0" alt="SourceForge.net Logo"></a> ! </div> </body> </html> --- 2,119 ---- <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <meta name="Author" content="Derrick Oswald"> ! <title>HTMLParser Main</title> ! <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> </head> <body> ! <h1>HTMLParser</h1> ! Welcome to the homepage of HTMLParser - a super-fast real-time ! parser for real-world HTML. What has attracted most developers to HTMLParser has ! been its simplicity in design, speed and ability to handle streaming real-world ! html. ! <p>The two fundamental use-cases that are handled by the parser are ! <a href="#extraction">extraction</a> and <a href="#transformation">transformation</a> ! (the syntheses use-case, where HTML pages are created from scratch, is better ! handled by other tools closer to the source of data). While prior versions ! concentrated on data extraction from web pages, Version 1.4 of the ! HTMLParser has substantial improvements in the area of transforming web ! pages, with simplified tag creation and editing, and verbatim toHtml() method ! output. ! <p>In general, to use the HTMLParser you will need to be able to write code in ! the Java programming language. Although some example programs are provided ! that may be useful as they stand, it's more than likely you will need (or ! want) to create your own programs or modify the ones provided to match your ! intended application. ! <p>To use the library, you will need to add either the htmllexer.jar or ! htmlparser.jar to your classpath when compiling and running. The ! htmllexer.jar provides low level access to generic string, remark and tag nodes on ! the page in a linear, flat, sequential manner. The htmlparser.jar, which ! includes the classes found in htmllexer.jar, provides access to a page as a ! sequence of nested differentiated tags containing string, remark and other ! tag nodes. So where the output from calls to the lexer ! <a href="javadoc/org/htmlparser/lexer/Lexer.html#nextNode()">nextNode()<a> ! method might be: ! <pre> ! <html> ! <head> ! <title> ! "Welcome" ! </title> ! </head> ! <body> ! etc... ! </pre> ! The output from the parser <a ! href="javadoc/org/htmlparser/util/NodeIterator.html">NodeIterator</a> would ! nest the tags as children of the <html>, <head> and other nodes ! (here represented by indentation): ! <pre> ! <html> ! <head> ! <title> ! "Welcome" ! </title> ! </head> ! <body> ! etc... ! </pre> ! The parser attempts to balance opening tags with ending tags to present the ! structure of the page, while the lexer simply spits out nodes. If your ! application requires only modest structural knowledge of the page, and is ! primarily concerned with individual, isolated nodes, you should consider ! using the lightweight lexer. But if your application requires knowledge of ! the nested structure of the page, for example processing tables, you will ! probably want to use the full parser. ! <h2><a name=extraction>Extraction</a></h2> ! Extraction encompasses all the information retrieval programs that are not ! meant to preserve the source page. This covers uses like: ! <ul> ! <li>text extraction, for use as input for text search engine databases for example</li> ! <li>link extraction, for crawling through web pages or harvesting email ! addresses</li> ! <li>screen scraping, for programmatic data input from web pages</li> ! <li>resource extraction, collecting images or sound</li> ! <li>a browser front end, the preliminary stage of page display</li> ! <li>link checking, ensuring links are valid</li> ! <li>site monitoring, checking for page differences beyond simplistic diffs</li> ! </ul> ! There are several facilities in the HTMLParser codebase to help with ! extraction, including ! <a href="javadoc/org/htmlparser/filters/package-summary.html">filters</a>, ! <a href="javadoc/org/htmlparser/visitors/package-summary.html">visitors</a> and ! <a href="javadoc/org/htmlparser/beans/package-summary.html">JavaBeans</a>. ! <h2><a name=transformation>Transformation</a></h2> ! Transformation includes all processing where the input <em>and</em> the output ! are HTML pages. Some examples are: ! <ul> ! <li>URL rewriting, modifying some or all links on a page</li> ! <li>site capture, moving content from the web to local disk</li> ! <li>censorship, removing offending words and phrases from pages</li> ! <li>HTML cleanup, correcting erroneous pages</li> ! <li>ad removal, excising URLs referencing advertising</li> ! <li>conversion to XML, moving existing web pages to XML</li> ! </ul> ! During or after reading in a page, operations on the nodes can ! accomplish many transformation tasks "in place", which can then be output ! with the <a href="javadoc/org/htmlparser/Node.html#toHtml()">toHtml()</a> method. ! Depending on the purpose of your application, you will probably want to look ! into node decorators, ! <a href="javadoc/org/htmlparser/visitors/package-summary.html">visitors</a>, or ! <a href="javadoc/org/htmlparser/tags/package-summary.html">custom tags</a> ! in conjunction with the ! <a href="javadoc/org/htmlparser/PrototypicalNodeFactory.html">PrototypicalNodeFactory</a>. ! <p>The HTML Parser is an open source library released under ! <a href="http://www.opensource.org/licenses/lgpl-license.html">GNU Lesser General Public ! License</a>, which basically says you are free to use the library "as is" in ! other (even proprietary) products, as long as due credit is given to the authors ! and the source code for the HTMLParser is included or available with the other product. ! For modified or embedded use, please consult the ! <a href="http://www.opensource.org/licenses/lgpl-license.html">LGPL license</a>. ! <div align="right"> ! <a href="http://sourceforge.net/projects/htmlparser" target="_parent"> ! <img src="http://sourceforge.net/sflogo.php?group_id=24399&type=1" width="88" height="31" border="0" alt="SourceForge.net"> ! </a> ! </div> </body> </html> Index: panel.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/panel.html,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** panel.html 31 Dec 2003 02:50:49 -0000 1.6 --- panel.html 4 Jan 2004 03:23:08 -0000 1.7 *************** *** 2,42 **** <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <meta name="Author" content="Somik Raha & Abhishek Srivastava"> ! <meta name="KEYWORDS" content="java,jini,calcutta, java users group,design patterns"> ! <meta name="GENERATOR" content="Mozilla/4.61 [en] (WinNT; I) [Netscape]"> ! <title>NAVIGATION PAGE</title> ! <style type=text/css>.abhi { FONT-FAMILY: "Arial Black", arial; FONT-SIZE: 8pt; FONT-WEIGHT: normal; TEXT-DECORATION: none}</style> </head> <body bgcolor="#FFFFFF" background="background.gif"> <img SRC="htmlparserlogo.jpg" BORDER=0 height=40 width=100> ! <li> ! <a href="main.html" target="mainFrame">Home Page</a></li> ! ! <li> <a href="http://sourceforge.net/project/showfiles.php?group_id=24399&release_id=129477" target="mainFrame">Download</a></li> ! <li> <a href="samples/index.html" target="mainFrame">Sample Programs</a></li> ! ! <li> <a href="docs/index.html" target="_parent">Documentation</a></li> <li> <a href="articles/index.html" target="mainFrame">Articles</a></li> ! <li> <a href="mailinglists.html" target="mainFrame">Mailing Lists</a> </li> - <li> <a href="bug.html" target="mainFrame">Report Bugs</a></li> - <li> <a href="support.html" target="mainFrame">Request Support</a></li> ! ! <li> ! <a href="http://sourceforge.net/cvs/?group_id=24399" target="mainFrame">CVS ! Repository</a></li> ! ! <li> ! <a href="http://sourceforge.net/projects/htmlparser" target="mainFrame">Project ! Page</a></li> ! ! <li> ! <a href="contributors.html" target="mainFrame">Contributors</a></li> ! ! <li> <a href="design/joinus.html" target="mainFrame">Join this Project</a></li> ! </body> </html> --- 2,41 ---- <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <meta name="Author" content="Somik Raha & Abhishek Srivastava"> ! <meta name="KEYWORDS" content="java,html,design patterns"> ! <title>NAVIGATION PAGE</title> ! <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> ! <!--style type=text/css>.abhi { FONT-FAMILY: "Arial Black", arial; FONT-SIZE: 8pt; FONT-WEIGHT: normal; TEXT-DECORATION: ! none}</style--> </head> <body bgcolor="#FFFFFF" background="background.gif"> <img SRC="htmlparserlogo.jpg" BORDER=0 height=40 width=100> ! <p><strong>About HTMLParser</strong></p> ! <ul> ! <li> <a href="main.html" target="mainFrame">Welcome</a></li> ! <li> <a href="http://sourceforge.net/projects/htmlparser" target="_parent">Project Page</a></li> ! <li> <a href="contributors.html" target="mainFrame">Contributors</a></li> ! <li> <a href="joinus.html" target="mainFrame">Join this Project</a></li> ! </ul> ! <p><strong>Downloads</strong></p> ! <ul> ! <li> <a href="http://sourceforge.net/project/showfiles.php?group_id=24399&package_id=47712" target="_parent">Version 1.4</a></li> ! <li> <a href="http://sourceforge.net/project/showfiles.php?group_id=24399&package_id=17243" target="_parent">Old Releases</a></li> ! <li> <a href="http://cvs.sourceforge.net/viewcvs.py/htmlparser/htmlparser/" target="_parent">CVS Repository</a></li> ! </ul> ! <p><strong>Documentation</strong></p> ! <ul> ! <li> <a href="javadoc/index.html" target="_parent">JavaDocs</a></li> ! <li> <a href="samples.html" target="mainFrame">Sample Programs</a></li> ! <li> <a href="wiki/index.html" target="_parent">Wiki</a></li> <li> <a href="articles/index.html" target="mainFrame">Articles</a></li> ! </ul> ! <p><strong>Support</strong></p> ! <ul> <li> <a href="mailinglists.html" target="mainFrame">Mailing Lists</a> </li> <li> <a href="bug.html" target="mainFrame">Report Bugs</a></li> <li> <a href="support.html" target="mainFrame">Request Support</a></li> ! </ul> </body> </html> Index: support.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/support.html,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** support.html 15 Dec 2002 03:45:00 -0000 1.1 --- support.html 4 Jan 2004 03:23:08 -0000 1.2 *************** *** 2,30 **** <html> <head> ! <title>Bug Reports</title> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body> <h2>Support Request</h2> ! <p>You can submit support requests here. Before you proceed, please note that ! you <strong>must</strong> <a href="http://sourceforge.net/account/login.php">sign ! in at sourceforge</a>, for only then will our responses reach you. Putting in ! an anonymous bug report will provide us with no way of reaching you. If you ! do not have a sourceforge login, then please consider mentioning your email ! address in the bug report. Anonymous support requests just dont make sense, ! so we will not be replying to anonymous reports.<br> ! <br> ! Please note that this is an open source project, and most of us are hard pressed ! for time. We are not obliged to help you but we do so anyway. You can help us ! by first researching your problem, and then requesting for support when you ! are really stuck. You should have gone through the <a href="sample/index.html">sample ! programs</a>, and <a href="design/index.html" target="_parent">documentation</a> ! before you submit your request. It might also be much faster to get help from ! the htmlparser user community, by signing up on the <a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-user">HTMLParser ! User mailing list</a>.<br> ! <br> ! Once you are ready, <a href="http://sourceforge.net/tracker/?func=add&group_id=24399&atid=381400">click ! here to submit your report</a>.</p> </body> </html> --- 2,31 ---- <html> <head> ! <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ! <title>Support Request</title> ! <link REL ="stylesheet" TYPE="text/css" HREF="javadoc/stylesheet.css" TITLE="Style"> </head> <body> <h2>Support Request</h2> ! <p>You can submit support requests here. Before you proceed, please note that ! you <strong>must</strong> <a href="http://sourceforge.net/account/login.php">sign ! in at sourceforge</a>, for only then will our responses reach you. ! If you do not have a sourceforge login, you can get one ! <a href="http://sourceforge.net/account/register.php">here</a>.</p> ! <p>Please note that this is an open source project, and most of us are hard pressed ! for time. We are not obliged to help you but we do so anyway. You can help us ! by first researching your problem, and then requesting for support when you ! are really stuck. You should have consulted the ! <a href="faq.html">frequently asked questions</a>, ! <a href="javadoc/index.html" target="_parent">JavaDocs</a>, and ! <a href="samples/index.html">sample programs</a> ! before you submit your request. It might also be much faster to get help from ! the htmlparser user community, by signing up on the ! <a href="http://lists.sourceforge.net/lists/listinfo/htmlparser-user"> ! HTMLParser User mailing list</a>.<br> ! <p>Once you are ready, ! <a href="http://sourceforge.net/tracker/?func=add&group_id=24399&atid=381400" target="_parent"> ! click here to submit your report</a>.</p> </body> </html> |