Thread: [Htmlparser-developer] version 1.3
Brought to you by:
derrickoswald
From: Derrick O. <Der...@ro...> - 2002-12-16 13:24:02
|
This message is just to open discussion. Here are some enhancements that might best be left till the next version. POST constructor. The basically two constructors that HTMLParser has either take a string URL or a HTMLReader. This shifts the onus on performing HTTP to the API user for POST operations. It might be good to have a HttpURLConnection or URLConnection argument constructor, where a primed and loaded connection is passed to the parser. Tables The current version flattens tables, pushing the onus on the API user to syntactically walk through the table data to get to a certain table entry. It may be useful to nest table entries, similar to what the the FORM tag does now, but have it correctly generate rows and columns. Logging The use of a feedback object is adequate, but JDK version 1.4 has a rich API, java.util.logging, that we might want to emulate (presuming we don't want to force JDK 1.4 usage). charset Currently the charset directive within the HTML page is ignored. There may be a need to honour this parameter on the Content-Type field. beans It might be nice to create one or more java beans that can be used within GUI IDE's. The predefined behavior might be what the parserapplications do now, but exposing some accessors on HTMLParser and providing a zero arg constructor may also prove useful. executable jar There is no default application for the htmlparser.jar, i.e. java -jar htmlparser.jar doesn't do anything at the moment. A little GUI application might be nice. I'm not talking a browser, but rather a demo of the applications (i.e. a tree view of the links a la robot, a text view a la StringExtractor, a list of mail addresses a la ripper etc. ). This would utilize the beans mentioned above. |
From: Sam J. <sa...@ne...> - 2002-12-17 00:21:44
|
Hi Derrick, Some responses of my own. Derrick Oswald wrote: > POST constructor. > The basically two constructors that HTMLParser has either take a > string URL or a HTMLReader. This shifts the onus on performing HTTP > to the API user for POST operations. It might be good to have a > HttpURLConnection or URLConnection argument constructor, where a > primed and loaded connection is passed to the parser. I very much agree (thanks for your previous suggestions on this topic BTW) > Tables > The current version flattens tables, pushing the onus on the API user > to syntactically walk through the table data to get to a certain table > entry. It may be useful to nest table entries, similar to what the > the FORM tag does now, but have it correctly generate rows and columns. Have you looked at HTTPUnit? http://httpunit.sourceforge.net/ They have to deal with a lot of similar problems and there may be synergies. > Logging > The use of a feedback object is adequate, but JDK version 1.4 has a > rich API, java.util.logging, that we might want to emulate (presuming > we don't want to force JDK 1.4 usage). I would be against forcing JDK 1.4 usage. I would recommend log4j http://jakarta.apache.org/log4j/docs/index.html > charset > Currently the charset directive within the HTML page is ignored. There > may be a need to honour this parameter on the Content-Type field. Agreed CHEERS> SAM |
From: Somik R. <so...@ya...> - 2002-12-17 06:40:17
|
Great to have a discussion going! I'd like to branch off all the issues into seperate threads so that we could deal with them seperately. > > Logging > > The use of a feedback object is adequate, but JDK version 1.4 has a > > rich API, java.util.logging, that we might want to emulate (presuming > > we don't want to force JDK 1.4 usage). > > I would be against forcing JDK 1.4 usage. I would recommend log4j > http://jakarta.apache.org/log4j/docs/index.html > Using either JDK 1.4 or log4j ties you down to a specific logging API. The latter will add to the weight of the parser. (I was actually considering log4j sometime back, but Claude Duguay convinced me otherwise) If however, more logging support is needed, I guess it could be added using a facade (or adapter) with JDK 1.4 (or log4j), externally. This is of course open to discussion. Regards, Somik |
From: Craig R. <cr...@qu...> - 2002-12-17 07:14:20
|
Take a look at the logging wrapper provided by Jakarta. It provides a thin bridge between different logging APIs. http://jakarta.apache.org/commons/logging.html Craig > > Great to have a discussion going! I'd like to branch off all the issues > into > seperate threads so that we could deal with them seperately. > > > > Logging > > > The use of a feedback object is adequate, but JDK version 1.4 has a > > > rich API, java.util.logging, that we might want to emulate (presuming > > > we don't want to force JDK 1.4 usage). > > > > I would be against forcing JDK 1.4 usage. I would recommend log4j > > http://jakarta.apache.org/log4j/docs/index.html > > > > Using either JDK 1.4 or log4j ties you down to a specific logging API. The > latter will add to the weight of the parser. (I was actually considering > log4j sometime back, but Claude Duguay convinced me otherwise) > > If however, more logging support is needed, I guess it could be added > using > a facade (or adapter) with JDK 1.4 (or log4j), externally. This is of > course open to discussion. > > Regards, > Somik > |
From: Derrick O. <Der...@ro...> - 2002-12-17 13:07:38
|
This looks like a good place to start. This will mean touching a ton of files though. Somik, Youu might want to do what's necessary to CVS (tag, branch, label...whatever) to freeze version 1.2 and open up the 1.3 version as the head revision. Derrick Craig Raw wrote: >Take a look at the logging wrapper provided by Jakarta. It provides a >thin bridge between different logging APIs. > >http://jakarta.apache.org/commons/logging.html > >Craig > > |
From: Somik R. <so...@ya...> - 2002-12-19 07:44:55
|
Hi Derrick, > Youu might want to do what's necessary to CVS (tag, branch, > label...whatever) to freeze version 1.2 and open up the 1.3 version as > the head revision. I've been swamped for a while - but I will get to this tomorrow. Actually, I was kind of hoping that we could put in some work to wrap up 1.2. If we get even a week without bug reports - we can close 1.2 - but the reports just keep coming and keep coming. Which leads me to believe that we may not be having enough testcases, and we may need to do some merciless refactorings... How do you think about cleaning it up before we release - taking time till Jan 1 to do this ? It'd be really good to have more eyeballs going over the code and performing refactorings, etc.. before we add any new features. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-17 06:45:53
|
Derrick Oswald wrote : > > Tables > > The current version flattens tables, pushing the onus on the API user > > to syntactically walk through the table data to get to a certain table > > entry. It may be useful to nest table entries, similar to what the > > the FORM tag does now, but have it correctly generate rows and columns. Thats a good idea. We should have a table scanner next. This would be a good feature for 1.3. Sam Joseph wrote : > Have you looked at HTTPUnit? http://httpunit.sourceforge.net/ > > They have to deal with a lot of similar problems and there may be synergies. > I am curious to hear more about this - I am going to be using HttpUnit real soon - what sort of problems did you face ? It will be great (as always) if you can share your vision. Regards, Somik |
From: Sam J. <ga...@yh...> - 2002-12-20 09:42:05
|
Somik Raha wrote: >Sam Joseph wrote : > > > >>Have you looked at HTTPUnit? http://httpunit.sourceforge.net/ >> >>They have to deal with a lot of similar problems and there may be >> >> >synergies. > > > >I am curious to hear more about this - I am going to be using HttpUnit real >soon - what sort of problems did you face ? > I didn't face problems as such. What I mean is that HttpUnit has to contain something similar to Htmlparser somewhere in its code. Htmlparser lets you parse HTML. So does HttpUnit, but HttpUnit lets you interact with the HTML forms that you have parsed out of the HTML. For example: WebConversation o_wc = new WebConversation; WebReponse x_jdoc = o_wc.getResponse(new GetMethodWebRequest("some_url")); WebForm x_form = x_jdoc.getFormWithName("my_form"); assertTrue(x_form.hasParameterNamed("some_param")); SubmitButton x_submit_button = x_form.getSubmitButton("Submit"); x_submit_button.click(); will open a url grab the html data coming off the response, and then lets you create objects like WebForms, SubmitButtons etc. You can then manipulate these, setting parameters on the forms, submitting them etc. This is how HttpUnit allows you to create unit tests for your html interfaces. Anyway, so my point is that underneath the API HttpUnit must be doing something similar to HtmlParser in order to allow it to get access to the html data, i.e. they are both parsing HTML. The main difference is the level of the API. Currently I have had this weird idea (which I mailed to the NinJava list) which is to use the html templates that I build by web management screens from to generate java outlines for the code that handles the web forms and also the test code itself. Synchronistically, in order to implement such a thing, I would need to start of my parsing the html templates, which are themselves HTML. I guess I could use either HttpUnit or HtmlParser to do this, but I'm not sure if HttpUnit can be used to parse local files .... Anyway, for those of you reading down this far, the idea would be that you could define in one place your web form structure, and then the tedious parts of writing the support class and the test class would be removed, allowing one to produce reliable web management screens much faster. Naturally look and feel would be farmed out to CSS, and he logical extentsion would be to define your web form structures in XML, in fact to generate them directly off a data model like the ones used in Torque and Turbine .... Then providing web management screens would just be a question of choosing which forms/objects to allow which users access. Although I guess we woudl still want the ability to specify aggregate forms that gave users access to data made up of components of more fundamental data structures. Apologies for the long post ... CHEERS> SAM |
From: Sam J. <ga...@yh...> - 2002-12-20 12:16:07
|
Some investigation of HttpUnit has turned up that it includes the NekoHTMLparser. Alsio there does not seem to be any mechanism to apply HttpUnit to a local file. It looks like grab a page off a web server or nothing .... I have started creating a TestCodeGenerator using HtmlParser. I am using an HTMLFormScanner to strip out all the form details, and from this I hope to generate testing and handling java classes .... CHEERS> SAM p.s. still not sure what the filters do in scanners. the filter parameter in HTMLTagScanner and HTMLFormScanner don't seem to be used for anything .... Sam Joseph wrote: > What I mean is that HttpUnit has to contain something similar to > Htmlparser somewhere in its code. |
From: Somik R. <so...@ya...> - 2002-12-20 18:04:40
|
Hi Sam, This is interesting - bcos I was doing much the same thing last evening. I've added a lot of searching methods into HTMLFormTag - as I was using this to do test-first development of an XSLT stylesheet. I was happy with the results - I was actually able to develop the stylesheet test-first. > I have started creating a TestCodeGenerator using > HtmlParser. I am > using an HTMLFormScanner to strip out all the form > details, and from > this I hope to generate testing and handling java > classes .... I will probably add some more utility methods into HTMLParserTestCase that will make life easier - but even in its current form, you might find it useful. I've documented it here : http://htmlparser.sourceforge.net/design/tests.html Regards, Somik __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Sam J. <ga...@yh...> - 2002-12-21 06:39:59
|
Hi Somik Somik Raha wrote: >Hi Sam, > This is interesting - bcos I was doing much the same >thing last evening. I've added a lot of searching >methods into HTMLFormTag - as I was using this to do >test-first development of an XSLT stylesheet. > I was happy with the results - I was actually able >to develop the stylesheet test-first. > Sounds cool. >>I have started creating a TestCodeGenerator using >>HtmlParser. I am >>using an HTMLFormScanner to strip out all the form >>details, and from >>this I hope to generate testing and handling java >>classes .... >> >> > >I will probably add some more utility methods into >HTMLParserTestCase that will make life easier - but >even in its current form, you might find it useful. > I think it will be. At first I was thinking I wanted lots of additional methods, but actually I can predicate behaviour based on the TYPE of the input tags. I'm getting output like this: [java] DEBUG [main] (TestCodeGenerator.java:296) - pTagName: FORM [java] DEBUG [main] (TestCodeGenerator.java:302) - attr: ACTION, value: $FORM_ACTION [java] DEBUG [main] (TestCodeGenerator.java:302) - attr: NAME, value: $EDIT_FORM [java] DEBUG [main] (TestCodeGenerator.java:302) - attr: METHOD, value: POST [java] DEBUG [main] (TestCodeGenerator.java:306) - x_inputs.size(): 7 [java] DEBUG [main] (TestCodeGenerator.java:321) - pTagName: INPUT [java] DEBUG [main] (TestCodeGenerator.java:327) - attr: VALUE, value: add_bookmark [java] DEBUG [main] (TestCodeGenerator.java:327) - attr: NAME, value: $ACTION [java] DEBUG [main] (TestCodeGenerator.java:327) - attr: TYPE, value: hidden and I think I can generate the necessary test and code shells from this. >I've documented it here : >http://htmlparser.sourceforge.net/design/tests.html > Interesting. I'm becoming more and more commited to a test-first philosophy myself, i.e. 1. write test 2. run it and watch it fail to confirm test will fail when implementation is not present 3. write the implementation 4. try and get implementation to pass test 5. repeat as appropriate Although actually my post was more about automatic test creation rather than testing itself. I recently found this: http://www.junitdoclet.org/ which automatically creates test shells out of code. And I found myself wanting the equivalent thing for httpUnit. Either way round I think if people are going to do testing they need support to make it easier to create the tests. And if they are working on the tests first they need support creating the implementation. It seems that really we want something like: 1. Specify use case, and data-in/data-out 2. Automatically generate test code and shell implementation code 3. Run test code to fail 4. fill in implementaton details ...etc Anyways..... CHEERS> SAM p.s. I'm impressed by the frequency with which you are releasing htmlParser, and your process of having multiple candidates etc. I struggle to release often as the release process itself still seems a little cumbersome (sourceforge has got better) .... have you any tips for streamlining it ....? I guess what I really need is an ant methods like ant release-bug-fix version ant create-new-version-release ant create-new-candiate-release which handle all the necessary communication with sourceforge, uploading, packaging and handling of release numbers .... |
From: Somik R. <so...@ya...> - 2002-12-21 08:47:08
|
Hi Sam > It seems that really we want something like: > > 1. Specify use case, and data-in/data-out > 2. Automatically generate test code and shell implementation code > 3. Run test code to fail > 4. fill in implementaton details ...etc > I like your idea. You can check this week's release - you'll find searching support for form tags which allow you to pick up input tags, textarea tags, select tags, ... Bytway... I'd written earlier about this - what is your opinion on using Bayesian networks to have a rule-based learning system, that gets better over time ? i.e. right now the tag identification mechanism is linear- there is only so far that can go. But with the sort of dirty html we get, the system has to be self-learning. I am thinking of an approach where we'd try to eliminate a lot of the hard-coded rules, with a learning network. Of course, we'd have our tests to verify that we haven't broken anything, and from there, it should only get better. It would be great to have your insight on this. > p.s. I'm impressed by the frequency with which you are releasing > htmlParser, and your process of having multiple candidates etc. I > struggle to release often as the release process itself still seems a > little cumbersome (sourceforge has got better) .... have you any tips > for streamlining it ....? I guess what I really need is an ant methods like > > ant release-bug-fix version > ant create-new-version-release > ant create-new-candiate-release > > which handle all the necessary communication with sourceforge, > uploading, packaging and handling of release numbers .... Ha ha! I am not sure if you'll believe this, but I was inspired to structure the htmlparser project based on the neurogrid project- you had ant scripts long before we did. Of course, ant scripts are so important to do the job automatically - but I like keeping things simple -in the sense, there is no seperate bug-fix version, but the next integration release (Candidate). I am not yet a fan of branches - they're ok if they dont live more than two weeks (I've been thinking real hard about it for a while). Im planning to get the production release out this week - so we can all move on to 1.3 (instead of having two versions - we'll live with 1.3 integration releases). I'd hate to make the same bug fixes twice. Regards, Somik |
From: Sam J. <ga...@yh...> - 2002-12-23 06:07:54
|
Hi Somik, Somik Raha wrote: >Bytway... I'd written earlier about this - what is your opinion on using >Bayesian networks to have a rule-based learning system, that gets better >over time ? i.e. right now the tag identification mechanism is linear- >there is only so far that can go. But with the sort of dirty html we get, >the system has to be self-learning. I am thinking of an approach where we'd >try to eliminate a lot of the hard-coded rules, with a learning network. Of >course, we'd have our tests to verify that we haven't broken anything, and >from there, it should only get better. It would be great to have your >insight on this. > I think I don't quite understand enough abuot htmlparser to see how Bayesian networks would be applicable. I have only recently worked out how your scanners work, or rather, that you have scanners for different types of tags and can then avoid processing those tags that you are not interested in. You say above that your tag identification mechanism is linear, but linear with respect to what? Can you give me an example of the hard coded rules you are using now, and a couple of examples of dirty html pages that cause them to be sub-optimal. Using learning in a system to increase efficiency is usually very difficult to do well. Learning systems basically have more flexibility than other systems, but as a consequence you have moer free parameters. It is easy to add a learning framework but then spend all your time just trying to adjust the system parameters, and then to discover that exploring the space of possible parameters for your learner is just too expensive. Nontheless I am always fascinated by the problem of adding learning to a system, precisely because it is so difficult to do well. If you can give me some concrete examples, I will do my best to help you select an appropriate learning mechanism. >>p.s. I'm impressed by the frequency with which you are releasing >>htmlParser, and your process of having multiple candidates etc. I >>struggle to release often as the release process itself still seems a >>little cumbersome (sourceforge has got better) .... have you any tips >>for streamlining it ....? I guess what I really need is an ant methods >> >> >like > > >>ant release-bug-fix version >>ant create-new-version-release >>ant create-new-candiate-release >> >>which handle all the necessary communication with sourceforge, >>uploading, packaging and handling of release numbers .... >> >> > >Ha ha! I am not sure if you'll believe this, but I was inspired to structure >the htmlparser project based on the neurogrid project- you had ant scripts >long before we did. > Interesting. The ant scripts for neurogrid were originally made by Rick Knowles, and I'm still only just getting a really good feel for ant. I remember trying to set up something in my ng scripts that would add the date to the jar file name, like you sometimes do, and failing. My own fault really; I rarely read the manual and always try to learn by modifying the operation of an existing system (kind of an evolutionary approach ....) >Of course, ant scripts are so important to do the job >automatically - but I like keeping things simple -in the sense, there is no >seperate bug-fix version, but the next integration release (Candidate). > >I am not yet a fan of branches - they're ok if they dont live more than two >weeks (I've been thinking real hard about it for a while). Im planning to >get the production release out this week - so we can all move on to 1.3 >(instead of having two versions - we'll live with 1.3 integration releases). >I'd hate to make the same bug fixes twice. > ok, but how much do you use the ant tasks to collect together all the thigns required for a release, or do you access souceforge yourself each time? Given that the code is ready to go, how long does it take you to do a release? 5 minutes, 30? I'm imagining an ant task that would require one command and then you'd leave it to run .... Really I will have to sit down and spend some time looking at your ant scripts again to work that out, but I thought it might be interesting to hear from you about whether the release process feels cumbersome or not... CHEERS> SAM |
From: Somik R. <so...@ya...> - 2002-12-24 07:28:40
|
Hi Sam, > Can you give me an example of the hard coded rules you are using now, > and a couple of examples of dirty html pages that cause them to be > sub-optimal. Here are some tags : [1] From neurogrid.com (debugging last year :) <a href="mailto:sa...@ne...?subject=Site Comments">Mail Us<a> [2] From freshmeat.net <a>revision</a> [3] From fedpage.com <a href="registration.asp?EventID=1272"><img border="0" src="\images\register.gif"</a> [4] From yahoo.com <a href=s/8741><img src="http://us.i1.yimg.com/us.yimg.com/i/i16/mov_popc.gif" height=16 width=16 border=0></img></td><td nowrap> <a href=s/7509><b>Yahoo! Movies</b></a> As you can see, dirty html hardly looks predictable. Especially when links are not closed correctly, the scanner has to guess when it should close the tag. And this is only for the link tag. For normal tags, [1] <sometag key1=value key2="value2 key3 = value3> [2] <sometag key1="<sometag>" key2="<!-- skdlskld -->"> The above two tags demonstrate a classic dilemma. If we ignore inverted commas, we cannot handle case 2, where the contents within inverted commas is valid text and not tags. All these examples are accepted by IE. All of these problems are currently handled by the parser, and I was looking to simplify the brain of the parser. > Using learning in a system to increase efficiency is usually very > difficult to do well. Learning systems basically have more flexibility > than other systems, but as a consequence you have moer free parameters. > It is easy to add a learning framework but then spend all your time > just trying to adjust the system parameters, and then to discover that > exploring the space of possible parameters for your learner is just too > expensive. > > Nontheless I am always fascinated by the problem of adding learning to a > system, precisely because it is so difficult to do well. If you can > give me some concrete examples, I will do my best to help you select an > appropriate learning mechanism. Thanks! > Interesting. The ant scripts for neurogrid were originally made by Rick > Knowles, and I'm still only just getting a really good feel for ant. I > remember trying to set up something in my ng scripts that would add the > date to the jar file name, like you sometimes do, and failing. My own > fault really; I rarely read the manual and always try to learn by > modifying the operation of an existing system (kind of an evolutionary > approach ....) Thats what I did too - but I started with the examples in the ant website (I think they believe in the approach too) Cheers, Somik |
From: Sam J. <ga...@yh...> - 2002-12-24 17:05:00
|
Hi Somik Somik Raha wrote: >Hi Sam, > > >>Can you give me an example of the hard coded rules you are using now, >>and a couple of examples of dirty html pages that cause them to be >>sub-optimal. >> >> > >Here are some tags : >[1] From neurogrid.com (debugging last year :) ><a href="mailto:sa...@ne...?subject=Site Comments">Mail Us<a> > >[2] From freshmeat.net ><a>revision</a> > >[3] From fedpage.com ><a href="registration.asp?EventID=1272"><img border="0" >src="\images\register.gif"</a> > >[4] From yahoo.com ><a href=s/8741><img >src="http://us.i1.yimg.com/us.yimg.com/i/i16/mov_popc.gif" height=16 >width=16 border=0></img></td><td nowrap> ><a href=s/7509><b>Yahoo! Movies</b></a> > >As you can see, dirty html hardly looks predictable. Especially when links >are not closed correctly, the scanner has to guess when it should close the >tag. > >And this is only for the link tag. For normal tags, >[1] <sometag key1=value key2="value2 key3 = value3> >[2] <sometag key1="<sometag>" key2="<!-- skdlskld -->"> > >The above two tags demonstrate a classic dilemma. If we ignore inverted >commas, we cannot handle case 2, where the contents within inverted commas >is valid text and not tags. All these examples are accepted by IE. > >All of these problems are currently handled by the parser, and I was looking >to simplify the brain of the parser. > Could you explain how all of these examples are handled by the current parser. Are you using some kind of specific rule to handle each case? Perhaps you can cut and paste a bit of code to the list to illustrate. The more precisely you can describe the operation of the existing parser when handling these kinds of cases, the more likely I can come up with a learner that will meet your needs. CHEERS> SAM |
From: Somik R. <so...@ya...> - 2002-12-17 06:46:55
|
Derrick Oswald wrote : > > charset > > Currently the charset directive within the HTML page is ignored. There > > may be a need to honour this parameter on the Content-Type field. I think this is the way to go. We're getting a nice to-do list for 1.3 :) Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-17 06:50:02
|
> Derrick Oswald wrote: > > > POST constructor. > > The basically two constructors that HTMLParser has either take a > > string URL or a HTMLReader. This shifts the onus on performing HTTP > > to the API user for POST operations. It might be good to have a > > HttpURLConnection or URLConnection argument constructor, where a > > primed and loaded connection is passed to the parser. Like Sam said - this sounds like HttpUnit. Are you using the parser for making tests ? Regards, Somik |
From: Derrick O. <Der...@ro...> - 2002-12-17 13:13:18
|
I've used this POST mechanism many times, not for testing a site. A typical example is fetching a postalcode by hitting the (for me) http://www.canadapost.ca/tools/pcl/bin/default-e.asp site and posting a filled in form. This has to be parsed, and a table element extracted. Derrick Somik Raha wrote: >>Derrick Oswald wrote: >> >> >> >>>POST constructor. >>>The basically two constructors that HTMLParser has either take a >>>string URL or a HTMLReader. This shifts the onus on performing HTTP >>>to the API user for POST operations. It might be good to have a >>>HttpURLConnection or URLConnection argument constructor, where a >>>primed and loaded connection is passed to the parser. >>> >>> > >Like Sam said - this sounds like HttpUnit. Are you using the parser for >making tests ? > >Regards, >Somik > > > |
From: Somik R. <so...@ya...> - 2002-12-19 07:47:04
|
> > I've used this POST mechanism many times, not for testing a site. > > A typical example is fetching a postalcode by hitting the (for me) > http://www.canadapost.ca/tools/pcl/bin/default-e.asp site and posting a > filled in form. This has to be parsed, and a table element extracted. I understand now! This sounds like a pretty useful feature to me - and if it is written test-first, I can't think of why this can't go into 1.2. The table element though is better off for 1.3.. a lot of clean-up needed in existing system. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-12-17 06:52:38
|
Derrick Oswald wrote: > beans > It might be nice to create one or more java beans that can be used > within GUI IDE's. The predefined behavior might be what the > parserapplications do now, but exposing some accessors on HTMLParser and > providing a zero arg constructor may also prove useful. > > executable jar > There is no default application for the htmlparser.jar, i.e. java -jar > htmlparser.jar doesn't do anything at the moment. A little GUI > application might be nice. I'm not talking a browser, but rather a demo > of the applications (i.e. a tree view of the links a la robot, a text > view a la StringExtractor, a list of mail addresses a la ripper etc. ). > This would utilize the beans mentioned above. Both are good ideas. Lets do this for 1.3. Regards, Somik |