Thread: [Htmlparser-developer] Re: [Htmlparser-user] Testing/feedback, question
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-06-26 01:38:07
|
1) There is command line handling and connection-oriented code in HTMLParser. This code should be uncoupled. Perhaps an HTMLParserMain class to handle the command line wrapper, keeping the HTMLParser code dedicated to parsing? Good suggestion. This refactoring should be done. 2) Thanks for filling in the toString methods in 1.2. I had noticed most missing in 1.1 and was concerned. While there's room for minor improvement (the use of StringBuffer to build strings and a consistent naming conventions), these are minor quips. I've found it useful to have a void toString(StringBuffer buffer); method variant in container classes, for building up strings from contained classes more efficiently. We need to go thru a phase of optimization looking at the strings used. = The toString(StringBuffer) method also sounds useful.=20 3) I love the existence of the toHTML(); methods. This was the suggestion of Sam Joseph (it used to be toRawString() in = older integration releases). Thanks Sam! 4) I see it's now possible to get something by calling getTag. This was missing in 1.1. Thanks. Hmm.. This method should actually read getTagName().=20 5) I noticed a lot of code in the HTMLTag class which is 'private static'. This suggests the need for an external class to handle this type of work. At peripheral glace, I'm presuming you're functioning as a Finite State Machine (thus the 'automata' prefix)? Ah yes, I have been thinking of doing this refactoring for a while, and = also refactor the other finite state machines for strings and remarks. Thanks for the big investment. I'd be happy to spend a little time helping with some of the grunt work. If you think the use of the Callback mechanism is good, for example, I could replace all the System.out and System.err for you and send you the code. You are most welcome to join us - as I mentioned, I'd be happy to add = you as a developer.=20 6) I noticed that you don't have a custom exception class. I have code kicking around that implements chained exceptions (as in Java 1.4) but is compatible with earlier Java versions. Chained exceptions are incredibly useful for wraping underlying exceptions into higher-level exceptions while retaining the stack trace. This results in highly usable libraries because it provides suitable high-level explanations of a problem, while retaining lower level context. Sounds like a great idea. Pls go ahead and add it to the CVS version. 7) I also have a very simple but versatile command line handler class that you can use if you like. It lets you retrieve arguments as either flags or parameter-followed options, single or multiple letter commands, order-depentent, etc. While simple, this is one of those classes that nobody should live without ;-). It would be good to have this in the parser. Great to have you on board! Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-06-26 02:01:17
|
WRT exception handling vs. feedback, only fatal exceptions should be = thrown and feedback, where you are currently using System.out or = System.err should go through an interface that users can reroute as they = might prefer (to logs, console or ignore them). I have written up the = classes and packaged them under the com.kizna.html.util package. I can = send these to you in any form you like. I agree. The existing System.err.println() statements - I think they all = indicate fatal errors - hence should be converted to an exception = throwing system. The Callback mechanism should also come in so we can start using it in = the rest of the library. Also - another issue I have been thinking of is SAX compliance. I dont = think it will be hard to make callbacks from the parse() method... What = do you think ? The files are: =20 HTMLFeedback DefaultHTMLFeedback FeedbackManager HTMLParserException (a chained exception class). =20 You put them in CVS. Do you think it'd be better to have a = com.kizna.html.exceptions package instead of util, for better naming = conventions ? I am debating whether to keep the ChainedException class as a base class = for more general use and use an HTMLParserException subclass. Any = thoughts? Hmm.. I'd need to see the code before I can comment. Since you are now going to be a developer - here are two important = guidelines (which you might be already following) : [1] all the code that is checked in must come with testcases and should = not break existing tests. As of now the parser is almost 100% covered by = tests. [2] The bug fixing strategy is - write a testcase to simulate the bug, = make the testcase fail, then fix the bug. Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-06-26 02:05:56
|
1) Point me to something that will tell me how to setup CVS to get an = update and I try to get set up to check things in. From your other mail, it seems you got CVS to work. You definitely need = SSH to check in code - http://cdx.sourceforge.net/win-HOWTO.htm=20 I was using Tortoise CVS earlier - its important that you make a = checkout once using SSH from your dos shell. Then you can continue to = update and commit using Tortoise CVS. The better and more elegant option is to use Eclipse - the great free = Open Source IDE supported by IBM - it interfaces very cleanly with CVS = and SSH (extssh), and you dont need to setup anything. Lets continue our tech discussions on the developer list. Cheers, Somik |