Re: [Htmlparser-developer] toPlainTextString() feedback requested
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-12-28 08:19:10
|
Hi Claude, >I think it may well be that the group's size is hitting critical mass and that minimal formalities are now required... Good to hear from you. I quite agree that this project seems to be reaching critical mass. We need to share a plan of action that we could all work on parallelly. First, I've been noticing that we're not consistent on the coding convention. In the past, if I see a class that does not follow the convention, I'd go ahead and fix it. However, as so many people are now working parallelly, its important that we all share the same coding convention. The one I follow is the standard Java coding convention. In addition, I particularly don't like using names with single characters in them - i.e. I'd prefer "tag" to "pTag". That is the convention followed in most of the parser. The coding convention is of course open for discussion if anyone feels that we should accomodate any particular practice. Second, I am planning to simplify the design of the scanners. As Dhaval mentioned earlier, and as Sam mentioned, it wasn't easy to write a new scanner. I agree that the javadoc of getParsed() should be better than what it is. However, I think the problem lies with the name itself. In the sense, if we rename "parsed" to "parameterTable" or "attributeTable" - and getParsed() becomes, getAttributeTable() - the javadoc would become redundant. You could say that this is a "documentation" change, or you could call this a "refactoring". Either way, it is developer and user-friendly. So I think Sam's suggestions and our current direction are one and the same. Third, there's way too much duplicate code in the tags. Almost every tag runs thru its attributes to render it as html - whereas this can be done from the superclass - HTMLTag. This is just one example. I've not been focussing much on reducing duplication - and we can see serious code smells. However, I think it is important to be able to quantify what we hope to achieve by undertaking several rounds of refactoring. As we keep refactoring, I will keep a tab on the size of the parser (using Lines of Code). If there's any other metric that people think is important, we could include that as well. It will be interesting to see how much code we can take out. Fourth, this is one project where change is welcome - all the time. Simply bcos it has a massive suite of tests which represents most of the user requirements that we've received till date. The real-value in the parser is not really its production code - but its solid test suite. If there are interesting design changes that will make the current system simpler, they should be tried out without fear - for if we break anything, we will know bcos of the automated testing. It is important that the testing suite is therefore regarded just as important (if not more) as the production code, and steps taken to improve its design, and make it very easy to add new tests all the time. Fifth, as the project grows - new features should keep coming in all the time. Derrick Oswald has been working away at so many cool new features - and he's been writing tests for them all - this is a much-appreciated activity and must continue. I'd be happy to not think about new features and completely focus on solidifying the existing system, and let Derrick focus on adding new stuff - for v1.3. Sixth, we have a very serious possibility of using AI in making tag recognitions. I have to find the time to write about the current recognition mechanism, and pass that on to Sam Joseph. Sam has considerable experience in artificial intelligence, and it will be good for the project to have his expertise in its core correction logic. Seventh, it might be good to have a Wiki for the parser - as so many open source projects do. That way, the entire burden of documentation is not on any one person. We should be looking at options of having our own Wiki so we can add content easily and collaboratively. Pls feel free to add/modify this list if I've missed out on anything. Regards, Somik |