Re: [Htmlparser-developer] toPlainTextString() feedback requested
Brought to you by:
derrickoswald
From: Sam J. <ga...@yh...> - 2002-12-28 09:25:17
|
Hi Somik Somik Raha wrote: >Second, I am planning to simplify the design of the scanners. As Dhaval >mentioned earlier, and as Sam mentioned, it wasn't easy to write a new >scanner. I agree that the javadoc of getParsed() should be better than what >it is. However, I think the problem lies with the name itself. In the sense, >if we rename "parsed" to "parameterTable" or "attributeTable" - and >getParsed() becomes, getAttributeTable() - the javadoc would become >redundant. You could say that this is a "documentation" change, or you could >call this a "refactoring". Either way, it is developer and user-friendly. So >I think Sam's suggestions and our current direction are one and the same. > > I think that just changing the name of the method is not quite enough. Certainly changing the name of the method is a good step, but I think javadoc comments that actually show an example of what is being returned make a huge usability difference, particularly when the thing being returned is something as complex as a hashtable. If we want to call changing method names and adding documentation, refactoring then I guess I am a huge fan of refactoring :-) However the other part of what I was saying is that I would like to see the documentation additions before we change method names. I mean you can deprecate away so as not to break my existing code ... I still think you should release a version with a full javadoc "refactoring" before you release anything with refactored code. What's the harm in focusing on the documentation first? I mean we're on version 1.2 now right? What about a version 1.2.1 that included the documentation fixes, before moving on to a 1.3-beta that included the newly refactored code that you're so keen to start work on. What would be the downside of proceeding in this fashion? >Sixth, we have a very serious possibility of using AI in making tag >recognitions. I have to find the time to write about the current recognition >mechanism, and pass that on to Sam Joseph. Sam has considerable experience >in artificial intelligence, and it will be good for the project to have his >expertise in its core correction logic. > This sounds like an interesting possibility, but I still need to understand how the current parser handles all the existing messy tags. I mean when you started talking through all the messy html examples I though you were going to be showing me something that didn't work and that you wanted some machine learning/AI to fix, but you ended up saying that all the examples could be handled by the exisiting parser. If that is so, why do you want to change it, and are there examples of messy code that can't be handled by the parser? If there aren't and its all working fine, what advantage do we gain by replacing the existing core correction logic with some hypothetical AI alternative? >Seventh, it might be good to have a Wiki for the parser - as so many open >source projects do. That way, the entire burden of documentation is not on >any one person. We should be looking at options of having our own Wiki so we >can add content easily and collaboratively. > You are most welcome to use the wiki I have set up in the short term: http://www.neurogrid.net/devwiki/wikipages/HtmlParser It uses an open source java wiki (devwiki), which is not as fully functional as it might be, but I've got it set up so everything gets backed up to CVS and all changes get sent to the neurogrid-cvs mailing list. If nothing else it might serve as an example wiki environment. CHEERS> SAM |