RE: [Htmlparser-developer] toPlainTextString() feedback requested
Brought to you by:
derrickoswald
From: <dha...@or...> - 2002-12-26 09:32:50
|
Hi, I agree with Sam when he says that "don't fix it when its not broken". But at the same time I see the need to make code better, more readable and simpler. However as suggested below it seems that the Visitor pattern is going to make things difficult to understand. And I too like Sam had a problem with HTMLParser initially. It may be with the documentation but this whole tag-scanner things was extremely confusing in the beginning(though subsequently I have loved them so much that I have even written a few myself without any problem). Hence I think that even if the visitor pattern is used, the user must have some simple easy-to-use methods to get his work done rather than try to understand the Visitor pattern. Just like Same, this was my two cents. Cheers, Dhaval -----Original Message----- From: gaijin [mailto:ga...@yh...] Sent: Tuesday, December 24, 2002 12:21 PM To: htmlparser-developer Cc: gaijin; htmlparser-user Subject: Re: [Htmlparser-developer] toPlainTextString() feedback requested Hi Somik and Joshua Joshua Kerievsky wrote: >>Could you explain why you want to refactor these methods? Remember the >>danger of premature refactoring ... you lose flexibility that then has >>to be re-added later on, making more work in the long run. >> >> >There's a good deal of duplicate code in way the two toHTML methods and the >toPlainTextString method do their work. The central theme is information >accumulation/alteration. That involves outputing tag and node results and >recusing through tags. The refactoring to Visitor allows us to > >* remove many lines of duplicate code, spread across many classes >* remove hard-coded accumulation/alteration logic, thereby making it easier >for clients to get the data they need > >Visitor takes some getting used to. I rarely use the pattern. In this case, >IMO, it was a good fit. > The Visitor pattern sounds interesting, and I look forward to hearing more about it. However, duplicated code itself is not IMO necessarily an evil. It all depends on whether one thinks that the duplicated components are going to diverge in functionality in the future. If you are sure they are not, then fine, refactor away. I guess my surprise at your (or perhaps Somik's) focus on refactoring comes from the fact that while the htmlparser is a great piece of software, the javadocs and other documentation could use some attention. For example, I can't find any explanation in the javadocs or otherwise of how the filters are supposed to work with the different scanners, or what values they are allowed to take. I generally work to "if it's not broken don't fix it", but I often add "before you start fixing it, make sure your documentation is up to date". Using the Visitor pattern may make it easier for clients to get the data they need, but given that the htmlparser is "working" (well it works for me), I would say that the more urgent issue here is making sure all the documentation is up to date. I have a lot of positive things to say about htmlparser, so don't take it the wrong way, when I say that the biggest problem I've had in using it in the last few weeks is inadequate javadocs. >>Is there some efficiency reason why you want to refactor these methods >>or is it just for neatness? >> >> > >Duplication removal is reason #1. > As I mention above. One should be careful of duplication removal for the sake of it. > Removal of hard-coded logic is reason #2. > This is a good reason. However I get the feeling that introduction of these Visitor classes will make the system conceptually more difficult to use rather than easier. I would feel better if the current set up was more fully documented before more complexity was added. And even if the Visitor pattern is used, I would recommend leaving methods like toPlainTextString() etc in place, but just making them short cut implementations to certain kinds of visitor-using methods. This will allow people who have yet to grasp the Visitor pattern something to work with. If you are keen to see lots of people using htmlparser, I think that you don't want people to have to come to terms with too many new concepts at once. You say yourself that the Visitor pattern takes some getting used to. I think the whole scanner concept takes some getting used to .... >Simplicity is reason #3: there is little reason to fatten the interfaces of >tag and node classes with various data accumulation/alteration methods when >one method and a variety of concrete Visitors can do the job with much less >code. > well I would agree if you could guarantee that there will be no divergence whatsoever in how the different methods will be used. If you can create a flexible enough implementation of the Visitor pattern then I guess that will support any possible divergence in the separate methods. However, I think there is a reason to have a fatter interface, in that convenience methods lower the barrier to entry for new users. Perhaps ideally one has a well implemented Visitor pattern that supports a raw method access, and a number of convenience methods? A well implemented Visitor pattern will, I assume, support all sorts of different operations, but I would feel much happier if the htmlparser had a complete javadoc and documentation review before any refactoring took place. People are trying to use the existing system and having trouble not because of the lack of refactoring, but a lack of well described methods. Well I say people, I mean me, I don't know if anyone else feels the same. Maybe it's just me :-) Just my two cents. CHEERS> SAM ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |