Re: [Htmlparser-developer] future directions

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Somik,

I guess I'm flattered that you think highly of my contributions, but I 
stand on the shoulders of giants.  You're probably feeling burnt out 
after two years and want to get your life back.  I suggest this.  If no 
one else comes forward, grant me the Project Manager role and forget 
about it for the summer (if you can).  I'll try to be the "Somik" by 
doing the builds, answering email, performing triage and fixing bugs, 
contributing to discussions, and so on, but it's unlikely that the 
quality will be as good. I'm not naive enough to think I'm the one with 
the vision to move it forward.  If it's a rest you want, I can only 
promise a short respite.  We'll both consider it again in September, and 
if you still feel burnt out, you can drop your Project Manager role 
altogether, in favour of me or somebody else.  Or, if I haven't 
completely misread you, you can pick it up again after your sabbatical.

p.s.
I didn't mean to imply that the existing testcases be discarded. No, 
they are excellent.
But do they have good code coverage? Do they represent a significant 
portion of valid HTML constructs? Can they be said to probe the invalid 
HTML space in a meaningful way? The only way to do this is via 
automation. Perhaps the fit framework can be used as the front end for 
it, but nobody is going to hand-craft thousands of tests.

Derrick

Somik Raha wrote:

><snip>
>
>
>The testcases in the parser are anything but ad-hoc.I have specifically
>avoided going down the path of the DTD. Lots of parsers exist which follow
>the DTD to the letter and fail miserably on real-world html. It is not
>difficult to construct a "correct" htmlparser using a grammar and javacc.
>What differentiates us is that we have a massive testcase database from
>real-world html which took almost 3 years to accumulate. It is this alone
>that gives us the confidence to totally change the design at any given time
>and still know that we're doing it right. This is what allowed me to
>redesign the CompositeTagScanner, and not break anything new. We are at a
>position where we can go ahead and redesign whatever we wish.
>
>With this release, I have tried to incorporate the Fit framework
>(http://fit.c2.com) which makes writing tests a breeze. Now, to write new
>tests, you will only use a html editor like Netscape Composer or M$ Word and
>you can run them! I have made some tests for the AttributeParser (and I
>already found a bug).
>
>I think we can think of making release 1.3 very soon - and probably decide
>to seal it from now. My recommended to-do list for 1.3 is :
>[1] Redesign StringParser
>[2] Redesign AttributeParser
>[3] Redesign NodeReader
>
>These three look particularly scary right now.
>
>On to project management issues, I am planning to step down from the role of
>the project lead. Your contribution has been substantial and has added so
>much value to the parser. You also have a solid vision to take this project
>to v1.4 - I really like your ideas about semantic capabilities and have been
>thinking on the same lines. I hope you will volunteer for this
>responsibility. There is not much extra work involved - the build scripts
>have been there for a while - all you have to do is make releases whenever
>you feel (one week is usually good). I will of course be there to help in
>whatever way I can.
>
>Regards,
>Somik
>
>
>  
>