[Htmlparser-developer] lexer integration

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Fixed up the serializability.

TODO
=====

TagData
-------
This has been reworked to allow it to limp along under the new system, 
but it should really be removed. I think the reason for it (reduce the 
number of arguments to tag constructors) no longer applies, and a lot of 
the code could be easier to read if the Tag was more bean-like and had a 
zero args constructor with appropriate accessors.

Helpers
-------
I desparately want to get rid of these 'helper' classes. They are just 
obfuscating the code.

Node Factory
------------
The factory concept needs to be extended with a TagFactory (extending 
NodeFactory) that has the signatures for creating all the possible types 
of tags there are, and then this needs to be used by all the scanners to 
create their specific tags.

Scanners
--------
The scanners may not be working, hard to tell without the unit tests 
running. I'm not sure that CompositeTagScanner is completely all right 
yet, It probably needs to be reworked based on the lexer.

Unit Tests
----------
As mentioned, many of the unit tests expect toHtml() to produce 
capitalized and rearranged output. And parseAndAssertNodeCount() is 
expected not to include so many whitespace nodes. These need to be 
addressed.

Documentation
-------------
As of now, it's more likely that the javadocs are lying to you than 
providing any helpful advice. This needs to be reworked completely.

As you can see there's lots of work to do, so anyone with a death wish 
can jump in.  I'll be working my way from top to bottom of the TODO list 
and commiting and notifying the developer list after each of them.  So 
go ahead and do a take from CVS and jump in the middle with anything 
that appeals. Keep the list posted and update your CVS tree often (or 
subscribe to the htmlparsre-cvs mailing list for interrupt driven 
notification rather than polled notification).