Thread: RE: [Htmlparser-developer] lexer integration - added back visitEndTag

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Regarding your note about having TagFactory have signatures for all
possible tags... how will TagFactory be extended to account for new,
user defined tags?  Is it intended to be user extendable?

Thanks for the great work!

Sincerely,
James.

-----Original Message-----
From: Derrick Oswald [mailto:Der...@Ro...]=20
Sent: Sunday, September 28, 2003 12:33 PM
To: htm...@li...
Subject: [Htmlparser-developer] lexer integration - added back
visitEndTag

Fixed up the broken visitor logic.
Added some docos on NodeVisitor.

TODO
=3D=3D=3D=3D=3D

Serializable
--------------
The Parser needs to be made serializable again. This involves a=20
transient field down on the Source, I think, rather than having the=20
whole Lexer transient in the Parser.

TagData
-------
This has been reworked to allow it to limp along under the new system,=20
but it should really be removed. I think the reason for it (reduce the=20
number of arguments to tag constructors) no longer applies, and a lot of

the code could be easier to read if the Tag was more bean-like and had a

zero args constructor with appropriate accessors.

Helpers
-------
I desparately want to get rid of these 'helper' classes. They are just=20
obfuscating the code.

Node Factory
------------
The factory concept needs to be extended with a TagFactory (extending=20
NodeFactory) that has the signatures for creating all the possible types

of tags there are, and then this needs to be used by all the scanners to

create their specific tags.

Scanners
--------
The scanners may not be working, hard to tell without the unit tests=20
running. I'm not sure that CompositeTagScanner is completely all right=20
yet, It probably needs to be reworked based on the lexer.

Unit Tests
----------
As mentioned, many of the unit tests expect toHtml() to produce=20
capitalized and rearranged output. And parseAndAssertNodeCount() is=20
expected not to include so many whitespace nodes. These need to be=20
addressed.

Documentation
-------------
As of now, it's more likely that the javadocs are lying to you than=20
providing any helpful advice. This needs to be reworked completely.

As you can see there's lots of work to do, so anyone with a death wish=20
can jump in.  I'll be working my way from top to bottom of the TODO list

and commiting and notifying the developer list after each of them.  So=20
go ahead and do a take from CVS and jump in the middle with anything=20
that appeals. Keep the list posted and update your CVS tree often (or=20
subscribe to the htmlparsre-cvs mailing list for interrupt driven=20
notification rather than polled notification).

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer