[Htmlparser-developer] lexer integration
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2003-09-29 17:38:09
|
Fixed up the serializability. TODO ===== TagData ------- This has been reworked to allow it to limp along under the new system, but it should really be removed. I think the reason for it (reduce the number of arguments to tag constructors) no longer applies, and a lot of the code could be easier to read if the Tag was more bean-like and had a zero args constructor with appropriate accessors. Helpers ------- I desparately want to get rid of these 'helper' classes. They are just obfuscating the code. Node Factory ------------ The factory concept needs to be extended with a TagFactory (extending NodeFactory) that has the signatures for creating all the possible types of tags there are, and then this needs to be used by all the scanners to create their specific tags. Scanners -------- The scanners may not be working, hard to tell without the unit tests running. I'm not sure that CompositeTagScanner is completely all right yet, It probably needs to be reworked based on the lexer. Unit Tests ---------- As mentioned, many of the unit tests expect toHtml() to produce capitalized and rearranged output. And parseAndAssertNodeCount() is expected not to include so many whitespace nodes. These need to be addressed. Documentation ------------- As of now, it's more likely that the javadocs are lying to you than providing any helpful advice. This needs to be reworked completely. As you can see there's lots of work to do, so anyone with a death wish can jump in. I'll be working my way from top to bottom of the TODO list and commiting and notifying the developer list after each of them. So go ahead and do a take from CVS and jump in the middle with anything that appeals. Keep the list posted and update your CVS tree often (or subscribe to the htmlparsre-cvs mailing list for interrupt driven notification rather than polled notification). |