Thread: RE: [Htmlparser-developer] lexer integration - added back visitEndTag
Brought to you by:
derrickoswald
From: Couball, J. <jam...@co...> - 2003-10-07 15:49:50
|
Regarding your note about having TagFactory have signatures for all possible tags... how will TagFactory be extended to account for new, user defined tags? Is it intended to be user extendable? Thanks for the great work! Sincerely, James. -----Original Message----- From: Derrick Oswald [mailto:Der...@Ro...]=20 Sent: Sunday, September 28, 2003 12:33 PM To: htm...@li... Subject: [Htmlparser-developer] lexer integration - added back visitEndTag Fixed up the broken visitor logic. Added some docos on NodeVisitor. TODO =3D=3D=3D=3D=3D Serializable -------------- The Parser needs to be made serializable again. This involves a=20 transient field down on the Source, I think, rather than having the=20 whole Lexer transient in the Parser. TagData ------- This has been reworked to allow it to limp along under the new system,=20 but it should really be removed. I think the reason for it (reduce the=20 number of arguments to tag constructors) no longer applies, and a lot of the code could be easier to read if the Tag was more bean-like and had a zero args constructor with appropriate accessors. Helpers ------- I desparately want to get rid of these 'helper' classes. They are just=20 obfuscating the code. Node Factory ------------ The factory concept needs to be extended with a TagFactory (extending=20 NodeFactory) that has the signatures for creating all the possible types of tags there are, and then this needs to be used by all the scanners to create their specific tags. Scanners -------- The scanners may not be working, hard to tell without the unit tests=20 running. I'm not sure that CompositeTagScanner is completely all right=20 yet, It probably needs to be reworked based on the lexer. Unit Tests ---------- As mentioned, many of the unit tests expect toHtml() to produce=20 capitalized and rearranged output. And parseAndAssertNodeCount() is=20 expected not to include so many whitespace nodes. These need to be=20 addressed. Documentation ------------- As of now, it's more likely that the javadocs are lying to you than=20 providing any helpful advice. This needs to be reworked completely. As you can see there's lots of work to do, so anyone with a death wish=20 can jump in. I'll be working my way from top to bottom of the TODO list and commiting and notifying the developer list after each of them. So=20 go ahead and do a take from CVS and jump in the middle with anything=20 that appeals. Keep the list posted and update your CVS tree often (or=20 subscribe to the htmlparsre-cvs mailing list for interrupt driven=20 notification rather than polled notification). ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@Ro...> - 2003-10-08 01:13:05
|
If you've been following the developer threads, Joshua and I are still thrashing out the details on how that would work ;-) It will be extendable. Couball, James wrote: >Regarding your note about having TagFactory have signatures for all >possible tags... how will TagFactory be extended to account for new, >user defined tags? Is it intended to be user extendable? > >Thanks for the great work! > >Sincerely, >James. > >-----Original Message----- >From: Derrick Oswald [mailto:Der...@Ro...] >Sent: Sunday, September 28, 2003 12:33 PM >To: htm...@li... >Subject: [Htmlparser-developer] lexer integration - added back >visitEndTag > >Fixed up the broken visitor logic. >Added some docos on NodeVisitor. > >TODO >===== > >Serializable >-------------- >The Parser needs to be made serializable again. This involves a >transient field down on the Source, I think, rather than having the >whole Lexer transient in the Parser. > >TagData >------- >This has been reworked to allow it to limp along under the new system, >but it should really be removed. I think the reason for it (reduce the >number of arguments to tag constructors) no longer applies, and a lot of > >the code could be easier to read if the Tag was more bean-like and had a > >zero args constructor with appropriate accessors. > >Helpers >------- >I desparately want to get rid of these 'helper' classes. They are just >obfuscating the code. > >Node Factory >------------ >The factory concept needs to be extended with a TagFactory (extending >NodeFactory) that has the signatures for creating all the possible types > >of tags there are, and then this needs to be used by all the scanners to > >create their specific tags. > >Scanners >-------- >The scanners may not be working, hard to tell without the unit tests >running. I'm not sure that CompositeTagScanner is completely all right >yet, It probably needs to be reworked based on the lexer. > >Unit Tests >---------- >As mentioned, many of the unit tests expect toHtml() to produce >capitalized and rearranged output. And parseAndAssertNodeCount() is >expected not to include so many whitespace nodes. These need to be >addressed. > >Documentation >------------- >As of now, it's more likely that the javadocs are lying to you than >providing any helpful advice. This needs to be reworked completely. > > > > >As you can see there's lots of work to do, so anyone with a death wish >can jump in. I'll be working my way from top to bottom of the TODO list > >and commiting and notifying the developer list after each of them. So >go ahead and do a take from CVS and jump in the middle with anything >that appeals. Keep the list posted and update your CVS tree often (or >subscribe to the htmlparsre-cvs mailing list for interrupt driven >notification rather than polled notification). > > > > |