[Htmlparser-developer] Re: Final Statistics from Trek Run
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-07-11 22:42:37
|
MessageThe SWT is not a contender for replacing Swing. It may be an = alternative, applicable in many circumstaces, but a quick look at the = Sun's Swing connection should dissuade you from assuming that few people = are using Swing.=20 LOL! I was asking for trouble with that comment :). I guess its just me = that finds Swing unbearably slow. I would not endorse trying to make HTMLParser Swing-compatible. These = are different animals and should stay that way. The notion of providing = a SAX-like interface is interesting but you should look instead toward = XML pull-parsers, which are the high-performance alternatives now = surfacing more widely. There is a JSR = (http://www.jcp.org/jsr/detail/173.jsp) that is trying to unify a good = interface for pull-parsing (they're calling it a Streaming API). You'll = find this link especially intersting (http://www.xmlpull.org/). I will look into this advice seriously (will start by educating myself = on XML Pull-parsers).=20 HTMLParser has two fundamental strengths. 1) It's easy to use and = extend. 2) It's lightning fast. Don't lose sight of these distinctions. The whole XML community is = strugling to achieve these goals and hasn't quite gotten there yet. = There's much to learn from XML, but they are laregely moving in this = direction. Its interesting that this should come up - the other day someone was = suggesting to me if the HTMLParser might not be used for parsing XML.. BTW: JTidy is a serious performance bottleneck in a high-performance = application. Good to know that :), havent checked it out myself yet. Its great to have a knowledgable person like you join this parser = community. It will be of great value in taking the final steps towards = stabilizing the API of the parser. The next integration releases would = focus on incorporating your suggestions, regarding the exception = handling. Maybe first week of Sep might be a realistic date for the = release of 1.2 (unless I get loads of time or help). Regards, Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Friday, July 12, 2002 1:29 AM Subject: RE: [Htmlparser-user] Final Statistics from Trek Run The SWT is not a contender for replacing Swing. It may be an = alternative, applicable in many circumstaces, but a quick look at the = Sun's Swing connection should dissuade you from assuming that few people = are using Swing. =20 HTMLParser has two fundamental strengths. 1) It's easy to use and = extend. 2) It's lightning fast. =20 Don't lose sight of these distinctions. The whole XML community is = strugling to achieve these goals and hasn't quite gotten there yet. = There's much to learn from XML, but they are laregely moving in this = direction. =20 BTW: JTidy is a serious performance bottleneck in a high-performance = application. =20 -----Original Message----- From: Somik Raha [mailto:so...@ya...]=20 Sent: Thursday, July 11, 2002 2:25 AM To: htm...@li... Subject: Re: [Htmlparser-user] Final Statistics from Trek Run Hi Craig, For example, the renderer built into Swing's JEditorPane expects callbacks resulting from well-formed HTML with certain (sometimes arbitrary) characteristics. (For example, a <head><title>X</title></head> section must exist, and X cannot be = null). It is possible that the formatting of the input HTML into a = structure with these characteristics reduces the parser's performance in order = to produce a better render. =20 Indeed - perhaps a good idea would be to rewrite JEditorPane :) - = make an open source version, which is better designed. Swing = compatibility is a real pain - we gave up on that not so far back :). On = the other hand, I was thinking that SAX compliance would be feasible and = worth it - I doubt if many people are considering Swing for graphics = these days, especially with the SWT being out there. But the SAX = mechanism is quite popular and its worth being able to just switch = parsers. Of course, whether you need to take these considerations into = account depends entirely on your application. The htmlparser seems to lean = more toward the extraction of information rather than its representation, = and the latter is so fraught with ambiguities as to make it a task of a different order altogether. So true. Like you had mailed sometime back, JTidy does a good job of = that. =20 Regards, Somik ----- Original Message -----=20 From: Craig Raw=20 To: htm...@li...=20 Sent: Thursday, July 11, 2002 5:35 PM Subject: [Htmlparser-user] RE: [Htmlparser-developer] Final = Statistics from Trek Run Just a point to notice on these tests. The htmlparser, for all = it's merits, is not a direct functional replacement for the Swing = parser.=20 For example, the renderer built into Swing's JEditorPane expects callbacks resulting from well-formed HTML with certain (sometimes arbitrary) characteristics. (For example, a <head><title>X</title></head> section must exist, and X cannot be = null). It is possible that the formatting of the input HTML into a = structure with these characteristics reduces the parser's performance in = order to produce a better render. Of course, whether you need to take these considerations into = account depends entirely on your application. The htmlparser seems to lean = more toward the extraction of information rather than its = representation, and the latter is so fraught with ambiguities as to make it a task of = a different order altogether. -craig -----Original Message----- From: htm...@li... [mailto:htm...@li...] On = Behalf Of Somik Raha Sent: 11 July 2002 02:19 AM To: htm...@li...; htm...@li... Subject: Re: [Htmlparser-user] RE: [Htmlparser-developer] Final Statistics from Trek Run Hi Claude, Thanks a ton for all these tests. Do you think you could write an article on this that we could put up ? Regards Somik ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek PC Mods, Computing goodies, cases & more http://thinkgeek.com/sf _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |