[Htmlparser-developer] RE: [Htmlparser-user] version 1.5
Brought to you by:
derrickoswald
From: John M. <jo...@rt...> - 2004-02-17 18:25:38
|
custom tags with namespaces would also be a nice feature. Ala <rte:body></rte:body> we use those for marking the test that our Lucene search engine should index. At the moment I am using a simple substring method to parse out the text between these tags, but having htmlparser support them out of the box would made things a lot more efficient for more complex pages with multiple tags. John On Tue, 2004-02-17 at 18:11, Marc Novakowski wrote: > I'm a big fan of server-side transforms. That is, scanning an HTML document and transforming parts of it into custom markup and/or DHTML. I do this using a servlet filter in Tomcat. > > I'm currently using an older version of the library (from 08/24/2003) -- before the major code changes were made, mostly because I've been too busy working on other things to port my code to the new APIs. I hope to get to it eventually! :) > > However, if you're looking for feedback, then here's what I would find useful in the library. It may or may not already do the following to certain degrees. But if anything in this list can be made easy(ier) than I'm all for it: > > - scan an HTML page for "custom" XML/HTML tags embedded within the HTML > - maintain both the original HTML and the location of the XML "islands" within it > - provide mechanisms to parse different kinds of custom tags, including the following: > - very simple tags (like <br>) > - value-only tags (like <a>value</a>) > - composite tags (like <ul>) > - tags that contain "anything", which the parser simply skips over > (similar to <script>, but even dumber so that all it looks for is the closing tag) > > - APIs that allow the definition of the custom tags (above) without having to create a custom scanner and tag class for each one > > For illustrative purposes, here's an example of what some of my custom tags look like: > > <html> > <body> > <h2>Here is the chart</h2> > <Component name="myChart" incorporates="Chart"> > <String name="backgroundColor" value="white"/> > <String name="foregroundColor" value="black"/> > <Number name="width" value="200"/> > <Number name="height" value="400"/> > <Reference name="data" value="dataModel"/> > <Method name="changeSize"> > <Param name="width"/> > <Param name="height"/> > <Impl> > // This is javascript code > this.width.set(width); > this.height.set(height); > this.render(); > </Impl> > </Method> > </Component> > <hr> > blah blah .... (more HTML) .... > > </body> > </html> > > > > Hope this helps! > Marc > > -----Original Message----- > From: Derrick Oswald [mailto:Der...@Ro...] > Sent: Tuesday, February 17, 2004 4:40 AM > To: htm...@li...; > htm...@li... > Subject: [Htmlparser-user] version 1.5 > > > Now that version 1.4 is nearly put to bed, it's time to look forward > into the future to visualize or 'blue sky' the features that could be > incorporated in the next version of the parser. There are a small number > of feature requests that have accumulated over the last few months that > can serve as a starting point: > http://sourceforge.net/tracker/?group_id=24399&atid=381402 > > But what is really required are some real use-cases that aren't > addressed by the curent parser, which will lead to real requirements, > which lead to real features that can be added to the parser for the next > version. What does everyone do with the htmlparser that could be built > into it? Or more to the point, what capabilities are lacking that cause > a developer to *not* use htmlparser and do it themselves some other way? > Does anybody have any ideas? Does anybody have some applications they > would like to add to the htmlparser codebase so that 'out-of-the-box' it > does what they want? In general, what directions should development > take, i.e. HTML correction or editing, XML, robots, server side > transforms etc.? Has anybody got some pet peeves they want cleared up? > Come on, give it up. Now's the time. > > Derrick > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id56&alloc_id438&op=click > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user -- John Moylan ---------------------- ePublishing Radio Telefis Eireann, Montrose House, Donnybrook, Dublin 4, Eire t:+353 1 2083564 e:joh...@rt... ****************************************************************************** The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please note that emails to, from and within RTÉ may be subject to the Freedom of Information Act 1997 and may be liable to disclosure. ****************************************************************************** |