[Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #195 - 3 msgs
Brought to you by:
derrickoswald
From: <wf...@ma...> - 2003-02-18 04:27:30
|
From: "Somik Raha" <so...@ya...> To: <htm...@li...> Subject: Re: [Htmlparser-user] Anyone around using htmlparser together=20 with=20 >Lotus Domino? >Date: Sat, 15 Feb 2003 20:23:09 -0800 >Thats interesting - can you tell us how you are using the parser with=20 Lotus >Domino, and what your doubt is ? Thank you for your reply, Somik. Since Domino R6 things have changed a little, however it will take some=20 time until this release becomes widely accepted. So what I'm investigatin= g=20 is related with R5 that supports Java 1.1.8 natively. There are several=20 things I'm investigating: 1) Referrer Spamming: This is becoming increasingly popular since referrers can be tweaked so=20 easily. The blogging scene often presents a list of recent referrers w/o=20 any validation. This can trick webmasters and visitors into clicking=20 spammed ones. I'm looking for a way to filter for valid references only. Using Domino one can retrieve a HTML page including a list of hyperlinks=20 however a) performance is not impressive and b) this requires a web=20 interface database (perweb.nsf) is set up on the server. I'd prefer to us= e=20 the HTMLParser class instead. This looks like a simple one. 2) HTML translation/validation/repair Domino's proprietary rich text format dates back to the 80s when HTML=20 wasn't a standard. Domino's rich-text capabilites are impressive,=20 including nested interactive sections, features like hotspots,=20 script-enabled buttons, tabbed forms and alike. Due to compatibility=20 reasons Domino was web-enabled mainly not by downsizing this format to=20 HTML's native capabilites but by adding a richtext-to-html task and addin= g=20 a special URL syntax. Although displayed properly by browsers the=20 generated HTML is not clean, e.g. list tags are not closed, stuff like=20 this. I'm investigating if HTMLParser could be used to do some automatic=20 repair - content will be edited in Domino's RTF for convenience and the=20 resulting HTML is parsed, corrected and seperately stored for web=20 delivery. I assume to parse HTML forgivingly the parser needs to perform=20 some stack correction and I hope this can easily be used for HTML repair=20 as well? --=20 Mit freundlichen Gr=FC=DFen / Kind regards Wolfgang Flamme wf...@ma... Am Jungst=FCck 32 55130 Mainz-Laubenheim Tel.: +49 (6131) 8 74 02 Mobil: +49 163 25 43 166 |