From: SourceForge.net <no...@so...> - 2004-02-29 09:48:02
|
Feature Requests item #906825, was opened at 2004-02-29 09:44 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=361118&aid=906825&group_id=11118 Category: Configuration Group: future Status: Open Resolution: None Priority: 5 Submitted By: Amuro Namie (onanie) Assigned to: Nobody/Anonymous (nobody) Summary: Is regular expression for html parsing a good idea? Initial Comment: I have been trying to customise the "Banner-by-links" filter to improve it's effectiveness. It has been a very difficult process, and so far I've achieved some limited results. I've managed to teach it to recognise the <a_href...></a> boundary, eliminating the entire element if the content matches the keyword list. The problem is, this means that it would eliminate text links as well whose url matches the keyword list. A bit aggressive i think (but it's fun seeing a pr0n page with no links). I have not been able to get Privoxy to look for <img_src...> within the <a_href...></a> boundary before deciding to eliminate the whole element. If it were able to, then it would leave text links alone. I realise that what Privoxy does with it's "experimental" banner-by-links filter is to look for <a_href...> references that are followed immediately by an <img_src...> (only allowed to be separated by whitespaces), but unfortunately this is not always the case. Sometimes a <font...> reference is plonked in between, etc. etc., and thus this is how Privoxy misses some of the ads. There has been no suitable regular expression I can think of that that would determine whether there is a <img_src=...> within an <a_href...></a> boundary before it decides to zap the whole link. Maybe i'm missing something. Why is this in a feature request forum? I've been told that regexp is a poor tool for html parsing, and that using "modules" would be far superior. I have no idea how that will be applied to Privoxy's future, but I wished there was an easier way to do things. Privoxy is a great application by the way, being almost fully programmable. I'd just like to say, keep up the great work despite my rantings! Thanks! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=361118&aid=906825&group_id=11118 |