[Htmlparser-developer] SearchFilter
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2006-05-09 03:14:10
|
Ian, The conversion of case requires either an assumption of encoding or an explicit one. See for example the additional Locale property on StringFilter. The regex library requires or assumes a strategy, either MATCH, LOOKINGAT or FIND. See for example the additional int property on RegexFilter. I'm not sure how much could be gained by subclassing the existing HasAttributeFilter. Another strategy would be to add boolean properties for 'InText' (on by default), 'InAttributeName', and 'InAttributeValue' to the StringFilter and RegexFilter. Then of course you would need to add an AttributeName property. The attribute name being allowed to be null is a good idea, and would be the default if it's just not set, no need for an extra boolean 'nameIsNull' property. By the way, searching the tag name would come for free if the attributes checking loop started at index zero. That would mean adding three boolean and a string property to the two classes. I think these are differences enough to warrant new classes. In fact, maybe this should be one really prickly class called a SearchFilter that combines what StringFilter and RegexFilter do, plus the above. I don't think something can be case-insensitive and a regex filter though, so these aren't completely orthogonal. So maybe a 'type' property: straight string match case insensitive match - needs or assumes a Locale regex match - needs or assumes a strategy I leave it up to you though. Sounds like a fair piece of work. The extra constructors on the AndFilter and OrFilter are also good ideas. The XorFilter seems like a good thing to round out the logical operations. Would it also take an array of filters and only return true if just one is matched? The FilterBuilder would need to be altered to handle these changes of course, assuming this was a goal. This would be easier if there were just new SearchFilter and XorFilter classes rather than changes to the existing HasAttributeFilter, StringFilter, and RegexFilter (because new classes could be ignored, like the CssSelectorFilter is currently being). Derrick Ian Macfarlane wrote: > I would also like to be able to set the attribute as null but the > attribute value as not-null. In this case, it should attempt to match > all attributes against the attribute value. > > Please email me if you have any objections to this (or anything else). > > Thanks > > Ian Macfarlane > > On 5/8/06, Ian Macfarlane <ian...@gm...> wrote: > >> I would like to add the following functionality to HasAttributeFilter: >> >> 1) A boolean flag to set if the matching should be case-insensitive. I >> think this could be done with a boolean, one new constructor (String >> attribute, String value, boolean attribValue) and get/set method pair. >> >> 2) A flag to mark that the attribValue should be parsed as a regular >> expression (I don't really see the benefit of doing this with the tag >> name). This should also obey the case-sensitivity rule in (1). For >> this, I imagine a further constructor and get/set method pair. (a >> sample use case of this is "post\d+" to match post1, post22, >> post343545, etc). >> >> >> I'm willing to go ahead and code these, but I thought I should run >> this past you other developers too in case you dislike either idea. >> I'm also open to either: >> >> a) putting the regexp stuff in a subclass of HasAttributeFilter (but >> it seems a small enough change to be suitable as part of the class >> size-wise). >> >> b) changing the one/two boolean constructors to be one constructor >> that takes an INT flag, and add flags for the different combinations >> (e.g. CASE_SENSITIVE = 1, USE_REGEX = 2, so both together would be 3). >> This seems unnecessarily complex, and doing it the way I suggested >> above still allows for this in the future if desired. >> >> >> Thanks for your feedback, >> >> Ian Macfarlane >> > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel?cmd=k&kid0709&bid&3057&dat1642 > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |