Re: [Htmlparser-user] Parsing for links
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-08-07 22:16:23
|
Hi,=0A=0AThe HasAttributeFilter should have worked... at least enough to ex= tract all links with the Id attribute:=0A new AndFilter (new TagNameFilter= ("A"), new HasAttributeFilter ("Id"))=0A=0AThat said, there isn't a "HasAt= tributeRegexFilter" that would match an attribute value pattern,=0Aalthough= it has been discussed on the dev forum - or was that the LinkRegexFilter?= =0A=0AWhat you need is a combination of the HasAttributeFilter and the Rege= xFilter, where the exact equality test in the accept() method of HasAttribu= teFilter is replaced by the pattern matching code from the RegexFilter. Som= ething like this:=0A=0A /**=0A * Accept tags with a certain attribut= e.=0A * @param node The node to check.=0A * @return <code>true</cod= e> if the node has the attribute=0A * (and value if that is being check= ed too), <code>false</code> otherwise.=0A */=0A public boolean accep= t (Node node)=0A {=0A Tag tag;=0A Attribute attribute;=0A = String string;=0A Matcher matcher;=0A boolean ret;=0A= =0A ret =3D false;=0A if (node instanceof Tag)=0A {=0A= tag =3D (Tag)node;=0A attribute =3D tag.getAttribute= Ex (mAttribute);=0A ret =3D null !=3D attribute;=0A i= f (ret && (null !=3D mValue))=0A {=0A string =3D = attribute.getValue ();=0A matcher =3D mPattern.matcher (stri= ng);=0A switch (mStrategy)=0A {=0A = case MATCH:=0A ret =3D matcher.matches ();= =0A break;=0A case LOOKINGAT:=0A = ret =3D matcher.lookingAt ();=0A = break;=0A case FIND:=0A default:= =0A ret =3D matcher.find ();=0A = break;=0A }=0A }=0A }=0A=0A retu= rn (ret);=0A }=0A=0ADerrick=0A=0A----- Original Message ----=0AFrom: Mar= k Goking <Mar...@as...>=0ATo: htm...@li...urcef= orge.net=0ASent: Tuesday, August 7, 2007 5:19:29 AM=0ASubject: [Htmlparser-= user] Parsing for links=0A=0A=0AHi all=0A=0AI used the filterbean class to = extract only tags with links <a href>=0A=0AHowever I wish to only retrieve = links that have an id attribute with=0Avalue that starts with string test_= =0A=0AI don't see any method in the api that lets you do a search for the i= d's=0Avalue that acts like a String's indexOf() method.=0A=0AWhat would be = the filters needed for this operation? Even though ive=0Aadded attributes t= o the LinkTag to search for id=3Dvalue attribute, it=0Astill wont work.=0A= =0AThanks=0AChester=0A=0A--------------------------------------------------= -----------------------=0AThis SF.net email is sponsored by: Splunk Inc.=0A= Still grepping through log files to find problems? Stop.=0ANow Search log = events and configuration files using AJAX and a browser.=0ADownload your FR= EE copy of Splunk now >> http://get.splunk.com/=0A________________________= _______________________=0AHtmlparser-user mailing list=0AHtmlparser-user@li= sts.sourceforge.net=0Ahttps://lists.sourceforge.net/lists/listinfo/htmlpars= er-user=0A=0A=0A=0A=0A |