Re: [Htmlparser-user] extract flickr tags
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-10-19 02:15:05
|
Once you have the paragraph tag (assuming it is defined to be a composite t= ag) you should be able to pass the NodeList from getChildren() through a fi= lter via extractAllNodesThatMatch(new TagNameFilter("A")) to extract all th= e link nodes. Then process the links from the resulting list and extract th= eir text using getLinkText().=0A=0A----- Original Message ----=0AFrom: M Me= ncke <me...@gm...>=0ATo: htm...@li...=0ASent= : Thursday, October 18, 2007 9:40:48 AM=0ASubject: [Htmlparser-user] extrac= t flickr tags=0A=0AHello, I have installed and ran the html parser and I ca= n get it to extract all of the text from the following web page:=0A=0Ahttp:= //www.flickr.com/photos/mariposa-de-amor/tags/=0A=0A=0A(mariposa-de-amor is= just an example, this could be any user name).=0A=0ABut I want to extract = ONLY the 150 most popular tags. In other words, I want to output a plain te= xt file with the users 150 most popular tags, like this:=0A=0A=0A35faves a = aberdeen abigfave anawesomeshot aplusphoto athousandwords avianexcellence b= aby band beach beautiful bedouin belis birds blogthis blueribbonwinner brav= o bridge bw castle child children church cindrel city clouds clova cluj col= ourartaward concert copii criket ..........................................= ...=0A=0A=0A=0AThe page source is like this:=0A=0A = <p id=3D"TagCloud"=0A>=0A=09=09=09=09=09 <a href=3D"/photos/marip= osa-de-amor/tags/35faves/" =0Astyle=3D"font-size: 14px;">35faves</a> = =0A=09=09=09=09=09&=0Anbsp;<a href=3D"/photos/mariposa-de-amor/tags/a/" sty= le=3D=0A"font-size: 14px;">a</a> =0A=09=09=09=09=09 <a=0A href=3D= "/photos/mariposa-de-amor/tags/aberdeen/" style=3D"font-size: 15px;">aberde= en</=0Aa> =0A </p>=0A SO I= want only the words 35faves, aberdeen, .........=0A=0AI could try to get t= he text inside <p id=3D"TagCloud"> and </p>, but that is not effective beca= use I extract " <a href............ " and I don't want this additional= text in my output.=0A=0A=0Aand I have tried this using Trimtags and in any= case, it doesnt work for me. Does anybody know of an existing method with = which I could do this, or can you offer any advice? All I want is to extrac= t a list of one user's tags from a flickr (or any other)webpage.Thanks a lo= t,Myriam=0A=0A=0A=0A=0A=0A=0A=0A=0A |