Re: [Htmlparser-user] extract flickr tags
Brought to you by:
derrickoswald
|
From: Derrick O. <der...@ro...> - 2007-10-19 02:15:05
|
Once you have the paragraph tag (assuming it is defined to be a composite t=
ag) you should be able to pass the NodeList from getChildren() through a fi=
lter via extractAllNodesThatMatch(new TagNameFilter("A")) to extract all th=
e link nodes. Then process the links from the resulting list and extract th=
eir text using getLinkText().=0A=0A----- Original Message ----=0AFrom: M Me=
ncke <me...@gm...>=0ATo: htm...@li...=0ASent=
: Thursday, October 18, 2007 9:40:48 AM=0ASubject: [Htmlparser-user] extrac=
t flickr tags=0A=0AHello, I have installed and ran the html parser and I ca=
n get it to extract all of the text from the following web page:=0A=0Ahttp:=
//www.flickr.com/photos/mariposa-de-amor/tags/=0A=0A=0A(mariposa-de-amor is=
just an example, this could be any user name).=0A=0ABut I want to extract =
ONLY the 150 most popular tags. In other words, I want to output a plain te=
xt file with the users 150 most popular tags, like this:=0A=0A=0A35faves a =
aberdeen abigfave anawesomeshot aplusphoto athousandwords avianexcellence b=
aby band beach beautiful bedouin belis birds blogthis blueribbonwinner brav=
o bridge bw castle child children church cindrel city clouds clova cluj col=
ourartaward concert copii criket ..........................................=
...=0A=0A=0A=0AThe page source is like this:=0A=0A =
<p id=3D"TagCloud"=0A>=0A=09=09=09=09=09 <a href=3D"/photos/marip=
osa-de-amor/tags/35faves/" =0Astyle=3D"font-size: 14px;">35faves</a> =
=0A=09=09=09=09=09&=0Anbsp;<a href=3D"/photos/mariposa-de-amor/tags/a/" sty=
le=3D=0A"font-size: 14px;">a</a> =0A=09=09=09=09=09 <a=0A href=3D=
"/photos/mariposa-de-amor/tags/aberdeen/" style=3D"font-size: 15px;">aberde=
en</=0Aa> =0A </p>=0A SO I=
want only the words 35faves, aberdeen, .........=0A=0AI could try to get t=
he text inside <p id=3D"TagCloud"> and </p>, but that is not effective beca=
use I extract " <a href............ " and I don't want this additional=
text in my output.=0A=0A=0Aand I have tried this using Trimtags and in any=
case, it doesnt work for me. Does anybody know of an existing method with =
which I could do this, or can you offer any advice? All I want is to extrac=
t a list of one user's tags from a flickr (or any other)webpage.Thanks a lo=
t,Myriam=0A=0A=0A=0A=0A=0A=0A=0A=0A |