If by headlines you mean headings (H1, H2 etc.) then yes, you should be able to create a NodeClassFilter looking for HeadingTag objects.
If I remember correctly how it is used...
NodeList list = parser.parse (new NodeClassFilter (HeadingTag.class));
----- Original Message ----
From: answers solutions <fas...@gm...>
To: Htm...@li...
Sent: Wednesday, June 25, 2008 5:58:20 AM
Subject: [Htmlparser-user] how to extract headlines using htmlparser
hi
I am presently using htmlparser to extract all the anchor tags in webpage .
but i want to extract only the headlines in webpage . is there any way i can identify the headlines in a webapge and extract them with the help of parser.
thanks in advance |