Re: [Htmlparser-user] Doubts about HTML Parsers.
Brought to you by:
derrickoswald
|
From: Derrick O. <der...@ro...> - 2007-04-20 11:07:20
|
The handling of EncodingChangeException is outlined in the FAQ.=0A=0AYou c=
an delete the node from the parent's children and then convert the page bac=
k to HTML.=0AThe pseudo code is:=0A=0A// get the entire page=0ANodeList lis=
t =3D parser.parse (null);=0A=0A// find the node to be deleted=0ANodeList a=
dvertisment =3D list.extractAllNodesThatMatch (some_filter);=0A=0A// remove=
these nodes from their parent=0Aforeach (Node node in advertisment)=0A{=0A=
Node parent =3D node.getParent ();=0A parent.getChildren ().remove (node=
);=0A}=0A=0A// reprint the HTML=0ASystem.out.println (list.toHtml ());=0A=
=0A----- Original Message ----=0AFrom: Gaurav Pranay <gaurav.pranay1@gmail.=
com>=0ATo: htm...@li...=0ASent: Thursday, April 19=
, 2007 12:53:54 AM=0ASubject: [Htmlparser-user] Doubts about HTML Parsers.=
=0A=0AHello Sir,=0A=0AThanks for your previous replies as they were of imme=
nse help to me.I have few more doubts regarding the use of Html Parsers & f=
or that i need your help.=0A=0A1) I have a doubt regarding the =0Aorg.htmlp=
arser.util.EncodingChangeException. Actually this exception is getting thro=
wn by the program whenever some sites carrying a different charcter set pro=
bably=0A charset=3DUTF-8=0A=0A=0A .=0ACan I use some tool to get rid of t=
hese exception ocuring in the program & can i get the details about the Exc=
eptions & where they can occur depending on the use. =0A =
=0A=0A2) If I want to clear the advertisement by the Htm=
l parser & the advertisement in plain text at the base of the page like:-=
=0A =A9 2007 =0ARediff.com India Limited. All Rights Reserved. =
=0ADisclaimer | =0AFeedback=0A =0ACan i implement the Parser in such Fashio=
n to get rid of these tags OR should i use some sort of Htmlcleaner in this=
case in parallel with the HtmlParsers?.=0A=0AAwaiting for your reply.=0ATh=
anks in advance.=0A=0A=0AGaurav Pranay.=0A=0A=0A---------------------------=
----------------------------------------------=0AThis SF.net email is spons=
ored by DB2 Express=0ADownload DB2 Express C - the FREE version of DB2 expr=
ess and take=0Acontrol of your XML. No limits. Just data. Click to get it n=
ow.=0Ahttp://sourceforge.net/powerbar/db2/=0A______________________________=
_________________=0AHtmlparser-user mailing lis...@li...=
urceforge.net=0Ahttps://lists.sourceforge.net/lists/listinfo/htmlparser-use=
r=0A=0A=0A=0A=0A |