Re: [Htmlparser-user] Doubts about HTML Parsers.
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-04-20 11:07:20
|
The handling of EncodingChangeException is outlined in the FAQ.=0A=0AYou c= an delete the node from the parent's children and then convert the page bac= k to HTML.=0AThe pseudo code is:=0A=0A// get the entire page=0ANodeList lis= t =3D parser.parse (null);=0A=0A// find the node to be deleted=0ANodeList a= dvertisment =3D list.extractAllNodesThatMatch (some_filter);=0A=0A// remove= these nodes from their parent=0Aforeach (Node node in advertisment)=0A{=0A= Node parent =3D node.getParent ();=0A parent.getChildren ().remove (node= );=0A}=0A=0A// reprint the HTML=0ASystem.out.println (list.toHtml ());=0A= =0A----- Original Message ----=0AFrom: Gaurav Pranay <gaurav.pranay1@gmail.= com>=0ATo: htm...@li...=0ASent: Thursday, April 19= , 2007 12:53:54 AM=0ASubject: [Htmlparser-user] Doubts about HTML Parsers.= =0A=0AHello Sir,=0A=0AThanks for your previous replies as they were of imme= nse help to me.I have few more doubts regarding the use of Html Parsers & f= or that i need your help.=0A=0A1) I have a doubt regarding the =0Aorg.htmlp= arser.util.EncodingChangeException. Actually this exception is getting thro= wn by the program whenever some sites carrying a different charcter set pro= bably=0A charset=3DUTF-8=0A=0A=0A .=0ACan I use some tool to get rid of t= hese exception ocuring in the program & can i get the details about the Exc= eptions & where they can occur depending on the use. =0A = =0A=0A2) If I want to clear the advertisement by the Htm= l parser & the advertisement in plain text at the base of the page like:-= =0A =A9 2007 =0ARediff.com India Limited. All Rights Reserved. = =0ADisclaimer | =0AFeedback=0A =0ACan i implement the Parser in such Fashio= n to get rid of these tags OR should i use some sort of Htmlcleaner in this= case in parallel with the HtmlParsers?.=0A=0AAwaiting for your reply.=0ATh= anks in advance.=0A=0A=0AGaurav Pranay.=0A=0A=0A---------------------------= ----------------------------------------------=0AThis SF.net email is spons= ored by DB2 Express=0ADownload DB2 Express C - the FREE version of DB2 expr= ess and take=0Acontrol of your XML. No limits. Just data. Click to get it n= ow.=0Ahttp://sourceforge.net/powerbar/db2/=0A______________________________= _________________=0AHtmlparser-user mailing list=0AH...@li...= urceforge.net=0Ahttps://lists.sourceforge.net/lists/listinfo/htmlparser-use= r=0A=0A=0A=0A=0A |