Re: [Htmlparser-user] Doubts about HTML Parsers.
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-04-27 01:58:44
|
See http://htmlparser.sourceforge.net/faq.html#composite to find out how to= define your own tags.=0A=0A----- Original Message ----=0AFrom: Gaurav Pran= ay <gau...@gm...>=0ATo: htmlparser user list <htmlparser-user@l= ists.sourceforge.net>=0ASent: Thursday, April 26, 2007 6:17:17 AM=0ASubject= : Re: [Htmlparser-user] Doubts about HTML Parsers.=0A=0AHi Derrick,=0A=0AI = would like to know whether i can define tags of my type which i specifiacll= y want to repalace in the Html content.ie. like ScriptTag, LinkTag etc.=0AB= ecause the =0A<iframe> sort of tags come in the filtered Htmlcontent.How co= uld i replace all of them by defining the filter of that sort. Please give = some relevant codes for this problem.=0A=0AThanks=0A=0A=0AOn 4/20/07, Derr= ick Oswald <der...@ro...> wrote:=0AThe handling of EncodingCha= ngeException is outlined in the =0AFAQ.=0A=0AYou can delete the node from t= he parent's children and then convert the page back to HTML.=0A=0AThe pseud= o code is:=0A=0A// get the entire page=0ANodeList list =3D parser.parse (nu= ll);=0A=0A// find the node to be deleted=0ANodeList advertisment =3D list.e= xtractAllNodesThatMatch (some_filter);=0A=0A// remove these nodes from thei= r parent=0A=0Aforeach (Node node in advertisment)=0A{=0A Node parent =3D n= ode.getParent ();=0A parent.getChildren ().remove (node);=0A}=0A=0A// repr= int the HTML=0ASystem.out.println (list.toHtml ());=0A=0A=0A----- Original = Message ----=0AFrom: Gaurav Pranay <gau...@gm...=0A>=0ATo: html= par...@li...=0ASent: Thursday, April 19, 2007 12:53:54= AM=0A=0ASubject: [Htmlparser-user] Doubts about HTML Parsers.=0A=0AHello S= ir,=0A=0AThanks for your previous replies as they were of immense help to m= e.I have few more doubts regarding the use of Html Parsers & for that i nee= d your help.=0A=0A=0A1) I have a doubt regarding the =0Aorg.htmlparser.util= .EncodingChangeException. Actually this exception is getting thrown by the = program whenever some sites carrying a different charcter set probably=0A c= harset=3DUTF-8=0A=0A=0A=0A .=0ACan I use some tool to get rid of these ex= ception ocuring in the program & can i get the details about the Exceptions= & where they can occur depending on the use. =0A = =0A=0A2) If I want to clear the advertisement by the Html parse= r & the advertisement in plain text at the base of the page like:-=0A = =A9 2007 =0A=0ARediff.com India Limited. All Rights Reserved. =0ADisc= laimer | =0A=0AFeedback=0A =0ACan i implement the Parser in such Fashion to= get rid of these tags OR should i use some sort of Htmlcleaner in this cas= e in parallel with the HtmlParsers?.=0A=0AAwaiting for your reply.=0AThanks= in advance.=0A=0A=0A=0AGaurav Pranay.=0A=0A=0A=0A-------------------------= ------------------------------------------------=0AThis SF.net email is spo= nsored by DB2 Express=0ADownload DB2 Express C - the FREE version of DB2 ex= press and take=0Acontrol of your XML. No limits. Just data. Click to get it= now.=0A=0Ahttp://sourceforge.net/powerbar/db2/=0A_________________________= ______________________=0A=0AHtmlparser-user mailing list=0AHtmlparser-user@= lists.sourceforge.net=0A=0Ahttps://lists.sourceforge.net/lists/listinfo/htm= lparser-user=0A=0A=0A=0A=0A=0A=0A=0A---------------------------------------= ----------------------------------=0AThis SF.net email is sponsored by DB2 = Express=0A=0ADownload DB2 Express C - the FREE version of DB2 express and t= ake=0Acontrol of your XML. No limits. Just data. Click to get it now.=0A=0A= http://sourceforge.net/powerbar/db2/=0A____________________________________= ___________=0AHtmlparser-user mailing list=0A=0AH...@li...urc= eforge.net=0Ahttps://lists.sourceforge.net/lists/listinfo/htmlparser-user= =0A=0A=0A=0A=0A=0A---------------------------------------------------------= ----------------=0AThis SF.net email is sponsored by DB2 Express=0ADownload= DB2 Express C - the FREE version of DB2 express and take=0Acontrol of your= XML. No limits. Just data. Click to get it now.=0Ahttp://sourceforge.net/p= owerbar/db2/=0A_______________________________________________=0AHtmlparser= -user mailing list=0AH...@li...=0Ahttps://lists.= sourceforge.net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |