Re: [Htmlparser-user] Doubts about HTML Parsers.
Brought to you by:
derrickoswald
From: Gaurav P. <gau...@gm...> - 2007-04-26 10:17:30
|
Hi Derrick, I would like to know whether i can define tags of my type which i specifiaclly want to repalace in the Html content.ie. like ScriptTag, LinkTag etc. Because the <iframe> sort of tags come in the filtered Htmlcontent.How could i replace all of them by defining the filter of that sort. Please give some relevant codes for this problem. Thanks On 4/20/07, Derrick Oswald <der...@ro...> wrote: > > The handling of EncodingChangeException is outlined in the FAQ<http://htmlparser.sourceforge.net/faq.html> > . > > You can delete the node from the parent's children and then convert the > page back to HTML. > The pseudo code is: > > // get the entire page > NodeList list = parser.parse (null); > > // find the node to be deleted > NodeList advertisment = list.extractAllNodesThatMatch (some_filter); > > // remove these nodes from their parent > foreach (Node node in advertisment) > { > Node parent = node.getParent (); > parent.getChildren ().remove (node); > } > > // reprint the HTML > System.out.println (list.toHtml ()); > > ----- Original Message ---- > From: Gaurav Pranay <gau...@gm...> > To: htm...@li... > Sent: Thursday, April 19, 2007 12:53:54 AM > Subject: [Htmlparser-user] Doubts about HTML Parsers. > > Hello Sir, > > Thanks for your previous replies as they were of immense help to me.I have > few more doubts regarding the use of Html Parsers & for that i need your > help. > > 1) I have a doubt regarding the > org.htmlparser.util.EncodingChangeException. Actually this exception is > getting thrown by the program whenever some sites carrying a different > charcter set probably charset=UTF-8 . > Can I use some tool to get rid of these exception ocuring in the program & > can i get the details about the Exceptions & where they can occur depending > on the use. > > 2) If I want to clear the advertisement by the Html parser & the > advertisement in plain text at the base of the page like:- > (c) 2007 Rediff.com India Limited. All Rights Reserved. * > Disclaimer* <http://www.rediff.com/disclaim.htm> | *Feedback*<http://support.rediff.com/> > Can i implement the Parser in such Fashion to get rid of these tags OR > should i use some sort of Htmlcleaner in this case in parallel with the > HtmlParsers?. > > Awaiting for your reply. > Thanks in advance. > > Gaurav Pranay. > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |