Re: [Htmlparser-user] Parsing Partial HTML text
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-09-27 12:03:36
|
=0AThis has been on the Request For Enhancement list for a couple of years = now.=0ARFE #888158 block & inline tag differentiation=0ARFE #1395202 suppor= t for well known tags like b,i,u,iframe=0A=0AAdding the equivalent of compo= site tags isn't too difficult, but each tag has semantics that are more sub= tle.=0ALike for example the MAP and AREA example would need accessors like = MapTag.getName() and AreaTag.getShape() and AreaTag.getCoords().=0A=0A... i= t's just a little code =0A=0A----- Original Message ----=0AFrom: Madhur Kum= ar Tanwani <mad...@gm...>=0ATo: htmlparser user list <htmlparser= -u...@li...>=0ASent: Thursday, September 27, 2007 12:22:16 = AM=0ASubject: Re: [Htmlparser-user] Parsing Partial HTML text=0A=0AHi Derri= ck,=0AI understand that the parser wraps unknown/unregistered tags in gener= ic =0Atag classes. But do you think, with so many normal tags not registere= d =0Awith HmlParser, it would be time to get them incorporated in the set o= f =0Atags the node factory registers? I mean, the HTML tags - MAP, AREA, = =0ATBODY, THEAD etc... are very common in a HTML content and I think it =0A= would be good to have them registered, so that people can work on it.=0A=0A= Would like to hear your comments on the same.=0AThanks,=0A=0ADerrick Oswald= wrote:=0A>=0A> The tbody tag you are getting is a generic tag - because th= e parser =0A> doesn't know about tbody.=0A> Hence it has no children becaus= e it is not a composite node.=0A> You can make your own tbody composite nod= e as described here: =0A> http://htmlparser.sourceforge.net/faq.html#compos= ite=0A>=0A>=0A> ----- Original Message ----=0A> From: "michaeld.jones@novar= tis.com" <mic...@no...>=0A> To: htm...@li...urce= forge.net=0A> Sent: Wednesday, September 26, 2007 10:30:45 AM=0A> Subject: = [Htmlparser-user] Parsing Partial HTML text=0A>=0A>=0A> I am having trouble= parsing html tagged text. It seems that I can =0A> retrieve a node but tha= t element does not have the child nodes as =0A> expected.=0A>=0A> St= ring table =3D=0A> "<tbody>\n" +=0A> "<tr>\= n" +=0A> "<td><span>brain_normal_GSM80627</span></td>\n" += =0A> "<td><span>normal</span></td>\n" +=0A> = "<td><span>cerebral cortex</span></td>\n" +=0A> "<td><span= >brain</span></td>\n" +=0A> "</tr>\n" +=0A> = "</tbody>\n";=0A>=0A> Parser parser =3D new Parser(new Lexer(table= ));=0A> try {=0A> Node tBodyNode =3D parser.extractAllN= odesThatMatch(new =0A> TagNameFilter("tbody")).elementAt(0);=0A> = System.out.println(tBodyNode.getChildren()); // Prints =0A> null <------= ---------=0A> } catch (ParserException e) {=0A> e.print= StackTrace(); //To change body of catch statement =0A> use File | Settings= | File Templates.=0A> }=0A>=0A> Does HTML Parser not handle text i= nput or partial html files well?=0A>=0A> _________________________=0A>=0A> = CONFIDENTIALITY NOTICE=0A>=0A> The information contained in this e-mail mes= sage is intended only for =0A> the exclusive use of the individual or entit= y named above and may =0A> contain information that is privileged, confiden= tial or exempt from =0A> disclosure under applicable law. If the reader of = this message is not =0A> the intended recipient, or the employee or agent r= esponsible for =0A> delivery of the message to the intended recipient, you = are hereby =0A> notified that any dissemination, distribution or copying of= this =0A> communication is strictly prohibited. If you have received this = =0A> communication in error, please notify the sender immediately by e-mail= =0A> and delete the material from any computer. Thank you.=0A>=0A> ------= ------------------------------------------------------------------=0A>=0A> = -------------------------------------------------------------------------= =0A> This SF.net email is sponsored by: Microsoft=0A> Defy all challenges. = Microsoft(R) Visual Studio 2005.=0A> http://clk.atdmt.com/MRT/go/vse0120000= 070mrt/direct/01/=0A> -----------------------------------------------------= -------------------=0A>=0A> _______________________________________________= =0A> Htmlparser-user mailing list=0A> Htm...@li...= =0A> https://lists.sourceforge.net/lists/listinfo/htmlparser-user=0A> =0A= =0A=0A-- =0AMadhur Kumar Tanwani <http://madhurtanwani.googlepages.com>=0AG= ebo <http://feeds.feedburner.com/%7Er/Gebo/%7E6/1>=0A=0A*******************= *********************************************=0A* Imagine if every Thursday= your shoes exploded if you tied them the *usual way*.=0AThis happens to us= all the time with *computers*, and nobody thinks of complaining. Johnson= =0A****************************************************************=0A=0A= =0A------------------------------------------------------------------------= -=0AThis SF.net email is sponsored by: Microsoft=0ADefy all challenges. Mic= rosoft(R) Visual Studio 2005.=0Ahttp://clk.atdmt.com/MRT/go/vse0120000070mr= t/direct/01/=0A_______________________________________________=0AHtmlparser= -user mailing list=0AH...@li...=0Ahttps://lists.= sourceforge.net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |