Re: [Htmlparser-user] Parsing Partial HTML text
Brought to you by:
derrickoswald
From: Madhur K. T. <mad...@gm...> - 2007-09-27 04:20:12
|
Hi Derrick, I understand that the parser wraps unknown/unregistered tags in generic tag classes. But do you think, with so many normal tags not registered with HmlParser, it would be time to get them incorporated in the set of tags the node factory registers? I mean, the HTML tags - MAP, AREA, TBODY, THEAD etc... are very common in a HTML content and I think it would be good to have them registered, so that people can work on it. Would like to hear your comments on the same. Thanks, Derrick Oswald wrote: > > The tbody tag you are getting is a generic tag - because the parser > doesn't know about tbody. > Hence it has no children because it is not a composite node. > You can make your own tbody composite node as described here: > http://htmlparser.sourceforge.net/faq.html#composite > > > ----- Original Message ---- > From: "mic...@no..." <mic...@no...> > To: htm...@li... > Sent: Wednesday, September 26, 2007 10:30:45 AM > Subject: [Htmlparser-user] Parsing Partial HTML text > > > I am having trouble parsing html tagged text. It seems that I can > retrieve a node but that element does not have the child nodes as > expected. > > String table = > "<tbody>\n" + > "<tr>\n" + > "<td><span>brain_normal_GSM80627</span></td>\n" + > "<td><span>normal</span></td>\n" + > "<td><span>cerebral cortex</span></td>\n" + > "<td><span>brain</span></td>\n" + > "</tr>\n" + > "</tbody>\n"; > > Parser parser = new Parser(new Lexer(table)); > try { > Node tBodyNode = parser.extractAllNodesThatMatch(new > TagNameFilter("tbody")).elementAt(0); > System.out.println(tBodyNode.getChildren()); // Prints > null <--------------- > } catch (ParserException e) { > e.printStackTrace(); //To change body of catch statement > use File | Settings | File Templates. > } > > Does HTML Parser not handle text input or partial html files well? > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for > the exclusive use of the individual or entity named above and may > contain information that is privileged, confidential or exempt from > disclosure under applicable law. If the reader of this message is not > the intended recipient, or the employee or agent responsible for > delivery of the message to the intended recipient, you are hereby > notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this > communication in error, please notify the sender immediately by e-mail > and delete the material from any computer. Thank you. > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- Madhur Kumar Tanwani <http://madhurtanwani.googlepages.com> Gebo <http://feeds.feedburner.com/%7Er/Gebo/%7E6/1> **************************************************************** * Imagine if every Thursday your shoes exploded if you tied them the *usual way*. This happens to us all the time with *computers*, and nobody thinks of complaining. Johnson **************************************************************** |