Re: [Htmlparser-user] Parsing Partial HTML text
Brought to you by:
derrickoswald
From: Madhur K. T. <mad...@gm...> - 2007-09-27 12:06:34
|
:) So what does it take for an outsider like me - i mean off the developer list - to work on that? Is that possible? Or allowed? Derrick Oswald wrote: > > This has been on the Request For Enhancement list for a couple of > years now. > RFE #888158 block & inline tag differentiation > <http://sourceforge.net/tracker/index.php?func=detail&aid=888158&group_id=24399&atid=381402> > RFE #1395202 support for well known tags like b,i,u,iframe > <http://sourceforge.net/tracker/index.php?func=detail&aid=1395202&group_id=24399&atid=381402> > > Adding the equivalent of composite tags isn't too difficult, but each > tag has semantics that are more subtle. > Like for example the MAP and AREA example would need accessors like > MapTag.getName() and AreaTag.getShape() and AreaTag.getCoords(). > > ... it's just a little code > > ----- Original Message ---- > From: Madhur Kumar Tanwani <mad...@gm...> > To: htmlparser user list <htm...@li...> > Sent: Thursday, September 27, 2007 12:22:16 AM > Subject: Re: [Htmlparser-user] Parsing Partial HTML text > > Hi Derrick, > I understand that the parser wraps unknown/unregistered tags in generic > tag classes. But do you think, with so many normal tags not registered > with HmlParser, it would be time to get them incorporated in the set of > tags the node factory registers? I mean, the HTML tags - MAP, AREA, > TBODY, THEAD etc... are very common in a HTML content and I think it > would be good to have them registered, so that people can work on it. > > Would like to hear your comments on the same. > Thanks, > > Derrick Oswald wrote: > > > > The tbody tag you are getting is a generic tag - because the parser > > doesn't know about tbody. > > Hence it has no children because it is not a composite node. > > You can make your own tbody composite node as described here: > > http://htmlparser.sourceforge.net/faq.html#composite > > > > > > ----- Original Message ---- > > From: "mic...@no..." <mic...@no...> > > To: htm...@li... > > Sent: Wednesday, September 26, 2007 10:30:45 AM > > Subject: [Htmlparser-user] Parsing Partial HTML text > > > > > > I am having trouble parsing html tagged text. It seems that I can > > retrieve a node but that element does not have the child nodes as > > expected. > > > > String table = > > "<tbody>\n" + > > "<tr>\n" + > > "<td><span>brain_normal_GSM80627</span></td>\n" + > > "<td><span>normal</span></td>\n" + > > "<td><span>cerebral cortex</span></td>\n" + > > "<td><span>brain</span></td>\n" + > > "</tr>\n" + > > "</tbody>\n"; > > > > Parser parser = new Parser(new Lexer(table)); > > try { > > Node tBodyNode = parser.extractAllNodesThatMatch(new > > TagNameFilter("tbody")).elementAt(0); > > System.out.println(tBodyNode.getChildren()); // Prints > > null <--------------- > > } catch (ParserException e) { > > e.printStackTrace(); //To change body of catch statement > > use File | Settings | File Templates. > > } > > > > Does HTML Parser not handle text input or partial html files well? > > > > _________________________ > > > > CONFIDENTIALITY NOTICE > > > > The information contained in this e-mail message is intended only for > > the exclusive use of the individual or entity named above and may > > contain information that is privileged, confidential or exempt from > > disclosure under applicable law. If the reader of this message is not > > the intended recipient, or the employee or agent responsible for > > delivery of the message to the intended recipient, you are hereby > > notified that any dissemination, distribution or copying of this > > communication is strictly prohibited. If you have received this > > communication in error, please notify the sender immediately by e-mail > > and delete the material from any computer. Thank you. > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > -- > Madhur Kumar Tanwani <http://madhurtanwani.googlepages.com> > Gebo <http://feeds.feedburner.com/%7Er/Gebo/%7E6/1> > > **************************************************************** > * Imagine if every Thursday your shoes exploded if you tied them the > *usual way*. > This happens to us all the time with *computers*, and nobody thinks of > complaining. Johnson > **************************************************************** > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- Madhur Kumar Tanwani <http://madhurtanwani.googlepages.com> Gebo <http://feeds.feedburner.com/%7Er/Gebo/%7E6/1> **************************************************************** * Imagine if every Thursday your shoes exploded if you tied them the *usual way*. This happens to us all the time with *computers*, and nobody thinks of complaining. Johnson **************************************************************** |