Re: [Htmlparser-user] Parsing Partial HTML text
Brought to you by:
derrickoswald
From: Madhur K. T. <mad...@gm...> - 2007-09-27 12:57:39
|
That is cool - I've been using HtmlParser for about 2 years now. I've done pretty basic things, though. The deepest I've been in Htmlparser's code is to find out how CompostieTagScanner worked - that was the time when I wanted to remove all the virtual end tags that the parser added to the content. I think the developer community option seems to be more exciting. I'll check on the site for more details on joining the dev side and will revert to you. So what - does the project have a set of induction docs or what? Thanks, Derrick Oswald wrote: > Yes it's possible and allowed... it's open source. > Two obvious choices... > You can join the developer community and have access to subversion, > or you can submit a patch (an overlay of files and changes) to be > incorporated. > Your choice. > > ----- Original Message ---- > From: Madhur Kumar Tanwani <mad...@gm...> > To: htmlparser user list <htm...@li...> > Sent: Thursday, September 27, 2007 8:08:37 AM > Subject: Re: [Htmlparser-user] Parsing Partial HTML text > > :) > So what does it take for an outsider like me - i mean off the developer > list - to work on that? Is that possible? Or allowed? > > Derrick Oswald wrote: > > > > This has been on the Request For Enhancement list for a couple of > > years now. > > RFE #888158 block & inline tag differentiation > > > <http://sourceforge.net/tracker/index.php?func=detail&aid=888158&group_id=24399&atid=381402 > <http://sourceforge.net/tracker/index.php?func=detail&aid=888158&group_id=24399&atid=381402>> > > RFE #1395202 support for well known tags like b,i,u,iframe > > > <http://sourceforge.net/tracker/index.php?func=detail&aid=1395202&group_id=24399&atid=381402 > <http://sourceforge.net/tracker/index.php?func=detail&aid=1395202&group_id=24399&atid=381402>> > > > > Adding the equivalent of composite tags isn't too difficult, but each > > tag has semantics that are more subtle. > > Like for example the MAP and AREA example would need accessors like > > MapTag.getName() and AreaTag.getShape() and AreaTag.getCoords(). > > > > ... it's just a little code > > > > ----- Original Message ---- > > From: Madhur Kumar Tanwani <mad...@gm...> > > To: htmlparser user list <htm...@li...> > > Sent: Thursday, September 27, 2007 12:22:16 AM > > Subject: Re: [Htmlparser-user] Parsing Partial HTML text > > > > Hi Derrick, > > I understand that the parser wraps unknown/unregistered tags in generic > > tag classes. But do you think, with so many normal tags not registered > > with HmlParser, it would be time to get them incorporated in the set of > > tags the node factory registers? I mean, the HTML tags - MAP, AREA, > > TBODY, THEAD etc... are very common in a HTML content and I think it > > would be good to have them registered, so that people can work on it. > > > > Would like to hear your comments on the same. > > Thanks, > > > > Derrick Oswald wrote: > > > > > > The tbody tag you are getting is a generic tag - because the parser > > > doesn't know about tbody. > > > Hence it has no children because it is not a composite node. > > > You can make your own tbody composite node as described here: > > > http://htmlparser.sourceforge.net/faq.html#composite > > > > > > > > > ----- Original Message ---- > > > From: "mic...@no..." <mic...@no...> > > > To: htm...@li... > > > Sent: Wednesday, September 26, 2007 10:30:45 AM > > > Subject: [Htmlparser-user] Parsing Partial HTML text > > > > > > > > > I am having trouble parsing html tagged text. It seems that I can > > > retrieve a node but that element does not have the child nodes as > > > expected. > > > > > > String table = > > > "<tbody>\n" + > > > "<tr>\n" + > > > "<td><span>brain_normal_GSM80627</span></td>\n" + > > > "<td><span>normal</span></td>\n" + > > > "<td><span>cerebral cortex</span></td>\n" + > > > "<td><span>brain</span></td>\n" + > > > "</tr>\n" + > > > "</tbody>\n"; > > > > > > Parser parser = new Parser(new Lexer(table)); > > > try { > > > Node tBodyNode = parser.extractAllNodesThatMatch(new > > > TagNameFilter("tbody")).elementAt(0); > > > System.out.println(tBodyNode.getChildren()); // Prints > > > null <--------------- > > > } catch (ParserException e) { > > > e.printStackTrace(); //To change body of catch statement > > > use File | Settings | File Templates. > > > } > > > > > > Does HTML Parser not handle text input or partial html files well? > > > > > > _________________________ > > > > > > CONFIDENTIALITY NOTICE > > > > > > The information contained in this e-mail message is intended only for > > > the exclusive use of the individual or entity named above and may > > > contain information that is privileged, confidential or exempt from > > > disclosure under applicable law. If the reader of this message is not > > > the intended recipient, or the employee or agent responsible for > > > delivery of the message to the intended recipient, you are hereby > > > notified that any dissemination, distribution or copying of this > > > communication is strictly prohibited. If you have received this > > > communication in error, please notify the sender immediately by e-mail > > > and delete the material from any computer. Thank you. > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by: Microsoft > > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > > ------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > -- > > Madhur Kumar Tanwani <http://madhurtanwani.googlepages.com> > > Gebo <http://feeds.feedburner.com/%7Er/Gebo/%7E6/1> > > > > **************************************************************** > > * Imagine if every Thursday your shoes exploded if you tied them the > > *usual way*. > > This happens to us all the time with *computers*, and nobody thinks of > > complaining. Johnson > > **************************************************************** > > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > -- > Madhur Kumar Tanwani <http://madhurtanwani.googlepages.com> > Gebo <http://feeds.feedburner.com/%7Er/Gebo/%7E6/1> > > **************************************************************** > * Imagine if every Thursday your shoes exploded if you tied them the > *usual way*. > This happens to us all the time with *computers*, and nobody thinks of > complaining. Johnson > **************************************************************** > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- Madhur Kumar Tanwani <http://madhurtanwani.googlepages.com> Gebo <http://feeds.feedburner.com/%7Er/Gebo/%7E6/1> **************************************************************** * Imagine if every Thursday your shoes exploded if you tied them the *usual way*. This happens to us all the time with *computers*, and nobody thinks of complaining. Johnson **************************************************************** |