Re: [Htmlparser-user] Parsing Partial HTML text
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-09-27 12:27:24
|
Yes it's possible and allowed... it's open source.=0ATwo obvious choices...= =0AYou can join the developer community and have access to subversion,=0Aor= you can submit a patch (an overlay of files and changes) to be incorporate= d.=0AYour choice.=0A=0A----- Original Message ----=0AFrom: Madhur Kumar Tan= wani <mad...@gm...>=0ATo: htmlparser user list <htmlparser-user@= lists.sourceforge.net>=0ASent: Thursday, September 27, 2007 8:08:37 AM=0ASu= bject: Re: [Htmlparser-user] Parsing Partial HTML text=0A=0A:)=0ASo what do= es it take for an outsider like me - i mean off the developer =0Alist - to = work on that? Is that possible? Or allowed?=0A=0ADerrick Oswald wrote:=0A>= =0A> This has been on the Request For Enhancement list for a couple of =0A>= years now.=0A> RFE #888158 block & inline tag differentiation =0A> <http:/= /sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D888158&group_id=3D24= 399&atid=3D381402>=0A> RFE #1395202 support for well known tags like b,i,u,= iframe =0A> <http://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1= 395202&group_id=3D24399&atid=3D381402>=0A>=0A> Adding the equivalent of com= posite tags isn't too difficult, but each =0A> tag has semantics that are m= ore subtle.=0A> Like for example the MAP and AREA example would need access= ors like =0A> MapTag.getName() and AreaTag.getShape() and AreaTag.getCoords= ().=0A>=0A> ... it's just a little code=0A>=0A> ----- Original Message ----= =0A> From: Madhur Kumar Tanwani <mad...@gm...>=0A> To: htmlparse= r user list <htm...@li...>=0A> Sent: Thursday, Sep= tember 27, 2007 12:22:16 AM=0A> Subject: Re: [Htmlparser-user] Parsing Part= ial HTML text=0A>=0A> Hi Derrick,=0A> I understand that the parser wraps un= known/unregistered tags in generic=0A> tag classes. But do you think, with = so many normal tags not registered=0A> with HmlParser, it would be time to = get them incorporated in the set of=0A> tags the node factory registers? I = mean, the HTML tags - MAP, AREA,=0A> TBODY, THEAD etc... are very common in= a HTML content and I think it=0A> would be good to have them registered, s= o that people can work on it.=0A>=0A> Would like to hear your comments on t= he same.=0A> Thanks,=0A>=0A> Derrick Oswald wrote:=0A> >=0A> > The tbody ta= g you are getting is a generic tag - because the parser=0A> > doesn't know = about tbody.=0A> > Hence it has no children because it is not a composite n= ode.=0A> > You can make your own tbody composite node as described here:=0A= > > http://htmlparser.sourceforge.net/faq.html#composite=0A> >=0A> >=0A> > = ----- Original Message ----=0A> > From: "mic...@no..." <mich= ael...@no...>=0A> > To: htm...@li...=0A= > > Sent: Wednesday, September 26, 2007 10:30:45 AM=0A> > Subject: [Htmlpar= ser-user] Parsing Partial HTML text=0A> >=0A> >=0A> > I am having trouble p= arsing html tagged text. It seems that I can=0A> > retrieve a node but that= element does not have the child nodes as=0A> > expected.=0A> >=0A> > = String table =3D=0A> > "<tbody>\n" +=0A> > = "<tr>\n" +=0A> > "<td><span>brain_normal_GSM80627</span><= /td>\n" +=0A> > "<td><span>normal</span></td>\n" +=0A> > = "<td><span>cerebral cortex</span></td>\n" +=0A> > = "<td><span>brain</span></td>\n" +=0A> > "</tr>\n" +=0A= > > "</tbody>\n";=0A> >=0A> > Parser parser =3D new= Parser(new Lexer(table));=0A> > try {=0A> > Node tBody= Node =3D parser.extractAllNodesThatMatch(new=0A> > TagNameFilter("tbody")).= elementAt(0);=0A> > System.out.println(tBodyNode.getChildren())= ; // Prints=0A> > null <---------------=0A> > } catch (ParserExcep= tion e) {=0A> > e.printStackTrace(); //To change body of catch= statement=0A> > use File | Settings | File Templates.=0A> > }=0A> = >=0A> > Does HTML Parser not handle text input or partial html files well?= =0A> >=0A> > _________________________=0A> >=0A> > CONFIDENTIALITY NOTICE= =0A> >=0A> > The information contained in this e-mail message is intended o= nly for=0A> > the exclusive use of the individual or entity named above and= may=0A> > contain information that is privileged, confidential or exempt f= rom=0A> > disclosure under applicable law. If the reader of this message is= not=0A> > the intended recipient, or the employee or agent responsible for= =0A> > delivery of the message to the intended recipient, you are hereby=0A= > > notified that any dissemination, distribution or copying of this=0A> > = communication is strictly prohibited. If you have received this=0A> > commu= nication in error, please notify the sender immediately by e-mail=0A> > and= delete the material from any computer. Thank you.=0A> >=0A> > -----------= -------------------------------------------------------------=0A> >=0A> > = =0A> ----------------------------------------------------------------------= ---=0A> > This SF.net email is sponsored by: Microsoft=0A> > Defy all chall= enges. Microsoft(R) Visual Studio 2005.=0A> > http://clk.atdmt.com/MRT/go/v= se0120000070mrt/direct/01/=0A> > ------------------------------------------= ------------------------------=0A> >=0A> > ________________________________= _______________=0A> > Htmlparser-user mailing list=0A> > Htmlparser-user@li= sts.sourceforge.net=0A> > https://lists.sourceforge.net/lists/listinfo/html= parser-user=0A> > =0A>=0A>=0A> -- =0A> Madhur Kumar Tanwani <http://madhur= tanwani.googlepages.com>=0A> Gebo <http://feeds.feedburner.com/%7Er/Gebo/%7= E6/1>=0A>=0A> *************************************************************= ***=0A> * Imagine if every Thursday your shoes exploded if you tied them th= e =0A> *usual way*.=0A> This happens to us all the time with *computers*, a= nd nobody thinks of =0A> complaining. Johnson=0A> *************************= ***************************************=0A>=0A>=0A> -----------------------= --------------------------------------------------=0A> This SF.net email is= sponsored by: Microsoft=0A> Defy all challenges. Microsoft(R) Visual Studi= o 2005.=0A> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/=0A> ___= ____________________________________________=0A> Htmlparser-user mailing li= st=0A> Htm...@li...=0A> https://lists.sourceforge.= net/lists/listinfo/htmlparser-user=0A>=0A> --------------------------------= ----------------------------------------=0A>=0A> --------------------------= -----------------------------------------------=0A> This SF.net email is sp= onsored by: Microsoft=0A> Defy all challenges. Microsoft(R) Visual Studio 2= 005.=0A> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/=0A> ------= ------------------------------------------------------------------=0A>=0A> = _______________________________________________=0A> Htmlparser-user mailing= list=0A> Htm...@li...=0A> https://lists.sourcefor= ge.net/lists/listinfo/htmlparser-user=0A> =0A=0A=0A-- =0AMadhur Kumar Tan= wani <http://madhurtanwani.googlepages.com>=0AGebo <http://feeds.feedburner= .com/%7Er/Gebo/%7E6/1>=0A=0A***********************************************= *****************=0A* Imagine if every Thursday your shoes exploded if you = tied them the *usual way*.=0AThis happens to us all the time with *computer= s*, and nobody thinks of complaining. Johnson=0A***************************= *************************************=0A=0A=0A-----------------------------= --------------------------------------------=0AThis SF.net email is sponsor= ed by: Microsoft=0ADefy all challenges. Microsoft(R) Visual Studio 2005.=0A= http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/=0A_________________= ______________________________=0AHtmlparser-user mailing list=0AHtmlparser-= us...@li...=0Ahttps://lists.sourceforge.net/lists/listinfo/h= tmlparser-user=0A=0A=0A=0A=0A |