Thread: Re: [Htmlparser-user] Trouble overriding textnodes using setTextPrototype
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2008-04-03 01:38:16
|
Just add an empty string in the constructor call: factory.setTextPrototype (new TextNode ("") { ----- Original Message ---- From: Jay Prall <jp...@se...> To: htm...@li... Sent: Wednesday, April 2, 2008 3:46:14 PM Subject: [Htmlparser-user] Trouble overriding textnodes using setTextPrototype Trouble overriding textnodes using setTextPrototype // This code doesn't compile. It complains "The constructor TextNode() is undefined". I got this in the documentation and thought it was a way to override textnodes? // My goal is to override TextNode so that I can process text and turn http://link.com into a real link <a href="link.com">link.com</a> // Any ideas? import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.Text; import org.htmlparser.lexer.Page; import org.htmlparser.tags.LinkTag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.*; import org.htmlparser.nodes.TextNode; public static void main (String[] args) throws ParserException { String html = "<html><body>\n<script>alert('hi');</script><select id=\"da\"></select>" + "http://googlelink.com\">123456 " + "<h1>hello</h1><a href=cnn.com></a>\n" + "http://google.com</br>" + "<b>https://cnn.com/?test=3&2=d</b></p>\n" + "<table><tr><td>http://table.com</td></tr></table>" + "<a href=\"a.html\">123</a>\n<a href=\"http://www.alreadylinkified.com/\">http://www.alreadylinkified.com</a>\n</body></html>"; PrototypicalNodeFactory factory = new PrototypicalNodeFactory(); factory.setTextPrototype (new TextNode () { public String toPlainTextString() { return (org.htmlparser.util.Translate.decode (super.toPlainTextString ())); } }); Parser parser = new Parser(html); parser.setNodeFactory(factory); NodeList all = parser.parse(null); System.out.println( all.toHtml()); } } -----Inline Attachment Follows----- ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace -----Inline Attachment Follows----- _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Derrick O. <der...@ro...> - 2008-04-15 01:06:41
|
Try this. import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.Text; import org.htmlparser.lexer.Page; import org.htmlparser.tags.LinkTag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.*; import org.htmlparser.nodes.TextNode; class MyText extends TextNode { public MyText () { super (null, 0, 0); mText = null; } public String toPlainTextString() { return (org.htmlparser.util.Translate.decode (super.toPlainTextString ())); } } public class Test { public static void main (String[] args) throws ParserException { String html = "<html><body>\n<script>alert('hi');</script><select id=\"da\"></select>" + "http://googlelink.com\">123456 " + "<h1>hello</h1><a href=cnn.com></a>\n" + "http://google.com " + "https://cnn.com/?test=3&2=d \n" + "http://table.com" + "123\nhttp://www.alreadylinkified.com/\">http://www.alreadylinkified.com\n"; PrototypicalNodeFactory factory = new PrototypicalNodeFactory(); factory.setTextPrototype (new MyText ()); Parser parser = new Parser(html); parser.setNodeFactory(factory); NodeList all = parser.parse(null); System.out.println( all.toHtml()); } } ----- Original Message ---- From: Jay Prall <jp...@se...> To: htmlparser user list <htm...@li...> Sent: Wednesday, April 9, 2008 11:33:43 AM Subject: Re: [Htmlparser-user] Trouble overriding textnodes using setTextPrototype Re: [Htmlparser-user] Trouble overriding textnodes using setTextPrototype Derrick, When I do this all text in the text nodes is removed. How can I subclass TextNode? Thanks Jay On 4/2/08 9:38 PM, "Derrick Oswald" <der...@ro...> wrote: Just add an empty string in the constructor call: factory.setTextPrototype (new TextNode ("") { ----- Original Message ---- From: Jay Prall <jp...@se...> To: htm...@li... Sent: Wednesday, April 2, 2008 3:46:14 PM Subject: [Htmlparser-user] Trouble overriding textnodes using setTextPrototype Trouble overriding textnodes using setTextPrototype // This code doesn't compile. It complains "The constructor TextNode() is undefined". I got this in the documentation and thought it was a way to override textnodes? // My goal is to override TextNode so that I can process text and turn http://link.com into a real link <a href="link.com">link.com</a> // Any ideas? import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.Text; import org.htmlparser.lexer.Page; import org.htmlparser.tags.LinkTag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.*; import org.htmlparser.nodes.TextNode; public static void main (String[] args) throws ParserException { String html = "<html><body>\n<script>alert('hi');</script><select id=\"da\"></select>" + " http://googlelink.com\">123456 <%5C%22%3Ca> " + "<h1>hello</h1><a href=cnn.com></a>\n" + "http://google.com</br>" + "<b>https://cnn.com/?test=3&2=d</b></p>\n" + "<table><tr><td>http://table.com</td></tr></table>" + "<a href=\"a.html\">123</a>\n<a href=\"http://www.alreadylinkified.com/\">http://www.alreadylinkified.com</a>\n</body></html>"; PrototypicalNodeFactory factory = new PrototypicalNodeFactory(); factory.setTextPrototype (new TextNode () { public String toPlainTextString() { return (org.htmlparser.util.Translate.decode (super.toPlainTextString ())); } }); Parser parser = new Parser(html); parser.setNodeFactory(factory); NodeList all = parser.parse(null); System.out.println( all.toHtml()); } } -----Inline Attachment Follows----- ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace -----Inline Attachment Follows----- _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Jay P. <jp...@se...> - 2008-04-09 15:35:35
|
Derrick, When I do this all text in the text nodes is removed. How can I subclass TextNode? Thanks Jay On 4/2/08 9:38 PM, "Derrick Oswald" <der...@ro...> wrote: > Just add an empty string in the constructor call: > factory.setTextPrototype (new TextNode ("") { > > > ----- Original Message ---- > From: Jay Prall <jp...@se...> > To: htm...@li... > Sent: Wednesday, April 2, 2008 3:46:14 PM > Subject: [Htmlparser-user] Trouble overriding textnodes using setTextPrototype > > Trouble overriding textnodes using setTextPrototype > // This code doesn't compile. It complains "The constructor TextNode() is > undefined". I got this in the documentation and thought it was a way to > override textnodes? > // My goal is to override TextNode so that I can process text and turn > http://link.com into a real link <a href="link.com">link.com</a> > // Any ideas? > > import org.htmlparser.Node; > import org.htmlparser.Parser; > import org.htmlparser.Text; > import org.htmlparser.lexer.Page; > import org.htmlparser.tags.LinkTag; > import org.htmlparser.util.NodeList; > import org.htmlparser.util.ParserException; > import org.htmlparser.PrototypicalNodeFactory; > import org.htmlparser.tags.*; > import org.htmlparser.nodes.TextNode; > > public static void main (String[] args) throws ParserException > { > String html = > "<html><body>\n<script>alert('hi');</script><select id=\"da\"></select>" + > " > > http://googlelink.com\">123456 <%5C%22%3Ca> > " + > > "<h1>hello</h1><a href=cnn.com></a>\n" + > "http://google.com</br>" + > "<b>https://cnn.com/?test=3&2=d</b></p>\n" + > "<table><tr><td>http://table.com</td></tr></table>" + > "<a href=\"a.html\">123</a>\n<a > href=\"http://www.alreadylinkified.com/\">http://www.alreadylinkified.com</a>\ > n</body></html>"; > > PrototypicalNodeFactory factory = new > PrototypicalNodeFactory(); > factory.setTextPrototype (new TextNode () { > public String toPlainTextString() > { > return (org.htmlparser.util.Translate.decode > (super.toPlainTextString ())); > } > }); > > Parser parser = new Parser(html); > parser.setNodeFactory(factory); > NodeList all = parser.parse(null); > System.out.println( all.toHtml()); > } > > } > > > -----Inline Attachment Follows----- > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > > > -----Inline Attachment Follows----- > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |