htmlparser-user Mailing List for HTML Parser (Page 29)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Al K. <alk...@gm...> - 2007-05-02 06:14:34
|
Hi, The links to old topics in this mailing list at http://sourceforge.net/mailarchive/forum.php?forum_name=htmlparser-user seem to be broken (500 - Internal Server Error). Does anybody know when they can be expected to work again? |
From: Arpad P. <Arp...@7N...> - 2007-04-30 17:14:44
|
One of the women gently wiped drool from the corner of his mouth. |
From: Derrick O. <der...@ro...> - 2007-04-27 01:58:44
|
See http://htmlparser.sourceforge.net/faq.html#composite to find out how to= define your own tags.=0A=0A----- Original Message ----=0AFrom: Gaurav Pran= ay <gau...@gm...>=0ATo: htmlparser user list <htmlparser-user@l= ists.sourceforge.net>=0ASent: Thursday, April 26, 2007 6:17:17 AM=0ASubject= : Re: [Htmlparser-user] Doubts about HTML Parsers.=0A=0AHi Derrick,=0A=0AI = would like to know whether i can define tags of my type which i specifiacll= y want to repalace in the Html content.ie. like ScriptTag, LinkTag etc.=0AB= ecause the =0A<iframe> sort of tags come in the filtered Htmlcontent.How co= uld i replace all of them by defining the filter of that sort. Please give = some relevant codes for this problem.=0A=0AThanks=0A=0A=0AOn 4/20/07, Derr= ick Oswald <der...@ro...> wrote:=0AThe handling of EncodingCha= ngeException is outlined in the =0AFAQ.=0A=0AYou can delete the node from t= he parent's children and then convert the page back to HTML.=0A=0AThe pseud= o code is:=0A=0A// get the entire page=0ANodeList list =3D parser.parse (nu= ll);=0A=0A// find the node to be deleted=0ANodeList advertisment =3D list.e= xtractAllNodesThatMatch (some_filter);=0A=0A// remove these nodes from thei= r parent=0A=0Aforeach (Node node in advertisment)=0A{=0A Node parent =3D n= ode.getParent ();=0A parent.getChildren ().remove (node);=0A}=0A=0A// repr= int the HTML=0ASystem.out.println (list.toHtml ());=0A=0A=0A----- Original = Message ----=0AFrom: Gaurav Pranay <gau...@gm...=0A>=0ATo: html= par...@li...=0ASent: Thursday, April 19, 2007 12:53:54= AM=0A=0ASubject: [Htmlparser-user] Doubts about HTML Parsers.=0A=0AHello S= ir,=0A=0AThanks for your previous replies as they were of immense help to m= e.I have few more doubts regarding the use of Html Parsers & for that i nee= d your help.=0A=0A=0A1) I have a doubt regarding the =0Aorg.htmlparser.util= .EncodingChangeException. Actually this exception is getting thrown by the = program whenever some sites carrying a different charcter set probably=0A c= harset=3DUTF-8=0A=0A=0A=0A .=0ACan I use some tool to get rid of these ex= ception ocuring in the program & can i get the details about the Exceptions= & where they can occur depending on the use. =0A = =0A=0A2) If I want to clear the advertisement by the Html parse= r & the advertisement in plain text at the base of the page like:-=0A = =A9 2007 =0A=0ARediff.com India Limited. All Rights Reserved. =0ADisc= laimer | =0A=0AFeedback=0A =0ACan i implement the Parser in such Fashion to= get rid of these tags OR should i use some sort of Htmlcleaner in this cas= e in parallel with the HtmlParsers?.=0A=0AAwaiting for your reply.=0AThanks= in advance.=0A=0A=0A=0AGaurav Pranay.=0A=0A=0A=0A-------------------------= ------------------------------------------------=0AThis SF.net email is spo= nsored by DB2 Express=0ADownload DB2 Express C - the FREE version of DB2 ex= press and take=0Acontrol of your XML. No limits. Just data. Click to get it= now.=0A=0Ahttp://sourceforge.net/powerbar/db2/=0A_________________________= ______________________=0A=0AHtmlparser-user mailing list=0AHtmlparser-user@= lists.sourceforge.net=0A=0Ahttps://lists.sourceforge.net/lists/listinfo/htm= lparser-user=0A=0A=0A=0A=0A=0A=0A=0A---------------------------------------= ----------------------------------=0AThis SF.net email is sponsored by DB2 = Express=0A=0ADownload DB2 Express C - the FREE version of DB2 express and t= ake=0Acontrol of your XML. No limits. Just data. Click to get it now.=0A=0A= http://sourceforge.net/powerbar/db2/=0A____________________________________= ___________=0AHtmlparser-user mailing list=0A=0AH...@li...urc= eforge.net=0Ahttps://lists.sourceforge.net/lists/listinfo/htmlparser-user= =0A=0A=0A=0A=0A=0A---------------------------------------------------------= ----------------=0AThis SF.net email is sponsored by DB2 Express=0ADownload= DB2 Express C - the FREE version of DB2 express and take=0Acontrol of your= XML. No limits. Just data. Click to get it now.=0Ahttp://sourceforge.net/p= owerbar/db2/=0A_______________________________________________=0AHtmlparser= -user mailing list=0AH...@li...=0Ahttps://lists.= sourceforge.net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |
From: Gaurav P. <gau...@gm...> - 2007-04-26 10:17:30
|
Hi Derrick, I would like to know whether i can define tags of my type which i specifiaclly want to repalace in the Html content.ie. like ScriptTag, LinkTag etc. Because the <iframe> sort of tags come in the filtered Htmlcontent.How could i replace all of them by defining the filter of that sort. Please give some relevant codes for this problem. Thanks On 4/20/07, Derrick Oswald <der...@ro...> wrote: > > The handling of EncodingChangeException is outlined in the FAQ<http://htmlparser.sourceforge.net/faq.html> > . > > You can delete the node from the parent's children and then convert the > page back to HTML. > The pseudo code is: > > // get the entire page > NodeList list = parser.parse (null); > > // find the node to be deleted > NodeList advertisment = list.extractAllNodesThatMatch (some_filter); > > // remove these nodes from their parent > foreach (Node node in advertisment) > { > Node parent = node.getParent (); > parent.getChildren ().remove (node); > } > > // reprint the HTML > System.out.println (list.toHtml ()); > > ----- Original Message ---- > From: Gaurav Pranay <gau...@gm...> > To: htm...@li... > Sent: Thursday, April 19, 2007 12:53:54 AM > Subject: [Htmlparser-user] Doubts about HTML Parsers. > > Hello Sir, > > Thanks for your previous replies as they were of immense help to me.I have > few more doubts regarding the use of Html Parsers & for that i need your > help. > > 1) I have a doubt regarding the > org.htmlparser.util.EncodingChangeException. Actually this exception is > getting thrown by the program whenever some sites carrying a different > charcter set probably charset=UTF-8 . > Can I use some tool to get rid of these exception ocuring in the program & > can i get the details about the Exceptions & where they can occur depending > on the use. > > 2) If I want to clear the advertisement by the Html parser & the > advertisement in plain text at the base of the page like:- > (c) 2007 Rediff.com India Limited. All Rights Reserved. * > Disclaimer* <http://www.rediff.com/disclaim.htm> | *Feedback*<http://support.rediff.com/> > Can i implement the Parser in such Fashion to get rid of these tags OR > should i use some sort of Htmlcleaner in this case in parallel with the > HtmlParsers?. > > Awaiting for your reply. > Thanks in advance. > > Gaurav Pranay. > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Derrick O. <der...@ro...> - 2007-04-25 22:27:15
|
Dipesh, The NodeIterator will only be the top level nodes if you use the unadulterated Parser. For example: <html> <head> </head> <body> </body> </html> yields only one top level node... the <HTML> node, All other nodes are children of the top level node. The equals() method is unlikely to work in any case. I don't believe it's implemented anywhere in the node class hierarchy. Maybe it should. If your intent is just to check element for element equality, I would instead suggest the Lexer class. (Which is what you get if you set the NodeFactory property on the parser to new PrototypicalNodeFactory (true);) The Lexer has a nextNode() method that will retrieve the nodes in a flat sequence. Then I would use: if (e1.toHtml ().equals (e2.toHtml ())) to compare the original HTML strings. But then, you are responsible for syncing up in case of injected nodes, which may be what you were trying to avoid. Otherwise you could just do a string comparison of the two entire HTML pages. In the case of a mismatch, you could submit a suitable portion of the page to the parser and see if it can figure out the nesting for you, but that sounds inefficient. Depends on how many pages you need to process. Derrick ----- Original Message ---- From: Dipesh Sharma <dip...@re...> To: der...@ro... Sent: Tuesday, April 24, 2007 10:53:31 PM Subject: Help needed Hi Derrick, A few days ago i had mailed for help, but none of the replies really helped me. Plz tell me if this can be acheived at all, and if so how? I'll be gateful to you. I'm trying to compare the html tag nodes of 2 different web pages by taking one node at a time. Hence, I need to compare the 1st node of the 2 web pages, then go to 2nd nodes and compare and so on. Could you plz help me how i can achieve this. I've tried to use Node iterator but haven't been successfull. Attached is my code. import org.htmlparser.Parser; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; import org.htmlparser.beans.StringBean; import org.htmlparser.filters.TagNameFilter; import org.htmlparser.util.*; import org.htmlparser.*; import org.htmlparser.filters.AndFilter; import org.htmlparser.filters.HasParentFilter; class Test { public static void main (String[] args) { try { Parser parser1 = new Parser ("http://www.deals2buy.com"); Parser parser2 = new Parser ("http://www.deals.com"); NodeIterator e1 = parser1.elements (); NodeIterator e2 = parser2.elements (); while(e1.hasMoreNodes() && e2.hasMoreNodes()) { if (e1.equals(e2)) System.out.println ("Yes"); else System.out.println ("No"); } } catch (ParserException pe) { pe.printStackTrace (); } } } |
From: cary h. <ca...@ho...> - 2007-04-24 21:43:17
|
I have a question about the sax parser. it related to the "Why aren't <P>, <B>, <I> etc. tags fully nested?" question in the FAQ. The sax parser also doe not recognize end tags for these things. Like the </b>. It recognizes it as a start tag. Is there a way to get it to recognize this as an end tag as well as many other tags it does the same with? TIA _________________________________________________________________ Interest Rates NEAR 39yr LOWS! $430,000 Mortgage for $1,299/mo - Calculate new payment http://www.lowermybills.com/lre/index.jsp?sourceid=lmb-9632-19132&moid=14888 |
From: sebb <se...@gm...> - 2007-04-20 11:23:30
|
On 20 Apr 2007 02:30:14 -0000, Dipesh Sharma <dip...@re...> wrote: > > > Hi I'm trying to compare the html tag nodes of 2 different web pages by taking one node at a time. Hence, I need to compare the 1st node of the 2 web pages, then go to 2nd nodes and compare and so on. Could you plz help me how i can achieve this. I''v tried to use Node iterator but haven't been successfull. Attached is my code. > > The node comparison: if (e1.nextNode()==e2.nextNode()) will always be false, as that will only be true if they are identical Objects. You need to compare the relevant attributes of the node objects instead. I've not looked at the API, but perhaps that defines an equals() method that will work for you. If not, you will need to write your own comparison method. |
From: Derrick O. <der...@ro...> - 2007-04-20 11:07:20
|
The handling of EncodingChangeException is outlined in the FAQ.=0A=0AYou c= an delete the node from the parent's children and then convert the page bac= k to HTML.=0AThe pseudo code is:=0A=0A// get the entire page=0ANodeList lis= t =3D parser.parse (null);=0A=0A// find the node to be deleted=0ANodeList a= dvertisment =3D list.extractAllNodesThatMatch (some_filter);=0A=0A// remove= these nodes from their parent=0Aforeach (Node node in advertisment)=0A{=0A= Node parent =3D node.getParent ();=0A parent.getChildren ().remove (node= );=0A}=0A=0A// reprint the HTML=0ASystem.out.println (list.toHtml ());=0A= =0A----- Original Message ----=0AFrom: Gaurav Pranay <gaurav.pranay1@gmail.= com>=0ATo: htm...@li...=0ASent: Thursday, April 19= , 2007 12:53:54 AM=0ASubject: [Htmlparser-user] Doubts about HTML Parsers.= =0A=0AHello Sir,=0A=0AThanks for your previous replies as they were of imme= nse help to me.I have few more doubts regarding the use of Html Parsers & f= or that i need your help.=0A=0A1) I have a doubt regarding the =0Aorg.htmlp= arser.util.EncodingChangeException. Actually this exception is getting thro= wn by the program whenever some sites carrying a different charcter set pro= bably=0A charset=3DUTF-8=0A=0A=0A .=0ACan I use some tool to get rid of t= hese exception ocuring in the program & can i get the details about the Exc= eptions & where they can occur depending on the use. =0A = =0A=0A2) If I want to clear the advertisement by the Htm= l parser & the advertisement in plain text at the base of the page like:-= =0A =A9 2007 =0ARediff.com India Limited. All Rights Reserved. = =0ADisclaimer | =0AFeedback=0A =0ACan i implement the Parser in such Fashio= n to get rid of these tags OR should i use some sort of Htmlcleaner in this= case in parallel with the HtmlParsers?.=0A=0AAwaiting for your reply.=0ATh= anks in advance.=0A=0A=0AGaurav Pranay.=0A=0A=0A---------------------------= ----------------------------------------------=0AThis SF.net email is spons= ored by DB2 Express=0ADownload DB2 Express C - the FREE version of DB2 expr= ess and take=0Acontrol of your XML. No limits. Just data. Click to get it n= ow.=0Ahttp://sourceforge.net/powerbar/db2/=0A______________________________= _________________=0AHtmlparser-user mailing list=0AH...@li...= urceforge.net=0Ahttps://lists.sourceforge.net/lists/listinfo/htmlparser-use= r=0A=0A=0A=0A=0A |
From: Gaurav P. <gau...@gm...> - 2007-04-20 04:17:48
|
Hi, Try using the below stated code.This is basically ment for moving through the web-page node by node . So if u want to compare the nodes then try using this for both the urls. import org.htmlparser.Parser; import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.util.ParserException; import org.htmlparser.visitors.NodeVisitor; public class MyVisitor extends NodeVisitor { public MyVisitor () { } public void visitTag (Tag tag) { System.out.println ("\n" + tag.getTagName () + tag.getStartPosition ()); } public void visitStringNode (Text string) { System.out.println (string); } public static void main (String[] args) throws ParserException { Parser parser = new Parser ("http://cbc.ca"); Visitor visitor = new MyVisitor (); parser.visitAllNodesWith (visitor); } On 20 Apr 2007 02:30:14 -0000, Dipesh Sharma <dip...@re...> wrote: > > Hi I'm trying to compare the html tag nodes of 2 different web pages by > taking one node at a time. Hence, I need to compare the 1st node of the 2 > web pages, then go to 2nd nodes and compare and so on. Could you plz help me > how i can achieve this. I''v tried to use Node iterator but haven't been > successfull. Attached is my code. > > > > > > > > [image: banner2]<http://adworks.rediff.com/cgi-bin/AdWorks/click.cgi/www.rediff.com/signature-home.htm/1050715198@Middle5/1165462_1159560/1164629/1?PARTNER=3&OAS_QUERY=null+target=new+> > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > |
From: Dipesh S. <dip...@re...> - 2007-04-20 02:30:29
|
aW1wb3J0IG9yZy5odG1scGFyc2VyLlBhcnNlcjsNCiAgICBpbXBvcnQgb3JnLmh0bWxwYXJz ZXIudXRpbC5Ob2RlTGlzdDsNCiAgICAgaW1wb3J0IG9yZy5odG1scGFyc2VyLnV0aWwuUGFy c2VyRXhjZXB0aW9uOw0KICAgICBpbXBvcnQgb3JnLmh0bWxwYXJzZXIuYmVhbnMuU3RyaW5n QmVhbjsNCiAgICBpbXBvcnQgb3JnLmh0bWxwYXJzZXIuZmlsdGVycy5UYWdOYW1lRmlsdGVy Ow0KICAJaW1wb3J0IG9yZy5odG1scGFyc2VyLnV0aWwuKjsNCiAgCWltcG9ydCAJb3JnLmh0 bWxwYXJzZXIuKjsNCiAgCWltcG9ydCBvcmcuaHRtbHBhcnNlci5maWx0ZXJzLkFuZEZpbHRl cjsNCiAgCWltcG9ydCBvcmcuaHRtbHBhcnNlci5maWx0ZXJzLkhhc1BhcmVudEZpbHRlcjsN Cg0KDQogICAgY2xhc3MgVGVzdA0KICAgIHsNCiAgICAgICAgcHVibGljIHN0YXRpYyB2b2lk IG1haW4gKFN0cmluZ1tdIGFyZ3MpDQogICAgICAgIHsNCiAgICAgICAgICAgIHRyeQ0KICAg ICAgICAgICAgew0KICAgICAgICAgICAgCQ0KICAgICAgICAgICAgCVBhcnNlciBwYXJzZXIx ID0gbmV3IFBhcnNlciAoImh0dHA6Ly93d3cuZGVhbHMyYnV5LmNvbSIpOw0KICAgICAgICAg ICAgCVBhcnNlciBwYXJzZXIyID0gbmV3IFBhcnNlciAoImh0dHA6Ly93d3cuZGVhbHMuY29t Iik7DQoNCg0KICAgICAgICAgICAgCU5vZGVJdGVyYXRvciBlMSA9IHBhcnNlcjEuZWxlbWVu dHMgKCk7DQogICAgICAgICAgICAJTm9kZUl0ZXJhdG9yIGUyID0gcGFyc2VyMi5lbGVtZW50 cyAoKTsNCiAgICAgICAgICAgIAkNCiAgICAgICAgICAgIAkNCiAgICAgICAgICAgIAkNCiAg ICAgICAgICAgIAl3aGlsZShlMS5oYXNNb3JlTm9kZXMoKSAmJiBlMi5oYXNNb3JlTm9kZXMo KSkNCiAgICAgICAgICAgIAkNCiAgICAgICAgICAgIAl7DQogICAgICAgICAgIAkJaWYgKGUx Lm5leHROb2RlKCk9PWUyLm5leHROb2RlKCkpDQogICAgICAgICAgIAkJCVN5c3RlbS5vdXQu cHJpbnRsbiAoIlllcyIpOw0KICAgICAgICAgICAgCQllbHNlDQogICAgICAgICAgICAJCQlT eXN0ZW0ub3V0LnByaW50bG4gKCJObyIpOw0KICAgICAgICAgICAJCSAgICAgICAgCQkNCiAg ICAgICAgICAgIAkNCiAgICAgICAgICAgIAl9DQogICAgICAgICAgDQogICAgICAgICAgICAJ IA0KDQogICAgICAgICAgICAJIA0KICAgICAgICAgICAgfQ0KICAgICAgICAgICAgY2F0Y2gg KFBhcnNlckV4Y2VwdGlvbiBwZSkNCiAgICAgICAgICAgIHsNCiAgICAgICAgICAgICAgICBw ZS5wcmludFN0YWNrVHJhY2UgKCk7DQogICAgICAgICAgICB9DQogICAgICAgIH0NCiAgICB9 DQo= |
From: Gaurav P. <gau...@gm...> - 2007-04-19 04:53:58
|
Hello Sir, Thanks for your previous replies as they were of immense help to me.I have few more doubts regarding the use of Html Parsers & for that i need your help. 1) I have a doubt regarding the org.htmlparser.util.EncodingChangeException. Actually this exception is getting thrown by the program whenever some sites carrying a different charcter set probably charset=UTF-8 . Can I use some tool to get rid of these exception ocuring in the program & can i get the details about the Exceptions & where they can occur depending on the use. 2) If I want to clear the advertisement by the Html parser & the advertisement in plain text at the base of the page like:- (c) 2007 Rediff.com India Limited. All Rights Reserved. * Disclaimer* <http://www.rediff.com/disclaim.htm> | *Feedback*<http://support.rediff.com/> Can i implement the Parser in such Fashion to get rid of these tags OR should i use some sort of Htmlcleaner in this case in parallel with the HtmlParsers?. Awaiting for your reply. Thanks in advance. Gaurav Pranay. |
From: Derrick O. <der...@ro...> - 2007-04-07 14:21:32
|
An IMG tag does not contain the text you refer to (above or below the image), which sounds like a caption placed by the HTML authoring software. There is no hard and fast rule that will get that text. Sorry. If you have a number of similar pages, you can use heuristics to create code that will find the text - for that particular class of pages. For example, When you have an IMG tag, either from filtering or examining every node, you can check for text or other tags that may be related to it by looking for the enclosing tag using the getParent() method and examining the children of the parent using getChildren(), to find out the siblings of the IMG tag. Some of these siblings may be the text you want, or perhaps a tag containing the text. You might want to use the FilterBuilder tool to see if you can build a heuristic easily. I don't understand your second question at all. ----- Original Message ---- From: Gaurav Pranay <gau...@gm...> To: htm...@li... Sent: Saturday, April 7, 2007 1:43:38 AM Subject: [Htmlparser-user] Doubt Questions. Hello Sir, Thanks for Quick reply & help. But i have some more doubts related to the Html -Parser. Q:-1)How i can use this parser to get the text associated with the an image ie.the <img tag like bold text above or below the image in a html dump & keep track of the texts around the image in a web-page ?. Q:-2) Do I need to clear the Html page so that i dont get the images of the add in any html page with the help of Html-Cleaner & if yes then how to implement the html-cleaner in the java program?. It will be of immense help to me if i could get some relevent codes related to the above doubts & some information about the relevent classes of Html-Parser through which i can attain the goals of my program. Your good-self is therefore requested to please provide me with some guidelines. Regards Gaurav Pranay ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Gaurav P. <gau...@gm...> - 2007-04-07 05:43:41
|
Hello Sir, Thanks for Quick reply & help. But i have some more doubts related to the Html -Parser. Q:-1)How i can use this parser to get the text associated with the an image ie.the <img tag like bold text above or below the image in a html dump & keep track of the texts around the image in a web-page ?. Q:-2) Do I need to clear the Html page so that i dont get the images of the add in any html page with the help of Html-Cleaner & if yes then how to implement the html-cleaner in the java program?. It will be of immense help to me if i could get some relevent codes related to the above doubts & some information about the relevent classes of Html-Parser through which i can attain the goals of my program. Your good-self is therefore requested to please provide me with some guidelines. Regards Gaurav Pranay |
From: Derrick O. <der...@ro...> - 2007-04-05 12:13:29
|
1) Use a TagNameFilter with "IMG" as the value in a call to parser.parse(filter); See the javadocs for the Parser class. 2) Current versions of the parser take the HTML as a parameter to the Parser constructor as in: new Parser ("<html>...") If it doesn't start with http or https it assumes it's text to be parsed. ----- Original Message ---- From: Gaurav Pranay <gau...@gm...> To: htm...@li... Sent: Thursday, April 5, 2007 1:27:25 AM Subject: [Htmlparser-user] Doubt Questions. Hello Sir/Madam, I have doubts regarding the use of HtmlParsers. Q:1) How to use the HtmlParser with my java program to extract the <img tag & the associated String data with it in any html content derived from any web-page?. Q:2) How to use the Parser for getting the valid String content from the html content in string form rather than giving the URL to the Parser?. Your Help in this context will be highly appreciated. Thanks & Regards. Gaurav ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Gaurav P. <gau...@gm...> - 2007-04-05 05:27:28
|
Hello Sir/Madam, I have doubts regarding the use of HtmlParsers. Q:1) How to use the HtmlParser with my java program to extract the <img tag & the associated String data with it in any html content derived from any web-page?. Q:2) How to use the Parser for getting the valid String content from the html content in string form rather than giving the URL to the Parser?. Your Help in this context will be highly appreciated. Thanks & Regards. Gaurav |
From: Derrick O. <der...@ro...> - 2007-04-01 21:34:25
|
>From the HTML specification: For menus, the control name is provided by a SELECT element and values are provided by OPTION elements. Only selected options may be successful. When no options are selected, the control is not successful and neither the name nor any values are submitted to the server when the form is submitted A control's "control name" is given by its name attribute. The scope of the name attribute for a control within a FORM element is the FORM element. Each control has both an initial value and a current value, both of which are character strings. Please consult the definition of each control for information about initial values and possible constraints on values imposed by the control. In general, a control's "initial value" may be specified with the control element's value attribute. However, the initial value of a TEXTAREA element is given by its contents, and the initial value of an OBJECT element in a form is determined by the object implementation (i.e., it lies outside the scope of this specification). The control's "current value" is first set to the initial value. Thereafter, the control's current value may be modified through user interaction and scripts. A control's initial value does not change. Thus, when a form is reset, each control's current value is reset to its initial value. If a control does not have an initial value, the effect of a form reset on that control is undefined. When a form is submitted for processing, some controls have their name paired with their current value and these pairs are submitted with the form. Those controls for which name/value pairs are submitted are called successful controls. For you, this should be "world=1", a pairing of the name attribute of the SELECT tag with the value attribute of the selected OPTION. Note that if there are many inputs the name/value pairs need to be separated by semicolons: buffer.append ("queryinput=yadda"); buffer.append ("&"); buffer.append ("world=1"); // etc. ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htmlparser user list <htm...@li...> Sent: Sunday, April 1, 2007 5:06:02 PM Subject: Re: [Htmlparser-user] Button Links It is going to the page where it says that the select input was unsupplied. As in. the text on the page says "No World Selected". I believe that everything else is working fine, it is just that one part. On 4/1/07, Derrick Oswald <der...@ro...> wrote: By 'doesn't work' you mean you aren't getting the page you expect? There may be any number of reasons for that. There may be other inputs from the form you need to supply. Also too, there may be cookies on redirections that may need to be supplied. Try using: org.htmlparser.http.ConnectionManager.setRedirectionProcessingEnabled (true); org.htmlparser.http.ConnectionManager.setCookieProcessingEnabled (true); Also some sites require a specific agent. See the documentation on org.htmlparser.http.ConnectionManager.setDefaultRequestProperties () If all else fails, try a simple case - on another server say - to get the hang of the POST before tackling the page you desire. ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htmlparser user list <htm...@li...> Sent: Sunday, April 1, 2007 1:21:35 PM Subject: Re: [Htmlparser-user] Button Links using "world=1" "world=" by itself, "world=" with "1" and "selected", and trying to set value = 1 and selected = to "selected" doesn't work. I can't think of any other combination to use. On 4/1/07, Derrick Oswald < der...@ro...> wrote: I believe you would just add it as a parameter. Using the FAQ example it would be done like so: buffer = new StringBuffer (1024); // 'input' fields separated by ampersands (&) buffer.append ("world=1"); // name=value or buffer.append ("world="); // if the nothing option is selected // etc. ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htmlparser user list < htm...@li...> Sent: Sunday, April 1, 2007 12:42:06 PM Subject: Re: [Htmlparser-user] Button Links After looking closer at the the code, it seems as though there is a "select" menu in the POST that I cannot figure out how to set. The HTML for it is <select name= "world"> <option></option> < option value="1" selected="selected" >World 1</ option></select> I can't find any info in the how to use POST on how to submit the information. It also seems to me as though the default is to have "World 1" selected, and so I don't understand why it has to have it sent in again. Thank you for your help |
From: Cliff H. <cli...@gm...> - 2007-04-01 21:06:08
|
It is going to the page where it says that the select input was unsupplied. As in. the text on the page says "No World Selected". I believe that everything else is working fine, it is just that one part. On 4/1/07, Derrick Oswald <der...@ro...> wrote: > > By 'doesn't work' you mean you aren't getting the page you expect? > There may be any number of reasons for that. > There may be other inputs from the form you need to supply. > Also too, there may be cookies on redirections that may need to be > supplied. > Try using: > org.htmlparser.http.ConnectionManager.setRedirectionProcessingEnabled(true); > org.htmlparser.http.ConnectionManager.setCookieProcessingEnabled (true); > Also some sites require a specific agent. See the documentation on > org.htmlparser.http.ConnectionManager.setDefaultRequestProperties() > If all else fails, try a simple case - on another server say - to get the > hang of the POST before tackling the page you desire. > > ----- Original Message ---- > From: Cliff Holbrook <cli...@gm...> > To: htmlparser user list <htm...@li...> > Sent: Sunday, April 1, 2007 1:21:35 PM > Subject: Re: [Htmlparser-user] Button Links > > using "world=1" "world=" by itself, "world=" with "1" and "selected", and > trying to set value = 1 and selected = to "selected" doesn't work. I can't > think of any other combination to use. > > On 4/1/07, Derrick Oswald <der...@ro...> wrote: > > > > I believe you would just add it as a parameter. Using the FAQ example it > > would be done like so: > > > > buffer = new StringBuffer (1024); > > // 'input' fields separated by ampersands (&) > > buffer.append ("world=1"); // name=value > > or > > buffer.append > > ("world="); // if the nothing option is selected > > // etc. > > > > > > > > ----- Original Message ---- > > From: Cliff Holbrook <cli...@gm...> > > To: htmlparser user list < htm...@li...> > > Sent: Sunday, April 1, 2007 12:42:06 PM > > Subject: Re: [Htmlparser-user] Button Links > > > > After looking closer at the the code, it seems as though there is a > > "select" menu in the POST that I cannot figure out how to set. The HTML for > > it is > > > > <select > > name= > > "world"> > > <option></option> > > < > > option value="1" selected="selected">World 1</ > > > > option></select> > > > > > > > > I can't find any info in the how to use POST on how to submit the > > information. It also seems to me as though the default is to have "World 1" > > selected, and so I don't understand why it has to have it sent in again. > > > > Thank you for your help > > > > On 4/1/07, Derrick Oswald < der...@ro...> wrote: > > > > > > I'm not sure what your question is. > > > As it says in the FAQ example a StringBean is used: > > > > > > bean = new StringBean (); > > > bean.setConnection (connection); > > > mText = bean.getStrings (); > > > > > > > > > But a parser could be used by just replacing the last three lines in > > > the try block with: > > > > > > parser = new Parser (); > > > parser.setConnection (connection); > > > // ... do parser operation > > > > > > So, you are actually passing the fully functional URL connection > > > object to the Parser instead of making it do a GET behind the scenes on a > > > (string) URL. > > > If what you are asking is how to use the parser, check out the Parser > > > javadoc > > > <http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html>, > > > basically get a list of nodes and do something with it: > > > > > > NodeList list = parser.parse (null); > > > // do something with your list of nodes. > > > > > > > > > ----- Original Message ---- > > > From: Cliff Holbrook <cli...@gm... > > > > To: htmlparser user list <htm...@li... > > > > Sent: Sunday, April 1, 2007 1:21:27 AM > > > Subject: Re: [Htmlparser-user] Button Links > > > > > > So after the POST has been completed sucessfully, how does one than > > > perform parsing operations on what has occurred? In other words, how would > > > the URL of the site the program gets routed to be found? > > > > > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys-and earn cash > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > -- > "Programming today is a race between software engineers striving to build > bigger and better idiot-proof programs, and the Universe trying to produce > bigger and better idiots. So far, the Universe is winning." > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -- "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." |
From: Derrick O. <der...@ro...> - 2007-04-01 20:13:29
|
By 'doesn't work' you mean you aren't getting the page you expect? There may be any number of reasons for that. There may be other inputs from the form you need to supply. Also too, there may be cookies on redirections that may need to be supplied. Try using: org.htmlparser.http.ConnectionManager.setRedirectionProcessingEnabled (true); org.htmlparser.http.ConnectionManager.setCookieProcessingEnabled (true); Also some sites require a specific agent. See the documentation on org.htmlparser.http.ConnectionManager.setDefaultRequestProperties() If all else fails, try a simple case - on another server say - to get the hang of the POST before tackling the page you desire. ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htmlparser user list <htm...@li...> Sent: Sunday, April 1, 2007 1:21:35 PM Subject: Re: [Htmlparser-user] Button Links using "world=1" "world=" by itself, "world=" with "1" and "selected", and trying to set value = 1 and selected = to "selected" doesn't work. I can't think of any other combination to use. On 4/1/07, Derrick Oswald <der...@ro...> wrote: I believe you would just add it as a parameter. Using the FAQ example it would be done like so: buffer = new StringBuffer (1024); // 'input' fields separated by ampersands (&) buffer.append ("world=1"); // name=value or buffer.append ("world="); // if the nothing option is selected // etc. ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htmlparser user list < htm...@li...> Sent: Sunday, April 1, 2007 12:42:06 PM Subject: Re: [Htmlparser-user] Button Links After looking closer at the the code, it seems as though there is a "select" menu in the POST that I cannot figure out how to set. The HTML for it is <select name= "world"> <option></option> < option value="1" selected="selected">World 1</ option></select> I can't find any info in the how to use POST on how to submit the information. It also seems to me as though the default is to have "World 1" selected, and so I don't understand why it has to have it sent in again. Thank you for your help On 4/1/07, Derrick Oswald < der...@ro...> wrote: I'm not sure what your question is. As it says in the FAQ example a StringBean is used: bean = new StringBean (); bean.setConnection (connection); mText = bean.getStrings (); But a parser could be used by just replacing the last three lines in the try block with: parser = new Parser (); parser.setConnection (connection); // ... do parser operation So, you are actually passing the fully functional URL connection object to the Parser instead of making it do a GET behind the scenes on a (string) URL. If what you are asking is how to use the parser, check out the Parser javadoc , basically get a list of nodes and do something with it: NodeList list = parser.parse (null); // do something with your list of nodes. ----- Original Message ---- From: Cliff Holbrook <cli...@gm... > To: htmlparser user list <htm...@li... > Sent: Sunday, April 1, 2007 1:21:27 AM Subject: Re: [Htmlparser-user] Button Links So after the POST has been completed sucessfully, how does one than perform parsing operations on what has occurred? In other words, how would the URL of the site the program gets routed to be found? ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user -- "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Cliff H. <cli...@gm...> - 2007-04-01 17:21:38
|
using "world=1" "world=" by itself, "world=" with "1" and "selected", and trying to set value = 1 and selected = to "selected" doesn't work. I can't think of any other combination to use. On 4/1/07, Derrick Oswald <der...@ro...> wrote: > > I believe you would just add it as a parameter. Using the FAQ example it > would be done like so: > > buffer = new StringBuffer (1024); > // 'input' fields separated by ampersands (&) > buffer.append ("world=1"); // name=value > or > buffer.append ("world="); // if the nothing option is selected > // etc. > > > > ----- Original Message ---- > From: Cliff Holbrook <cli...@gm...> > To: htmlparser user list <htm...@li...> > Sent: Sunday, April 1, 2007 12:42:06 PM > Subject: Re: [Htmlparser-user] Button Links > > After looking closer at the the code, it seems as though there is a > "select" menu in the POST that I cannot figure out how to set. The HTML for > it is > > <select > name="world"> > <option></option> > < > option value="1" selected="selected">World 1</ > option></select> > > > > I can't find any info in the how to use POST on how to submit the > information. It also seems to me as though the default is to have "World 1" > selected, and so I don't understand why it has to have it sent in again. > > Thank you for your help > > On 4/1/07, Derrick Oswald <der...@ro...> wrote: > > > > I'm not sure what your question is. > > As it says in the FAQ example a StringBean is used: > > > > bean = new StringBean (); > > bean.setConnection (connection); > > mText = bean.getStrings (); > > > > > > But a parser could be used by just replacing the last three lines in the > > try block with: > > > > parser = new Parser (); > > parser.setConnection (connection); > > // ... do parser operation > > > > So, you are actually passing the fully functional URL connection object > > to the Parser instead of making it do a GET behind the scenes on a (string) > > URL. > > If what you are asking is how to use the parser, check out the Parser > > javadoc > > <http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html>, > > basically get a list of nodes and do something with it: > > > > NodeList list = parser.parse (null); > > // do something with your list of nodes. > > > > > > ----- Original Message ---- > > From: Cliff Holbrook <cli...@gm...> > > To: htmlparser user list <htm...@li... > > > Sent: Sunday, April 1, 2007 1:21:27 AM > > Subject: Re: [Htmlparser-user] Button Links > > > > So after the POST has been completed sucessfully, how does one than > > perform parsing operations on what has occurred? In other words, how would > > the URL of the site the program gets routed to be found? > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -- "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." |
From: Derrick O. <der...@ro...> - 2007-04-01 17:00:16
|
I believe you would just add it as a parameter. Using the FAQ example it would be done like so: buffer = new StringBuffer (1024); // 'input' fields separated by ampersands (&) buffer.append ("world=1"); // name=value or buffer.append ("world="); // if the nothing option is selected // etc. ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htmlparser user list <htm...@li...> Sent: Sunday, April 1, 2007 12:42:06 PM Subject: Re: [Htmlparser-user] Button Links After looking closer at the the code, it seems as though there is a "select" menu in the POST that I cannot figure out how to set. The HTML for it is <select name="world"> <option></option> < option value="1" selected="selected">World 1</ option></select> I can't find any info in the how to use POST on how to submit the information. It also seems to me as though the default is to have "World 1" selected, and so I don't understand why it has to have it sent in again. Thank you for your help On 4/1/07, Derrick Oswald <der...@ro...> wrote: I'm not sure what your question is. As it says in the FAQ example a StringBean is used: bean = new StringBean (); bean.setConnection (connection); mText = bean.getStrings (); But a parser could be used by just replacing the last three lines in the try block with: parser = new Parser (); parser.setConnection (connection); // ... do parser operation So, you are actually passing the fully functional URL connection object to the Parser instead of making it do a GET behind the scenes on a (string) URL. If what you are asking is how to use the parser, check out the Parser javadoc , basically get a list of nodes and do something with it: NodeList list = parser.parse (null); // do something with your list of nodes. ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htmlparser user list <htm...@li... > Sent: Sunday, April 1, 2007 1:21:27 AM Subject: Re: [Htmlparser-user] Button Links So after the POST has been completed sucessfully, how does one than perform parsing operations on what has occurred? In other words, how would the URL of the site the program gets routed to be found? |
From: Cliff H. <cli...@gm...> - 2007-04-01 16:42:09
|
After looking closer at the the code, it seems as though there is a "select" menu in the POST that I cannot figure out how to set. The HTML for it is <select name="world"> <option></option> <option value="1" selected="selected">World 1</option></select> I can't find any info in the how to use POST on how to submit the information. It also seems to me as though the default is to have "World 1" selected, and so I don't understand why it has to have it sent in again. Thank you for your help On 4/1/07, Derrick Oswald <der...@ro...> wrote: > > I'm not sure what your question is. > As it says in the FAQ example a StringBean is used: > > bean = new StringBean (); > bean.setConnection (connection); > mText = bean.getStrings (); > > > But a parser could be used by just replacing the last three lines in the > try block with: > > parser = new Parser (); > parser.setConnection (connection); > // ... do parser operation > > So, you are actually passing the fully functional URL connection object to > the Parser instead of making it do a GET behind the scenes on a (string) > URL. > If what you are asking is how to use the parser, check out the Parser > javadoc<http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html>, > basically get a list of nodes and do something with it: > > NodeList list = parser.parse (null); > // do something with your list of nodes. > > > ----- Original Message ---- > From: Cliff Holbrook <cli...@gm...> > To: htmlparser user list <htm...@li...> > Sent: Sunday, April 1, 2007 1:21:27 AM > Subject: Re: [Htmlparser-user] Button Links > > So after the POST has been completed sucessfully, how does one than > perform parsing operations on what has occurred? In other words, how would > the URL of the site the program gets routed to be found? > > On 3/31/07, Derrick Oswald <der...@ro...> wrote: > > > > The button is probably in a form with a POST type submission method > > required. > > You can see how to handle POST for a form in the FAQ<http://htmlparser.sourceforge.net/faq.html>under How > > can I use POST to fetch a page?<http://htmlparser.sourceforge.net/faq.html#post> > > > > ----- Original Message ---- > > From: Cliff Holbrook < cli...@gm...> > > To: htm...@li... > > Sent: Saturday, March 31, 2007 11:06:15 PM > > Subject: [Htmlparser-user] Button Links > > > > I am trying to create a java program that will follow a series of > > links. However, the first link is not a link per se, but a button, and the > > link parser doesn't recognize it as a link. The source of the button is > > > > <input type="submit" value= > > "Sign in" /> > > > > > > I am fairly new at using this, so I would appreciate some pointers on > > where to look to find out how to accomplish this. > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys-and earn cash > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys-and earn cash > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > -- > "Programming today is a race between software engineers striving to build > bigger and better idiot-proof programs, and the Universe trying to produce > bigger and better idiots. So far, the Universe is winning." > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -- "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." |
From: Derrick O. <der...@ro...> - 2007-04-01 15:56:20
|
I'm not sure what your question is. As it says in the FAQ example a StringBean is used: bean = new StringBean (); bean.setConnection (connection); mText = bean.getStrings (); But a parser could be used by just replacing the last three lines in the try block with: parser = new Parser (); parser.setConnection (connection); // ... do parser operation So, you are actually passing the fully functional URL connection object to the Parser instead of making it do a GET behind the scenes on a (string) URL. If what you are asking is how to use the parser, check out the Parser javadoc, basically get a list of nodes and do something with it: NodeList list = parser.parse (null); // do something with your list of nodes. ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htmlparser user list <htm...@li...> Sent: Sunday, April 1, 2007 1:21:27 AM Subject: Re: [Htmlparser-user] Button Links So after the POST has been completed sucessfully, how does one than perform parsing operations on what has occurred? In other words, how would the URL of the site the program gets routed to be found? On 3/31/07, Derrick Oswald <der...@ro...> wrote: The button is probably in a form with a POST type submission method required. You can see how to handle POST for a form in the FAQ under How can I use POST to fetch a page? ----- Original Message ---- From: Cliff Holbrook < cli...@gm...> To: htm...@li... Sent: Saturday, March 31, 2007 11:06:15 PM Subject: [Htmlparser-user] Button Links I am trying to create a java program that will follow a series of links. However, the first link is not a link per se, but a button, and the link parser doesn't recognize it as a link. The source of the button is <input type="submit" value= "Sign in" /> I am fairly new at using this, so I would appreciate some pointers on where to look to find out how to accomplish this. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user -- "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Cliff H. <cli...@gm...> - 2007-04-01 05:21:30
|
So after the POST has been completed sucessfully, how does one than perform parsing operations on what has occurred? In other words, how would the URL of the site the program gets routed to be found? On 3/31/07, Derrick Oswald <der...@ro...> wrote: > > The button is probably in a form with a POST type submission method > required. > You can see how to handle POST for a form in the FAQ<http://htmlparser.sourceforge.net/faq.html>under How > can I use POST to fetch a page?<http://htmlparser.sourceforge.net/faq.html#post> > > ----- Original Message ---- > From: Cliff Holbrook <cli...@gm...> > To: htm...@li... > Sent: Saturday, March 31, 2007 11:06:15 PM > Subject: [Htmlparser-user] Button Links > > I am trying to create a java program that will follow a series of links. > However, the first link is not a link per se, but a button, and the link > parser doesn't recognize it as a link. The source of the button is > > <input type="submit" value= > "Sign in" /> > > > I am fairly new at using this, so I would appreciate some pointers on > where to look to find out how to accomplish this. > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -- "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." |
From: Derrick O. <der...@ro...> - 2007-04-01 03:15:50
|
The button is probably in a form with a POST type submission method required. You can see how to handle POST for a form in the FAQ under How can I use POST to fetch a page? ----- Original Message ---- From: Cliff Holbrook <cli...@gm...> To: htm...@li... Sent: Saturday, March 31, 2007 11:06:15 PM Subject: [Htmlparser-user] Button Links I am trying to create a java program that will follow a series of links. However, the first link is not a link per se, but a button, and the link parser doesn't recognize it as a link. The source of the button is <input type="submit" value= "Sign in" /> I am fairly new at using this, so I would appreciate some pointers on where to look to find out how to accomplish this. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Cliff H. <cli...@gm...> - 2007-04-01 03:06:17
|
I am trying to create a java program that will follow a series of links. However, the first link is not a link per se, but a button, and the link parser doesn't recognize it as a link. The source of the button is <input type="submit" value="Sign in" /> I am fairly new at using this, so I would appreciate some pointers on where to look to find out how to accomplish this. |