htmlparser-user Mailing List for HTML Parser (Page 14)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Ben R. <be...@in...> - 2010-02-10 01:12:25
|
Greetings, I would like to use bold tags ("b") with org.htmlparser, but have been unable to get them to work as expected. I added and registered a BoldTag class that is an exact copy of org.htmlparser.tags.ParagraphTag (with the "B" added to mEnders) but have not been able to make it work as expected. When looking at the children of the body node from the following html string, there are 3 children (<b>, <a>, </b>) instead of the single <b> as I would expect. "<html><body><b><a href='test.com'>Test</a></b></body></html>" Am I missing something here? -Ben Rose |
From: Asmita <ks...@gm...> - 2010-01-22 11:32:56
|
hi can anyone help me to remove the CSS tags from my dom tree.... can u help me... i dont know if CssSelectorNodeFilter would help can someone tell about its usage ... thanks in advance -- With regards, Asmi |
From: krishna <kri...@ya...> - 2010-01-20 10:46:27
|
dear friends, whenever I run my program with local html file, i have no problem. when i run it through the internet connection, i have this exception. org.htmlparser.util.ParserException: with some quotes can anybody say why? how i can recover. |
From: <Mug...@tc...> - 2010-01-04 11:18:03
|
Hi, I would like to parse the URL and get the following things 1)Title of the webpage 2)Displayable Text of the page 3)Images in the webpage 3)Anchor links of the webpage Please guide me on how to proceed. Regards, Mugilan T.S Ragupathy Tata Consultancy Services Mailto: mug...@tc... Website: http://www.tcs.com ____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing ____________________________________________ =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you |
From: Derrick O. <der...@gm...> - 2009-12-27 07:32:44
|
You should be able to create a filter to extract the alt attribute from IMG tags using the FilterBuilder program. On Sat, Dec 26, 2009 at 9:30 PM, <jmu...@ai...> wrote: > I need help getting the StockScouter number from MSN Money and then writing > it to a file: > > http://moneycentral.msn.com/investor/StockRating/srsmain.asp?Symbol=AAPL > > The StockScouter number is located in the Html code below: > > <img class="img1" src="images/SRS5.gif" width="97" height="82" alt="StockScouter > Rating: 5" /><p> > > > > Can anyone point me in the right direction? Thanks! > > > > > > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: <jmu...@ai...> - 2009-12-26 20:30:41
|
I need help getting the StockScouter number from MSN Money and then writing it to a file: http://moneycentral.msn.com/investor/StockRating/srsmain.asp?Symbol=AAPL The StockScouter number is located in the Html code below: <img class="img1" src="images/SRS5.gif" width="97" height="82" alt="StockScouter Rating: 5" /><p> Can anyone point me in the right direction? Thanks! |
From: Andy W. <an...@aw...> - 2009-12-17 10:44:14
|
For anyone having this problem in future, here is a workaround (exception handling omitted): // get the html 'the Google way' URL url = new URL(TEST_URL); BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream())); String line = null; StringBuilder buf = new StringBuilder(); while ((line = reader.readLine()) != null) { buf.append(line); buf.append(System.getProperty ("line.separator")); } reader.close(); // use createPaser method that expects html string Parser parser = Parser.createParser(buf.toString(), null); StringBean sb = new StringBean (); parser.visitAllNodesWith (sb); return (sb.getStrings ()); |
From: Derrick O. <der...@gm...> - 2009-12-16 19:52:22
|
There's nothing we can do about the exception thrown in a third party package (com.google.apphosting.utils.security.urlfetch). You might check what you added to parseCookies() to get it to behave this way... or maybe just catch the exception. On Wed, Dec 16, 2009 at 4:05 PM, Andy Wickson <an...@aw...> wrote: > Thanks again. > Doesn't seem to work for me though.... > The code now is: > > Parser.getConnectionManager().setCookieProcessingEnabled(false); > Parser parser = new Parser (http://TEST_URL); > StringBean sb = new StringBean (); > parser.visitAllNodesWith (sb); > String result = sb.getStrings (); > > and the stack trace is: > > at > com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getHeaderFields(URLFetchServiceStreamHandler.java:211) > at > com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getHeaderField(URLFetchServiceStreamHandler.java:196) > at > org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1097) > at > org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:669) > at > org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:848) > at org.htmlparser.Parser.<init>(Parser.java:301) > at org.htmlparser.Parser.<init>(Parser.java:313) > > May be a red herring though, as I can get the same result testing locally > if I switch off the http server that servers the TEST_URL. > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Andy W. <an...@aw...> - 2009-12-16 15:05:19
|
Thanks again. Doesn't seem to work for me though.... The code now is: Parser.getConnectionManager().setCookieProcessingEnabled(false); Parser parser = new Parser (http://TEST_URL); StringBean sb = new StringBean (); parser.visitAllNodesWith (sb); String result = sb.getStrings (); and the stack trace is: at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getHeaderFields(URLFetchServiceStreamHandler.java:211) at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getHeaderField(URLFetchServiceStreamHandler.java:196) at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1097) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:669) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:848) at org.htmlparser.Parser.<init>(Parser.java:301) at org.htmlparser.Parser.<init>(Parser.java:313) May be a red herring though, as I can get the same result testing locally if I switch off the http server that servers the TEST_URL. |
From: Derrick O. <der...@gm...> - 2009-12-15 16:39:22
|
As it says in the StringBean header, you can use the StringBean as a visitor: * StringBean sb = new StringBean (); * Parser parser = new Parser ("http://cbc.ca"); * parser.visitAllNodesWith (sb); * String s = sb.getStrings (); * sb.setLinks (true); * parser.reset (); * parser.visitAllNodesWith (sb); * String sl = sb.getStrings (); On Tue, Dec 15, 2009 at 1:09 PM, Andy Wickson <an...@aw...> wrote: > Thanks for the reply. > You wrote that I can turn off cookie processing with: > parser.getConnectionManager > ().setCookieProcessingEnabled(false) > > However, I am using the StringBean class that has no access to the Parser > it uses: > > My code is like: > > StringBean sb = new StringBean (); > sb.setURL (http://blah-blah); > String result = sb.getStrings (); > > Any idea how I can get to the parser? > > Thanks > > > > > > > ------------------------------------------------------------------------------ > Return on Information: > Google Enterprise Search pays you back > Get the facts. > http://p.sf.net/sfu/google-dev2dev > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Andy W. <an...@aw...> - 2009-12-15 12:09:12
|
Thanks for the reply. You wrote that I can turn off cookie processing with: parser.getConnectionManager ().setCookieProcessingEnabled(false) However, I am using the StringBean class that has no access to the Parser it uses: My code is like: StringBean sb = new StringBean (); sb.setURL (http://blah-blah); String result = sb.getStrings (); Any idea how I can get to the parser? Thanks |
From: Derrick O. <der...@gm...> - 2009-12-12 09:03:09
|
This has been replaced by the main program in org.htmlparser.beans.StringBean. Sorry for the misdirection On Wed, Dec 9, 2009 at 11:18 PM, David Portabella Clotet < dav...@gm...> wrote: > Hello, > > In the website: http://htmlparser.sourceforge.net/samples.html > there is info about the "StringExtractor" example: > ++++++++++++++++++ > String Extractor > Extract text from a web page. > org.htmlparser.parserapplications.StringExtractor > bin/stringextractor http://website_url > ++++++++++++++++++ > > However, I did not find this example in any of this two downloads: > HTMLParser-2.0-SNAPSHOT-src.zip > HTMLParser-2.0-SNAPSHOT-bin.zip > > Can you please tell me where to find the StringExtractor example? > > > Best regards, > DAvid Portabella > > > ------------------------------------------------------------------------------ > Return on Information: > Google Enterprise Search pays you back > Get the facts. > http://p.sf.net/sfu/google-dev2dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: David P. C. <dav...@gm...> - 2009-12-09 22:18:54
|
Hello, In the website: http://htmlparser.sourceforge.net/samples.html there is info about the "StringExtractor" example: ++++++++++++++++++ String Extractor Extract text from a web page. org.htmlparser.parserapplications.StringExtractor bin/stringextractor http://website_url ++++++++++++++++++ However, I did not find this example in any of this two downloads: HTMLParser-2.0-SNAPSHOT-src.zip HTMLParser-2.0-SNAPSHOT-bin.zip Can you please tell me where to find the StringExtractor example? Best regards, DAvid Portabella |
From: Derrick O. <der...@gm...> - 2009-12-08 06:24:59
|
On connection open, if cookie processing is enabled, it will attempt to parse whatever cookies came back with the HTTP response. This looks like it hit a bad one. You can turn this processing off with parser.getConnectionManager ().setCookieProcessingEnabled(false) On Tue, Dec 8, 2009 at 12:31 AM, Andy Wickson <an...@aw...> wrote: > Has anyone used HtmlParser with GAE? > I have a url I parse ok in 'dev mode' but when I publish the code the the > live environment I get the following exception when parsing the same page: > > java.lang.IllegalStateException: no cookie value > at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1131) > at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:669) > at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:848) > at org.htmlparser.Parser.setURL(Parser.java:392) > > > Anyone know why I find myself parsing Cookies anyway? > > > cheers > > Andy > > > > ------------------------------------------------------------------------------ > Return on Information: > Google Enterprise Search pays you back > Get the facts. > http://p.sf.net/sfu/google-dev2dev > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Andy W. <an...@aw...> - 2009-12-07 23:31:30
|
Has anyone used HtmlParser with GAE? I have a url I parse ok in 'dev mode' but when I publish the code the the live environment I get the following exception when parsing the same page: java.lang.IllegalStateException: no cookie value at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:1131) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:669) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:848) at org.htmlparser.Parser.setURL(Parser.java:392) Anyone know why I find myself parsing Cookies anyway? cheers Andy |
From: Micheal M. <mic...@gm...> - 2009-11-07 21:50:37
|
How to replace URLs of links using Java HTMLParser <http://stackoverflow.com/questions/1089517/how-to-replace-urls-of-links-using-java-htmlparser-org-htmlparser> I am using htmlparser (htmlparser.org) to re-write all the link's in a input String. All i need to do is iterate over all the link tags (<a href=...), that appear in the input String, grab their value, perform some regex to determine how they should be manipulated, and then update the link's href, target and onclick values accordingly. I am not sure how exactly I can update only the select link elements in the input String, will leaving all other data in the input String untouched. It seems like the htmlparser library can extract certain elements for manipulation but it can't manipulate elements in their original context, and the then return their updated values will maintaining the integrity of the original context. Example : <a class="user" href="">ERIC POWER</a> commented on <a class="user" href="">Test Test</a>'s blog entry, <a href=" http://liferay-sandbox-myview/c/blogs/find_entry?entryId=20801">Dave's Entry To Test DE742</a>, in <a class="group" href=" http://liferay-sandbox-myview/c/my_places/view?groupId=11113&privateLayout=0">Global Platform for Sales</a>. Any help would be greatly appreciated. Thanks Micheal |
From: Daniel A. <da...@ce...> - 2009-11-03 00:22:26
|
I am trying to get the text of an HTML page (similar to StringBean.getString()) but without the anchor text. I tried a few different methods including using a filter bean but I couldn't get it right. Does anyone know of a way to do this? Regards, Danny Abraham cephX, Inc. ==================================== cephX - Online Cephalometrics and Storage Services www.cephX.com<http://www.cephx.com/> cephX, Inc. c/o Vegas Valley Orthodontics 7500 West Lake Mead Blvd #9-377 Las Vegas NV 89128 Toll-free phone (USA): 800 992 1499 Phone: +1 213 452 1584 Fax: +1 425 491 2871 da...@ce...<mailto:da...@ce...> ==================================== |
From: Java G. <jav...@ya...> - 2009-11-01 19:42:45
|
Hi, I'm currently reviewing HTMLParser and like what I see so far. One thing that would be great as a RFE; there is not an easy way to set user-agent using default Parser. Default Parser uses a static initializer to set its User-Agent as opposed to using default ConnectionManager. Would be nice if this could be just set in one place; and Parser class uses defaults instead of the static init. I'm using latest 2.0 snapshot Cheers |
From: Derrick O. <der...@gm...> - 2009-10-28 14:41:06
|
Definitely sounds like a bug. On Wed, Oct 28, 2009 at 11:39 AM, Magnus Olstad Hansen <ma...@ma...>wrote: > I've discovered that my source had a double space before the href > attribute, like this: > > <a href="..."> > > Seems the extra space is included with the attribute name, as a > setAttribute(" " + "href", ...) actually changes the value in my source. > This may be something to classify as a bug? Whitespaces in HTML are > probably very often incosistent... > > Thanks, > Magnus > > > Magnus Olstad Hansen wrote: > > Hi, > > > > I have trouble changing the content of the HREF attribute of an A-tag. > > I've tried both TagNode.setAttribute("href", ...) and also > > TagNode.removeAttribute("href") followed by TagNode.setAttribute("href", > > ...). But - the href value seems to remain unchanged - any ideas? > > > > Is the new value validated by HTML Parse somehow? > > > > Thanks, > > Magnus > > > > > > > ------------------------------------------------------------------------------ > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Magnus O. H. <ma...@ma...> - 2009-10-28 11:05:57
|
I've discovered that my source had a double space before the href attribute, like this: <a href="..."> Seems the extra space is included with the attribute name, as a setAttribute(" " + "href", ...) actually changes the value in my source. This may be something to classify as a bug? Whitespaces in HTML are probably very often incosistent... Thanks, Magnus Magnus Olstad Hansen wrote: > Hi, > > I have trouble changing the content of the HREF attribute of an A-tag. > I've tried both TagNode.setAttribute("href", ...) and also > TagNode.removeAttribute("href") followed by TagNode.setAttribute("href", > ...). But - the href value seems to remain unchanged - any ideas? > > Is the new value validated by HTML Parse somehow? > > Thanks, > Magnus > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Magnus O. H. <ma...@ma...> - 2009-10-28 10:26:36
|
Hi, I have trouble changing the content of the HREF attribute of an A-tag. I've tried both TagNode.setAttribute("href", ...) and also TagNode.removeAttribute("href") followed by TagNode.setAttribute("href", ...). But - the href value seems to remain unchanged - any ideas? Is the new value validated by HTML Parse somehow? Thanks, Magnus |
From: Henry T. <hen...@ho...> - 2009-10-20 14:12:25
|
> From: htm...@li... > Subject: Htmlparser-user Digest, Vol 37, Issue 6 > To: htm...@li... > Date: Sat, 17 Oct 2009 03:39:23 +0000 > > Send Htmlparser-user mailing list submissions to > htm...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > or, via email, send a message with subject or body 'help' to > htm...@li... > > You can reach the person managing the list at > htm...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Htmlparser-user digest..." > > > Today's Topics: > > 1. [SPAM] Slave seeks for his master (Glennis Yoro) > 2. [SPAM] Please, make a call (Grover Osjr) > 3. [SPAM] Secrets of best tricks (Moises Surman) > 4. Get links (hel...@ya...) > 5. Re: Get links (Derrick Oswald) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 12 Oct 2009 17:54:03 +0200 > From: "Glennis Yoro" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Slave seeks for his master > To: htm...@li... > Message-ID: <8395OJC.472606CDD.5864439004398EKOXBVCWOOOHQFL3371@marek> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Thu, 15 Oct 2009 19:42:01 +0200 > From: "Grover Osjr" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Please, make a call > To: htm...@li... > Message-ID: <3288DH.2089EC68.283304082775XXTHMEVSEQUSRAA94@hrnjakovic> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 3 > From: "Moises Surman" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Secrets of best tricks > To: htm...@li... > Message-ID: > <mai...@li...> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Fri, 16 Oct 2009 15:55:46 +0000 (GMT) > From: hel...@ya... > Subject: [Htmlparser-user] Get links > To: htm...@li... > Message-ID: <978...@we...> > Content-Type: text/plain; charset=utf-8 > > Hello > I use HTML parser to retrive links from web pages. > I had done this: > > Parser parser = new Parser(url); > NodeList body = parser.parse(new HasAttributeFilter("a")); > > But It retrives the anchor text and not the link. How can I get links? > > > > > > > > ------------------------------ > > Message: 5 > Date: Sat, 17 Oct 2009 05:39:10 +0200 > From: Derrick Oswald <der...@gm...> > Subject: Re: [Htmlparser-user] Get links > To: hel...@ya..., htmlparser user list > <htm...@li...> > Message-ID: > <16a...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > You probably want HasAttributeFilter("ahref") or probably better > NodeClassFilter(LinkTag.class). > Then the nodes in the list are LinkTags which have the method getLink(). > > On Fri, Oct 16, 2009 at 5:55 PM, <hel...@ya...> wrote: > > > Hello > > I use HTML parser to retrive links from web pages. > > I had done this: > > > > Parser parser = new Parser(url); > > NodeList body = parser.parse(new HasAttributeFilter("a")); > > > > But It retrives the anchor text and not the link. How can I get links? > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > > ------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > End of Htmlparser-user Digest, Vol 37, Issue 6 > ********************************************** _________________________________________________________________ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/171222984/direct/01/ |
From: Derrick O. <der...@gm...> - 2009-10-17 03:39:22
|
You probably want HasAttributeFilter("ahref") or probably better NodeClassFilter(LinkTag.class). Then the nodes in the list are LinkTags which have the method getLink(). On Fri, Oct 16, 2009 at 5:55 PM, <hel...@ya...> wrote: > Hello > I use HTML parser to retrive links from web pages. > I had done this: > > Parser parser = new Parser(url); > NodeList body = parser.parse(new HasAttributeFilter("a")); > > But It retrives the anchor text and not the link. How can I get links? > > > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <hel...@ya...> - 2009-10-16 15:55:58
|
Hello I use HTML parser to retrive links from web pages. I had done this: Parser parser = new Parser(url); NodeList body = parser.parse(new HasAttributeFilter("a")); But It retrives the anchor text and not the link. How can I get links? |
From: Derrick O. <der...@gm...> - 2009-10-09 04:48:06
|
The Page class maintains this type of information. When the source is exhausted it could record the cursor position. I dont think it does now. The PageIndex has the position of each end-of-line except possibly the last.I think that adding one for the end-of-file wouldn't hurt. Then the size (in characters) would be that last cursor position. The size in bytes depends on the encoding, as mentioned here.. http://htmlparser.sourceforge.net/faq.html#byte On Fri, Oct 9, 2009 at 1:47 AM, Graham Bentley < gra...@wh...> wrote: > Incidentally, I was passing the html to the parser as a string as I > wanted to know the size of the page in bytes as I couldnt see > how to get this from the htmlparser if passing in an inputstream - and > cant rely on the contenttype header as sometimes is missing. > How do I get the compressed and or uncompressed size of the stream - is > this possible or is it a feature that could be added ? > regards, > Graham > > htm...@li... wrote: > > Send Htmlparser-user mailing list submissions to > > htm...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > or, via email, send a message with subject or body 'help' to > > htm...@li... > > > > You can reach the person managing the list at > > htm...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Htmlparser-user digest..." > > > > > > Today's Topics: > > > > 1. [SPAM] Related game (Edmund Kuyzyoj) > > 2. No line numbers if using a string source for parser > > (Graham Bentley) > > 3. Re: No line numbers if using a string source for parser > > (Derrick Oswald) > > 4. [SPAM] No ads, only love (Adelina Qbunu) > > 5. [SPAM] Favorite Brand promotion!! (Evon Pribyl) > > 6. [SPAM] They changed format (Kacie Jsotuv) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sat, 3 Oct 2009 21:01:44 +0200 > > From: "Edmund Kuyzyoj" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] Related game > > To: htm...@li... > > Message-ID: <4728PU.8178EF249.84990311452528BAVFMWIYGTOJIUJ52@bombel> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 2 > > Date: Sun, 04 Oct 2009 04:04:26 +0100 > > From: Graham Bentley <gra...@wh...> > > Subject: [Htmlparser-user] No line numbers if using a string source > > for parser > > To: htm...@li... > > Message-ID: <4AC...@wh...> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Hi there, > > > > Im using the parser to extract links and am passing the html in as a > > string source ie: > > Parser oParser = new Parser(new Lexer(sHTML)); > > However Ive noticed that I can get the starting position of the > > extracted nodes: > > LinkTag.getStartPosition > > but cannot get the starting line number: > > LinkTag.getStartingLineNumber > > is always 0 > > > > it works fine and gives the line number if I pass in an input stream > > from httpurlconnection or the urlconnection itself. > > so a bit confused - is this a bug or is it not possible to get the line > > numbers when using the stringsource ? > > could really do with the line numbers if there is a correct way of doing > > this, thanks. > > > > all working really well apart from that, thanks for the library. > > > > regards, > > Graham > > > > > > > > > > > > ------------------------------ > > > > Message: 3 > > Date: Sun, 4 Oct 2009 07:12:21 +0200 > > From: Derrick Oswald <der...@gm...> > > Subject: Re: [Htmlparser-user] No line numbers if using a string > > source for parser > > To: htmlparser user list <htm...@li...> > > Message-ID: > > <16a...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > Does your input text string contain the newline (0x0A) or carriage > > return-newline (0x0D,0x0A) end of line characters? > > > > On Sun, Oct 4, 2009 at 5:04 AM, Graham Bentley < > > gra...@wh...> wrote: > > > > > >> Hi there, > >> > >> Im using the parser to extract links and am passing the html in as a > >> string source ie: > >> Parser oParser = new Parser(new Lexer(sHTML)); > >> However Ive noticed that I can get the starting position of the > >> extracted nodes: > >> LinkTag.getStartPosition > >> but cannot get the starting line number: > >> LinkTag.getStartingLineNumber > >> is always 0 > >> > >> it works fine and gives the line number if I pass in an input stream > >> from httpurlconnection or the urlconnection itself. > >> so a bit confused - is this a bug or is it not possible to get the line > >> numbers when using the stringsource ? > >> could really do with the line numbers if there is a correct way of doing > >> this, thanks. > >> > >> all working really well apart from that, thanks for the library. > >> > >> regards, > >> Graham > >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Come build with us! The BlackBerry® Developer Conference in SF, CA > >> is the only developer event you need to attend this year. Jumpstart your > >> developing skills, take BlackBerry mobile applications to market and > stay > >> ahead of the curve. Join us from November 9-12, 2009. Register > now! > >> http://p.sf.net/sfu/devconf > >> _______________________________________________ > >> Htmlparser-user mailing list > >> Htm...@li... > >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >> > >> > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 4 > > Date: Mon, 5 Oct 2009 09:53:37 +0300 > > From: "Adelina Qbunu" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] No ads, only love > > To: htm...@li... > > Message-ID: > > <1439ISV.2351639A54.249493786265JRTEXWNFPQTGXVR5453@FELICIA> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 5 > > From: "Evon Pribyl" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] Favorite Brand promotion!! > > To: htm...@li... > > Message-ID: <A9U31139H19T2597.BJDXADOXRY.F03E4C93350C@home> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 6 > > Date: Wed, 7 Oct 2009 13:57:33 +0200 > > From: "Kacie Jsotuv" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] They changed format > > To: htm...@li... > > Message-ID: <6191VIB.969264AE4.2562334509LHHTXBUCUWUSPOR23@UZIVATEL> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > ------------------------------------------------------------------------------ > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > > > ------------------------------ > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > End of Htmlparser-user Digest, Vol 37, Issue 3 > > ********************************************** > > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |