htmlparser-user Mailing List for HTML Parser (Page 18)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: abdulkadir v. <abd...@ya...> - 2008-12-18 10:05:33
|
Hello, Is there a way to set user-agent property ? |
From: Guohui L. <lig...@no...> - 2008-11-14 10:08:47
|
Hi, I got html content from database, it is like following: <html> <body> http://<servername>:<port>/abc.html<br>\r\n </body> <html> The content needs to be display in the webpage. But <servername> and <port> can not be display because the "<" is html key word. So I have to change it to "<". But at the same time, I do not want to change the "<" in other html key work ("<html>", "<br>" etc). In other word, I want to change the content to following: <html> <body> http://<servername>:<port>/abc.html<br><br> </body> <html> Can htmlparser do this? If yes, please tell me how to do. Notes: <servername> & <port> maybe is changed to anothers, such as <hostname>, they are not definite. Thanks! Best Regards, Nicholas Li |
From: Anurag S. R. <anu...@ya...> - 2008-10-23 09:19:03
|
Hi, I am using HTML parser to create a DOM of HTML documents. I used the following code, Parser parser = new Parser("http://www.yahoo.com/"); NodeList rootNodes = parser.parse(null); This code works fine, and generates a DOM, returning all the root nodes into that NodeList. But while traversing the tree, I found out that a lot of nodes are having a flat structure. Eg. If I have a 'h1' node, it creates TEXT and /h1 as children of h1. But for 'b' node, it creates TEXT and /b as siblings of 'b' node instead of one similar to 'h1'. I figured out that CompositeTag are parsed using the CompositeTagScanner, and are made into a tree like hierarchy.. While others are parsed using TagScanner. According to the definition of composite tag, it is any tag with an ending tag. But this isn't working as expected. Can someone tell me how to tell html parser to treat every node as a composite tag? I figured out that only nodes which extend CompositeTag, like HeadingTag TableTag etc, use CompositeTagScanner. Is there a way to force all nodes to be treated as composite? Or any other workaround that'd help my tree structure be consistent (not affected by whether its a h1 or b). Regards, Anurag. Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ |
From: Alexander H. <at...@gm...> - 2008-10-18 18:51:18
|
Hi i need to find a way to search for a list of words in a html file. I mean there are several ways to do it without the parser but i either need the parser for other things. I didnt find a comfortable way to do this with the parser. Maybe there is one but i didnt find any. Hopefully you could help me. Alex |
From: qike258258 <qik...@16...> - 2008-10-13 03:19:52
|
Hi all, i want to change the html value for example <image src=www.google.com /> i want to change the src=www.google.com to the src=www.test.com how can i do? |
From: Derrick O. <der...@ro...> - 2008-09-26 11:57:38
|
The translation adheres to the HTML specification http://www.w3.org/TR/REC-html40/sgml/entities.html If you want to avoid translating some characters, you will need to remove them from the Translate.mCharacterReferences list that was automatically generated and recompile. Derrick ----- Original Message ---- From: Tony Aldrich <ton...@gm...> To: htm...@li... Sent: Friday, September 26, 2008 5:58:49 AM Subject: [Htmlparser-user] How to encode only special symbols Good day, When I use Translate.encode() it encodes all non-latin symbols into &#..... codes. But how can I encode only special (about 250) symbols and other UTF-8 symbols left untouched? Thanks in advance. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Tony A. <ton...@gm...> - 2008-09-26 09:59:05
|
Good day, When I use Translate.encode() it encodes all non-latin symbols into &#..... codes. But how can I encode only special (about 250) symbols and other UTF-8 symbols left untouched? Thanks in advance. |
From: Derrick O. <der...@ro...> - 2008-08-30 20:58:40
|
Try looking at the code for the FilterBuilder application: trunk\filterbuilder\src\main\java\org\htmlparser\parserapplications\filterbuilder See the tree view it creates in the upper right hand corner of the screen shot: http://sourceforge.net/project/screenshots.php?group_id=24399 ----- Original Message ---- From: asha raja <ash...@ya...> To: htm...@li... Sent: Saturday, August 30, 2008 10:38:58 AM Subject: [Htmlparser-user] PLS PLS help HI i need to separate the tags and content from the html document and generate a tree in java.. hw can i do ths?? pls help!! i will be really pleased if u can tell me tat! thanks in advance! regards asha ________________________________ Unlimited freedom, unlimited storage. Get it now |
From: asha r. <ash...@ya...> - 2008-08-30 14:39:10
|
HI i need to separate the tags and content from the html document and generate a tree in java.. hw can i do ths?? pls help!! i will be really pleased if u can tell me tat! thanks in advance! regards asha Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/ |
From: tanvir a. <tan...@gm...> - 2008-08-25 18:29:38
|
hello asha plz study html-tutorial and you will learn .then i will send you some code segments. tanvir *bangladesh university of engineering and technology* On Tue, Aug 26, 2008 at 12:05 AM, asha raja <ash...@ya...> wrote: > hello friends, > i am new to ths html parser. i really dunno where to start and how to use > it. if u can help me i will be really pleased! > > Thanks in advance > > regards > asha > > ------------------------------ > Unlimited freedom, unlimited storage. Get it now<http://in.rd.yahoo.com/tagline_mail_2/*http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -- Tanvir Ahmed |
From: asha r. <ash...@ya...> - 2008-08-25 18:06:06
|
hello friends, i am new to ths html parser. i really dunno where to start and how to use it. if u can help me i will be really pleased! Thanks in advance regards asha Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/ |
From: Ian M. <ia...@ia...> - 2008-08-13 13:56:04
|
Try using the Apache http client - http://hc.apache.org/httpclient-3.x/ On Fri, Jul 25, 2008 at 3:52 AM, Derrick Oswald <der...@ro...> wrote: > The exception says it all : Server redirected too many times (20) > There are too many redirections. > This is set by the JVM I think, > > ----- Original Message ---- > From: Sisilla Sookdeo <sis...@gm...> > To: htm...@li... > Sent: Thursday, July 24, 2008 8:15:54 AM > Subject: [Htmlparser-user] java.net.ProtocolException: Server redirected too > many times (20) > > Hello All, > > > > I am using the following program to generate HTML from a JSP URL –: > > > > package Common; > > import org.htmlparser.Parser; > > import org.htmlparser.util.NodeList; > > import org.htmlparser.util.ParserException; > > > > public class GetHTML > > { > > public String JSPToHTML(String url) > > { > > String html = ""; > > > > System.out.println("Generating HTML String from " + > url + "..."); > > try > > { > > Parser parser = new Parser (url); > > NodeList list = parser.parse (null); > > html = list.toHtml (); > > System.out.println("HTML successfully > generated!"); > > } > > catch (ParserException pe) > > { > > pe.printStackTrace (); > > } > > > > return html; > > } > > } > > > > This works fine for every URL I've passed to it save one. Here is a snippet > of my server output-: > > > > org.htmlparser.util.ParserException: Exception getting input stream from > http://ptt0013:8084/ETMApp/Sales/JSVRHTMLContent.jsp?pttjsvrid=62&class=FormTable&pttqrid=9&changestring=updated&oldaddress1=Insert%20Address%201%20Here&oldaddress2=&oldcity=Insert%20City%20Here&oldcountry=Insert%20Country%20Here&olddate=2008-07-25&oldtime=&olddetails=&oldvpttemployeeid=2&email=email > (Server redirected too many times (20)).; > > java.net.ProtocolException: Server redirected too many times (20) > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > at > sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1298) > > at java.security.AccessController.doPrivileged(Native Method) > > at > sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1292) > > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:948) > > at org.htmlparser.lexer.Page.setConnection(Page.java:571) > > at org.htmlparser.lexer.Page.<init>(Page.java:134) > > at org.htmlparser.lexer.Lexer.<init>(Lexer.java:186) > > at org.htmlparser.Parser.setResource(Parser.java:398) > > at org.htmlparser.Parser.<init>(Parser.java:317) > > at org.htmlparser.Parser.<init>(Parser.java:331) > > at Common.GetHTML.JSPToHTML(GetHTML.java:15) > > > > Everything works fine if I copy and paste the URL into my browser. What > might I be missing here? I appreciate any effort to help me. Thank you for > your time and consideration. > > > > Sincerely, > > > > Sisilla Sookdeo > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: radhouane b. <bra...@ya...> - 2008-07-30 05:58:27
|
To specify a local url, you can use: file://C:/myfolder/myfile.html Best Regards, Radhouane. ----- Message d'origine ---- De : Gabriel Murray <gab...@gm...> À : htm...@li... Envoyé le : Mercredi, 30 Juillet 2008, 1h22mn 23s Objet : [Htmlparser-user] using stringextractor with saved webpage I am trying to extract text from webpages using stringextractor. It works fine if I supply a URL as an argument, e.g. bin/stringextractor http://www.google.com, but if I supply the filename of a locally saved webpage, I get the following error: org.htmlparser.util.ParserException: Connection refused Will this work with saved webpages? Cheers, Gabe ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user _____________________________________________________________________________ Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr |
From: tanvir a. <tan...@gm...> - 2008-07-30 00:55:16
|
can u send me ur code segment ? *tanvir bangladesh university of engineering and technology dhaka bangladesh* On Wed, Jul 30, 2008 at 6:02 AM, Thomas Haines < tho...@re...> wrote: > Could you post the local filename you are trying to extract? > > On 30/07/2008, at 7:22 AM, Gabriel Murray wrote: > > > I am trying to extract text from webpages using stringextractor. It > > works fine if I supply a URL as an argument, e.g. bin/stringextractor > > http://www.google.com, but if I supply the filename of a locally saved > > webpage, I get the following error: > > > > org.htmlparser.util.ParserException: Connection refused > > > > Will this work with saved webpages? > > > > Cheers, > > Gabe > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's > > challenge > > Build the coolest Linux based applications with Moblin SDK & win > > great prizes > > Grand prize is a trip for two to an Open Source event anywhere in > > the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Thomas H. <tho...@re...> - 2008-07-30 00:02:29
|
Could you post the local filename you are trying to extract? On 30/07/2008, at 7:22 AM, Gabriel Murray wrote: > I am trying to extract text from webpages using stringextractor. It > works fine if I supply a URL as an argument, e.g. bin/stringextractor > http://www.google.com, but if I supply the filename of a locally saved > webpage, I get the following error: > > org.htmlparser.util.ParserException: Connection refused > > Will this work with saved webpages? > > Cheers, > Gabe > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Gabriel M. <gab...@gm...> - 2008-07-29 23:22:26
|
I am trying to extract text from webpages using stringextractor. It works fine if I supply a URL as an argument, e.g. bin/stringextractor http://www.google.com, but if I supply the filename of a locally saved webpage, I get the following error: org.htmlparser.util.ParserException: Connection refused Will this work with saved webpages? Cheers, Gabe |
From: calvin He <cra...@ya...> - 2008-07-28 05:51:32
|
Can Htmlparser do XPath looking up? I Thanks a lot! |
From: Derrick O. <der...@ro...> - 2008-07-25 02:53:05
|
The exception says it all : Server redirected too many times (20) There are too many redirections. This is set by the JVM I think, ----- Original Message ---- From: Sisilla Sookdeo <sis...@gm...> To: htm...@li... Sent: Thursday, July 24, 2008 8:15:54 AM Subject: [Htmlparser-user] java.net.ProtocolException: Server redirected too many times (20) Hello All, I am using the following program to generate HTML from a JSP URL –: package Common; import org.htmlparser.Parser; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; public class GetHTML { public String JSPToHTML(String url) { String html = ""; System.out.println("Generating HTML String from " + url + "..."); try { Parser parser = new Parser (url); NodeList list = parser.parse (null); html = list.toHtml (); System.out.println("HTML successfully generated!"); } catch (ParserException pe) { pe.printStackTrace (); } return html; } } This works fine for every URL I've passed to it save one. Here is a snippet of my server output-: org.htmlparser.util.ParserException: Exception getting input stream from http://ptt0013:8084/ETMApp/Sales/JSVRHTMLContent.jsp?pttjsvrid=62&class=FormTable&pttqrid=9&changestring=updated&oldaddress1=Insert%20Address%201%20Here&oldaddress2=&oldcity=Insert%20City%20Here&oldcountry=Insert%20Country%20Here&olddate=2008-07-25&oldtime=&olddetails=&oldvpttemployeeid=2&email=email (Server redirected too many times (20)).; java.net.ProtocolException: Server redirected too many times (20) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1298) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1292) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:948) at org.htmlparser.lexer.Page.setConnection(Page.java:571) at org.htmlparser.lexer.Page.<init>(Page.java:134) at org.htmlparser.lexer.Lexer.<init>(Lexer.java:186) at org.htmlparser.Parser.setResource(Parser.java:398) at org.htmlparser.Parser.<init>(Parser.java:317) at org.htmlparser.Parser.<init>(Parser.java:331) at Common.GetHTML.JSPToHTML(GetHTML.java:15) Everything works fine if I copy and paste the URL into my browser. What might I be missing here? I appreciate any effort to help me. Thank you for your time and consideration. Sincerely, Sisilla Sookdeo |
From: Sisilla S. <sis...@gm...> - 2008-07-24 12:15:56
|
Hello All, I am using the following program to generate HTML from a JSP URL –: package Common; import org.htmlparser.Parser; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; public class GetHTML { public String JSPToHTML(String url) { String html = ""; System.out.println("Generating HTML String from " + url + "..."); try { Parser parser = new Parser (url); NodeList list = parser.parse (null); html = list.toHtml (); System.out.println("HTML successfully generated!"); } catch (ParserException pe) { pe.printStackTrace (); } return html; } } This works fine for every URL I've passed to it save one. Here is a snippet of my server output-: org.htmlparser.util.ParserException: Exception getting input stream from http://ptt0013:8084/ETMApp/Sales/JSVRHTMLContent.jsp?pttjsvrid=62&class=FormTable&pttqrid=9&changestring=updated&oldaddress1=Insert%20Address%201%20Here&oldaddress2=&oldcity=Insert%20City%20Here&oldcountry=Insert%20Country%20Here&olddate=2008-07-25&oldtime=&olddetails=&oldvpttemployeeid=2&email=email(Server redirected too many times (20)).; java.net.ProtocolException: Server redirected too many times (20) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1298) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1292) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:948) at org.htmlparser.lexer.Page.setConnection(Page.java:571) at org.htmlparser.lexer.Page.<init>(Page.java:134) at org.htmlparser.lexer.Lexer.<init>(Lexer.java:186) at org.htmlparser.Parser.setResource(Parser.java:398) at org.htmlparser.Parser.<init>(Parser.java:317) at org.htmlparser.Parser.<init>(Parser.java:331) at Common.GetHTML.JSPToHTML(GetHTML.java:15) Everything works fine if I copy and paste the URL into my browser. What might I be missing here? I appreciate any effort to help me. Thank you for your time and consideration. Sincerely, Sisilla Sookdeo |
From: Fatih M. U. <fm...@gm...> - 2008-07-02 12:33:22
|
You should use the class in this package : org.htmlparser.Parser See javadocs. http://htmlparser.sourceforge.net/javadoc/index.html fmu _____ From: htm...@li... [mailto:htm...@li...] On Behalf Of tanvir ahmed Sent: Wednesday, July 02, 2008 5:24 AM To: htm...@li... Subject: [Htmlparser-user] query Parser p = new Parser("http://finance.yahoo.com/marketupdate?u"); when i write this code , it generates the error: "The constructor Parser(String) is undefined" can anyone solve the problem? i included the following line: import com.sun.org.apache.xalan .internal.xsltc.compiler.Parser; |
From: tanvir a. <tan...@gm...> - 2008-07-02 12:23:42
|
Parser p = new Parser("http://finance.yahoo.com/marketupdate?u"); when i write this code , it generates the error: *"The constructor Parser(String) is undefined"* can anyone solve the problem? i included the following line: import com.sun.org.apache.xalan.internal.xsltc.compiler.Parser; |
From: answers s. <fas...@gm...> - 2008-06-25 11:51:31
|
hi i want extract headlines from webpage something like RSS is it possiable through htmlparser. Thanks in advance On 6/25/08, Derrick Oswald <der...@ro...> wrote: > > If by headlines you mean headings (H1, H2 etc.) then yes, you should be > able to create a NodeClassFilter looking for HeadingTag objects. > If I remember correctly how it is used... > NodeList list = parser.parse (new NodeClassFilter (HeadingTag.class)); > > ----- Original Message ---- > From: answers solutions <fas...@gm...> > To: Htm...@li... > Sent: Wednesday, June 25, 2008 5:58:20 AM > Subject: [Htmlparser-user] how to extract headlines using htmlparser > > hi > > I am presently using htmlparser to extract all the anchor tags in webpage > . > > > but i want to extract only the headlines in webpage . is there any way i > can identify the headlines in a webapge and extract them with the help of > parser. > > > thanks in advance > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Derrick O. <der...@ro...> - 2008-06-25 11:05:10
|
If by headlines you mean headings (H1, H2 etc.) then yes, you should be able to create a NodeClassFilter looking for HeadingTag objects. If I remember correctly how it is used... NodeList list = parser.parse (new NodeClassFilter (HeadingTag.class)); ----- Original Message ---- From: answers solutions <fas...@gm...> To: Htm...@li... Sent: Wednesday, June 25, 2008 5:58:20 AM Subject: [Htmlparser-user] how to extract headlines using htmlparser hi I am presently using htmlparser to extract all the anchor tags in webpage . but i want to extract only the headlines in webpage . is there any way i can identify the headlines in a webapge and extract them with the help of parser. thanks in advance |
From: answers s. <fas...@gm...> - 2008-06-25 09:58:23
|
hi I am presently using htmlparser to extract all the anchor tags in webpage . but i want to extract only the headlines in webpage . is there any way i can identify the headlines in a webapge and extract them with the help of parser. thanks in advance |
From: Henry T. <htr...@ya...> - 2008-06-16 11:07:47
|
Hi All, I am having difficulty parsing the following table using htmlparser table data filter statements: <table border="0" cellpadding="0" cellspacing="0" width="782" id="main-content"> <tr> <td valign="top" class="top"> <table border="0" cellpadding="0" cellspacing="0"> <tr> <td valign="top" class="top"> <!-- un-delay results 14/10/2004 .................................. ---> <div class="greyBorder"> <table border="0" cellspacing="0" cellpadding="2" width="100%"> <tr> <td class="propType"> </td> <td class="propType"><b>Patient</b></td> <td class="propType"><b>Firstname</b></td> <td class="propType"><b>Surname</b></td> <td class="propType" align="right"><b>Date of birth</b></td> <td class="propType">Sex</td> </tr> <tr class="smallnarrow"> <td class="even" width="10" align="left"></td> <td class="even" style="vertical-align: middle;">Clinic</td> <td class="even" style="vertical-align: middle;">John</td> <td class="even" style="vertical-align: middle;">Smith</td> <td class="even" align="right" style="vertical-align: middle;">10/02/1940</td> <td class="even" width="10" style="vertical-align: middle;">M</td> </tr> </table> </div> <div style="margin-top:10px;"> <br> <br> <br> </div> <div align="center" style="margin-bottom: 20px;"> ......... </td></tr></table></td></tr></table> The table data filter statements below pick up every lines shown above which is more than what I wanted: (1) new AndFilter ( new TagNameFilter ("table"), (2) new AndFilter ( new HasAttributeFilter ("border","0"), (3) new AndFilter ( new HasAttributeFilter ("cellspacing","0"), (4) new AndFilter ( new HasAttributeFilter ("cellpadding"), (5) new AndFilter ( new HasAttributeFilter ("width","782"), (6) new AndFilter ( new HasAttributeFilter ("id","main-content"), (7) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"), (8) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"), (9) new HasChildFilter ( new AndFilter ( new TagNameFilter ("table"), (10) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"), (11) new HasChildFilter ( new TagNameFilter ("td"),true)),true)),true)),true)),true))))))); However, I would like to narrow down the parsing by extracting only the Patient table data in bold aboved. Nevertheless, the additional parsing statements below have not proven to be successful: (1) new AndFilter ( new TagNameFilter ("table"), (2) new AndFilter ( new HasAttributeFilter ("border","0"), (3) new AndFilter ( new HasAttributeFilter ("cellspacing","0"), (4) new AndFilter ( new HasAttributeFilter ("cellpadding"), (5) new AndFilter ( new HasAttributeFilter ("width","782"), (6) new AndFilter ( new HasAttributeFilter ("id","main-content"), (7) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"), (8) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"), (9) new HasChildFilter ( new AndFilter ( new TagNameFilter ("table"), (10) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"), (11) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"), (12) new HasChildFilter ( new AndFilter ( new TagNameFilter ("div"), (13) new HasAttributeFilter "class","greyBorder")),true)),true)),true)),true)),true)),true))))))); Line 12-13 searches for the <div> with attribute class=greyBorder but it did not pick up the Patient table at all. Any idea on where the last parsing statement went wrong? It appears that the htmlparser does not treat <div> as a nested tag around the Patient table. Many thanks, Henry Get the name you always wanted with the new y7mail email address. www.yahoo7.com.au/mail |