htmlparser-developer Mailing List for HTML Parser (Page 2)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: Derrick O. <der...@gm...> - 2009-12-16 05:33:46
|
Nav bars are usually identified by a DIV tag with a special Id which you could filter on with a TagNameFilter *and* a HasAttribute filter. A simplistic approach would just delete the node from it's parent, which may work in your case. On Tue, Dec 15, 2009 at 10:56 PM, Ted Yu <yuz...@gm...> wrote: > Hi, > Is it possible to remove navigation bar of web page (common on web portals) > from parse output ? > > The motivation is to provide more focused contents for page categorization. > > Thanks > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: Ted Yu <yuz...@gm...> - 2009-12-15 21:56:26
|
Hi, Is it possible to remove navigation bar of web page (common on web portals) from parse output ? The motivation is to provide more focused contents for page categorization. Thanks |
From: Rod H. <hil...@re...> - 2009-08-28 20:32:45
|
Hello, I'll make this short and sweet. 1) I'm a graduate student. 2) I'm doing research on development practices in the open source community. 3) I would like to use HTML Parser as part of my research. I'd be eternally grateful if everyone who has ever contributed code to this project could fill out my survey. It's three questions long, and you can do the whole thing with just the mouse. Please take a few seconds and help me complete my research, I'd really appreciate it. The survey is here: http://spreadsheets.google.com/viewform?formkey=cHU2aHo5bS14cE04c2gzWGlhaUpQSHc6MA .. Thank you, Rod Hilton P.S. If you are involved with multiple open source projects and seeing this message on other mailing lists, first let me apologize for being so annoying, but second let me ask you to please fill out the survey again, once for each project. Thanks! |
From: Derrick O. <der...@gm...> - 2009-06-30 17:14:20
|
An explaination of how to use POST is available on the FAQ page: http://htmlparser.sourceforge.net/faq.html#post 2009/6/30 Marco Yeung <yeu...@ho...>: > If there is a login form (POST). How can we use htmlParser to login and get > the cookie before passing it to parser? > > >> Date: Sun, 28 Jun 2009 19:24:58 +0200 >> From: der...@gm... >> To: htm...@li... >> Subject: Re: [Htmlparser-developer] how to login website with cookie >> >> Try looking at the thread "Unable to enable cookies" in the Help forum: >> https://sourceforge.net/forum/message.php?msg_id=5852557 >> >> basically: >> parser.getConnectionManager().setRedirectionProcessingEnabled(true); >> parser.getConnectionManager().setCookieProcessingEnabled(true); >> >> then you need to set the cookie before fetching the page: >> >> parser.getConnectionManager().setCookie (Cookie cookie, String domain) >> >> where you have alreday made a cookie, see org.htmlparser.http.Cookie >> and domain is like "google.com" >> >> >> 2009/6/28 Marco Yeung <yeu...@ho...>: >> > Hi, >> > >> > I want to HTMLParser to parse the content of a webpage that requires >> > login >> > and maintains cookies. >> > How can I use htmlparser to do that ? >> > >> > >> > Rdgs >> > Marco >> > >> > ________________________________ >> > Invite your mail contacts to join your friends list with Windows Live >> > Spaces. It's easy! Try it! >> > >> > ------------------------------------------------------------------------------ >> > >> > _______________________________________________ >> > Htmlparser-developer mailing list >> > Htm...@li... >> > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> > >> > >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Htmlparser-developer mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > ________________________________ > See all the ways you can stay connected to friends and family > ------------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: Marco Y. <yeu...@ho...> - 2009-06-30 16:54:56
|
If there is a login form (POST). How can we use htmlParser to login and get the cookie before passing it to parser? > Date: Sun, 28 Jun 2009 19:24:58 +0200 > From: der...@gm... > To: htm...@li... > Subject: Re: [Htmlparser-developer] how to login website with cookie > > Try looking at the thread "Unable to enable cookies" in the Help forum: > https://sourceforge.net/forum/message.php?msg_id=5852557 > > basically: > parser.getConnectionManager().setRedirectionProcessingEnabled(true); > parser.getConnectionManager().setCookieProcessingEnabled(true); > > then you need to set the cookie before fetching the page: > > parser.getConnectionManager().setCookie (Cookie cookie, String domain) > > where you have alreday made a cookie, see org.htmlparser.http.Cookie > and domain is like "google.com" > > > 2009/6/28 Marco Yeung <yeu...@ho...>: > > Hi, > > > > I want to HTMLParser to parse the content of a webpage that requires login > > and maintains cookies. > > How can I use htmlparser to do that ? > > > > > > Rdgs > > Marco > > > > ________________________________ > > Invite your mail contacts to join your friends list with Windows Live > > Spaces. It's easy! Try it! > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > > Htmlparser-developer mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer _________________________________________________________________ Show them the way! Add maps and directions to your party invites. http://www.microsoft.com/windows/windowslive/products/events.aspx |
From: Derrick O. <der...@gm...> - 2009-06-28 17:25:02
|
Try looking at the thread "Unable to enable cookies" in the Help forum: https://sourceforge.net/forum/message.php?msg_id=5852557 basically: parser.getConnectionManager().setRedirectionProcessingEnabled(true); parser.getConnectionManager().setCookieProcessingEnabled(true); then you need to set the cookie before fetching the page: parser.getConnectionManager().setCookie (Cookie cookie, String domain) where you have alreday made a cookie, see org.htmlparser.http.Cookie and domain is like "google.com" 2009/6/28 Marco Yeung <yeu...@ho...>: > Hi, > > I want to HTMLParser to parse the content of a webpage that requires login > and maintains cookies. > How can I use htmlparser to do that ? > > > Rdgs > Marco > > ________________________________ > Invite your mail contacts to join your friends list with Windows Live > Spaces. It's easy! Try it! > ------------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: Marco Y. <yeu...@ho...> - 2009-06-28 10:48:12
|
Hi, I want to HTMLParser to parse the content of a webpage that requires login and maintains cookies. How can I use htmlparser to do that ? Rdgs Marco _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us |
From: Matt K. <mat...@gm...> - 2009-04-08 14:30:56
|
I just ask because when i opened it up to 10 or so threads I was getting intermittent exceptions.... It very well could be my code though I haven't had time to debug. On 4/8/09, Ian Macfarlane <ia...@ia...> wrote: > I can't give a 100% confident answer, but I've used it on a > many-threaded application in the past with no problems. Can't be sure > that the parts you'll exercise will be thread safe though. > > Ian > > 2009/2/25 Matt Kirkley <mat...@gm...>: >> Hi, I was wondering if the html-parser libraries are thread safe? >> >> ------------------------------------------------------------------------------ >> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, >> CA >> -OSBC tackles the biggest issue in open source: Open Sourcing the >> Enterprise >> -Strategies to boost innovation and cut costs with open source >> participation >> -Receive a $600 discount off the registration fee with the source code: >> SFAD >> http://p.sf.net/sfu/XcvMzF8H >> _______________________________________________ >> Htmlparser-developer mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: Ian M. <ia...@ia...> - 2009-04-08 07:54:01
|
I can't give a 100% confident answer, but I've used it on a many-threaded application in the past with no problems. Can't be sure that the parts you'll exercise will be thread safe though. Ian 2009/2/25 Matt Kirkley <mat...@gm...>: > Hi, I was wondering if the html-parser libraries are thread safe? > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: Matt K. <mat...@gm...> - 2009-02-25 14:47:27
|
Hi, I was wondering if the html-parser libraries are thread safe? |
From: Leos L. <lit...@ce...> - 2008-10-07 19:30:23
|
Hi, I use htmlparser 2 and it is a great tool. I use it to protect the text entered by users in my website www.abclinuxu.cz (allowed tags, attributes, basic XSS) and for simple transformations. I plan to unify these use cases in HtmlPurifier library. It is a pity, that it is not actively developed, because many users would worry to start using it. When I am seeking for a new library, this is my standard policy. Well, my question is why TABLE tag is not in Hashtable breakTags in TagNode.java? It is a block element. I also miss there TBODY, THEAD and TFOOT tags. Leos |
From: Ian M. <ia...@ia...> - 2008-09-09 08:28:20
|
Ah yes! By the way, we need to update the link on the site homepage (http://htmlparser.sourceforge.net/) as it currently links to http://svn.sourceforge.net/viewvc/viewcvs.cgi/htmlparser/ which doesn't work. Regards Ian 2008/9/9 Derrick Oswald <der...@ro...>: > How about the SVN Browse... > http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/ > > The archive you mention was a mailing list that issued an email whenever a > file changed. Is that what you want? > I don't know how to do that... yet. > > ----- Original Message ---- > From: Ian Macfarlane <ia...@ia...> > To: The developer mailing list of the htmlparser project > <htm...@li...> > Sent: Monday, September 8, 2008 11:57:51 AM > Subject: Re: [Htmlparser-developer] Working with HTMLParser code in Eclipse > > Hi Derrick, > > Thanks - this is the approach I eventually worked out I needed to do, > and I've successfully managed to set it up. I've committed a new tag, > bug fix, and also fixed two tests which were failing but shouldn't be. > > Just a thought - when we were using CVS we had a web interface at > http://sourceforge.net/mailarchive/forum.php?forum_name=htmlparser-cvs > - is it possible to set up a similar one for SVN? > > Ian > > 2008/9/5 Derrick Oswald <der...@ro...>: >> I'm not sure about the exact steps in Eclipse, but the two projects under >> trunk\lexer and trunk\parser should be all that is required. >> These build the two jar files that the other applications use. >> You should be able to set up two projects - htmlparser.jar depends on >> htmllexer.jar. >> >> ----- Original Message ---- >> From: Ian Macfarlane <ia...@ia...> >> To: htm...@li... >> Sent: Thursday, September 4, 2008 10:26:54 AM >> Subject: [Htmlparser-developer] Working with HTMLParser code in Eclipse >> >> Can anyone tell me how to set up the HTMLParser code base in Eclipse? >> When it was in CVS, it was all within a single directory, but since it >> moved to SVN it's all in different directories, so I'm not quite sure >> how to set up HTMLParser for this. >> >> Just to clarify - this is the HTMLParser code itself, not code using >> HTMLParser. >> >> Thanks >> >> Ian Macfarlane >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Htmlparser-developer mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Htmlparser-developer mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: Derrick O. <der...@ro...> - 2008-09-08 23:57:28
|
How about the SVN Browse... http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/ The archive you mention was a mailing list that issued an email whenever a file changed. Is that what you want? I don't know how to do that... yet. ----- Original Message ---- From: Ian Macfarlane <ia...@ia...> To: The developer mailing list of the htmlparser project <htm...@li...> Sent: Monday, September 8, 2008 11:57:51 AM Subject: Re: [Htmlparser-developer] Working with HTMLParser code in Eclipse Hi Derrick, Thanks - this is the approach I eventually worked out I needed to do, and I've successfully managed to set it up. I've committed a new tag, bug fix, and also fixed two tests which were failing but shouldn't be. Just a thought - when we were using CVS we had a web interface at http://sourceforge.net/mailarchive/forum.php?forum_name=htmlparser-cvs - is it possible to set up a similar one for SVN? Ian 2008/9/5 Derrick Oswald <der...@ro...>: > I'm not sure about the exact steps in Eclipse, but the two projects under > trunk\lexer and trunk\parser should be all that is required. > These build the two jar files that the other applications use. > You should be able to set up two projects - htmlparser.jar depends on > htmllexer.jar. > > ----- Original Message ---- > From: Ian Macfarlane <ia...@ia...> > To: htm...@li... > Sent: Thursday, September 4, 2008 10:26:54 AM > Subject: [Htmlparser-developer] Working with HTMLParser code in Eclipse > > Can anyone tell me how to set up the HTMLParser code base in Eclipse? > When it was in CVS, it was all within a single directory, but since it > moved to SVN it's all in different directories, so I'm not quite sure > how to set up HTMLParser for this. > > Just to clarify - this is the HTMLParser code itself, not code using > HTMLParser. > > Thanks > > Ian Macfarlane > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Ian M. <ia...@ia...> - 2008-09-08 15:57:59
|
Hi Derrick, Thanks - this is the approach I eventually worked out I needed to do, and I've successfully managed to set it up. I've committed a new tag, bug fix, and also fixed two tests which were failing but shouldn't be. Just a thought - when we were using CVS we had a web interface at http://sourceforge.net/mailarchive/forum.php?forum_name=htmlparser-cvs - is it possible to set up a similar one for SVN? Ian 2008/9/5 Derrick Oswald <der...@ro...>: > I'm not sure about the exact steps in Eclipse, but the two projects under > trunk\lexer and trunk\parser should be all that is required. > These build the two jar files that the other applications use. > You should be able to set up two projects - htmlparser.jar depends on > htmllexer.jar. > > ----- Original Message ---- > From: Ian Macfarlane <ia...@ia...> > To: htm...@li... > Sent: Thursday, September 4, 2008 10:26:54 AM > Subject: [Htmlparser-developer] Working with HTMLParser code in Eclipse > > Can anyone tell me how to set up the HTMLParser code base in Eclipse? > When it was in CVS, it was all within a single directory, but since it > moved to SVN it's all in different directories, so I'm not quite sure > how to set up HTMLParser for this. > > Just to clarify - this is the HTMLParser code itself, not code using > HTMLParser. > > Thanks > > Ian Macfarlane > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: Derrick O. <der...@ro...> - 2008-09-05 10:42:23
|
I'm not sure about the exact steps in Eclipse, but the two projects under trunk\lexer and trunk\parser should be all that is required. These build the two jar files that the other applications use. You should be able to set up two projects - htmlparser.jar depends on htmllexer.jar. ----- Original Message ---- From: Ian Macfarlane <ia...@ia...> To: htm...@li... Sent: Thursday, September 4, 2008 10:26:54 AM Subject: [Htmlparser-developer] Working with HTMLParser code in Eclipse Can anyone tell me how to set up the HTMLParser code base in Eclipse? When it was in CVS, it was all within a single directory, but since it moved to SVN it's all in different directories, so I'm not quite sure how to set up HTMLParser for this. Just to clarify - this is the HTMLParser code itself, not code using HTMLParser. Thanks Ian Macfarlane ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Ian M. <ia...@ia...> - 2008-09-04 14:26:59
|
Can anyone tell me how to set up the HTMLParser code base in Eclipse? When it was in CVS, it was all within a single directory, but since it moved to SVN it's all in different directories, so I'm not quite sure how to set up HTMLParser for this. Just to clarify - this is the HTMLParser code itself, not code using HTMLParser. Thanks Ian Macfarlane |
From: eugene k. <ku...@ya...> - 2008-02-01 19:44:26
|
<table cellspacing='0' cellpadding='0' border='0' ><tr><td style="font: inherit;"><br>I have noticed that <div> tag inside LinkTag is assumed as end tag (ender). From what I read in the specs <div> is allowed inside such as <a> <div> </div></a>. <br><br>For example the page from www.google.com places <div> as above.<br><br>Greetings<br>Eugene<br><br></td></tr></table><br> <hr size=1>Looking for last minute shopping deals? <a href="http://us.rd.yahoo.com/evt=51734/*http://tools.search.yahoo.com/newsearch/category.php?category=shopping"> Find them fast with Yahoo! Search.</a> |
From: Nie H. <ked...@16...> - 2007-07-18 03:37:17
|
DQotLS0tLSBPcmlnaW5hbCBNZXNzYWdlIC0tLS0tIA0KRnJvbTogIklhbiBNYWNmYXJsYW5lIiA8 aWFuQGlhbm1hY2ZhcmxhbmUuY29tPg0KVG86IDxodG1scGFyc2VyLWRldmVsb3BlckBsaXN0cy5z b3VyY2Vmb3JnZS5uZXQ+DQpTZW50OiBTYXR1cmRheSwgQXByaWwgMjEsIDIwMDcgMTI6NDggQU0N ClN1YmplY3Q6IFtIdG1scGFyc2VyLWRldmVsb3Blcl0gSW5jb3JyZWN0IGVuY29kaW5nIG9mICYj ODIzMDtvbiBMaW51eCBzeXN0ZW1zDQoNCg0KSSBoYXZlIGVuY291bnRlcmVkIGFuIGludGVyZXN0 aW5nIGlzc3VlIHdpdGggdGhlIGVuY29kaW5nIG9mIHRoZQ0KY2hhcmFjdGVyICYjODIzMDsNCg0K T24gd2luZG93cywgdGhlIGNoYXJhY3RlciBpcyBjb3JyZWN0bHkgZW5jb2RlZCB0byB0aGUgdGhy ZWUgZG90DQpjaGFyYWN0ZXIuIEhvd2V2ZXIsIG9uIGEgTGludXggc3lzdGVtIGl0IGdldHMgZW5j b2RlZCB0byAodGhpcyBtaWdodA0Kbm90IGNvbWUgb3V0IHJpZ2h0IGluIHRoZSBlbWFpbCkgdGhp czog4j+mDQoNCkFjY29yZGluZyB0byB0aGUgVzNDIGRvYyByZWZlcmVuY2VkIGluIHRoZSBjb21t ZW50cyAtDQpodHRwOi8vd3d3LnczLm9yZy9UUi9SRUMtaHRtbDQwL3NnbWwvZW50aXRpZXMuaHRt bCAtIHdoaWNoIHNheXM6DQoNCjwhRU5USVRZIGhlbGxpcCAgIENEQVRBICImIzgyMzA7IiAtLSBo b3Jpem9udGFsIGVsbGlwc2lzID0gdGhyZWUgZG90IGxlYWRlciwNCiAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICBVKzIwMjYgSVNPcHViICAtLT4NCg0KYm90aCAmaGVsbGlwOyBh bmQgJiM4MjMwOyBzaG91bGQgYmUgZW5jb2RlZCB0byB0aGlzIGVsbGlwc2lzDQpjaGFyYWN0ZXIu IEhvd2V2ZXIsIHRoaXMgaXMgbm90IHRoZSBjYXNlLg0KDQpZb3UgY2FuIGdldCBhIG1pbmltYWwg dGVzdGNhc2Ugb2YgdGhpcyBlcnJvciBieSBkb2luZyBqdXN0IGENClN0cmluZ0JlYW4gb24gdGhl IGVudGl0eSBzb2xlbHk6DQoNClBhcnNlciBwYXJzZXIgPSBuZXcgUGFyc2VyKCk7DQpwYXJzZXIu c2V0SW5wdXRIVE1MKCImIzgyMzA7Iik7DQpTdHJpbmdCZWFuIHNiID0gbmV3IFN0cmluZ0JlYW4o KTsNCnBhcnNlci52aXNpdEFsbE5vZGVzV2l0aChzYik7DQpTeXN0ZW0ub3V0LnByaW50bG4oc2Iu Z2V0U3RyaW5ncygpKTsNCg0KSSd2ZSB3b3JrZWQgb3V0IHRoYXQgaXQncyBsb2NhdGVkIGluIFRy YW5zbGF0ZS5kZWNvZGUoLi4pLCBjYXN0aW5nIGludA0KdG8gY2hhci4gVGhpcyB0ZXN0Y2FzZSBz aG93cyB0aGF0IGl0IGRvZXNuJ3Qgd29yazoNCg0KaW50IG51bSA9IDgyMzA7DQpjaGFyIGMgPSAo Y2hhciludW07DQpTeXN0ZW0ub3V0LnByaW50bG4oYyk7DQoNCkl0IHdvdWxkIHNlZW0gdGhhdCBp dCdzIHNvbWV0aGluZyB0byBkbyB3aXRoIHRoZSBjYXN0aW5nIG9mIHRoZSBpbnQgdG8NCnRoZSBj aGFyIHRoYXQgbXVzdCBiZSBwbGF0Zm9ybSBkZXBlbmRlbnQsIGFuZCBpbiB0aGlzIGNhc2UgaXQg d291bGQNCnNlZW0gaW5jb3JyZWN0IHdoZW4gcnVuIG9uIG15IExpbnV4IGJveC4NCg0KSSdkIHdl bGNvbWUgYW55IHN1Z2dlc3Rpb25zIHBlb3BsZSBoYXZlIGFzIHRvIGhvdyB0byBmaXggdGhpcy4N Cg0KVGhhbmtzDQoNCklhbiBNYWNmYXJsYW5lDQoNCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NClRoaXMgU0Yu bmV0IGVtYWlsIGlzIHNwb25zb3JlZCBieSBEQjIgRXhwcmVzcw0KRG93bmxvYWQgREIyIEV4cHJl c3MgQyAtIHRoZSBGUkVFIHZlcnNpb24gb2YgREIyIGV4cHJlc3MgYW5kIHRha2UNCmNvbnRyb2wg b2YgeW91ciBYTUwuIE5vIGxpbWl0cy4gSnVzdCBkYXRhLiBDbGljayB0byBnZXQgaXQgbm93Lg0K aHR0cDovL3NvdXJjZWZvcmdlLm5ldC9wb3dlcmJhci9kYjIvDQpfX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fXw0KSHRtbHBhcnNlci1kZXZlbG9wZXIgbWFpbGlu ZyBsaXN0DQpIdG1scGFyc2VyLWRldmVsb3BlckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQNCmh0dHBz Oi8vbGlzdHMuc291cmNlZm9yZ2UubmV0L2xpc3RzL2xpc3RpbmZvL2h0bWxwYXJzZXItZGV2ZWxv cGVyDQo= |
From: Arjohn K. <arj...@ad...> - 2007-04-21 05:16:22
|
Ian Macfarlane wrote: > I have encountered an interesting issue with the encoding of the > character … > > On windows, the character is correctly encoded to the three dot > character. However, on a Linux system it gets encoded to (this might > not come out right in the email) this: â?¦ > > According to the W3C doc referenced in the comments - > http://www.w3.org/TR/REC-html40/sgml/entities.html - which says: > > <!ENTITY hellip CDATA "…" -- horizontal ellipsis = three dot leader, > U+2026 ISOpub --> > > both … and … should be encoded to this ellipsis > character. However, this is not the case. > > You can get a minimal testcase of this error by doing just a > StringBean on the entity solely: > > Parser parser = new Parser(); > parser.setInputHTML("…"); > StringBean sb = new StringBean(); > parser.visitAllNodesWith(sb); > System.out.println(sb.getStrings()); > > I've worked out that it's located in Translate.decode(..), casting int > to char. This testcase shows that it doesn't work: > > int num = 8230; > char c = (char)num; > System.out.println(c); > > It would seem that it's something to do with the casting of the int to > the char that must be platform dependent, and in this case it would > seem incorrect when run on my Linux box. It's highly unlikely that a simple type cast is platform dependent. More likely cause is a character encoding issue. Either the font set that you use on your Linux installation can't render the character correctly, or the character is encoded using a wrong character set. Note that System.out.println uses the platform's default character encoding, which may not support this character. You could instead try to write the character to a file using OutputStreamWriter with UTF-8 as encoding. Then try open this file in a browser and make sure that it renders the file as UTF-8. -- Arjohn Kampman, Senior Software Engineer Aduna - Guided Exploration www.aduna-software.com |
From: Ian M. <ia...@ia...> - 2007-04-20 16:48:43
|
I have encountered an interesting issue with the encoding of the character … On windows, the character is correctly encoded to the three dot character. However, on a Linux system it gets encoded to (this might not come out right in the email) this: =E2?=A6 According to the W3C doc referenced in the comments - http://www.w3.org/TR/REC-html40/sgml/entities.html - which says: <!ENTITY hellip CDATA "…" -- horizontal ellipsis =3D three dot lead= er, U+2026 ISOpub --> both … and … should be encoded to this ellipsis character. However, this is not the case. You can get a minimal testcase of this error by doing just a StringBean on the entity solely: Parser parser =3D new Parser(); parser.setInputHTML("…"); StringBean sb =3D new StringBean(); parser.visitAllNodesWith(sb); System.out.println(sb.getStrings()); I've worked out that it's located in Translate.decode(..), casting int to char. This testcase shows that it doesn't work: int num =3D 8230; char c =3D (char)num; System.out.println(c); It would seem that it's something to do with the casting of the int to the char that must be platform dependent, and in this case it would seem incorrect when run on my Linux box. I'd welcome any suggestions people have as to how to fix this. Thanks Ian Macfarlane |
From: Derrick O. <der...@ro...> - 2007-02-27 00:09:45
|
=0AGood idea.=0A=0APerhaps visibility isn't so difficult.=0AAdding a bunch = of (static) properties on the Parser that just wiggle the underlying static= values might work.=0A=0ADerrick=0A=0A----- Original Message ----i=0AFrom: = Ian Macfarlane <ian...@gm...>=0ATo: htmlparser-developer@lists.s= ourceforge.net=0ASent: Monday, February 26, 2007 10:37:07 AM=0ASubject: [Ht= mlparser-developer] Lexer STRICT_REMARKS=0A=0AI'm wondering if STRICT_REMAR= KS in Lexer should default to false=0Arather than true - that way it would = parse more closely to how=0Abrowsers parse it, which are generally forgivin= g about these things.=0AThis would stop people wondering why it works in th= eir browser but not=0Ain the HTML Parser. Those who want true strict parsin= g could choose to=0Aenable this.=0A=0AThe real issue is probably more the v= isibility of this setting - it=0Awould perhaps be better if we had a centra= lised way for switching=0Abetween standards compliance and being forgiving = - equivalent to=0A'quirks mode' and 'standards compliance mode' of browsers= , and set=0Athis across the entire parser. I imagine this would be quite a = task=0Ahowever.=0A=0AIan=0A=0A---------------------------------------------= ----------------------------=0ATake Surveys. Earn Cash. Influence the Futur= e of IT=0AJoin SourceForge.net's Techsay panel and you'll get the chance to= share your=0Aopinions on IT & business topics through brief surveys-and ea= rn cash=0Ahttp://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforg= e&CID=3DDEVDEV=0A_______________________________________________=0AHtmlpars= er-developer mailing list=0AH...@li...=0Aht= tps://lists.sourceforge.net/lists/listinfo/htmlparser-developer=0A=0A=0A=0A= =0A |
From: Ian M. <ian...@gm...> - 2007-02-26 15:37:14
|
I'm wondering if STRICT_REMARKS in Lexer should default to false rather than true - that way it would parse more closely to how browsers parse it, which are generally forgiving about these things. This would stop people wondering why it works in their browser but not in the HTML Parser. Those who want true strict parsing could choose to enable this. The real issue is probably more the visibility of this setting - it would perhaps be better if we had a centralised way for switching between standards compliance and being forgiving - equivalent to 'quirks mode' and 'standards compliance mode' of browsers, and set this across the entire parser. I imagine this would be quite a task however. Ian |
From: Axel <ax...@gm...> - 2007-01-26 17:13:45
|
On 1/26/07, Derrick Oswald <Der...@ro...> wrote: > Hi Axel, > > Please log this as a feature request. > http://sourceforge.net/tracker/?group_id=24399&atid=381402 Done. See #1645471 -- Axel Kramer WikiBlog: http://www.groovy-news.org/e/page/axelclk |
From: Derrick O. <Der...@Ro...> - 2007-01-26 12:36:54
|
Hi Axel, Please log this as a feature request. http://sourceforge.net/tracker/?group_id=24399&atid=381402 Derrick Axel wrote: >Hi > >I've asked this already some time ago. >I would like to use some extra information in the tag nodes. Therefore >I inserted the following methods in the Tag and TagNode classes. >Does it make sense to have this simple methods in the standard >htmlparser libraries? > >Or is it better to have something like a >Tag#setExtraAttribute (String key, Object value) and >Object Tag#getExtraAttribute (String key) >methods which could store additional information per Tag as an special >attribute? > >****** Insert in file Tag.java ****** > /** > * Get the customer information for this tag > * @return > */ > public Object getCustomerInfo(); > /** > * Set the customer information for this tag if necessary > */ > public void setCustomerInfo(Object customerInfo); > >****** Insert in file TagNode.java ****** > private Object mCustomerInfo=null; > public Object getCustomerInfo(){ > return mCustomerInfo; > } > > public void setCustomerInfo(Object customerInfo){ > mCustomerInfo= customerInfo; > } > > > |
From: Axel <ax...@gm...> - 2007-01-20 16:15:12
|
Hi I've asked this already some time ago. I would like to use some extra information in the tag nodes. Therefore I inserted the following methods in the Tag and TagNode classes. Does it make sense to have this simple methods in the standard htmlparser libraries? Or is it better to have something like a Tag#setExtraAttribute (String key, Object value) and Object Tag#getExtraAttribute (String key) methods which could store additional information per Tag as an special attribute? ****** Insert in file Tag.java ****** /** * Get the customer information for this tag * @return */ public Object getCustomerInfo(); /** * Set the customer information for this tag if necessary */ public void setCustomerInfo(Object customerInfo); ****** Insert in file TagNode.java ****** private Object mCustomerInfo=null; public Object getCustomerInfo(){ return mCustomerInfo; } public void setCustomerInfo(Object customerInfo){ mCustomerInfo= customerInfo; } -- Axel Kramer WikiBlog: http://www.groovy-news.org/e/page/axelclk |