htmlparser-developer Mailing List for HTML Parser (Page 2)

Brought to you by: derrickoswald

htmlparser-developer — The developer mailing list of the htmlparser project

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (1)	Dec (4)
2002	Jan (12)	Feb	Mar (7)	Apr (27)	May (14)	Jun (16)	Jul (27)	Aug (74)	Sep (1)	Oct (23)	Nov (12)	Dec (119)
2003	Jan (31)	Feb (23)	Mar (28)	Apr (59)	May (119)	Jun (10)	Jul (3)	Aug (17)	Sep (8)	Oct (38)	Nov (6)	Dec (1)
2004	Jan (4)	Feb (4)	Mar (1)	Apr (2)	May	Jun (7)	Jul (6)	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan	Feb (1)	Mar	Apr (8)	May	Jun	Jul	Aug (2)	Sep (10)	Oct (4)	Nov (15)	Dec
2006	Jan	Feb (1)	Mar	Apr (4)	May (11)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2007	Jan (3)	Feb (2)	Mar	Apr (2)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (5)	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr (2)	May	Jun (4)	Jul	Aug (1)	Sep	Oct	Nov	Dec (2)
2010	Jan (1)	Feb	Mar	Apr (8)	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (1)	Dec
2011	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (2)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov (2)	Dec (2)

Flat | Threaded

<< < 1 2 3 4 .. 33 > >> (Page 2 of 33)

Re: [Htmlparser-developer] removing navigation bar from parse output

From: Derrick O. <der...@gm...> - 2009-12-16 05:33:46

Nav bars are usually identified by a DIV tag with a special Id which you
could filter on with a TagNameFilter *and* a HasAttribute filter.
A simplistic approach would just delete the node from it's parent, which may
work in your case.

On Tue, Dec 15, 2009 at 10:56 PM, Ted Yu <yuz...@gm...> wrote:

> Hi,
> Is it possible to remove navigation bar of web page (common on web portals)
> from parse output ?
>
> The motivation is to provide more focused contents for page categorization.
>
> Thanks
>
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and
> easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>

[Htmlparser-developer] removing navigation bar from parse output

From: Ted Yu <yuz...@gm...> - 2009-12-15 21:56:26

Hi,
Is it possible to remove navigation bar of web page (common on web portals)
from parse output ?

The motivation is to provide more focused contents for page categorization.

Thanks

[Htmlparser-developer] Student asking for your help in open source survey

From: Rod H. <hil...@re...> - 2009-08-28 20:32:45

Hello,

I'll make this short and sweet.

1) I'm a graduate student.
2) I'm doing research on development practices in the open source  
community.
3) I would like to use HTML Parser as part of my research.

I'd be eternally grateful if everyone who has ever contributed code to  
this project could fill out my survey.  It's three questions long, and  
you can do the whole thing with just the mouse.

Please take a few seconds and help me complete my research, I'd really  
appreciate it.

The survey is here:
http://spreadsheets.google.com/viewform?formkey=cHU2aHo5bS14cE04c2gzWGlhaUpQSHc6MA 
..

Thank you,
Rod Hilton

P.S. If you are involved with multiple open source projects and seeing  
this message on other mailing lists, first let me apologize for being  
so annoying, but second let me ask you to please fill out the survey  
again, once for each project.  Thanks!

Re: [Htmlparser-developer] how to login website with cookie

From: Derrick O. <der...@gm...> - 2009-06-30 17:14:20

An explaination of how to use POST is available on the FAQ page:
http://htmlparser.sourceforge.net/faq.html#post

2009/6/30 Marco Yeung <yeu...@ho...>:
> If there is a login form (POST).  How can we use htmlParser to login and get
> the cookie before passing it to parser?
>
>
>> Date: Sun, 28 Jun 2009 19:24:58 +0200
>> From: der...@gm...
>> To: htm...@li...
>> Subject: Re: [Htmlparser-developer] how to login website with cookie
>>
>> Try looking at the thread "Unable to enable cookies" in the Help forum:
>> https://sourceforge.net/forum/message.php?msg_id=5852557
>>
>> basically:
>> parser.getConnectionManager().setRedirectionProcessingEnabled(true);
>> parser.getConnectionManager().setCookieProcessingEnabled(true);
>>
>> then you need to set the cookie before fetching the page:
>>
>> parser.getConnectionManager().setCookie (Cookie cookie, String domain)
>>
>> where you have alreday made a cookie, see org.htmlparser.http.Cookie
>> and domain is like "google.com"
>>
>>
>> 2009/6/28 Marco Yeung <yeu...@ho...>:
>> > Hi,
>> >
>> > I want to HTMLParser to parse the content of a webpage that requires
>> > login
>> > and maintains cookies.
>> > How can I use htmlparser to do that ?
>> >
>> >
>> > Rdgs
>> > Marco
>> >
>> > ________________________________
>> > Invite your mail contacts to join your friends list with Windows Live
>> > Spaces. It's easy! Try it!
>> >
>> > ------------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > Htmlparser-developer mailing list
>> > Htm...@li...
>> > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>> >
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Htmlparser-developer mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
> ________________________________
> See all the ways you can stay connected to friends and family
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>

Re: [Htmlparser-developer] how to login website with cookie

From: Marco Y. <yeu...@ho...> - 2009-06-30 16:54:56

If there is a login form (POST).  How can we use htmlParser to login and get the cookie before passing it to parser?


 
> Date: Sun, 28 Jun 2009 19:24:58 +0200
> From: der...@gm...
> To: htm...@li...
> Subject: Re: [Htmlparser-developer] how to login website with cookie
> 
> Try looking at the thread "Unable to enable cookies" in the Help forum:
> https://sourceforge.net/forum/message.php?msg_id=5852557
> 
> basically:
> parser.getConnectionManager().setRedirectionProcessingEnabled(true);
> parser.getConnectionManager().setCookieProcessingEnabled(true);
> 
> then you need to set the cookie before fetching the page:
> 
> parser.getConnectionManager().setCookie (Cookie cookie, String domain)
> 
> where you have alreday made a cookie, see org.htmlparser.http.Cookie
> and domain is like "google.com"
> 
> 
> 2009/6/28 Marco Yeung <yeu...@ho...>:
> > Hi,
> >
> > I want to HTMLParser to parse the content of a webpage that requires login
> > and maintains cookies.
> > How can I use htmlparser to do that ?
> >
> >
> > Rdgs
> > Marco
> >
> > ________________________________
> > Invite your mail contacts to join your friends list with Windows Live
> > Spaces. It's easy! Try it!
> > ------------------------------------------------------------------------------
> >
> > _______________________________________________
> > Htmlparser-developer mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
> >
> >
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer

_________________________________________________________________
Show them the way! Add maps and directions to your party invites. 
http://www.microsoft.com/windows/windowslive/products/events.aspx

Re: [Htmlparser-developer] how to login website with cookie

From: Derrick O. <der...@gm...> - 2009-06-28 17:25:02

Try looking at the thread "Unable to enable cookies" in the Help forum:
https://sourceforge.net/forum/message.php?msg_id=5852557

basically:
parser.getConnectionManager().setRedirectionProcessingEnabled(true);
parser.getConnectionManager().setCookieProcessingEnabled(true);

then you need to set the cookie before fetching the page:

parser.getConnectionManager().setCookie (Cookie cookie, String domain)

where you have alreday made a cookie, see org.htmlparser.http.Cookie
and domain is like "google.com"


2009/6/28 Marco Yeung <yeu...@ho...>:
> Hi,
>
> I want to HTMLParser to parse the content of a webpage that requires login
> and maintains cookies.
> How can I use htmlparser to do that ?
>
>
> Rdgs
> Marco
>
> ________________________________
> Invite your mail contacts to join your friends list with Windows Live
> Spaces. It's easy! Try it!
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>

[Htmlparser-developer] how to login website with cookie

From: Marco Y. <yeu...@ho...> - 2009-06-28 10:48:12

Hi,

 

I want to HTMLParser to parse the content of a webpage that requires login and maintains cookies.

How can I use htmlparser to do that ?

 

 

Rdgs
Marco

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us

Re: [Htmlparser-developer] thread safety

From: Matt K. <mat...@gm...> - 2009-04-08 14:30:56

I just ask because when i opened it up to 10 or so threads I was
getting intermittent exceptions.... It very well could be my code
though I haven't had time to debug.

On 4/8/09, Ian Macfarlane <ia...@ia...> wrote:
> I can't give a 100% confident answer, but I've used it on a
> many-threaded application in the past with no problems. Can't be sure
> that the parts you'll exercise will be thread safe though.
>
> Ian
>
> 2009/2/25 Matt Kirkley <mat...@gm...>:
>> Hi, I was wondering if the html-parser libraries are thread safe?
>>
>> ------------------------------------------------------------------------------
>> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco,
>> CA
>> -OSBC tackles the biggest issue in open source: Open Sourcing the
>> Enterprise
>> -Strategies to boost innovation and cut costs with open source
>> participation
>> -Receive a $600 discount off the registration fee with the source code:
>> SFAD
>> http://p.sf.net/sfu/XcvMzF8H
>> _______________________________________________
>> Htmlparser-developer mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>>
>>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> High Quality Requirements in a Collaborative Environment.
> Download a free trial of Rational Requirements Composer Now!
> http://p.sf.net/sfu/www-ibm-com
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>

Re: [Htmlparser-developer] thread safety

From: Ian M. <ia...@ia...> - 2009-04-08 07:54:01

I can't give a 100% confident answer, but I've used it on a
many-threaded application in the past with no problems. Can't be sure
that the parts you'll exercise will be thread safe though.

Ian

2009/2/25 Matt Kirkley <mat...@gm...>:
> Hi, I was wondering if the html-parser libraries are thread safe?
>
> ------------------------------------------------------------------------------
> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
> -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
> -Strategies to boost innovation and cut costs with open source participation
> -Receive a $600 discount off the registration fee with the source code: SFAD
> http://p.sf.net/sfu/XcvMzF8H
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>

[Htmlparser-developer] thread safety

From: Matt K. <mat...@gm...> - 2009-02-25 14:47:27

Hi, I was wondering if the html-parser libraries are thread safe?

[Htmlparser-developer] table tag does not break flow?

From: Leos L. <lit...@ce...> - 2008-10-07 19:30:23

Hi,

I use htmlparser 2 and it is a great tool. I use it to protect the text 
entered by users in my website www.abclinuxu.cz (allowed tags, 
attributes, basic XSS) and for simple transformations. I plan to unify 
these use cases in HtmlPurifier library.

It is a pity, that it is not actively developed, because many users 
would worry to start using it. When I am seeking for a new library, this 
is my standard policy.

Well, my question is why TABLE tag is not in Hashtable breakTags in 
TagNode.java? It is a block element. I also miss there TBODY, THEAD and 
TFOOT tags.

Leos

Re: [Htmlparser-developer] Working with HTMLParser code in Eclipse

From: Ian M. <ia...@ia...> - 2008-09-09 08:28:20

Ah yes!

By the way, we need to update the link on the site homepage
(http://htmlparser.sourceforge.net/) as it currently links to
http://svn.sourceforge.net/viewvc/viewcvs.cgi/htmlparser/ which
doesn't work.

Regards

Ian

2008/9/9 Derrick Oswald <der...@ro...>:
> How about the SVN Browse...
> http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/
>
> The archive you mention was a mailing list that issued an email whenever a
> file changed. Is that what you want?
> I don't know how to do that... yet.
>
> ----- Original Message ----
> From: Ian Macfarlane <ia...@ia...>
> To: The developer mailing list of the htmlparser project
> <htm...@li...>
> Sent: Monday, September 8, 2008 11:57:51 AM
> Subject: Re: [Htmlparser-developer] Working with HTMLParser code in Eclipse
>
> Hi Derrick,
>
> Thanks - this is the approach I eventually worked out I needed to do,
> and I've successfully managed to set it up. I've committed a new tag,
> bug fix, and also fixed two tests which were failing but shouldn't be.
>
> Just a thought - when we were using CVS we had a web interface at
> http://sourceforge.net/mailarchive/forum.php?forum_name=htmlparser-cvs
> - is it possible to set up a similar one for SVN?
>
> Ian
>
> 2008/9/5 Derrick Oswald <der...@ro...>:
>> I'm not sure about the exact steps in Eclipse, but the two projects under
>> trunk\lexer and trunk\parser should be all that is required.
>> These build the two jar files that the other applications use.
>> You should be able to set up two projects - htmlparser.jar depends on
>> htmllexer.jar.
>>
>> ----- Original Message ----
>> From: Ian Macfarlane <ia...@ia...>
>> To: htm...@li...
>> Sent: Thursday, September 4, 2008 10:26:54 AM
>> Subject: [Htmlparser-developer] Working with HTMLParser code in Eclipse
>>
>> Can anyone tell me how to set up the HTMLParser code base in Eclipse?
>> When it was in CVS, it was all within a single directory, but since it
>> moved to SVN it's all in different directories, so I'm not quite sure
>> how to set up HTMLParser for this.
>>
>> Just to clarify - this is the HTMLParser code itself, not code using
>> HTMLParser.
>>
>> Thanks
>>
>> Ian Macfarlane
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win great
>> prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the
>> world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> Htmlparser-developer mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win great
>> prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the
>> world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> Htmlparser-developer mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>>
>>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>

Re: [Htmlparser-developer] Working with HTMLParser code in Eclipse

From: Derrick O. <der...@ro...> - 2008-09-08 23:57:28

How about the SVN Browse...   http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/


The archive you mention was a mailing list that issued an email whenever a file changed. Is that what you want?
I don't know how to do that... yet.


----- Original Message ----
From: Ian Macfarlane <ia...@ia...>
To: The developer mailing list of the htmlparser project <htm...@li...>
Sent: Monday, September 8, 2008 11:57:51 AM
Subject: Re: [Htmlparser-developer] Working with HTMLParser code in Eclipse

Hi Derrick,

Thanks - this is the approach I eventually worked out I needed to do,
and I've successfully managed to set it up. I've committed a new tag,
bug fix, and also fixed two tests which were failing but shouldn't be.

Just a thought - when we were using CVS we had a web interface at
http://sourceforge.net/mailarchive/forum.php?forum_name=htmlparser-cvs
- is it possible to set up a similar one for SVN?

Ian

2008/9/5 Derrick Oswald <der...@ro...>:
> I'm not sure about the exact steps in Eclipse, but the two projects under
> trunk\lexer and trunk\parser should be all that is required.
> These build the two jar files that the other applications use.
> You should be able to set up two projects - htmlparser.jar depends on
> htmllexer.jar.
>
> ----- Original Message ----
> From: Ian Macfarlane <ia...@ia...>
> To: htm...@li...
> Sent: Thursday, September 4, 2008 10:26:54 AM
> Subject: [Htmlparser-developer] Working with HTMLParser code in Eclipse
>
> Can anyone tell me how to set up the HTMLParser code base in Eclipse?
> When it was in CVS, it was all within a single directory, but since it
> moved to SVN it's all in different directories, so I'm not quite sure
> how to set up HTMLParser for this.
>
> Just to clarify - this is the HTMLParser code itself, not code using
> HTMLParser.
>
> Thanks
>
> Ian Macfarlane
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer

Re: [Htmlparser-developer] Working with HTMLParser code in Eclipse

From: Ian M. <ia...@ia...> - 2008-09-08 15:57:59

Hi Derrick,

Thanks - this is the approach I eventually worked out I needed to do,
and I've successfully managed to set it up. I've committed a new tag,
bug fix, and also fixed two tests which were failing but shouldn't be.

Just a thought - when we were using CVS we had a web interface at
http://sourceforge.net/mailarchive/forum.php?forum_name=htmlparser-cvs
- is it possible to set up a similar one for SVN?

Ian

2008/9/5 Derrick Oswald <der...@ro...>:
> I'm not sure about the exact steps in Eclipse, but the two projects under
> trunk\lexer and trunk\parser should be all that is required.
> These build the two jar files that the other applications use.
> You should be able to set up two projects - htmlparser.jar depends on
> htmllexer.jar.
>
> ----- Original Message ----
> From: Ian Macfarlane <ia...@ia...>
> To: htm...@li...
> Sent: Thursday, September 4, 2008 10:26:54 AM
> Subject: [Htmlparser-developer] Working with HTMLParser code in Eclipse
>
> Can anyone tell me how to set up the HTMLParser code base in Eclipse?
> When it was in CVS, it was all within a single directory, but since it
> moved to SVN it's all in different directories, so I'm not quite sure
> how to set up HTMLParser for this.
>
> Just to clarify - this is the HTMLParser code itself, not code using
> HTMLParser.
>
> Thanks
>
> Ian Macfarlane
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>

Re: [Htmlparser-developer] Working with HTMLParser code in Eclipse

From: Derrick O. <der...@ro...> - 2008-09-05 10:42:23

I'm not sure about the exact steps in Eclipse, but the two projects under trunk\lexer and trunk\parser should be all that is required.
These build the two jar files that the other applications use.
You should be able to set up two projects - htmlparser.jar depends on htmllexer.jar.

----- Original Message ----
From: Ian Macfarlane <ia...@ia...>
To: htm...@li...
Sent: Thursday, September 4, 2008 10:26:54 AM
Subject: [Htmlparser-developer] Working with HTMLParser code in Eclipse

Can anyone tell me how to set up the HTMLParser code base in Eclipse?
When it was in CVS, it was all within a single directory, but since it
moved to SVN it's all in different directories, so I'm not quite sure
how to set up HTMLParser for this.

Just to clarify - this is the HTMLParser code itself, not code using HTMLParser.

Thanks

Ian Macfarlane

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer

[Htmlparser-developer] Working with HTMLParser code in Eclipse

From: Ian M. <ia...@ia...> - 2008-09-04 14:26:59

Can anyone tell me how to set up the HTMLParser code base in Eclipse?
When it was in CVS, it was all within a single directory, but since it
moved to SVN it's all in different directories, so I'm not quite sure
how to set up HTMLParser for this.

Just to clarify - this is the HTMLParser code itself, not code using HTMLParser.

Thanks

Ian Macfarlane

[Htmlparser-developer] <DIV> and LinkTag

From: eugene k. <ku...@ya...> - 2008-02-01 19:44:26

<table cellspacing='0' cellpadding='0' border='0' ><tr><td style="font: inherit;"><br>I have noticed that &lt;div&gt; tag inside LinkTag is assumed as end tag (ender). From what I read in the specs &lt;div&gt; is allowed inside such as &lt;a&gt; &lt;div&gt; &lt;/div&gt;&lt;/a&gt;. <br><br>For example the page from www.google.com places &lt;div&gt; as above.<br><br>Greetings<br>Eugene<br><br></td></tr></table><br>
      <hr size=1>Looking for last minute shopping deals? <a href="http://us.rd.yahoo.com/evt=51734/*http://tools.search.yahoo.com/newsearch/category.php?category=shopping"> 
Find them fast with Yahoo! Search.</a>

Re: [Htmlparser-developer] Incorrect encoding of … on Linux systems

From: Nie H. <ked...@16...> - 2007-07-18 03:37:17

DQotLS0tLSBPcmlnaW5hbCBNZXNzYWdlIC0tLS0tIA0KRnJvbTogIklhbiBNYWNmYXJsYW5lIiA8
aWFuQGlhbm1hY2ZhcmxhbmUuY29tPg0KVG86IDxodG1scGFyc2VyLWRldmVsb3BlckBsaXN0cy5z
b3VyY2Vmb3JnZS5uZXQ+DQpTZW50OiBTYXR1cmRheSwgQXByaWwgMjEsIDIwMDcgMTI6NDggQU0N
ClN1YmplY3Q6IFtIdG1scGFyc2VyLWRldmVsb3Blcl0gSW5jb3JyZWN0IGVuY29kaW5nIG9mICYj
ODIzMDtvbiBMaW51eCBzeXN0ZW1zDQoNCg0KSSBoYXZlIGVuY291bnRlcmVkIGFuIGludGVyZXN0
aW5nIGlzc3VlIHdpdGggdGhlIGVuY29kaW5nIG9mIHRoZQ0KY2hhcmFjdGVyICYjODIzMDsNCg0K
T24gd2luZG93cywgdGhlIGNoYXJhY3RlciBpcyBjb3JyZWN0bHkgZW5jb2RlZCB0byB0aGUgdGhy
ZWUgZG90DQpjaGFyYWN0ZXIuIEhvd2V2ZXIsIG9uIGEgTGludXggc3lzdGVtIGl0IGdldHMgZW5j
b2RlZCB0byAodGhpcyBtaWdodA0Kbm90IGNvbWUgb3V0IHJpZ2h0IGluIHRoZSBlbWFpbCkgdGhp
czog4j+mDQoNCkFjY29yZGluZyB0byB0aGUgVzNDIGRvYyByZWZlcmVuY2VkIGluIHRoZSBjb21t
ZW50cyAtDQpodHRwOi8vd3d3LnczLm9yZy9UUi9SRUMtaHRtbDQwL3NnbWwvZW50aXRpZXMuaHRt
bCAtIHdoaWNoIHNheXM6DQoNCjwhRU5USVRZIGhlbGxpcCAgIENEQVRBICImIzgyMzA7IiAtLSBo
b3Jpem9udGFsIGVsbGlwc2lzID0gdGhyZWUgZG90IGxlYWRlciwNCiAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICBVKzIwMjYgSVNPcHViICAtLT4NCg0KYm90aCAmaGVsbGlwOyBh
bmQgJiM4MjMwOyBzaG91bGQgYmUgZW5jb2RlZCB0byB0aGlzIGVsbGlwc2lzDQpjaGFyYWN0ZXIu
IEhvd2V2ZXIsIHRoaXMgaXMgbm90IHRoZSBjYXNlLg0KDQpZb3UgY2FuIGdldCBhIG1pbmltYWwg
dGVzdGNhc2Ugb2YgdGhpcyBlcnJvciBieSBkb2luZyBqdXN0IGENClN0cmluZ0JlYW4gb24gdGhl
IGVudGl0eSBzb2xlbHk6DQoNClBhcnNlciBwYXJzZXIgPSBuZXcgUGFyc2VyKCk7DQpwYXJzZXIu
c2V0SW5wdXRIVE1MKCImIzgyMzA7Iik7DQpTdHJpbmdCZWFuIHNiID0gbmV3IFN0cmluZ0JlYW4o
KTsNCnBhcnNlci52aXNpdEFsbE5vZGVzV2l0aChzYik7DQpTeXN0ZW0ub3V0LnByaW50bG4oc2Iu
Z2V0U3RyaW5ncygpKTsNCg0KSSd2ZSB3b3JrZWQgb3V0IHRoYXQgaXQncyBsb2NhdGVkIGluIFRy
YW5zbGF0ZS5kZWNvZGUoLi4pLCBjYXN0aW5nIGludA0KdG8gY2hhci4gVGhpcyB0ZXN0Y2FzZSBz
aG93cyB0aGF0IGl0IGRvZXNuJ3Qgd29yazoNCg0KaW50IG51bSA9IDgyMzA7DQpjaGFyIGMgPSAo
Y2hhciludW07DQpTeXN0ZW0ub3V0LnByaW50bG4oYyk7DQoNCkl0IHdvdWxkIHNlZW0gdGhhdCBp
dCdzIHNvbWV0aGluZyB0byBkbyB3aXRoIHRoZSBjYXN0aW5nIG9mIHRoZSBpbnQgdG8NCnRoZSBj
aGFyIHRoYXQgbXVzdCBiZSBwbGF0Zm9ybSBkZXBlbmRlbnQsIGFuZCBpbiB0aGlzIGNhc2UgaXQg
d291bGQNCnNlZW0gaW5jb3JyZWN0IHdoZW4gcnVuIG9uIG15IExpbnV4IGJveC4NCg0KSSdkIHdl
bGNvbWUgYW55IHN1Z2dlc3Rpb25zIHBlb3BsZSBoYXZlIGFzIHRvIGhvdyB0byBmaXggdGhpcy4N
Cg0KVGhhbmtzDQoNCklhbiBNYWNmYXJsYW5lDQoNCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NClRoaXMgU0Yu
bmV0IGVtYWlsIGlzIHNwb25zb3JlZCBieSBEQjIgRXhwcmVzcw0KRG93bmxvYWQgREIyIEV4cHJl
c3MgQyAtIHRoZSBGUkVFIHZlcnNpb24gb2YgREIyIGV4cHJlc3MgYW5kIHRha2UNCmNvbnRyb2wg
b2YgeW91ciBYTUwuIE5vIGxpbWl0cy4gSnVzdCBkYXRhLiBDbGljayB0byBnZXQgaXQgbm93Lg0K
aHR0cDovL3NvdXJjZWZvcmdlLm5ldC9wb3dlcmJhci9kYjIvDQpfX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fXw0KSHRtbHBhcnNlci1kZXZlbG9wZXIgbWFpbGlu
ZyBsaXN0DQpIdG1scGFyc2VyLWRldmVsb3BlckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQNCmh0dHBz
Oi8vbGlzdHMuc291cmNlZm9yZ2UubmV0L2xpc3RzL2xpc3RpbmZvL2h0bWxwYXJzZXItZGV2ZWxv
cGVyDQo=

Re: [Htmlparser-developer] Incorrect encoding of … on Linux systems

From: Arjohn K. <arj...@ad...> - 2007-04-21 05:16:22

Ian Macfarlane wrote:
> I have encountered an interesting issue with the encoding of the
> character &#8230;
> 
> On windows, the character is correctly encoded to the three dot
> character. However, on a Linux system it gets encoded to (this might
> not come out right in the email) this: â?¦
> 
> According to the W3C doc referenced in the comments -
> http://www.w3.org/TR/REC-html40/sgml/entities.html - which says:
> 
> <!ENTITY hellip   CDATA "&#8230;" -- horizontal ellipsis = three dot leader,
>                                      U+2026 ISOpub  -->
> 
> both &hellip; and &#8230; should be encoded to this ellipsis
> character. However, this is not the case.
> 
> You can get a minimal testcase of this error by doing just a
> StringBean on the entity solely:
> 
> Parser parser = new Parser();
> parser.setInputHTML("&#8230;");
> StringBean sb = new StringBean();
> parser.visitAllNodesWith(sb);
> System.out.println(sb.getStrings());
> 
> I've worked out that it's located in Translate.decode(..), casting int
> to char. This testcase shows that it doesn't work:
> 
> int num = 8230;
> char c = (char)num;
> System.out.println(c);
> 
> It would seem that it's something to do with the casting of the int to
> the char that must be platform dependent, and in this case it would
> seem incorrect when run on my Linux box.

It's highly unlikely that a simple type cast is platform dependent. More
likely cause is a character encoding issue. Either the font set that you
use on your Linux installation can't render the character correctly, or
the character is encoded using a wrong character set. Note that
System.out.println uses the platform's default character encoding, which
may not support this character.

You could instead try to write the character to a file using
OutputStreamWriter with UTF-8 as encoding. Then try open this file in a
browser and make sure that it renders the file as UTF-8.

-- 
Arjohn Kampman, Senior Software Engineer
Aduna - Guided Exploration
www.aduna-software.com

[Htmlparser-developer] Incorrect encoding of … on Linux systems

From: Ian M. <ia...@ia...> - 2007-04-20 16:48:43

I have encountered an interesting issue with the encoding of the
character &#8230;

On windows, the character is correctly encoded to the three dot
character. However, on a Linux system it gets encoded to (this might
not come out right in the email) this: =E2?=A6

According to the W3C doc referenced in the comments -
http://www.w3.org/TR/REC-html40/sgml/entities.html - which says:

<!ENTITY hellip   CDATA "&#8230;" -- horizontal ellipsis =3D three dot lead=
er,
                                     U+2026 ISOpub  -->

both &hellip; and &#8230; should be encoded to this ellipsis
character. However, this is not the case.

You can get a minimal testcase of this error by doing just a
StringBean on the entity solely:

Parser parser =3D new Parser();
parser.setInputHTML("&#8230;");
StringBean sb =3D new StringBean();
parser.visitAllNodesWith(sb);
System.out.println(sb.getStrings());

I've worked out that it's located in Translate.decode(..), casting int
to char. This testcase shows that it doesn't work:

int num =3D 8230;
char c =3D (char)num;
System.out.println(c);

It would seem that it's something to do with the casting of the int to
the char that must be platform dependent, and in this case it would
seem incorrect when run on my Linux box.

I'd welcome any suggestions people have as to how to fix this.

Thanks

Ian Macfarlane

Re: [Htmlparser-developer] Lexer STRICT_REMARKS

From: Derrick O. <der...@ro...> - 2007-02-27 00:09:45

=0AGood idea.=0A=0APerhaps visibility isn't so difficult.=0AAdding a bunch =
of (static) properties on the Parser that just wiggle the underlying static=
 values might work.=0A=0ADerrick=0A=0A----- Original Message ----i=0AFrom: =
Ian Macfarlane <ian...@gm...>=0ATo: htmlparser-developer@lists.s=
ourceforge.net=0ASent: Monday, February 26, 2007 10:37:07 AM=0ASubject: [Ht=
mlparser-developer] Lexer STRICT_REMARKS=0A=0AI'm wondering if STRICT_REMAR=
KS in Lexer should default to false=0Arather than true - that way it would =
parse more closely to how=0Abrowsers parse it, which are generally forgivin=
g about these things.=0AThis would stop people wondering why it works in th=
eir browser but not=0Ain the HTML Parser. Those who want true strict parsin=
g could choose to=0Aenable this.=0A=0AThe real issue is probably more the v=
isibility of this setting - it=0Awould perhaps be better if we had a centra=
lised way for switching=0Abetween standards compliance and being forgiving =
- equivalent to=0A'quirks mode' and 'standards compliance mode' of browsers=
, and set=0Athis across the entire parser. I imagine this would be quite a =
task=0Ahowever.=0A=0AIan=0A=0A---------------------------------------------=
----------------------------=0ATake Surveys. Earn Cash. Influence the Futur=
e of IT=0AJoin SourceForge.net's Techsay panel and you'll get the chance to=
 share your=0Aopinions on IT & business topics through brief surveys-and ea=
rn cash=0Ahttp://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforg=
e&CID=3DDEVDEV=0A_______________________________________________=0AHtmlpars=
er-developer mailing list=0AH...@li...=0Aht=
tps://lists.sourceforge.net/lists/listinfo/htmlparser-developer=0A=0A=0A=0A=
=0A

[Htmlparser-developer] Lexer STRICT_REMARKS

From: Ian M. <ian...@gm...> - 2007-02-26 15:37:14

I'm wondering if STRICT_REMARKS in Lexer should default to false
rather than true - that way it would parse more closely to how
browsers parse it, which are generally forgiving about these things.
This would stop people wondering why it works in their browser but not
in the HTML Parser. Those who want true strict parsing could choose to
enable this.

The real issue is probably more the visibility of this setting - it
would perhaps be better if we had a centralised way for switching
between standards compliance and being forgiving - equivalent to
'quirks mode' and 'standards compliance mode' of browsers, and set
this across the entire parser. I imagine this would be quite a task
however.

Ian

Re: [Htmlparser-developer] How to store additional data objects in Tag class?

From: Axel <ax...@gm...> - 2007-01-26 17:13:45

On 1/26/07, Derrick Oswald <Der...@ro...> wrote:
> Hi Axel,
>
> Please log this as a feature request.
> http://sourceforge.net/tracker/?group_id=24399&atid=381402
Done.
See #1645471

-- 
Axel Kramer
WikiBlog: http://www.groovy-news.org/e/page/axelclk

Re: [Htmlparser-developer] How to store additional data objects in Tag class?

From: Derrick O. <Der...@Ro...> - 2007-01-26 12:36:54

Hi Axel,

Please log this as a feature request.
http://sourceforge.net/tracker/?group_id=24399&atid=381402

Derrick

Axel wrote:

>Hi
>
>I've asked this already some time ago.
>I would like to use some extra information in the tag nodes. Therefore
>I inserted the following methods in the Tag and TagNode classes.
>Does it make sense to have this simple methods in the standard
>htmlparser libraries?
>
>Or is it better to have something like a
>Tag#setExtraAttribute (String key, Object value) and
>Object Tag#getExtraAttribute (String key)
>methods which could store additional information per Tag as an special
>attribute?
>
>****** Insert in file Tag.java ******
>	/**
>	 * Get the customer information for this tag
>	 * @return
>	 */
>	public Object getCustomerInfo();
>	/**
>	 * Set the customer information for this tag if necessary
>	 */
>	public void setCustomerInfo(Object customerInfo);
>
>****** Insert in file TagNode.java ******
>    private Object mCustomerInfo=null;
>    public Object getCustomerInfo(){
>    	return mCustomerInfo;
>    }
>
>  	public void setCustomerInfo(Object customerInfo){
>  		mCustomerInfo= customerInfo;
>  	}
>
>  
>

[Htmlparser-developer] How to store additional data objects in Tag class?

From: Axel <ax...@gm...> - 2007-01-20 16:15:12

Hi

I've asked this already some time ago.
I would like to use some extra information in the tag nodes. Therefore
I inserted the following methods in the Tag and TagNode classes.
Does it make sense to have this simple methods in the standard
htmlparser libraries?

Or is it better to have something like a
Tag#setExtraAttribute (String key, Object value) and
Object Tag#getExtraAttribute (String key)
methods which could store additional information per Tag as an special
attribute?

****** Insert in file Tag.java ******
	/**
	 * Get the customer information for this tag
	 * @return
	 */
	public Object getCustomerInfo();
	/**
	 * Set the customer information for this tag if necessary
	 */
	public void setCustomerInfo(Object customerInfo);

****** Insert in file TagNode.java ******
    private Object mCustomerInfo=null;
    public Object getCustomerInfo(){
    	return mCustomerInfo;
    }

  	public void setCustomerInfo(Object customerInfo){
  		mCustomerInfo= customerInfo;
  	}

-- 
Axel Kramer
WikiBlog: http://www.groovy-news.org/e/page/axelclk

14 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 .. 33 > >> (Page 2 of 33)