htmlparser-developer Mailing List for HTML Parser (Page 27)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: <dha...@or...> - 2002-08-16 09:07:58
|
Hey Somik, Thats a lovely idea. We'll definitely try it out. Dhaval -----Original Message----- From: somik [mailto:so...@ya...] Sent: Wednesday, August 14, 2002 8:55 PM To: htmlparser-developer Cc: somik Subject: Re: [Htmlparser-developer] Writing test cases Hi Dhaval, A word of advice, dont break your head over CVS & SSH (unless u have a couple of aspirins). Switch to Eclipse - a free Open Source IDE and the best. :) (www.eclipse.org) It has integration with CVS - suited for team programming, and also has vast no of winner features (not the least of which is refactoring). If you cant make the switch - then go for Tortoise CVS - it integrates with your windows explorer, things get a lot easier to use CVS with SSH after that.. Though - another incentive for using Eclipse- I have developed a Pair Programming plugin for eclipse at http://sangam.sourceforge.net - and I am releasing the next version this weekend. If you use Eclipse, then we can pair program over the internet on htmlparser :). The future is here! Cheers, Somik ------------------------------------------------------- This sf.net email is sponsored by: Dice - The leading online job board for high-tech professionals. Search and apply for tech jobs today! http://seeker.dice.com/seeker.epl?rel_code=31 _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2002-08-14 15:25:00
|
Hi Dhaval, A word of advice, dont break your head over CVS & SSH (unless u have a couple of aspirins). Switch to Eclipse - a free Open Source IDE and the best. :) (www.eclipse.org) It has integration with CVS - suited for team programming, and also has vast no of winner features (not the least of which is refactoring). If you cant make the switch - then go for Tortoise CVS - it integrates with your windows explorer, things get a lot easier to use CVS with SSH after that.. Though - another incentive for using Eclipse- I have developed a Pair Programming plugin for eclipse at http://sangam.sourceforge.net - and I am releasing the next version this weekend. If you use Eclipse, then we can pair program over the internet on htmlparser :). The future is here! Cheers, Somik |
From: <dha...@or...> - 2002-08-14 12:13:49
|
Hi Somik, > be good to have a user-defined controlling mechanism, to choose if it should be system autodetected, or the particular end of line char to be used. Yeah I think thats a good idea. Bytway, I am finding very little time to devote at present (for a couple of weeks) on the parser. If you or other developers can volunteer to work on this, it will really benefit the product and the community. I'm up for it as much as work here allows me to. Dhaval |
From: Somik R. <so...@ya...> - 2002-08-14 12:00:32
|
Hi Dhaval, > Do let me know if you are interested in including it as a > part of the standard library and what I will need to do for that > purpose. I will feel a little bit more confidant if someone goes through > my code since this is my first time. I have changed the tag-scanners > quite a fair bit from the time I last sent them to u. > That would be good. In fact, if you can check it into CVS, then I can work on it the moment I find some time, maybe sometime tomorrow. You need to sign up as a developer at http://sourceforge.net/account/register.php and send me your id. I will then add you as a developer for htmlparser. You can then check in your code directly with CVS. Bytway, thank you for asking so many questions - I have been wanting to put out all this info in the docs, but now that you've brought it all out, its there in the mail archives for others. Of course, we still need good docs :) Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-08-14 10:30:59
|
> I am observing one more strange occurrence in a HTMLStringNode. > > Whenever I have a string between 2 tags whose last character is \n it is > returned to me appended by \r\n. Yes, this is indeed a bug.. I am not even sure if we should autodetect system default. What if, you are creating the parsed file in linux and sending it to windoze machines? We'd still have the nice squares. It would be good to have a user-defined controlling mechanism, to choose if it should be system autodetected, or the particular end of line char to be used. Bytway, I am finding very little time to devote at present (for a couple of weeks) on the parser. If you or other developers can volunteer to work on this, it will really benefit the product and the community. The process is now streamlined, so you can easily make releases by simple using the ant file (in CVS). Also - it is not good for a project that is used by so many to rest on one person. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-14 08:49:53
|
Hi Dhaval, > Thats exactly what happens. Everythign inside <OPTION ..> will be tag > and outside it will be HTMLStringNode however when I ahve to read > another <OPTIOn ....tag> wherein the previous OPTION tag did not have a > closing </OPTION> the later <OPTION....> tag gets read and since it is > once read it is unavailable for scanning again as a new Option tag. > Anyway I seem to have made my testcases work by storing the previous > node value and in case </OPTION> is not present I take care of it > accordingly. I have just added some more test cases to validate its > robustness. For the time being I think its done. Good question. I faced the same thing with several other tags. To counter this issue - you will find a variable in the evaluate() method - previousOpenScanner. Suppose you are trying to search for </OPTION> and encounter a <OPTION> instead, then evaluate actually allows you to do something about it. At that point, you must fool the open scanner into believing that the previous tag got closed. This is exactly whats done in HTMLLinkScanner. On seeing there was a previousOpenScanner, we accept it as true. And in scan(), the end tag (which wasnt there) is returned, putting in a correction, so that the next tag still gets parsed (in elementEnd() positioning). Let me know if you need more help. (You simply cant do this without testcases..) Cheers Somik ----- Original Message ----- From: <dha...@or...> To: <htm...@li...> Sent: Wednesday, August 14, 2002 5:15 PM Subject: RE: [Htmlparser-developer] Writing OPTION tag > Hi Somik, > > Thats exactly what happens. Everythign inside <OPTION ..> will be tag > and outside it will be HTMLStringNode however when I ahve to read > another <OPTIOn ....tag> wherein the previous OPTION tag did not have a > closing </OPTION> the later <OPTION....> tag gets read and since it is > once read it is unavailable for scanning again as a new Option tag. > Anyway I seem to have made my testcases work by storing the previous > node value and in case </OPTION> is not present I take care of it > accordingly. I have just added some more test cases to validate its > robustness. For the time being I think its done. > > Thanx for the response nevertheless. > > Regards, > > Dhaval Udani > Senior Analyst > M-Line, QPEG > OrbiTech Solutions Ltd. > +91-22-8290019 Extn. 1457 > > > > -----Original Message----- > From: somik [mailto:so...@ya...] > Sent: Wednesday, August 14, 2002 1:14 PM > To: htmlparser-developer > Cc: somik > Subject: Re: [Htmlparser-developer] Writing OPTION tag > > > Hi Dhaval, > Sorry, Ive been really swamped.. > > The problem with my input is that <OPTION value="AltaVista Search"> > > would be read as an OptionTag, AltaVista would be read as the > StringNode > > and then <OPTION value="Lycos Search"> would be read and since it is > > neither a StringNode nor an EndTag an OptionTag would be created for > the > > above 2 values. .. > > This idea is incorrect. <OPTION. .... > is a tag. Nothing inside the > Option > tag is a string node. > <OPTION ... > (this is HTMLTag) > some text here sdjklsdjk (this is HTMLStringNode) > </OPTION> (this is HTMLEndTag) > > HTH. > > Cheers, > Somik > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Dice - The leading online job board > for high-tech professionals. Search and apply for tech jobs today! > http://seeker.dice.com/seeker.epl?rel_code=31 > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: <dha...@or...> - 2002-08-14 08:30:30
|
I am observing one more strange occurrence in a HTMLStringNode. Whenever I have a string between 2 tags whose last character is \n it is returned to me appended by \r\n. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 -----Original Message----- From: Udani, Dhaval H. Sent: Wednesday, August 14, 2002 1:45 PM To: htmlparser-developer Cc: Udani, Dhaval H. Subject: RE: [Htmlparser-developer] Writing OPTION tag Hi Somik, Thats exactly what happens. Everythign inside <OPTION ..> will be tag and outside it will be HTMLStringNode however when I ahve to read another <OPTIOn ....tag> wherein the previous OPTION tag did not have a closing </OPTION> the later <OPTION....> tag gets read and since it is once read it is unavailable for scanning again as a new Option tag. Anyway I seem to have made my testcases work by storing the previous node value and in case </OPTION> is not present I take care of it accordingly. I have just added some more test cases to validate its robustness. For the time being I think its done. Thanx for the response nevertheless. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Wednesday, August 14, 2002 1:14 PM To: htmlparser-developer Cc: somik Subject: Re: [Htmlparser-developer] Writing OPTION tag Hi Dhaval, Sorry, Ive been really swamped.. > The problem with my input is that <OPTION value="AltaVista Search"> > would be read as an OptionTag, AltaVista would be read as the StringNode > and then <OPTION value="Lycos Search"> would be read and since it is > neither a StringNode nor an EndTag an OptionTag would be created for the > above 2 values. .. This idea is incorrect. <OPTION. .... > is a tag. Nothing inside the Option tag is a string node. <OPTION ... > (this is HTMLTag) some text here sdjklsdjk (this is HTMLStringNode) </OPTION> (this is HTMLEndTag) HTH. Cheers, Somik ------------------------------------------------------- This sf.net email is sponsored by: Dice - The leading online job board for high-tech professionals. Search and apply for tech jobs today! http://seeker.dice.com/seeker.epl?rel_code=31 _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: <dha...@or...> - 2002-08-14 08:16:58
|
Hi Somik, Thats exactly what happens. Everythign inside <OPTION ..> will be tag and outside it will be HTMLStringNode however when I ahve to read another <OPTIOn ....tag> wherein the previous OPTION tag did not have a closing </OPTION> the later <OPTION....> tag gets read and since it is once read it is unavailable for scanning again as a new Option tag. Anyway I seem to have made my testcases work by storing the previous node value and in case </OPTION> is not present I take care of it accordingly. I have just added some more test cases to validate its robustness. For the time being I think its done. Thanx for the response nevertheless. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Wednesday, August 14, 2002 1:14 PM To: htmlparser-developer Cc: somik Subject: Re: [Htmlparser-developer] Writing OPTION tag Hi Dhaval, Sorry, Ive been really swamped.. > The problem with my input is that <OPTION value="AltaVista Search"> > would be read as an OptionTag, AltaVista would be read as the StringNode > and then <OPTION value="Lycos Search"> would be read and since it is > neither a StringNode nor an EndTag an OptionTag would be created for the > above 2 values. .. This idea is incorrect. <OPTION. .... > is a tag. Nothing inside the Option tag is a string node. <OPTION ... > (this is HTMLTag) some text here sdjklsdjk (this is HTMLStringNode) </OPTION> (this is HTMLEndTag) HTH. Cheers, Somik ------------------------------------------------------- This sf.net email is sponsored by: Dice - The leading online job board for high-tech professionals. Search and apply for tech jobs today! http://seeker.dice.com/seeker.epl?rel_code=31 _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2002-08-14 07:43:29
|
Hi Dhaval, Sorry, Ive been really swamped.. > The problem with my input is that <OPTION value="AltaVista Search"> > would be read as an OptionTag, AltaVista would be read as the StringNode > and then <OPTION value="Lycos Search"> would be read and since it is > neither a StringNode nor an EndTag an OptionTag would be created for the > above 2 values. .. This idea is incorrect. <OPTION. .... > is a tag. Nothing inside the Option tag is a string node. <OPTION ... > (this is HTMLTag) some text here sdjklsdjk (this is HTMLStringNode) </OPTION> (this is HTMLEndTag) HTH. Cheers, Somik |
From: <dha...@or...> - 2002-08-14 07:06:41
|
Hi guys, I am yet trying to solve my problem with the scanner of my OPTION tag. I would really appreciate any help from the developers of the parsing engine. I think a solution may lie in knowing certain internals of the parser. Let me explain my problem in detail. Assume the following 2 OPTION tags : <OPTION value="AltaVista Search">AltaVista <OPTION value="Lycos Search"></OPTION> The OPTION tag does not explicitly require an end tag. Hence the first line is valid. My parsing logic in scan is as follows : 1. Disable existing parsers 2. Read elements from the Reader. 3. Check whether it is an EndTag for OPTION or SELECT (since OPTION tags are always under SELECT). If so create an OptionTag object with necessary values 4. If it is not an EndTag, check whether it is a StringNode (this would be for the value between <OPTION> and </OPTION> tags). If so it is the text of the OPTION tag and store it temporarily. (This will be later used in the constructor). 5. If it is neither it could be an error or the beginning of another tag (possible another <OPTION> tag as above) and hence the current loop must be terminated and the option object must be constructed. The problem with my input is that <OPTION value="AltaVista Search"> would be read as an OptionTag, AltaVista would be read as the StringNode and then <OPTION value="Lycos Search"> would be read and since it is neither a StringNode nor an EndTag an OptionTag would be created for the above 2 values. However since this tag is already read it will not qualify as a new OptionTag and hence I am missing out this tag in my parsing. I hope I have been able to explain my problem clearly. If not, I would certainly like to clarify on any points which are not understood. A snippet of code from scan() of HTMLOptionTagScanner is given below Vector lScannerVector = HTMLParserUtils.adjustScanners(pReader); do { lNode = pReader.readElement(); System.out.println(lNode.toHTML()); if (lNode instanceof HTMLEndTag) { lEndTag = (HTMLEndTag)lNode; String lEndTagString = lEndTag.getText().toUpperCase(); if (lEndTagString.equals("OPTION") || lEndTagString.equals("SELECT")) { endTagFound = true; } } else if (lNode instanceof HTMLStringNode) { lText.append(lNode.toHTML()); } else if (lNode instanceof HTMLTag) { endTagFound = true; } } while (!endTagFound); HTMLOptionTag lOptionTag = new HTMLOptionTag(0, lNode.elementEnd(), pTag.getText(), lText.toString(), pCurrLine); HTMLParserUtils.restoreScanners(pReader, lScannerVector); Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 |
From: <dha...@or...> - 2002-08-12 12:49:15
|
Hi Somik, My test case writing has become quite simplified by basically following your template codes. Should be done with all test cases by tommorrow. I have finished writing tag-scanner pairs for INPUT, TEXTAREA, OPTION & SELECT tags. Do let me know if you are interested in including it as a part of the standard library and what I will need to do for that purpose. I will feel a little bit more confidant if someone goes through my code since this is my first time. I have changed the tag-scanners quite a fair bit from the time I last sent them to u. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 |
From: Somik R. <so...@ya...> - 2002-08-10 08:22:39
|
Dhaval Udani wrote : My team is building a framework which is used by many projects in my organization. All the other projects create HTML with their own look-and-feel. To use the framework, they need to convert these files into a JSP(using a tool developed by my team). The tool apart from jsut changing the extension ;) also adds lots of JSP code and makes certain modifications to the HTML tags(not the presentation tags though). After the JSP is created if the layout changes, they will ahve to again spend time correcting this anomaly and will need to keep doing it everytime they change their HTML page or the tool is updated. Now I guess you can understand why I feel so strongly about maintaining layout. I am not sure I fully understand. The other teams are creating HTML with = their own look and feel. You are converting it to a JSP. Naturally the = alignment would have changes by your additions itself. Now, if the = original HTML is preserved in functionality but not in exact layout as = it arrived, I did not understand how that causes a problem in your other = teams. Are they reading your jsp file through some program ?=20 If you can give some more details, a clearer picture might emerge. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-10 08:17:21
|
Hi Claude, =20 You've again raised a good point. I will look into this for next = week's release. Regards Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Friday, August 09, 2002 12:58 AM Subject: RE: [Htmlparser-developer] Re: [Htmlparser-user] Another = Ill-Formed Example Based on your description there is a risk that calling hasMoreNodes = without calling nextHTMLNode a few times in a row will not have the = desired API semantics. If the parsing takes place in the call to = hasMoreNodes, then the parser moves forward, regardless of whether the = nextHTMLNode method was called. This suggests that the method should be = called something else, more indicative of this behavior, or the behavior = should be changed. =20 -----Original Message-----=20 From: Somik Raha [mailto:so...@ya...]=20 Sent: Thu 8/8/2002 12:07 AM=20 To: htm...@li...=20 Cc:=20 Subject: Re: [Htmlparser-developer] Re: [Htmlparser-user] Another = Ill-Formed Example Hi Claude, Thanks for the kind words. BTW: I was giving some thought to the calls that take place in = HTMLEnumeration. As far as I could tell, many internal calls were made = twice, by virtue of the hasMoreNodes/nextHTMLNode pattern. An alternate = pattern is repeated calls to nextHTMLNode which should stop when a null = response is returned. This pattern is used by the = BufferedReader.readLine method, by the JDBC ResultSet.next method, etc. = Based on the simple observation that calls to hasMoreNodes AND = nextHTMLNode run some of the same underlying code, it seems that the = speed of the parser could be positively influenced by reducing the = interface to a single call. Any thoughts? I am not so sure this would be a good idea, because then, we'd have to = compromise on the API. Then users would have to be checking for null = values- the iterator interface is also one that is popular and we have = a familiarity factor here. As far as optimization goes, the nextHTMLNode doesent do parsing, it = simply returns the node that was parsed internally when hasMoreNodes() = was called. So, the only speed up would be in the reduction of a call - = I am not so sure that this would be the best place for such a speedup. Bytway, talking about speedups, the last release and the next one = should see some tweaks - and the performance ought to have gotten = better. Are you still doing the performance testing ? Any results to = share ? Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-08-10 08:15:00
|
Hi Folks, Next release (v1.2-2002-08-11) is out. From the change log : [1] Fixed bug 590703 - Empty values dont get parsed [2] Fixed bug 591435 - Missing values cause keys to be missed [3] Removed all infinite loops in scanners, replaced with throwing HTMLParserException [4] Fixed bug in HTMLTitleScanner, allowing certain malformed title tags to be parsed [5] Modified HTMLReader - now accepts Reader instead of BufferedReader [6] HTMLParser constructor now throws HTMLParserException. [7] Fixed bug 592355 - Empty tags throw exceptions from some scanners. Now, if the tag is empty, it is not passed down to scanners. Also, fixed the related issue in HTMLStringNode, causing empty tags to be treated as tags and not strings. A very significant fix is #3 - I would highly recommend upgrading your copies asap. Also, following suggestions of Amit Rana, the constructor itself throws HTMLParserException. You can expect some more API changes in the coming weeks, as we attempt to integrate Claude's other contributions (Parser Feedback). We've got over 150 tests and all passing. Regards, Somik |
From: Claude D. <CD...@ar...> - 2002-08-08 15:58:20
|
QmFzZWQgb24geW91ciBkZXNjcmlwdGlvbiB0aGVyZSBpcyBhIHJpc2sgdGhhdCBjYWxsaW5nIGhh c01vcmVOb2RlcyB3aXRob3V0IGNhbGxpbmcgbmV4dEhUTUxOb2RlIGEgZmV3IHRpbWVzIGluIGEg cm93IHdpbGwgbm90IGhhdmUgdGhlIGRlc2lyZWQgQVBJIHNlbWFudGljcy4gSWYgdGhlIHBhcnNp bmcgdGFrZXMgcGxhY2UgaW4gdGhlIGNhbGwgdG8gaGFzTW9yZU5vZGVzLCB0aGVuIHRoZSBwYXJz ZXIgbW92ZXMgZm9yd2FyZCwgcmVnYXJkbGVzcyBvZiB3aGV0aGVyIHRoZSBuZXh0SFRNTE5vZGUg bWV0aG9kIHdhcyBjYWxsZWQuIFRoaXMgc3VnZ2VzdHMgdGhhdCB0aGUgbWV0aG9kIHNob3VsZCBi ZSBjYWxsZWQgc29tZXRoaW5nIGVsc2UsIG1vcmUgaW5kaWNhdGl2ZSBvZiB0aGlzIGJlaGF2aW9y LCBvciB0aGUgYmVoYXZpb3Igc2hvdWxkIGJlIGNoYW5nZWQuDQogDQotLS0tLU9yaWdpbmFsIE1l c3NhZ2UtLS0tLSANCkZyb206IFNvbWlrIFJhaGEgW21haWx0bzpzb21pa0B5YWhvby5jb21dIA0K U2VudDogVGh1IDgvOC8yMDAyIDEyOjA3IEFNIA0KVG86IGh0bWxwYXJzZXItZGV2ZWxvcGVyQGxp c3RzLnNvdXJjZWZvcmdlLm5ldCANCkNjOiANClN1YmplY3Q6IFJlOiBbSHRtbHBhcnNlci1kZXZl bG9wZXJdIFJlOiBbSHRtbHBhcnNlci11c2VyXSBBbm90aGVyIElsbC1Gb3JtZWQgRXhhbXBsZQ0K DQoNCg0KCUhpIENsYXVkZSwNCgkgICAgVGhhbmtzIGZvciB0aGUga2luZCB3b3Jkcy4NCgkNCglC VFc6IEkgd2FzIGdpdmluZyBzb21lIHRob3VnaHQgdG8gdGhlIGNhbGxzIHRoYXQgdGFrZSBwbGFj ZSBpbiBIVE1MRW51bWVyYXRpb24uIEFzIGZhciBhcyBJIGNvdWxkIHRlbGwsIG1hbnkgaW50ZXJu YWwgY2FsbHMgd2VyZSBtYWRlIHR3aWNlLCBieSB2aXJ0dWUgb2YgdGhlIGhhc01vcmVOb2Rlcy9u ZXh0SFRNTE5vZGUgcGF0dGVybi4gQW4gYWx0ZXJuYXRlIHBhdHRlcm4gaXMgcmVwZWF0ZWQgY2Fs bHMgdG8gbmV4dEhUTUxOb2RlIHdoaWNoIHNob3VsZCBzdG9wIHdoZW4gYSBudWxsIHJlc3BvbnNl IGlzIHJldHVybmVkLiBUaGlzIHBhdHRlcm4gaXMgdXNlZCBieSB0aGUgQnVmZmVyZWRSZWFkZXIu cmVhZExpbmUgbWV0aG9kLCBieSB0aGUgSkRCQyBSZXN1bHRTZXQubmV4dCBtZXRob2QsIGV0Yy4g QmFzZWQgb24gdGhlIHNpbXBsZSBvYnNlcnZhdGlvbiB0aGF0IGNhbGxzIHRvIGhhc01vcmVOb2Rl cyBBTkQgbmV4dEhUTUxOb2RlIHJ1biBzb21lIG9mIHRoZSBzYW1lIHVuZGVybHlpbmcgY29kZSwg aXQgc2VlbXMgdGhhdCB0aGUgc3BlZWQgb2YgdGhlIHBhcnNlciBjb3VsZCBiZSBwb3NpdGl2ZWx5 IGluZmx1ZW5jZWQgYnkgcmVkdWNpbmcgdGhlIGludGVyZmFjZSB0byBhIHNpbmdsZSBjYWxsLiBB bnkgdGhvdWdodHM/DQoJIA0KCUkgYW0gbm90IHNvIHN1cmUgdGhpcyB3b3VsZCBiZSBhIGdvb2Qg aWRlYSwgYmVjYXVzZSB0aGVuLCB3ZSdkIGhhdmUgdG8gY29tcHJvbWlzZSBvbiB0aGUgQVBJLiBU aGVuIHVzZXJzIHdvdWxkIGhhdmUgdG8gYmUgY2hlY2tpbmcgZm9yIG51bGwgdmFsdWVzLSAgdGhl IGl0ZXJhdG9yIGludGVyZmFjZSBpcyBhbHNvIG9uZSB0aGF0IGlzIHBvcHVsYXIgYW5kIHdlIGhh dmUgYSBmYW1pbGlhcml0eSBmYWN0b3IgaGVyZS4NCgkgDQoJQXMgZmFyIGFzIG9wdGltaXphdGlv biBnb2VzLCB0aGUgbmV4dEhUTUxOb2RlIGRvZXNlbnQgZG8gcGFyc2luZywgaXQgc2ltcGx5IHJl dHVybnMgdGhlIG5vZGUgdGhhdCB3YXMgcGFyc2VkIGludGVybmFsbHkgd2hlbiBoYXNNb3JlTm9k ZXMoKSB3YXMgY2FsbGVkLiBTbywgdGhlIG9ubHkgc3BlZWQgdXAgd291bGQgYmUgaW4gdGhlIHJl ZHVjdGlvbiBvZiBhIGNhbGwgLSBJIGFtIG5vdCBzbyBzdXJlIHRoYXQgdGhpcyB3b3VsZCBiZSB0 aGUgYmVzdCBwbGFjZSBmb3Igc3VjaCBhIHNwZWVkdXAuDQoJIA0KCUJ5dHdheSwgdGFsa2luZyBh Ym91dCBzcGVlZHVwcywgdGhlIGxhc3QgcmVsZWFzZSBhbmQgdGhlIG5leHQgb25lIHNob3VsZCBz ZWUgc29tZSB0d2Vha3MgLSBhbmQgdGhlIHBlcmZvcm1hbmNlIG91Z2h0IHRvIGhhdmUgZ290dGVu IGJldHRlci4gQXJlIHlvdSBzdGlsbCBkb2luZyB0aGUgcGVyZm9ybWFuY2UgdGVzdGluZyA/IEFu eSByZXN1bHRzIHRvIHNoYXJlID8NCgkgDQoJQ2hlZXJzLA0KCVNvbWlrDQoNCg== |
From: Somik R. <so...@ya...> - 2002-08-08 07:14:19
|
MessageHi Claude, Thanks for the kind words. BTW: I was giving some thought to the calls that take place in = HTMLEnumeration. As far as I could tell, many internal calls were made = twice, by virtue of the hasMoreNodes/nextHTMLNode pattern. An alternate = pattern is repeated calls to nextHTMLNode which should stop when a null = response is returned. This pattern is used by the = BufferedReader.readLine method, by the JDBC ResultSet.next method, etc. = Based on the simple observation that calls to hasMoreNodes AND = nextHTMLNode run some of the same underlying code, it seems that the = speed of the parser could be positively influenced by reducing the = interface to a single call. Any thoughts? I am not so sure this would be a good idea, because then, we'd have to = compromise on the API. Then users would have to be checking for null = values- the iterator interface is also one that is popular and we have = a familiarity factor here. As far as optimization goes, the nextHTMLNode doesent do parsing, it = simply returns the node that was parsed internally when hasMoreNodes() = was called. So, the only speed up would be in the reduction of a call - = I am not so sure that this would be the best place for such a speedup. Bytway, talking about speedups, the last release and the next one should = see some tweaks - and the performance ought to have gotten better. Are = you still doing the performance testing ? Any results to share ? Cheers, Somik |
From: Claude D. <CD...@ar...> - 2002-08-07 15:48:15
|
You are not only talented but very kind! Thanks. =20 BTW: I was giving some thought to the calls that take place in HTMLEnumeration. As far as I could tell, many internal calls were made twice, by virtue of the hasMoreNodes/nextHTMLNode pattern. An alternate pattern is repeated calls to nextHTMLNode which should stop when a null response is returned. This pattern is used by the BufferedReader.readLine method, by the JDBC ResultSet.next method, etc. Based on the simple observation that calls to hasMoreNodes AND nextHTMLNode run some of the same underlying code, it seems that the speed of the parser could be positively influenced by reducing the interface to a single call. Any thoughts? =20 -----Original Message----- From: Somik Raha [mailto:so...@ya...]=20 Sent: Tuesday, August 06, 2002 9:56 PM To: htm...@li... Cc: htm...@li... Subject: [Htmlparser-developer] Re: [Htmlparser-user] Another Ill-Formed Example Hi Claude, This has been handled, related to the earlier fix. All potential infinite loops have been removed, and there will be no more hangings - only HTMLParserExceptions from now on. There will be a release having all these fixes this weekend. =20 Regards, Somik ----- Original Message -----=20 From: Claude <mailto:CD...@ar...> Duguay=20 To: htm...@li...=20 Sent: Wednesday, August 07, 2002 3:35 AM Subject: [Htmlparser-user] Another Ill-Formed Example Here's some markup we found in another document that causes the HTMLParser to hang. "<TITLE>KRP VALIDATION<PROCESS/TITLE>" So far, we've had 4 documents cause our process to come to a grinding halt. I would much prefer a policy of exception throwing to hangs asap, followed by consideration of whether unusual markup can be handled more elegantly in a subsequent phase. Thanks to everyone, as always. =20 |
From: Somik R. <so...@ya...> - 2002-08-07 05:02:31
|
MessageHi Claude, This has been handled, related to the earlier fix. All potential = infinite loops have been removed, and there will be no more hangings - = only HTMLParserExceptions from now on. There will be a release having all these fixes this weekend. Regards, Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Wednesday, August 07, 2002 3:35 AM Subject: [Htmlparser-user] Another Ill-Formed Example Here's some markup we found in another document that causes the = HTMLParser to hang. "<TITLE>KRP VALIDATION<PROCESS/TITLE>" So far, we've had 4 documents cause our process to come to a grinding = halt. I would much prefer a policy of exception throwing to hangs asap, = followed by consideration of whether unusual markup can be handled more = elegantly in a subsequent phase. Thanks to everyone, as always. =20 |
From: Somik R. <so...@ya...> - 2002-08-07 04:57:40
|
MessageHi Claude, This bug has been fixed. Bytway - a request - please enter bug = reports from the site http://htmlparser.sourceforge.net. Regards, Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Tuesday, August 06, 2002 4:02 AM Subject: [Htmlparser-developer] Malformed HTML If the parser (1.2 integration build) encounters the following code it = hangs: =20 <html><head><TITLE> <html><head><TITLE> Double tags can hang the code </TITLE></head><body> <body><html> =20 I have created this reproducible source document but I am still trying = to issolate the source of the problem. =20 BTW: The exception handling is excellent this way Somik. There are a = few conditions that hang the parser which should throw exceptions, but = the framework is in place to get there now. Thanks. |
From: Claude D. <CD...@ar...> - 2002-08-06 17:02:04
|
I should mention that it would not (in my view) be a good idea to tie this project to the org.xml.sax package just for the InputSource object. I would presume that a new InputSource object with the same semantics would be created for the HTMLParser project. -----Original Message----- From: Claude Duguay=20 Sent: Tuesday, August 06, 2002 9:42 AM To: htm...@li... Subject: RE: [Htmlparser-developer] Language Support I've been reading further on the use of InputSource in XML and recall your interest in using a similar mechanism. In practice it seems easy enough to do this and I've provided some same code to illustrate. The InputSource provides either a Reader, InputStream of System ID (usually a local file name) and they can be checked for existence, in that order. =20 I raise this issue (in this context) because one of the reasons the XML community adopted the InputSource was because it could contain additional information about he character set encoding (which is not used here but could be). This becomes much more important if you start considering internationalization. =20 Here's some sample code that is compilable (but untested - though it should work): import java.io.*; import org.xml.sax.*; import com.kizna.html.util.HTMLParserException; =20 public class InputSourceReader extends BufferedReader { public InputSourceReader(InputSource source) throws HTMLParserException { super(getReaderFromInputSource(source)); } =20 protected static Reader getReaderFromInputSource(InputSource source) throws HTMLParserException { Reader reader =3D source.getCharacterStream(); if (reader !=3D null) { return reader; } =20 InputStream input =3D source.getByteStream(); if (input !=3D null) { return new InputStreamReader(input); } =20 String systemId =3D source.getSystemId(); if (systemId !=3D null) { try { return new FileReader(systemId); } catch (FileNotFoundException e) { throw new HTMLParserException("Invalid InputSource", e); } } throw new HTMLParserException("Invalid InputSource"); } } -----Original Message----- From: Somik Raha [mailto:so...@ya...]=20 Sent: Monday, August 05, 2002 7:07 PM To: htm...@li... Subject: [Htmlparser-developer] Language Support Hi Folks, Amit Rana is a new developer on HTMLParser. He has considerable experience in internationalization - and he is currently working to enable language support and switching. Two languages high on my list are - French and Finnish, considering we've had French and Finnish developers on this project. We also want to do Japanese support. The architecture that Amit is trying is nice - it will simply require publishing of a standard English properties file - and for any language support, a corresponding translated properties file will be loaded up. Amit --> you can probably give a more detailed explanation here. =20 Regards, Somik |
From: Claude D. <CD...@ar...> - 2002-08-06 16:41:59
|
I've been reading further on the use of InputSource in XML and recall your interest in using a similar mechanism. In practice it seems easy enough to do this and I've provided some same code to illustrate. The InputSource provides either a Reader, InputStream of System ID (usually a local file name) and they can be checked for existence, in that order. =20 I raise this issue (in this context) because one of the reasons the XML community adopted the InputSource was because it could contain additional information about he character set encoding (which is not used here but could be). This becomes much more important if you start considering internationalization. =20 Here's some sample code that is compilable (but untested - though it should work): import java.io.*; import org.xml.sax.*; import com.kizna.html.util.HTMLParserException; =20 public class InputSourceReader extends BufferedReader { public InputSourceReader(InputSource source) throws HTMLParserException { super(getReaderFromInputSource(source)); } =20 protected static Reader getReaderFromInputSource(InputSource source) throws HTMLParserException { Reader reader =3D source.getCharacterStream(); if (reader !=3D null) { return reader; } =20 InputStream input =3D source.getByteStream(); if (input !=3D null) { return new InputStreamReader(input); } =20 String systemId =3D source.getSystemId(); if (systemId !=3D null) { try { return new FileReader(systemId); } catch (FileNotFoundException e) { throw new HTMLParserException("Invalid InputSource", e); } } throw new HTMLParserException("Invalid InputSource"); } } -----Original Message----- From: Somik Raha [mailto:so...@ya...]=20 Sent: Monday, August 05, 2002 7:07 PM To: htm...@li... Subject: [Htmlparser-developer] Language Support Hi Folks, Amit Rana is a new developer on HTMLParser. He has considerable experience in internationalization - and he is currently working to enable language support and switching. Two languages high on my list are - French and Finnish, considering we've had French and Finnish developers on this project. We also want to do Japanese support. The architecture that Amit is trying is nice - it will simply require publishing of a standard English properties file - and for any language support, a corresponding translated properties file will be loaded up. Amit --> you can probably give a more detailed explanation here. =20 Regards, Somik |
From: Claude D. <CD...@ar...> - 2002-08-06 16:19:06
|
My own expectations are fairly simple. =20 1) If the page is unparsable because it is ill-formed, the parser should throw an exception. This is a priority behavior in that it is better for the parser to report problems than it is for it to hang because the internal logic to handle ill-formed documents has gotten too complicated or unpredictable. =20 2) If it is possible for the parser to handle certain types of ill-formed documents, this should be considered a desirable feature, but never at the expense of handling properly formed documents or notifiying the library user that something went wrong if it couldn't. =20 It may be best to consider these separate issues. Since item 1 is imperative and item 2 is a feature, you may want to consider making item 2 a selectable feature. That is to say, there may be a need to have a 'strict' mode that never handles ill-formed documents (which has plenty of value in and of itself, given that some folks actually want to recognize bad HTML), and another 'liberal' mode, that does it''s best to compensate for flaws in the document. =20 The problem with compensating for ill-formed documents will always be that to handle it one way may interfere with an alternate interpretation, which in some cases may also be correct. In cases where there is not alternate interpretation, the solution is simple. I cases where an alternate interpretation is possible, the code is inevitably wrong to someone who wanted to see the alternate behavior. It's probably best, then, to further separate the compensation criteria to handle ONLY those cases where the interpretation is unambiguous. =20 -----Original Message----- From: Somik Raha [mailto:so...@ya...]=20 Sent: Tuesday, August 06, 2002 12:11 AM To: htm...@li... Subject: Re: [Htmlparser-developer] Update (Claude - ur feedback needed) Hi Kaarle, It seems like we may have acted hastily in correcting this (even in HTMLImageScanner). I just tried Claude's page again, and I find that the image is not parsed. Amit also mentioned sometime back that we ought to flag some kind of error.=20 Of course IE does not collapse- it continues parsing.=20 So - I think you should not put in this fix to parseParameters(). I should also rollback my fix and throw an error (?) - or probably throw a bad image tag, where you cannot retrieve the data. OTOH - the other side of the coin is - if someday people decide to kick IE out, and write a new browser with this parser, such pages would work fine. In which case, it would be good to have fixes like this. =20 I find myself tilting to the former argument, however attractive the latter may sound. Amit, Claude--> what are your comments ? Claude - as this bug was reported by you - I'd like to ask what do you expect ? =20 Regards, Somik =20 =20 ----- Original Message -----=20 From: Kaarle Kaila <mailto:kaa...@kk...> =20 To: so...@ya... ; htm...@li...=20 Sent: Tuesday, August 06, 2002 4:07 PM Subject: Re: [Htmlparser-developer] Update I still had a look at the code and made a small addition that would accept <a b"c"> as <a b=3D"c"> Would it be usefull to have it inserted into CVS? or is it OK as it is? regards Kaarle PS! I can't access CVS until the evening=20 ---- Original Message ---- From: so...@ya... To: htm...@li... Subject: Re: [Htmlparser-developer] Update Date: Tue, 6 Aug 2002 15:42:29 +0900 >Hi Kaarle, > Thanks for the clarification. > >Regards, >Somik > > >I did not really do that I think. I just made a testcase that=20 >seems=20 > >to verify=20 > >that <a b"c"> will be assume to be <a b> , same as <a b=3D""> > > > >Oh - then what happens to c, is it ignored?=20 > > > > Yes! That's what seems to happen. As I said I only added a testcase > to verify what happens. I did not change the code for this purpose. > > regards > Kaarle > > > > >Cheers, > >Somik > > > ----------------------------- > Kaarle Kaila > http://www.iki.fi/kaila > mailto:kaa...@ik... > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > ----------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... |
From: Claude D. <CD...@ar...> - 2002-08-06 16:05:56
|
Great! Check out the Log class in the JavaDocs for the Apache Commons project:=20 <http://jakarta.apache.org/commons/logging.html> http://jakarta.apache.org/commons/logging.html It's intented to provide an abstraction that maps onto various logging libraries (Log4J, JDK14 logging, etc). The API for the Log class looks similar to the one I proposed. The main distinctions are that they've used Object types for the message (I'd presume they count on the toString method for logging) and they have more methods. I think there's room for adding methods in the Feedback API, but I'd be inclided to do it on an as-needed basis. -----Original Message----- From: Somik Raha [mailto:so...@ya...]=20 Sent: Monday, August 05, 2002 7:04 PM To: htm...@li... Subject: Re: [Htmlparser-developer] HTMLParserFeedback Hi Claude, No no, I wasnt planning to use log4j for the parser :) Just mentioning that the model is so similar. J2SDK 1.4.x of course has the same logging stuff in their APIs. I agree with your reasoning - we'll start putting in the feedback classes down the line. Let me see if I can find some time in the weekend to analyze this. If anyone else wants to try this integration - pls feel free. =20 Regards, Somik ----- Original Message -----=20 From: Claude <mailto:CD...@ar...> Duguay=20 To: htm...@li...=20 Sent: Monday, August 05, 2002 1:04 PM Subject: RE: [Htmlparser-developer] HTMLParserFeedback Please don't introduce any dependencies on other libraries. The Feedback model is intended to allow users to redirect output to wherever they see fit for their application. The default sends output to the console but it's easy for implementers to make more local decisions based on their context, by replacing the default implementation, so long as the interface is valid. The whole idea of a library/framework is that the input/output is controllable by the developer using it. You don't want any coupling to other libraries. Let developers decide what's suitable for their application. It's similar to the ErrorHandler in SAX, though in their case, the output goes nowhere by default. It's up to users to decide what to do. =20 You'll notice that the Feedback classes introduce a model that library developers can use to direct output to a place that won't interfere with the library user/developer's notion of where things could go. I've been meaning to write something more specific about this design pattern but things just keep getting in the way. In any case, use the Feedback mechanism as a way of allowing users to decide where the output should go or whether it should be ignored. Consider it a replacement for System.out and System.err. Users can later decide whether the output (which falls into simple categories) should be logged, send to the console, written to a GUI, rerouted to sockets, filtered by pipelines or simply ignored. The beauty of this design is all in the uncoupling, ushc that the library user decides what's relevant in their application. =20 -----Original Message-----=20 From: Somik Raha [mailto:so...@ya...]=20 Sent: Sun 8/4/2002 12:34 AM=20 To: htm...@li...=20 Cc:=20 Subject: [Htmlparser-developer] HTMLParserFeedback Hi Developers, This is to initiate a discussion on the next step, on integration feedback into the parser. Claude had submitted HTMLParserFeedback interface (in the util package) - which allow us to log the activity of the parser, inform when errors occur, and show warnings.=20 I am familiar with log4j, and this sounds pretty similar - in terms of functionality, it sounds good. But in terms of performance, my question is : [1] Will this result in an unacceptable performance hit ? [2] Should we provide alternate constructors or modify existing API ? If we provide alternates, then what default behaviour would be best ? Are we talking about default callback objects - if yes, the strings created for each call would slow down the parser. It would be great to have some thoughts on this. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-06 07:23:36
|
Hi Dhaval, I woudl like to know how "checked" would be reflected in the HTMLTag during the parsing procedure. This is what "should" happen - the tag will be treated the same as=20 <INPUT type=3D"checkbox" name=3D"Authorize" value=3D"Y" checked=3D""> However, this is not what actually happens - I've written a testcase to = demonstrate this, and we should be fixing it soon. Kaarle - I've opened a bug report, can you check this ? Thanks a lot. Regards, Somik =20 ----- Original Message -----=20 From: dha...@or...=20 To: htm...@li...=20 Sent: Tuesday, August 06, 2002 4:04 PM Subject: RE: [Htmlparser-user] Parsing query Hi, I have a small doubt. For a checkbox or a radio button the following kind of tag is very normal. <INPUT type=3D"checkbox" name=3D"Authorize" value=3D"Y" checked> I woudl like to know how "checked" would be reflected in the HTMLTag during the parsing procedure. Thanx in advance, Dhaval |
From: Somik R. <so...@ya...> - 2002-08-06 07:17:34
|
Hi Kaarle, It seems like we may have acted hastily in correcting this (even in = HTMLImageScanner). I just tried Claude's page again, and I find that the = image is not parsed. Amit also mentioned sometime back that we ought to = flag some kind of error.=20 Of course IE does not collapse- it continues parsing.=20 So - I think you should not put in this fix to parseParameters(). I = should also rollback my fix and throw an error (?) - or probably throw a = bad image tag, where you cannot retrieve the data. OTOH - the other side of the coin is - if someday people decide to = kick IE out, and write a new browser with this parser, such pages would = work fine. In which case, it would be good to have fixes like this. I find myself tilting to the former argument, however attractive the = latter may sound. Amit, Claude--> what are your comments ? Claude - as this bug was reported by you - I'd like to ask what do = you expect ? Regards, Somik ----- Original Message -----=20 From: Kaarle Kaila=20 To: so...@ya... ; htm...@li...=20 Sent: Tuesday, August 06, 2002 4:07 PM Subject: Re: [Htmlparser-developer] Update I still had a look at the code and made a small addition that would accept <a b"c"> as <a b=3D"c"> Would it be usefull to have it inserted into CVS? or is it OK as it is? regards Kaarle PS! I can't access CVS until the evening=20 ---- Original Message ---- From: so...@ya... To: htm...@li... Subject: Re: [Htmlparser-developer] Update Date: Tue, 6 Aug 2002 15:42:29 +0900 >Hi Kaarle, > Thanks for the clarification. > >Regards, >Somik > > >I did not really do that I think. I just made a testcase that=20 >seems=20 > >to verify=20 > >that <a b"c"> will be assume to be <a b> , same as <a b=3D""> > > > >Oh - then what happens to c, is it ignored?=20 > > > > Yes! That's what seems to happen. As I said I only added a testcase > to verify what happens. I did not change the code for this purpose. > > regards > Kaarle > > > > >Cheers, > >Somik > > > ----------------------------- > Kaarle Kaila > http://www.iki.fi/kaila > mailto:kaa...@ik... > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > ----------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... |