htmlparser-developer Mailing List for HTML Parser (Page 28)
Brought to you by:
derrickoswald
You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
| 2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
| 2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
| 2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
| 2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
| 2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
| 2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
|
From: Kaarle K. <kaa...@kk...> - 2002-08-06 07:07:18
|
I still had a look at the code and made a small addition that would accept <a b"c"> as <a b="c"> Would it be usefull to have it inserted into CVS? or is it OK as it is? regards Kaarle PS! I can't access CVS until the evening ---- Original Message ---- From: so...@ya... To: htm...@li... Subject: Re: [Htmlparser-developer] Update Date: Tue, 6 Aug 2002 15:42:29 +0900 >Hi Kaarle, > Thanks for the clarification. > >Regards, >Somik > > >I did not really do that I think. I just made a testcase that >seems > >to verify > >that <a b"c"> will be assume to be <a b> , same as <a b=""> > > > >Oh - then what happens to c, is it ignored? > > > > Yes! That's what seems to happen. As I said I only added a testcase > to verify what happens. I did not change the code for this purpose. > > regards > Kaarle > > > > >Cheers, > >Somik > > > ----------------------------- > Kaarle Kaila > http://www.iki.fi/kaila > mailto:kaa...@ik... > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > ----------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... |
|
From: Somik R. <so...@ya...> - 2002-08-06 06:49:28
|
Hi Kaarle,
Thanks for the clarification.
Regards,
Somik
----- Original Message -----=20
From: Kaarle Kaila=20
To: so...@ya... ; htm...@li...=20
Sent: Tuesday, August 06, 2002 3:34 PM
Subject: Re: [Htmlparser-developer] Update
>I did not really do that I think. I just made a testcase that seems=20
>to verify=20
>that <a b"c"> will be assume to be <a b> , same as <a b=3D"">
>
>Oh - then what happens to c, is it ignored?=20
>
Yes! That's what seems to happen. As I said I only added a testcase
to verify what happens. I did not change the code for this purpose.
regards
Kaarle
>Cheers,
>Somik
>
-----------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
|
|
From: Kaarle K. <kaa...@kk...> - 2002-08-06 06:34:11
|
>I did not really do that I think. I just made a testcase that seems >to verify >that <a b"c"> will be assume to be <a b> , same as <a b=""> > >Oh - then what happens to c, is it ignored? > Yes! That's what seems to happen. As I said I only added a testcase to verify what happens. I did not change the code for this purpose. regards Kaarle >Cheers, >Somik > ----------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... |
|
From: Somik R. <so...@ya...> - 2002-08-06 04:14:11
|
I did not really do that I think. I just made a testcase that seems to = verify=20 that <a b"c"> will be assume to be <a b> , same as <a b=3D""> Oh - then what happens to c, is it ignored?=20 Cheers, Somik |
|
From: Kaarle K. <kaa...@ik...> - 2002-08-06 04:10:32
|
On Tuesday 06 August 2002 04:36, Somik Raha wrote: > Hi Folks > An update to let you know Kaarle has fixed the bug 590703 and he ha= s > also made a modification in parseParameters(), by which tags of the for= m <a > b"c"> will be assume to be > <a b=3D"c"> hi! I did not really do that I think. I just made a testcase that seems to ve= rify=20 that <a b"c"> will be assume to be <a b> , same as <a b=3D""> regards Kaarle > > Latest code is in CVS, all testcases passing. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila |
|
From: Somik R. <so...@ya...> - 2002-08-06 02:15:37
|
Hi Folks,
Would someone volunteer or know someone who could volunteer to write =
the documentation for this project ? In fact a lot of quality docs are =
really needed.
Production release 1.2 must be accompanied with quality docs..
Regards,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-06 02:13:55
|
Hi Folks,
Amit Rana is a new developer on HTMLParser. He has considerable =
experience in internationalization - and he is currently working to =
enable language support and switching. Two languages high on my list are =
- French and Finnish, considering we've had French and Finnish =
developers on this project. We also want to do Japanese support.
The architecture that Amit is trying is nice - it will simply =
require publishing of a standard English properties file - and for any =
language support, a corresponding translated properties file will be =
loaded up.
Amit --> you can probably give a more detailed explanation here.
Regards,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-06 02:10:44
|
Hi Claude,
No no, I wasnt planning to use log4j for the parser :)
Just mentioning that the model is so similar. J2SDK 1.4.x of course =
has the same logging stuff in their APIs.
I agree with your reasoning - we'll start putting in the feedback =
classes down the line. Let me see if I can find some time in the weekend =
to analyze this. If anyone else wants to try this integration - pls feel =
free.
Regards,
Somik
----- Original Message -----=20
From: Claude Duguay=20
To: htm...@li...=20
Sent: Monday, August 05, 2002 1:04 PM
Subject: RE: [Htmlparser-developer] HTMLParserFeedback
Please don't introduce any dependencies on other libraries. The =
Feedback model is intended to allow users to redirect output to wherever =
they see fit for their application. The default sends output to the =
console but it's easy for implementers to make more local decisions =
based on their context, by replacing the default implementation, so long =
as the interface is valid. The whole idea of a library/framework is that =
the input/output is controllable by the developer using it. You don't =
want any coupling to other libraries. Let developers decide what's =
suitable for their application. It's similar to the ErrorHandler in SAX, =
though in their case, the output goes nowhere by default. It's up to =
users to decide what to do.
=20
You'll notice that the Feedback classes introduce a model that library =
developers can use to direct output to a place that won't interfere with =
the library user/developer's notion of where things could go. I've been =
meaning to write something more specific about this design pattern but =
things just keep getting in the way. In any case, use the Feedback =
mechanism as a way of allowing users to decide where the output should =
go or whether it should be ignored. Consider it a replacement for =
System.out and System.err. Users can later decide whether the output =
(which falls into simple categories) should be logged, send to the =
console, written to a GUI, rerouted to sockets, filtered by pipelines or =
simply ignored. The beauty of this design is all in the uncoupling, ushc =
that the library user decides what's relevant in their application.
=20
-----Original Message-----=20
From: Somik Raha [mailto:so...@ya...]=20
Sent: Sun 8/4/2002 12:34 AM=20
To: htm...@li...=20
Cc:=20
Subject: [Htmlparser-developer] HTMLParserFeedback
Hi Developers,
This is to initiate a discussion on the next step, on integration =
feedback into the parser. Claude had submitted HTMLParserFeedback =
interface (in the util package) - which allow us to log the activity of =
the parser, inform when errors occur, and show warnings.=20
I am familiar with log4j, and this sounds pretty similar - in =
terms of functionality, it sounds good. But in terms of performance, my =
question is :
[1] Will this result in an unacceptable performance hit ?
[2] Should we provide alternate constructors or modify existing API ? =
If we provide alternates, then what default behaviour would be best ? =
Are we talking about default callback objects - if yes, the strings =
created for each call would slow down the parser.
It would be great to have some thoughts on this.
Regards,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-06 02:08:12
|
Dear Amit, trying to parser "www.google.com" gives <<Error! URL www.google.com=20 Malformed!>> on linux. if i give "http://www.google.com, it works. 1. is this behaviour expected? Yes, this is expected. Previously, the parser used to look for http. = This would cause problems when the protocol was different - like ftp, or = something else.. In order to not restrict the protocol, this checking = has been removed.=20 OTOH, I agree with your observation about throwing exceptions in the = constructor. I have added that capability now, and its in CVS. Regards, Somik |
|
From: Somik R. <so...@ya...> - 2002-08-06 01:52:10
|
MessageI was just looking at HTMLReader and realized that it extends = BufferedReader AND takes a BufferedReader as an argument. This would, if = I'm not mistaken, result in a nested pair of BufferedReaders. Is this = intentional? Thank you for finding this - its a mistake. I've corrected it and = modified HTMLReader so it only takes a Reader, and does not hold state. = Some of the code in the parser is over 2 years old.. thanks for finding = this. Regards, Somik |
|
From: Somik R. <so...@ya...> - 2002-08-06 01:43:51
|
Hi Folks
An update to let you know Kaarle has fixed the bug 590703 and he has =
also made a modification in parseParameters(), by which tags of the form =
<a b"c"> will be assume to be=20
<a b=3D"c">
Latest code is in CVS, all testcases passing.
Regards,
Somik
|
|
From: Claude D. <CD...@ar...> - 2002-08-05 19:02:47
|
If the parser (1.2 integration build) encounters the following code it hangs: =20 <html><head><TITLE> <html><head><TITLE> Double tags can hang the code </TITLE></head><body> <body><html> =20 I have created this reproducible source document but I am still trying to issolate the source of the problem. =20 BTW: The exception handling is excellent this way Somik. There are a few conditions that hang the parser which should throw exceptions, but the framework is in place to get there now. Thanks. |
|
From: Claude D. <CD...@ar...> - 2002-08-05 17:53:43
|
I was just looking at HTMLReader and realized that it extends BufferedReader AND takes a BufferedReader as an argument. This would, if I'm not mistaken, result in a nested pair of BufferedReaders. Is this intentional? |
|
From: Amit R. <ra...@ma...> - 2002-08-05 05:15:19
|
hi, =09trying to parser "www.google.com" gives <<Error! URL www.google.com=20 Malformed!>> on linux. =09if i give "http://www.google.com", it works. 1. is this behaviour expected? =09following code throws exception: =09URL url =3D new URL(resourceLocn); in HTMLParser, method openConnectio= n()=20 bombs. Although API explains why this should bomb but I wanted to know=20 whether this is expected? If not, should we append "http://" to the front. Somik, This is another reason why HTMLParser should throw exception. If you see= =20 HTMLStringFilter example, even though HTMLParser has bombed, there is no = way=20 in which HTMLStringFilter can know it should stop instead it goes on with= =20 rest of its processing. Regards, Amit. |
|
From: Claude D. <CD...@ar...> - 2002-08-05 04:04:08
|
UGxlYXNlIGRvbid0IGludHJvZHVjZSBhbnkgZGVwZW5kZW5jaWVzIG9uIG90aGVyIGxpYnJhcmll cy4gVGhlIEZlZWRiYWNrIG1vZGVsIGlzIGludGVuZGVkIHRvIGFsbG93IHVzZXJzIHRvIHJlZGly ZWN0IG91dHB1dCB0byB3aGVyZXZlciB0aGV5IHNlZSBmaXQgZm9yIHRoZWlyIGFwcGxpY2F0aW9u LiBUaGUgZGVmYXVsdCBzZW5kcyBvdXRwdXQgdG8gdGhlIGNvbnNvbGUgYnV0IGl0J3MgZWFzeSBm b3IgaW1wbGVtZW50ZXJzIHRvIG1ha2UgbW9yZSBsb2NhbCBkZWNpc2lvbnMgYmFzZWQgb24gdGhl aXIgY29udGV4dCwgYnkgcmVwbGFjaW5nIHRoZSBkZWZhdWx0IGltcGxlbWVudGF0aW9uLCBzbyBs b25nIGFzIHRoZSBpbnRlcmZhY2UgaXMgdmFsaWQuIFRoZSB3aG9sZSBpZGVhIG9mIGEgbGlicmFy eS9mcmFtZXdvcmsgaXMgdGhhdCB0aGUgaW5wdXQvb3V0cHV0IGlzIGNvbnRyb2xsYWJsZSBieSB0 aGUgZGV2ZWxvcGVyIHVzaW5nIGl0LiBZb3UgZG9uJ3Qgd2FudCBhbnkgY291cGxpbmcgdG8gb3Ro ZXIgbGlicmFyaWVzLiBMZXQgZGV2ZWxvcGVycyBkZWNpZGUgd2hhdCdzIHN1aXRhYmxlIGZvciB0 aGVpciBhcHBsaWNhdGlvbi4gSXQncyBzaW1pbGFyIHRvIHRoZSBFcnJvckhhbmRsZXIgaW4gU0FY LCB0aG91Z2ggaW4gdGhlaXIgY2FzZSwgdGhlIG91dHB1dCBnb2VzIG5vd2hlcmUgYnkgZGVmYXVs dC4gSXQncyB1cCB0byB1c2VycyB0byBkZWNpZGUgd2hhdCB0byBkby4NCiANCllvdSdsbCBub3Rp Y2UgdGhhdCB0aGUgRmVlZGJhY2sgY2xhc3NlcyBpbnRyb2R1Y2UgYSBtb2RlbCB0aGF0IGxpYnJh cnkgZGV2ZWxvcGVycyBjYW4gdXNlIHRvIGRpcmVjdCBvdXRwdXQgdG8gYSBwbGFjZSB0aGF0IHdv bid0IGludGVyZmVyZSB3aXRoIHRoZSBsaWJyYXJ5IHVzZXIvZGV2ZWxvcGVyJ3Mgbm90aW9uIG9m IHdoZXJlIHRoaW5ncyBjb3VsZCBnby4gSSd2ZSBiZWVuIG1lYW5pbmcgdG8gd3JpdGUgc29tZXRo aW5nIG1vcmUgc3BlY2lmaWMgYWJvdXQgdGhpcyBkZXNpZ24gcGF0dGVybiBidXQgdGhpbmdzIGp1 c3Qga2VlcCBnZXR0aW5nIGluIHRoZSB3YXkuIEluIGFueSBjYXNlLCB1c2UgdGhlIEZlZWRiYWNr IG1lY2hhbmlzbSBhcyBhIHdheSBvZiBhbGxvd2luZyB1c2VycyB0byBkZWNpZGUgd2hlcmUgdGhl IG91dHB1dCBzaG91bGQgZ28gb3Igd2hldGhlciBpdCBzaG91bGQgYmUgaWdub3JlZC4gQ29uc2lk ZXIgaXQgYSByZXBsYWNlbWVudCBmb3IgU3lzdGVtLm91dCBhbmQgU3lzdGVtLmVyci4gVXNlcnMg Y2FuIGxhdGVyIGRlY2lkZSB3aGV0aGVyIHRoZSBvdXRwdXQgKHdoaWNoIGZhbGxzIGludG8gc2lt cGxlIGNhdGVnb3JpZXMpIHNob3VsZCBiZSBsb2dnZWQsIHNlbmQgdG8gdGhlIGNvbnNvbGUsIHdy aXR0ZW4gdG8gYSBHVUksIHJlcm91dGVkIHRvIHNvY2tldHMsIGZpbHRlcmVkIGJ5IHBpcGVsaW5l cyBvciBzaW1wbHkgaWdub3JlZC4gVGhlIGJlYXV0eSBvZiB0aGlzIGRlc2lnbiBpcyBhbGwgaW4g dGhlIHVuY291cGxpbmcsIHVzaGMgdGhhdCB0aGUgbGlicmFyeSB1c2VyIGRlY2lkZXMgd2hhdCdz IHJlbGV2YW50IGluIHRoZWlyIGFwcGxpY2F0aW9uLg0KIA0KLS0tLS1PcmlnaW5hbCBNZXNzYWdl LS0tLS0gDQpGcm9tOiBTb21payBSYWhhIFttYWlsdG86c29taWtAeWFob28uY29tXSANClNlbnQ6 IFN1biA4LzQvMjAwMiAxMjozNCBBTSANClRvOiBodG1scGFyc2VyLWRldmVsb3BlckBsaXN0cy5z b3VyY2Vmb3JnZS5uZXQgDQpDYzogDQpTdWJqZWN0OiBbSHRtbHBhcnNlci1kZXZlbG9wZXJdIEhU TUxQYXJzZXJGZWVkYmFjaw0KDQoNCg0KCUhpIERldmVsb3BlcnMsDQoJICAgIFRoaXMgaXMgdG8g aW5pdGlhdGUgYSBkaXNjdXNzaW9uIG9uIHRoZSBuZXh0IHN0ZXAsIG9uIGludGVncmF0aW9uIGZl ZWRiYWNrIGludG8gdGhlIHBhcnNlci4gQ2xhdWRlIGhhZCBzdWJtaXR0ZWQgSFRNTFBhcnNlckZl ZWRiYWNrIGludGVyZmFjZSAoaW4gdGhlIHV0aWwgcGFja2FnZSkgLSB3aGljaCBhbGxvdyB1cyB0 byBsb2cgdGhlIGFjdGl2aXR5IG9mIHRoZSBwYXJzZXIsIGluZm9ybSB3aGVuIGVycm9ycyBvY2N1 ciwgYW5kIHNob3cgd2FybmluZ3MuIA0KCSANCgkgICAgSSBhbSBmYW1pbGlhciB3aXRoIGxvZzRq LCBhbmQgdGhpcyBzb3VuZHMgcHJldHR5IHNpbWlsYXIgLSBpbiB0ZXJtcyBvZiBmdW5jdGlvbmFs aXR5LCBpdCBzb3VuZHMgZ29vZC4gQnV0IGluIHRlcm1zIG9mIHBlcmZvcm1hbmNlLCBteSBxdWVz dGlvbiBpcyA6DQoJWzFdIFdpbGwgdGhpcyByZXN1bHQgaW4gYW4gdW5hY2NlcHRhYmxlIHBlcmZv cm1hbmNlIGhpdCA/DQoJWzJdIFNob3VsZCB3ZSBwcm92aWRlIGFsdGVybmF0ZSBjb25zdHJ1Y3Rv cnMgb3IgbW9kaWZ5IGV4aXN0aW5nIEFQSSA/IElmIHdlIHByb3ZpZGUgYWx0ZXJuYXRlcywgdGhl biB3aGF0IGRlZmF1bHQgYmVoYXZpb3VyIHdvdWxkIGJlIGJlc3QgPyBBcmUgd2UgdGFsa2luZyBh Ym91dCBkZWZhdWx0IGNhbGxiYWNrIG9iamVjdHMgLSBpZiB5ZXMsIHRoZSBzdHJpbmdzIGNyZWF0 ZWQgZm9yIGVhY2ggY2FsbCB3b3VsZCBzbG93IGRvd24gdGhlIHBhcnNlci4NCgkgDQoJICAgIEl0 IHdvdWxkIGJlIGdyZWF0IHRvIGhhdmUgc29tZSB0aG91Z2h0cyBvbiB0aGlzLg0KCSANCglSZWdh cmRzLA0KCVNvbWlrDQoJIA0KCSANCg0K |
|
From: Somik R. <so...@ya...> - 2002-08-04 07:41:54
|
Hi Developers,
This is to initiate a discussion on the next step, on integration =
feedback into the parser. Claude had submitted HTMLParserFeedback =
interface (in the util package) - which allow us to log the activity of =
the parser, inform when errors occur, and show warnings.=20
I am familiar with log4j, and this sounds pretty similar - in terms =
of functionality, it sounds good. But in terms of performance, my =
question is :
[1] Will this result in an unacceptable performance hit ?
[2] Should we provide alternate constructors or modify existing API ? If =
we provide alternates, then what default behaviour would be best ? Are =
we talking about default callback objects - if yes, the strings created =
for each call would slow down the parser.
It would be great to have some thoughts on this.
Regards,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-04 07:27:46
|
I forgot to mention - the most important bug fix in this release is in = parseParameters() (588885), done by Kaarle Kaila, because of which we = have been able to incorporate "intelligence" in the parsing, making = Cedric Rosa a happy man. Thanks a ton, Kaarle. Cheers, Somik |
|
From: Somik R. <so...@ya...> - 2002-08-04 07:22:36
|
Hi Folks,
Its time again, another integration release is out. Check =
http://htmlparser.sourceforge.net.
So whats new? Major API change - the parser now has chained =
exceptions. If some problem occurs, your application will have a chance =
to take care of it, instead of simply crashing. Also, the exception =
messages are more meaningful, giving a better picture of what went =
wrong.
Thanks to Claude Duguay for the ChainedException classes, and bug =
reports. And many thanks to the best tester of HTMLParser - Cedric Rosa =
- for countless bug reports - pls keep up the good work.
From the change log,=20
[1] Fixed bug 590250, problem in HTMLStringNode, by which a single
character on the last line was causing a parser crash
[2] Optimized and refactored HTMLParameterParaser.parseParameters()
[3] Modified PerformanceTest to exclude first reading in average =
computation
[4] Fixed bug in HTMLParameterParser.parseParameters(), due to which
params with spaces before =3D were not being picked up
[5] Made massive API changes - throwing exceptions and using =
HTMLEnumeration
[6] Fixed HTMLRemarkNode bug - we can recognize stuff like now.
[7] Fixed HTMLImageScanner bug - we can now fix image tags like
IMG SRC"somepic.jpg" - the missing equal to can be deduced
[8] Fixed HTMLLinkScanner bug - end tags within a link were not being =
included
inside the link data.
Please give your feedback regarding the API changes.=20
NOTE=20
[1] this release would break your existing applications due to the API =
change. Simply wrap the parsing in a try-catch block to cath a =
HTMLParserException and your apps should work again.
[2] There is one known bug (590703) caught by two testcases in =
parseParameters(). This is a minor bug which shouldnt affect =
applications, and should be fixed in the next release.
Cheers,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-04 06:31:22
|
Hi Kaarle,
I've managed to fix this bug in HTMLImageScanner. Meanwhile, there =
is a small issue - it seems that parseParameters() cannot handle -=20
<tag name=3D"">
I'd expect to have an empty string in the hashtable, but the testcase =
breaks. (HTMLImageScannerTest.testMissingEqualTo()). Although for this =
release, we can go without this fix. I will put in a report soon.
Cheers,
Somik
----- Original Message -----=20
From: Kaarle Kaila=20
To: htm...@li...=20
Sent: Sunday, August 04, 2002 1:00 PM
Subject: Re: [Htmlparser-developer] Bug Report
On Sunday 04 August 2002 04:07, Somik Raha wrote:
> Hi Claude
> I've fixed this bug, but I found another on the page you sent =
which I
> dont know how to fix : <img src"/images/spacer.gif" width=3D"1" =
height=3D"1"
> alt=3D"">
I would say that no reason to accept it as src=3D"/images/spacer.gif"
but maybe it could be accepted as 'src/images/spacer.gif'=20
or "/images/spacer.gif" or someting similar i.e as just a bad =
parameter
name without value. I don't know how parseParameters would take it
but it should probably do something like that.
regards
Kaarle
>
> This one is driving me crazy - how can a browser accept this!!
> Anyway, I am throwing exceptions now.. I need to think and see if =
its
> possible to accept this as well.
>
> Regards,
> Somik
--=20
-------------------------------------------
Kaarle Kaila
mailto:kaa...@ik...
http://www.iki.fi/kaila
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
|
|
From: Somik R. <so...@ya...> - 2002-08-04 06:09:36
|
Hi Kaarle,
I was also thinking the fix might be done in parseParameters().
But the point is - as humans, we can easily tell that it should be =
taken as src=3D.
So this correction should be possible... I found that the current =
crash is happening in the HTMLImageScanner class- which means =
parseParameters can be left as is, and we could try to add this =
intelligence (correction) from the scanner end - and perhaps fix the tag =
and call for it to be parsed again.=20
=20
A second reason is this kind of smart logic makes sense only in a =
particular context, and it might not be good to clutter =
parseParameters() which has to stay as optimal as possible. I will try =
to work on these lines, and see if a fix is possible.
Cheers,
Somik
=20
----- Original Message -----=20
From: Kaarle Kaila=20
To: htm...@li...=20
Sent: Sunday, August 04, 2002 1:00 PM
Subject: Re: [Htmlparser-developer] Bug Report
On Sunday 04 August 2002 04:07, Somik Raha wrote:
> Hi Claude
> I've fixed this bug, but I found another on the page you sent =
which I
> dont know how to fix : <img src"/images/spacer.gif" width=3D"1" =
height=3D"1"
> alt=3D"">
I would say that no reason to accept it as src=3D"/images/spacer.gif"
but maybe it could be accepted as 'src/images/spacer.gif'=20
or "/images/spacer.gif" or someting similar i.e as just a bad =
parameter
name without value. I don't know how parseParameters would take it
but it should probably do something like that.
regards
Kaarle
>
> This one is driving me crazy - how can a browser accept this!!
> Anyway, I am throwing exceptions now.. I need to think and see if =
its
> possible to accept this as well.
>
> Regards,
> Somik
--=20
-------------------------------------------
Kaarle Kaila
mailto:kaa...@ik...
http://www.iki.fi/kaila
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
|
|
From: Kaarle K. <kaa...@ik...> - 2002-08-04 04:02:34
|
On Sunday 04 August 2002 04:07, Somik Raha wrote: > Hi Claude > I've fixed this bug, but I found another on the page you sent which= I > dont know how to fix : <img src"/images/spacer.gif" width=3D"1" height=3D= "1" > alt=3D""> I would say that no reason to accept it as src=3D"/images/spacer.gif" but maybe it could be accepted as 'src/images/spacer.gif'=20 or "/images/spacer.gif" or someting similar i.e as just a bad parameter name without value. I don't know how parseParameters would take it but it should probably do something like that. regards Kaarle > > This one is driving me crazy - how can a browser accept this!! > Anyway, I am throwing exceptions now.. I need to think and see if its > possible to accept this as well. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila |
|
From: Somik R. <so...@ya...> - 2002-08-04 01:07:43
|
Hi Claude
I've fixed this bug, but I found another on the page you sent which =
I dont know how to fix :
<img src"/images/spacer.gif" width=3D"1" height=3D"1" alt=3D"">
This one is driving me crazy - how can a browser accept this!!
Anyway, I am throwing exceptions now.. I need to think and see if its =
possible to accept this as well.
Regards,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-03 10:48:34
|
Hi Folks,
A quick update.. I have just integrated Claude's contribution of =
HTMLParserException. The idea of using chained exceptions is really =
cool.=20
Claude--> I couldnt find your HTMLEnumeration class, so I made my =
own interface.
Interesting thing - the performance has been enhanced quite a bit - =
bcos no HTMLNode class casts are needed.=20
However, the down side is, all existing apps based on the parser =
will need modification as the API has changed.
I should be able to make the release tomorrow, after fixing the bug =
that Claude has reported.
Cheers,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-02 12:29:46
|
Hi Kaarle, > isApo waits for next '-sign and > isAmp waits for next "-sign. I guess isAmp should be called something else > (isCitation?) > > I guess t stands for temp. Perhaps it could be e.g. item. > st should perhaps be token but then > the current token should be renamed to something like tokenSet. Thanks for the clarifications. I will change the names tomorrow. > >Once again - thanks so much for your quick action on this bug. Bytway, > >could you flag this bug as fixed on the htmlparser page with some comment, > >for archiving purposes ? (You are a developer, so you can login and go to > >the htmlparser bugs page from > ><http://htmlparser.sourceforge.net>http://htmlparser.sourceforge.net ). > > OK. I wrote there something. Hope that was what you meant. Yeah - that was good. Can you also change the status of the bug to "fixed", and close the report (change the Open status to Closed). Thanks a lot. Cheers, Somik |
|
From: Somik R. <so...@ya...> - 2002-08-02 06:33:37
|
MessageHi Claude, From our point of view, a hag is devastating in that it does not allow = the application to move forward. An exception would be ideal in that it = would identify the problem without breaking the application. Like I said earlier, your suggestions are the most important part of the = 1.2 release - they are the last thing left - I am stuck trying to get = some time - swamped with managing several other os projects.=20 I'd really appreciate if some developers can come forward and help with = implementing Claude's suggestions. I might be able to spend some time = this weekend, but a collaborative effort is always better - this project = is getting way too big for me to handle alone, and it has come this far = only due to the suggestions and requirements of others.=20 Regards, Somik |