htmlparser-developer Mailing List for HTML Parser (Page 28)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: Kaarle K. <kaa...@kk...> - 2002-08-06 07:07:18
|
I still had a look at the code and made a small addition that would accept <a b"c"> as <a b="c"> Would it be usefull to have it inserted into CVS? or is it OK as it is? regards Kaarle PS! I can't access CVS until the evening ---- Original Message ---- From: so...@ya... To: htm...@li... Subject: Re: [Htmlparser-developer] Update Date: Tue, 6 Aug 2002 15:42:29 +0900 >Hi Kaarle, > Thanks for the clarification. > >Regards, >Somik > > >I did not really do that I think. I just made a testcase that >seems > >to verify > >that <a b"c"> will be assume to be <a b> , same as <a b=""> > > > >Oh - then what happens to c, is it ignored? > > > > Yes! That's what seems to happen. As I said I only added a testcase > to verify what happens. I did not change the code for this purpose. > > regards > Kaarle > > > > >Cheers, > >Somik > > > ----------------------------- > Kaarle Kaila > http://www.iki.fi/kaila > mailto:kaa...@ik... > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > ----------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... |
From: Somik R. <so...@ya...> - 2002-08-06 06:49:28
|
Hi Kaarle, Thanks for the clarification. Regards, Somik ----- Original Message -----=20 From: Kaarle Kaila=20 To: so...@ya... ; htm...@li...=20 Sent: Tuesday, August 06, 2002 3:34 PM Subject: Re: [Htmlparser-developer] Update >I did not really do that I think. I just made a testcase that seems=20 >to verify=20 >that <a b"c"> will be assume to be <a b> , same as <a b=3D""> > >Oh - then what happens to c, is it ignored?=20 > Yes! That's what seems to happen. As I said I only added a testcase to verify what happens. I did not change the code for this purpose. regards Kaarle >Cheers, >Somik > ----------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Kaarle K. <kaa...@kk...> - 2002-08-06 06:34:11
|
>I did not really do that I think. I just made a testcase that seems >to verify >that <a b"c"> will be assume to be <a b> , same as <a b=""> > >Oh - then what happens to c, is it ignored? > Yes! That's what seems to happen. As I said I only added a testcase to verify what happens. I did not change the code for this purpose. regards Kaarle >Cheers, >Somik > ----------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... |
From: Somik R. <so...@ya...> - 2002-08-06 04:14:11
|
I did not really do that I think. I just made a testcase that seems to = verify=20 that <a b"c"> will be assume to be <a b> , same as <a b=3D""> Oh - then what happens to c, is it ignored?=20 Cheers, Somik |
From: Kaarle K. <kaa...@ik...> - 2002-08-06 04:10:32
|
On Tuesday 06 August 2002 04:36, Somik Raha wrote: > Hi Folks > An update to let you know Kaarle has fixed the bug 590703 and he ha= s > also made a modification in parseParameters(), by which tags of the for= m <a > b"c"> will be assume to be > <a b=3D"c"> hi! I did not really do that I think. I just made a testcase that seems to ve= rify=20 that <a b"c"> will be assume to be <a b> , same as <a b=3D""> regards Kaarle > > Latest code is in CVS, all testcases passing. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila |
From: Somik R. <so...@ya...> - 2002-08-06 02:15:37
|
Hi Folks, Would someone volunteer or know someone who could volunteer to write = the documentation for this project ? In fact a lot of quality docs are = really needed. Production release 1.2 must be accompanied with quality docs.. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-06 02:13:55
|
Hi Folks, Amit Rana is a new developer on HTMLParser. He has considerable = experience in internationalization - and he is currently working to = enable language support and switching. Two languages high on my list are = - French and Finnish, considering we've had French and Finnish = developers on this project. We also want to do Japanese support. The architecture that Amit is trying is nice - it will simply = require publishing of a standard English properties file - and for any = language support, a corresponding translated properties file will be = loaded up. Amit --> you can probably give a more detailed explanation here. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-06 02:10:44
|
Hi Claude, No no, I wasnt planning to use log4j for the parser :) Just mentioning that the model is so similar. J2SDK 1.4.x of course = has the same logging stuff in their APIs. I agree with your reasoning - we'll start putting in the feedback = classes down the line. Let me see if I can find some time in the weekend = to analyze this. If anyone else wants to try this integration - pls feel = free. Regards, Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Monday, August 05, 2002 1:04 PM Subject: RE: [Htmlparser-developer] HTMLParserFeedback Please don't introduce any dependencies on other libraries. The = Feedback model is intended to allow users to redirect output to wherever = they see fit for their application. The default sends output to the = console but it's easy for implementers to make more local decisions = based on their context, by replacing the default implementation, so long = as the interface is valid. The whole idea of a library/framework is that = the input/output is controllable by the developer using it. You don't = want any coupling to other libraries. Let developers decide what's = suitable for their application. It's similar to the ErrorHandler in SAX, = though in their case, the output goes nowhere by default. It's up to = users to decide what to do. =20 You'll notice that the Feedback classes introduce a model that library = developers can use to direct output to a place that won't interfere with = the library user/developer's notion of where things could go. I've been = meaning to write something more specific about this design pattern but = things just keep getting in the way. In any case, use the Feedback = mechanism as a way of allowing users to decide where the output should = go or whether it should be ignored. Consider it a replacement for = System.out and System.err. Users can later decide whether the output = (which falls into simple categories) should be logged, send to the = console, written to a GUI, rerouted to sockets, filtered by pipelines or = simply ignored. The beauty of this design is all in the uncoupling, ushc = that the library user decides what's relevant in their application. =20 -----Original Message-----=20 From: Somik Raha [mailto:so...@ya...]=20 Sent: Sun 8/4/2002 12:34 AM=20 To: htm...@li...=20 Cc:=20 Subject: [Htmlparser-developer] HTMLParserFeedback Hi Developers, This is to initiate a discussion on the next step, on integration = feedback into the parser. Claude had submitted HTMLParserFeedback = interface (in the util package) - which allow us to log the activity of = the parser, inform when errors occur, and show warnings.=20 I am familiar with log4j, and this sounds pretty similar - in = terms of functionality, it sounds good. But in terms of performance, my = question is : [1] Will this result in an unacceptable performance hit ? [2] Should we provide alternate constructors or modify existing API ? = If we provide alternates, then what default behaviour would be best ? = Are we talking about default callback objects - if yes, the strings = created for each call would slow down the parser. It would be great to have some thoughts on this. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-06 02:08:12
|
Dear Amit, trying to parser "www.google.com" gives <<Error! URL www.google.com=20 Malformed!>> on linux. if i give "http://www.google.com, it works. 1. is this behaviour expected? Yes, this is expected. Previously, the parser used to look for http. = This would cause problems when the protocol was different - like ftp, or = something else.. In order to not restrict the protocol, this checking = has been removed.=20 OTOH, I agree with your observation about throwing exceptions in the = constructor. I have added that capability now, and its in CVS. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-06 01:52:10
|
MessageI was just looking at HTMLReader and realized that it extends = BufferedReader AND takes a BufferedReader as an argument. This would, if = I'm not mistaken, result in a nested pair of BufferedReaders. Is this = intentional? Thank you for finding this - its a mistake. I've corrected it and = modified HTMLReader so it only takes a Reader, and does not hold state. = Some of the code in the parser is over 2 years old.. thanks for finding = this. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-06 01:43:51
|
Hi Folks An update to let you know Kaarle has fixed the bug 590703 and he has = also made a modification in parseParameters(), by which tags of the form = <a b"c"> will be assume to be=20 <a b=3D"c"> Latest code is in CVS, all testcases passing. Regards, Somik |
From: Claude D. <CD...@ar...> - 2002-08-05 19:02:47
|
If the parser (1.2 integration build) encounters the following code it hangs: =20 <html><head><TITLE> <html><head><TITLE> Double tags can hang the code </TITLE></head><body> <body><html> =20 I have created this reproducible source document but I am still trying to issolate the source of the problem. =20 BTW: The exception handling is excellent this way Somik. There are a few conditions that hang the parser which should throw exceptions, but the framework is in place to get there now. Thanks. |
From: Claude D. <CD...@ar...> - 2002-08-05 17:53:43
|
I was just looking at HTMLReader and realized that it extends BufferedReader AND takes a BufferedReader as an argument. This would, if I'm not mistaken, result in a nested pair of BufferedReaders. Is this intentional? |
From: Amit R. <ra...@ma...> - 2002-08-05 05:15:19
|
hi, =09trying to parser "www.google.com" gives <<Error! URL www.google.com=20 Malformed!>> on linux. =09if i give "http://www.google.com", it works. 1. is this behaviour expected? =09following code throws exception: =09URL url =3D new URL(resourceLocn); in HTMLParser, method openConnectio= n()=20 bombs. Although API explains why this should bomb but I wanted to know=20 whether this is expected? If not, should we append "http://" to the front. Somik, This is another reason why HTMLParser should throw exception. If you see= =20 HTMLStringFilter example, even though HTMLParser has bombed, there is no = way=20 in which HTMLStringFilter can know it should stop instead it goes on with= =20 rest of its processing. Regards, Amit. |
From: Claude D. <CD...@ar...> - 2002-08-05 04:04:08
|
UGxlYXNlIGRvbid0IGludHJvZHVjZSBhbnkgZGVwZW5kZW5jaWVzIG9uIG90aGVyIGxpYnJhcmll cy4gVGhlIEZlZWRiYWNrIG1vZGVsIGlzIGludGVuZGVkIHRvIGFsbG93IHVzZXJzIHRvIHJlZGly ZWN0IG91dHB1dCB0byB3aGVyZXZlciB0aGV5IHNlZSBmaXQgZm9yIHRoZWlyIGFwcGxpY2F0aW9u LiBUaGUgZGVmYXVsdCBzZW5kcyBvdXRwdXQgdG8gdGhlIGNvbnNvbGUgYnV0IGl0J3MgZWFzeSBm b3IgaW1wbGVtZW50ZXJzIHRvIG1ha2UgbW9yZSBsb2NhbCBkZWNpc2lvbnMgYmFzZWQgb24gdGhl aXIgY29udGV4dCwgYnkgcmVwbGFjaW5nIHRoZSBkZWZhdWx0IGltcGxlbWVudGF0aW9uLCBzbyBs b25nIGFzIHRoZSBpbnRlcmZhY2UgaXMgdmFsaWQuIFRoZSB3aG9sZSBpZGVhIG9mIGEgbGlicmFy eS9mcmFtZXdvcmsgaXMgdGhhdCB0aGUgaW5wdXQvb3V0cHV0IGlzIGNvbnRyb2xsYWJsZSBieSB0 aGUgZGV2ZWxvcGVyIHVzaW5nIGl0LiBZb3UgZG9uJ3Qgd2FudCBhbnkgY291cGxpbmcgdG8gb3Ro ZXIgbGlicmFyaWVzLiBMZXQgZGV2ZWxvcGVycyBkZWNpZGUgd2hhdCdzIHN1aXRhYmxlIGZvciB0 aGVpciBhcHBsaWNhdGlvbi4gSXQncyBzaW1pbGFyIHRvIHRoZSBFcnJvckhhbmRsZXIgaW4gU0FY LCB0aG91Z2ggaW4gdGhlaXIgY2FzZSwgdGhlIG91dHB1dCBnb2VzIG5vd2hlcmUgYnkgZGVmYXVs dC4gSXQncyB1cCB0byB1c2VycyB0byBkZWNpZGUgd2hhdCB0byBkby4NCiANCllvdSdsbCBub3Rp Y2UgdGhhdCB0aGUgRmVlZGJhY2sgY2xhc3NlcyBpbnRyb2R1Y2UgYSBtb2RlbCB0aGF0IGxpYnJh cnkgZGV2ZWxvcGVycyBjYW4gdXNlIHRvIGRpcmVjdCBvdXRwdXQgdG8gYSBwbGFjZSB0aGF0IHdv bid0IGludGVyZmVyZSB3aXRoIHRoZSBsaWJyYXJ5IHVzZXIvZGV2ZWxvcGVyJ3Mgbm90aW9uIG9m IHdoZXJlIHRoaW5ncyBjb3VsZCBnby4gSSd2ZSBiZWVuIG1lYW5pbmcgdG8gd3JpdGUgc29tZXRo aW5nIG1vcmUgc3BlY2lmaWMgYWJvdXQgdGhpcyBkZXNpZ24gcGF0dGVybiBidXQgdGhpbmdzIGp1 c3Qga2VlcCBnZXR0aW5nIGluIHRoZSB3YXkuIEluIGFueSBjYXNlLCB1c2UgdGhlIEZlZWRiYWNr IG1lY2hhbmlzbSBhcyBhIHdheSBvZiBhbGxvd2luZyB1c2VycyB0byBkZWNpZGUgd2hlcmUgdGhl IG91dHB1dCBzaG91bGQgZ28gb3Igd2hldGhlciBpdCBzaG91bGQgYmUgaWdub3JlZC4gQ29uc2lk ZXIgaXQgYSByZXBsYWNlbWVudCBmb3IgU3lzdGVtLm91dCBhbmQgU3lzdGVtLmVyci4gVXNlcnMg Y2FuIGxhdGVyIGRlY2lkZSB3aGV0aGVyIHRoZSBvdXRwdXQgKHdoaWNoIGZhbGxzIGludG8gc2lt cGxlIGNhdGVnb3JpZXMpIHNob3VsZCBiZSBsb2dnZWQsIHNlbmQgdG8gdGhlIGNvbnNvbGUsIHdy aXR0ZW4gdG8gYSBHVUksIHJlcm91dGVkIHRvIHNvY2tldHMsIGZpbHRlcmVkIGJ5IHBpcGVsaW5l cyBvciBzaW1wbHkgaWdub3JlZC4gVGhlIGJlYXV0eSBvZiB0aGlzIGRlc2lnbiBpcyBhbGwgaW4g dGhlIHVuY291cGxpbmcsIHVzaGMgdGhhdCB0aGUgbGlicmFyeSB1c2VyIGRlY2lkZXMgd2hhdCdz IHJlbGV2YW50IGluIHRoZWlyIGFwcGxpY2F0aW9uLg0KIA0KLS0tLS1PcmlnaW5hbCBNZXNzYWdl LS0tLS0gDQpGcm9tOiBTb21payBSYWhhIFttYWlsdG86c29taWtAeWFob28uY29tXSANClNlbnQ6 IFN1biA4LzQvMjAwMiAxMjozNCBBTSANClRvOiBodG1scGFyc2VyLWRldmVsb3BlckBsaXN0cy5z b3VyY2Vmb3JnZS5uZXQgDQpDYzogDQpTdWJqZWN0OiBbSHRtbHBhcnNlci1kZXZlbG9wZXJdIEhU TUxQYXJzZXJGZWVkYmFjaw0KDQoNCg0KCUhpIERldmVsb3BlcnMsDQoJICAgIFRoaXMgaXMgdG8g aW5pdGlhdGUgYSBkaXNjdXNzaW9uIG9uIHRoZSBuZXh0IHN0ZXAsIG9uIGludGVncmF0aW9uIGZl ZWRiYWNrIGludG8gdGhlIHBhcnNlci4gQ2xhdWRlIGhhZCBzdWJtaXR0ZWQgSFRNTFBhcnNlckZl ZWRiYWNrIGludGVyZmFjZSAoaW4gdGhlIHV0aWwgcGFja2FnZSkgLSB3aGljaCBhbGxvdyB1cyB0 byBsb2cgdGhlIGFjdGl2aXR5IG9mIHRoZSBwYXJzZXIsIGluZm9ybSB3aGVuIGVycm9ycyBvY2N1 ciwgYW5kIHNob3cgd2FybmluZ3MuIA0KCSANCgkgICAgSSBhbSBmYW1pbGlhciB3aXRoIGxvZzRq LCBhbmQgdGhpcyBzb3VuZHMgcHJldHR5IHNpbWlsYXIgLSBpbiB0ZXJtcyBvZiBmdW5jdGlvbmFs aXR5LCBpdCBzb3VuZHMgZ29vZC4gQnV0IGluIHRlcm1zIG9mIHBlcmZvcm1hbmNlLCBteSBxdWVz dGlvbiBpcyA6DQoJWzFdIFdpbGwgdGhpcyByZXN1bHQgaW4gYW4gdW5hY2NlcHRhYmxlIHBlcmZv cm1hbmNlIGhpdCA/DQoJWzJdIFNob3VsZCB3ZSBwcm92aWRlIGFsdGVybmF0ZSBjb25zdHJ1Y3Rv cnMgb3IgbW9kaWZ5IGV4aXN0aW5nIEFQSSA/IElmIHdlIHByb3ZpZGUgYWx0ZXJuYXRlcywgdGhl biB3aGF0IGRlZmF1bHQgYmVoYXZpb3VyIHdvdWxkIGJlIGJlc3QgPyBBcmUgd2UgdGFsa2luZyBh Ym91dCBkZWZhdWx0IGNhbGxiYWNrIG9iamVjdHMgLSBpZiB5ZXMsIHRoZSBzdHJpbmdzIGNyZWF0 ZWQgZm9yIGVhY2ggY2FsbCB3b3VsZCBzbG93IGRvd24gdGhlIHBhcnNlci4NCgkgDQoJICAgIEl0 IHdvdWxkIGJlIGdyZWF0IHRvIGhhdmUgc29tZSB0aG91Z2h0cyBvbiB0aGlzLg0KCSANCglSZWdh cmRzLA0KCVNvbWlrDQoJIA0KCSANCg0K |
From: Somik R. <so...@ya...> - 2002-08-04 07:41:54
|
Hi Developers, This is to initiate a discussion on the next step, on integration = feedback into the parser. Claude had submitted HTMLParserFeedback = interface (in the util package) - which allow us to log the activity of = the parser, inform when errors occur, and show warnings.=20 I am familiar with log4j, and this sounds pretty similar - in terms = of functionality, it sounds good. But in terms of performance, my = question is : [1] Will this result in an unacceptable performance hit ? [2] Should we provide alternate constructors or modify existing API ? If = we provide alternates, then what default behaviour would be best ? Are = we talking about default callback objects - if yes, the strings created = for each call would slow down the parser. It would be great to have some thoughts on this. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-04 07:27:46
|
I forgot to mention - the most important bug fix in this release is in = parseParameters() (588885), done by Kaarle Kaila, because of which we = have been able to incorporate "intelligence" in the parsing, making = Cedric Rosa a happy man. Thanks a ton, Kaarle. Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-08-04 07:22:36
|
Hi Folks, Its time again, another integration release is out. Check = http://htmlparser.sourceforge.net. So whats new? Major API change - the parser now has chained = exceptions. If some problem occurs, your application will have a chance = to take care of it, instead of simply crashing. Also, the exception = messages are more meaningful, giving a better picture of what went = wrong. Thanks to Claude Duguay for the ChainedException classes, and bug = reports. And many thanks to the best tester of HTMLParser - Cedric Rosa = - for countless bug reports - pls keep up the good work. From the change log,=20 [1] Fixed bug 590250, problem in HTMLStringNode, by which a single character on the last line was causing a parser crash [2] Optimized and refactored HTMLParameterParaser.parseParameters() [3] Modified PerformanceTest to exclude first reading in average = computation [4] Fixed bug in HTMLParameterParser.parseParameters(), due to which params with spaces before =3D were not being picked up [5] Made massive API changes - throwing exceptions and using = HTMLEnumeration [6] Fixed HTMLRemarkNode bug - we can recognize stuff like now. [7] Fixed HTMLImageScanner bug - we can now fix image tags like IMG SRC"somepic.jpg" - the missing equal to can be deduced [8] Fixed HTMLLinkScanner bug - end tags within a link were not being = included inside the link data. Please give your feedback regarding the API changes.=20 NOTE=20 [1] this release would break your existing applications due to the API = change. Simply wrap the parsing in a try-catch block to cath a = HTMLParserException and your apps should work again. [2] There is one known bug (590703) caught by two testcases in = parseParameters(). This is a minor bug which shouldnt affect = applications, and should be fixed in the next release. Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-08-04 06:31:22
|
Hi Kaarle, I've managed to fix this bug in HTMLImageScanner. Meanwhile, there = is a small issue - it seems that parseParameters() cannot handle -=20 <tag name=3D""> I'd expect to have an empty string in the hashtable, but the testcase = breaks. (HTMLImageScannerTest.testMissingEqualTo()). Although for this = release, we can go without this fix. I will put in a report soon. Cheers, Somik ----- Original Message -----=20 From: Kaarle Kaila=20 To: htm...@li...=20 Sent: Sunday, August 04, 2002 1:00 PM Subject: Re: [Htmlparser-developer] Bug Report On Sunday 04 August 2002 04:07, Somik Raha wrote: > Hi Claude > I've fixed this bug, but I found another on the page you sent = which I > dont know how to fix : <img src"/images/spacer.gif" width=3D"1" = height=3D"1" > alt=3D""> I would say that no reason to accept it as src=3D"/images/spacer.gif" but maybe it could be accepted as 'src/images/spacer.gif'=20 or "/images/spacer.gif" or someting similar i.e as just a bad = parameter name without value. I don't know how parseParameters would take it but it should probably do something like that. regards Kaarle > > This one is driving me crazy - how can a browser accept this!! > Anyway, I am throwing exceptions now.. I need to think and see if = its > possible to accept this as well. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2002-08-04 06:09:36
|
Hi Kaarle, I was also thinking the fix might be done in parseParameters(). But the point is - as humans, we can easily tell that it should be = taken as src=3D. So this correction should be possible... I found that the current = crash is happening in the HTMLImageScanner class- which means = parseParameters can be left as is, and we could try to add this = intelligence (correction) from the scanner end - and perhaps fix the tag = and call for it to be parsed again.=20 =20 A second reason is this kind of smart logic makes sense only in a = particular context, and it might not be good to clutter = parseParameters() which has to stay as optimal as possible. I will try = to work on these lines, and see if a fix is possible. Cheers, Somik =20 ----- Original Message -----=20 From: Kaarle Kaila=20 To: htm...@li...=20 Sent: Sunday, August 04, 2002 1:00 PM Subject: Re: [Htmlparser-developer] Bug Report On Sunday 04 August 2002 04:07, Somik Raha wrote: > Hi Claude > I've fixed this bug, but I found another on the page you sent = which I > dont know how to fix : <img src"/images/spacer.gif" width=3D"1" = height=3D"1" > alt=3D""> I would say that no reason to accept it as src=3D"/images/spacer.gif" but maybe it could be accepted as 'src/images/spacer.gif'=20 or "/images/spacer.gif" or someting similar i.e as just a bad = parameter name without value. I don't know how parseParameters would take it but it should probably do something like that. regards Kaarle > > This one is driving me crazy - how can a browser accept this!! > Anyway, I am throwing exceptions now.. I need to think and see if = its > possible to accept this as well. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Kaarle K. <kaa...@ik...> - 2002-08-04 04:02:34
|
On Sunday 04 August 2002 04:07, Somik Raha wrote: > Hi Claude > I've fixed this bug, but I found another on the page you sent which= I > dont know how to fix : <img src"/images/spacer.gif" width=3D"1" height=3D= "1" > alt=3D""> I would say that no reason to accept it as src=3D"/images/spacer.gif" but maybe it could be accepted as 'src/images/spacer.gif'=20 or "/images/spacer.gif" or someting similar i.e as just a bad parameter name without value. I don't know how parseParameters would take it but it should probably do something like that. regards Kaarle > > This one is driving me crazy - how can a browser accept this!! > Anyway, I am throwing exceptions now.. I need to think and see if its > possible to accept this as well. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila |
From: Somik R. <so...@ya...> - 2002-08-04 01:07:43
|
Hi Claude I've fixed this bug, but I found another on the page you sent which = I dont know how to fix : <img src"/images/spacer.gif" width=3D"1" height=3D"1" alt=3D""> This one is driving me crazy - how can a browser accept this!! Anyway, I am throwing exceptions now.. I need to think and see if its = possible to accept this as well. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-08-03 10:48:34
|
Hi Folks, A quick update.. I have just integrated Claude's contribution of = HTMLParserException. The idea of using chained exceptions is really = cool.=20 Claude--> I couldnt find your HTMLEnumeration class, so I made my = own interface. Interesting thing - the performance has been enhanced quite a bit - = bcos no HTMLNode class casts are needed.=20 However, the down side is, all existing apps based on the parser = will need modification as the API has changed. I should be able to make the release tomorrow, after fixing the bug = that Claude has reported. Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-08-02 12:29:46
|
Hi Kaarle, > isApo waits for next '-sign and > isAmp waits for next "-sign. I guess isAmp should be called something else > (isCitation?) > > I guess t stands for temp. Perhaps it could be e.g. item. > st should perhaps be token but then > the current token should be renamed to something like tokenSet. Thanks for the clarifications. I will change the names tomorrow. > >Once again - thanks so much for your quick action on this bug. Bytway, > >could you flag this bug as fixed on the htmlparser page with some comment, > >for archiving purposes ? (You are a developer, so you can login and go to > >the htmlparser bugs page from > ><http://htmlparser.sourceforge.net>http://htmlparser.sourceforge.net ). > > OK. I wrote there something. Hope that was what you meant. Yeah - that was good. Can you also change the status of the bug to "fixed", and close the report (change the Open status to Closed). Thanks a lot. Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-08-02 06:33:37
|
MessageHi Claude, From our point of view, a hag is devastating in that it does not allow = the application to move forward. An exception would be ideal in that it = would identify the problem without breaking the application. Like I said earlier, your suggestions are the most important part of the = 1.2 release - they are the last thing left - I am stuck trying to get = some time - swamped with managing several other os projects.=20 I'd really appreciate if some developers can come forward and help with = implementing Claude's suggestions. I might be able to spend some time = this weekend, but a collaborative effort is always better - this project = is getting way too big for me to handle alone, and it has come this far = only due to the suggestions and requirements of others.=20 Regards, Somik |