htmlparser-developer Mailing List for HTML Parser (Page 9)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: Marc N. <ma...@ke...> - 2003-08-22 05:24:03
|
Derrick, these changes sound great! Thank you so much for putting so = much work into creating a top notch lexer package. I'll definitely put = some time into going over your code, and I'll definitely help with = testing out the integration once it gets underway. Marc -----Original Message----- From: Derrick Oswald [mailto:Der...@ro...] Sent: Wednesday, August 20, 2003 7:56 PM To: htm...@li... Subject: [Htmlparser-developer] new i/o subsystem Marc, James, Somik, Joshua, Amit, et. al. I've just dropped some speed fixes to the lexer package, the new low=20 level i/o subsystem I've been working on. It now appears to be 10% to 50% faster at getting raw nodes than the=20 NodeReader/parserHelpers were. It's not complete: - it needs an EndNode class for speed and memory reasons - I backed off multi-threading for speed - character set detection isn't really working yet - there's no constructor taking a file name But the next logical step is probably integration into the real parser=20 to run against real test cases. However, I think this will cause a *lot* of unit tests to fail. There are a number of reasons for this: - attributes will have case preserved, I think I've gotten around=20 this temporarily with a switch in the ParserTestCase class - whitespace is preserved, a lot of this has to do with the=20 different line endings handling - the order of attributes in tags is preserved, so toHtml() output=20 is completely different - the count of nodes may be altered by the whitespace nodes, this=20 may require changing the ParserTestCase counting strategy - remark nodes store all the text, even the dashes - I mostly only paid attention to the HTML specification, real HTML=20 is somewhat more exotic All these failing tests will need labour intensive manual attention to=20 detail to get the tests correct again. In other words, once this is integrated there's no turning back. As with any animal that's having it's spine replaced, there's bound to=20 be a bit of pain. So, before that happens, the code should go through a period of severe=20 code review. That's what open source is about right? So if you have some time. please go over the lexer package with a fine=20 tooth comb. Add more test cases to the lexerTests package. Take a look at the toString() output (see testReal in LexerTests for=20 example). Optimize the hell out of it. Bounce it around and see what methods would make you happy. Then add = them. I'm thinking, two weeks minimum, so this period would span at least two=20 integration builds. The first one will be August 24th, so if you don't have CVS access=20 you'll need to start with that. OK, let's have at 'er folks! Derrick ------------------------------------------------------- This SF.net email is sponsored by Dice.com. Did you know that Dice has over 25,000 tech jobs available today? From careers in IT to Engineering to Tech Sales, Dice has tech jobs from the best hiring companies. http://www.dice.com/index.epl?rel_code=3D104 _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Derrick O. <Der...@ro...> - 2003-08-21 06:03:09
|
Marc, James, Somik, Joshua, Amit, et. al. I've just dropped some speed fixes to the lexer package, the new low level i/o subsystem I've been working on. It now appears to be 10% to 50% faster at getting raw nodes than the NodeReader/parserHelpers were. It's not complete: - it needs an EndNode class for speed and memory reasons - I backed off multi-threading for speed - character set detection isn't really working yet - there's no constructor taking a file name But the next logical step is probably integration into the real parser to run against real test cases. However, I think this will cause a *lot* of unit tests to fail. There are a number of reasons for this: - attributes will have case preserved, I think I've gotten around this temporarily with a switch in the ParserTestCase class - whitespace is preserved, a lot of this has to do with the different line endings handling - the order of attributes in tags is preserved, so toHtml() output is completely different - the count of nodes may be altered by the whitespace nodes, this may require changing the ParserTestCase counting strategy - remark nodes store all the text, even the dashes - I mostly only paid attention to the HTML specification, real HTML is somewhat more exotic All these failing tests will need labour intensive manual attention to detail to get the tests correct again. In other words, once this is integrated there's no turning back. As with any animal that's having it's spine replaced, there's bound to be a bit of pain. So, before that happens, the code should go through a period of severe code review. That's what open source is about right? So if you have some time. please go over the lexer package with a fine tooth comb. Add more test cases to the lexerTests package. Take a look at the toString() output (see testReal in LexerTests for example). Optimize the hell out of it. Bounce it around and see what methods would make you happy. Then add them. I'm thinking, two weeks minimum, so this period would span at least two integration builds. The first one will be August 24th, so if you don't have CVS access you'll need to start with that. OK, let's have at 'er folks! Derrick |
From: Amit R. <ami...@ya...> - 2003-08-18 07:17:14
|
Hi, I looked in the problem briefly, while trying to parse for links on www.009.com the parser loops infinetly on following tags <IMG src="www_009_com home page_files/imode.gif" border=0 width="44" height="54"><A href="http://www.009.com/cgi/machine1.pl"> iモード対応ページ</A> I will look in detail when i get time later. Regards, Amit. NOTE: Parser succesfully returns <IMG src="www_009_com home page_files/qv10anim.gif" border=0 width="28" height="16"><BR><A href="http://www.009.com/suginami/">Digital Photo 杉並デジカメ探偵団 with Casio QV-10A</A> __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
From: Derrick O. <Der...@ro...> - 2003-07-03 15:38:46
|
Stig, Yes, you're correct. See bug #755929 Empty string attr. value causes attr parsing to be stopped <http://sourceforge.net/tracker/index.php?func=detail&aid=755929&group_id=24399&atid=381399> It's been fixed in version 1.4 though: http://sourceforge.net/project/showfiles.php?group_id=24399&release_id=168419 Derrick Stig Tanggaard wrote: > Hey > > I noticed that when the img tag takes the following form: > > <img height=45 > alt="" > > src="http://www2.incredimail.com/images/newsletter/021/maintitle.gif" > width=460 border=0> > > The rest of the attributes after alt (inclusive) disappears. > > This happens when iterating over the nodes and when I call node.toHtml() > > I suspect its the empty alt tag causing it, coz removing it will get > the tags properly. > > Stig Tanggaard > > ps. this is with version 1.3. |
From: Stig T. <st...@eu...> - 2003-07-02 21:22:46
|
Hey I noticed that when the img tag takes the following form: <img height=3D45=20 alt=3D""=20 = src=3D"http://www2.incredimail.com/images/newsletter/021/maintitle.gif"=20 width=3D460 border=3D0> The rest of the attributes after alt (inclusive) disappears.=20 This happens when iterating over the nodes and when I call node.toHtml() I suspect its the empty alt tag causing it, coz removing it will get the = tags properly.=20 Stig Tanggaard ps. this is with version 1.3. |
From: Derrick O. <Der...@ro...> - 2003-07-02 19:18:05
|
Joshua, I added the test case to TableScannerTest but I couldn't reproduce it. Perhaps it's a Windows thing again (see for example the threads around bug #725338 StackOverflow Error). If you can reproduce it perhaps you can find the cause and fix it. Derrick Joshua Kerievsky wrote: > Hi Derrick, > > Was testOverFlow working for you when you checked it in? It is > failing for me. If we expect it to fail for the moment, I could > temporarily put it in the temporaryFailures package that Somik setup. > > thanks > jk > > > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |
From: Joshua K. <jo...@in...> - 2003-06-29 19:35:33
|
Hi Derrick, Was testOverFlow working for you when you checked it in? It is failing for me. If we expect it to fail for the moment, I could temporarily put it in the temporaryFailures package that Somik setup. thanks jk |
From: Marc N. <ma...@ke...> - 2003-06-17 04:42:11
|
VGhhbmtzIFNvbWlrISAgSSdsbCB0cnkgb3V0IHlvdXIgZml4IHRvIFNjcmlwdFNjYW5uZXIgdG9t b3Jyb3cuDQogDQpNYXJjDQoNCgktLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLSANCglGcm9tOiBT b21payBSYWhhIFttYWlsdG86c29taWtAeWFob28uY29tXSANCglTZW50OiBNb24gNi8xNi8yMDAz IDY6NTMgUE0gDQoJVG86IGh0bWxwYXJzZXItZGV2ZWxvcGVyQGxpc3RzLnNvdXJjZWZvcmdlLm5l dCANCglDYzogDQoJU3ViamVjdDogUmU6IFtIdG1scGFyc2VyLWRldmVsb3Blcl0gZmFpbGluZyB0 ZXN0cyAoSm9zaCwgRGVycmljaywgS2FhcmxlKQ0KCQ0KCQ0KDQoJSGkgSm9zaCwNCgkgICAgT24g c2Vjb25kIHRob3VnaHRzLCBJIHRoaW5rIHdoYXQgeW91IHN1Z2dlc3QgaXMgYSBnb29kIGlkZWEu Li4gOikNCgkNCgkgICAgQWZ0ZXIgZ29pbmcgdGhydSB0aGUgY29kZSwgSSByZWFsaXplZCB0aGF0 IHRoZSBjdXJyZW50IHdvcmsgaW4gcHJvZ3Jlc3MNCglpcyB0b28gY29tcGxleCBmb3IgbWUgdG8g Zml4IHF1aWNrbHkuLi4gVGhlcmUgYXJlIHNvbWUgcGFydGljdWxhcmx5IGhhaXJ5DQoJY2FzZXMg b2YgYXR0cmlidXRlIHBhcnNpbmcgdGhhdCBJIHRoaW5rIGFyZSBiZWluZyB3b3JrZWQgdXBvbiBh bmQgbWF5IG5vdCBiZQ0KCWRvbmUgc29vbi4NCgkNCgkgICAgSSd2ZSBzZXBlcmF0ZWQgZmFpbGlu ZyB0ZXN0cyBpbnRvIHRoZSBwYWNrYWdlDQoJb3JnLmh0bWxwYXJzZXIudGVzdHMudGVtcG9yYXJ5 RmFpbHVyZXMNCgkNCgkgICAgRGVycmljaywgTWFyYywgS2FhcmxlLCBKb3NoIC0gRmVlbCBmcmVl IHRvIGNoYW5nZSB0aGUgbmFtZSBpZiB5b3UgcHJlZmVyDQoJYSBtb3JlIGRlc2NyaXB0aXZlIG9u ZS4gQWxzbyBmZWVsIGZyZWUgdG8gcmV2aWV3IHRoaXMgbWVjaGFuaXNtIGFuZCBjaGFuZ2UNCglp dCBpZiB5b3UgZG8gbm90IGxpa2UgaXQuDQoJDQoJICAgIEthYXJsZSAtIEkndmUgcmVmYWN0b3Jl ZCBBdHRyaWJ1dGVQYXJzZXIgLSB0aGUgbG9naWMgcmVtYWlucyBpbnRhY3QuIEkNCgloYXZlIHRy aWVkIHRvIGd1ZXNzIHRoZSBsb2dpYyBhcyBtdWNoIGFzIEkgY291bGQsIGJ1dCB5b3Ugd291bGQg YmUgaW4gYQ0KCWJldHRlciBwb3NpdGlvbiB0byBjb21wbGV0ZSBpdCAobW9kaWZ5IG5hbWVzLCBl dGMuLikNCgkNCgkgICAgRm9sa3M6IHdlIGhhdmUgYSBncmVlbiBiYXIgbm93IQ0KCQ0KCVJlZ2Fy ZHMsDQoJU29taWsNCgktLS0tLSBPcmlnaW5hbCBNZXNzYWdlIC0tLS0tDQoJRnJvbTogIkpvc2h1 YSBLZXJpZXZza3kiIDxqb3NodWFAaW5kdXN0cmlhbGxvZ2ljLmNvbT4NCglUbzogPGh0bWxwYXJz ZXItZGV2ZWxvcGVyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldD4NCglTZW50OiBGcmlkYXksIEp1bmUg MTMsIDIwMDMgMTI6MjIgQU0NCglTdWJqZWN0OiBSZTogW0h0bWxwYXJzZXItZGV2ZWxvcGVyXSBm YWlsaW5nIHRlc3RzDQoJDQoJDQoJPiBIb3cgd291bGQgeW91J2FsbCBmZWVsIGFib3V0IGNyZWF0 aW5nIGEgcGFja2FnZSBjYWxsZWQgZmFpbGluZ1Rlc3RzIHRoYXQNCgk+ICAgd291bGQgbm90IGJl IGEgcGFydCBvZiB0aGUgQWxsVGVzdHMgc3VpdGUuICBUaGlzIHBhY2thZ2Ugd291bGQgYmUgdGhl DQoJPiBwbGFjZSB3ZSBzdG9yZSBvdXIgZmFpbGluZyBvciB0ZW1wb3JhcmlseSBmYWlsaW5nIHRl c3RzLiAgSXQgd291bGQgYWxzbw0KCT4gYmUgdGhlIHBsYWNlIHdlIG11c3QgZ28sIGNvbnRpbnVv dXNseSwgdG8gcmlkIHRoZSBwYWNrYWdlIG9mIGl0cyBmYWlsaW5nDQoJPiB0ZXN0cy4gIFRoaXMg d291bGQgZW5zdXJlIHRoYXQgdGhlIGJhciBpcyBhbHdheXMgZ3JlZW4gZnJvbSB0aGUgQWxsVGVz dHMNCgk+IHBlcnNwZWN0aXZlLCB3aGlsZSBzdGlsbCBhY2tub3dsZWRnaW5nIGFyZWFzIHRoYXQg bmVlZCB3b3JrLg0KCT4NCgk+IHRob3VnaHRzLCBmZWVsaW5ncywgc2FyY2FzbT8NCgk+DQoJPiBi ZXN0IHJlZ2FyZHMNCgk+IGprDQoJPg0KCT4NCgk+DQoJPg0KCT4gLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KCT4gVGhpcyBTRi5ORVQgZW1h aWwgaXMgc3BvbnNvcmVkIGJ5OiBlQmF5DQoJPiBHcmVhdCBkZWFscyBvbiBvZmZpY2UgdGVjaG5v bG9neSAtLSBvbiBlQmF5IG5vdyEgQ2xpY2sgaGVyZToNCgk+IGh0dHA6Ly9hZGZhcm0ubWVkaWFw bGV4LmNvbS9hZC9jay83MTEtMTE2OTctNjkxNi01DQoJPiBfX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fXw0KCT4gSHRtbHBhcnNlci1kZXZlbG9wZXIgbWFpbGlu ZyBsaXN0DQoJPiBIdG1scGFyc2VyLWRldmVsb3BlckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQNCgk+ IGh0dHBzOi8vbGlzdHMuc291cmNlZm9yZ2UubmV0L2xpc3RzL2xpc3RpbmZvL2h0bWxwYXJzZXIt ZGV2ZWxvcGVyDQoJDQoJDQoJDQoJLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLQ0KCVRoaXMgU0YuTmV0IGVtYWlsIGlzIHNwb25zb3JlZCBieTog SU5ldFUNCglBdHRlbnRpb24gV2ViIERldmVsb3BlcnMgJiBDb25zdWx0YW50czogQmVjb21lIEFu IElOZXRVIEhvc3RpbmcgUGFydG5lci4NCglSZWZlciBEZWRpY2F0ZWQgU2VydmVycy4gV2UgTWFu YWdlIFRoZW0uIFlvdSBHZXQgMTAlIE1vbnRobHkgQ29tbWlzc2lvbiENCglJTmV0VSBEZWRpY2F0 ZWQgTWFuYWdlZCBIb3N0aW5nIGh0dHA6Ly93d3cuaW5ldHUubmV0L3BhcnRuZXIvaW5kZXgucGhw DQoJX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCglIdG1s cGFyc2VyLWRldmVsb3BlciBtYWlsaW5nIGxpc3QNCglIdG1scGFyc2VyLWRldmVsb3BlckBsaXN0 cy5zb3VyY2Vmb3JnZS5uZXQNCglodHRwczovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9s aXN0aW5mby9odG1scGFyc2VyLWRldmVsb3Blcg0KCQ0KDQo= |
From: Somik R. <so...@ya...> - 2003-06-17 01:54:29
|
Hi Josh, On second thoughts, I think what you suggest is a good idea... :) After going thru the code, I realized that the current work in progress is too complex for me to fix quickly... There are some particularly hairy cases of attribute parsing that I think are being worked upon and may not be done soon. I've seperated failing tests into the package org.htmlparser.tests.temporaryFailures Derrick, Marc, Kaarle, Josh - Feel free to change the name if you prefer a more descriptive one. Also feel free to review this mechanism and change it if you do not like it. Kaarle - I've refactored AttributeParser - the logic remains intact. I have tried to guess the logic as much as I could, but you would be in a better position to complete it (modify names, etc..) Folks: we have a green bar now! Regards, Somik ----- Original Message ----- From: "Joshua Kerievsky" <jo...@in...> To: <htm...@li...> Sent: Friday, June 13, 2003 12:22 AM Subject: Re: [Htmlparser-developer] failing tests > How would you'all feel about creating a package called failingTests that > would not be a part of the AllTests suite. This package would be the > place we store our failing or temporarily failing tests. It would also > be the place we must go, continuously, to rid the package of its failing > tests. This would ensure that the bar is always green from the AllTests > perspective, while still acknowledging areas that need work. > > thoughts, feelings, sarcasm? > > best regards > jk > > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: eBay > Great deals on office technology -- on eBay now! Click here: > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2003-06-17 00:47:07
|
Hi Josh, I don't think that is a good idea. It will encourage developer comfort with red - which will not help.. OTOH, I've fixed the ScriptScanner class- I went thru the discussion b/w Marc and Derrick, and understood the problem. It was actually a simple problem - we had support to tackle fake script matches, but it would trigger only on a double-quote. Now it also triggers on a single quote. The reason this might not have been apparent is that the class had smelly code - badly designed by me. I've refactored it to be intention-revealing, and hopefully manageable. I am looking into some of the other failing tests.. Regards, Somik ----- Original Message ----- From: "Joshua Kerievsky" <jo...@in...> To: <htm...@li...> Sent: Friday, June 13, 2003 12:22 AM Subject: Re: [Htmlparser-developer] failing tests > How would you'all feel about creating a package called failingTests that > would not be a part of the AllTests suite. This package would be the > place we store our failing or temporarily failing tests. It would also > be the place we must go, continuously, to rid the package of its failing > tests. This would ensure that the bar is always green from the AllTests > perspective, while still acknowledging areas that need work. > > thoughts, feelings, sarcasm? > > best regards > jk > > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: eBay > Great deals on office technology -- on eBay now! Click here: > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Joshua K. <jo...@in...> - 2003-06-13 04:23:24
|
How would you'all feel about creating a package called failingTests that would not be a part of the AllTests suite. This package would be the place we store our failing or temporarily failing tests. It would also be the place we must go, continuously, to rid the package of its failing tests. This would ensure that the bar is always green from the AllTests perspective, while still acknowledging areas that need work. thoughts, feelings, sarcasm? best regards jk |
From: Somik R. <so...@ya...> - 2003-06-12 21:21:21
|
Sounds like an AttributeParser bug.. You could write a test that provides this input, and asserts the expected output - in AttributeParserTest (look at the other tests). Then, file a bug report, and choose the category "AttributeParser". Kaarle Kaila will dive in to the rescue. Regards, Somik ----- Original Message ----- From: "Joseph Robins" <jmr...@tg...> To: <htm...@li...> Sent: Thursday, June 12, 2003 12:17 PM Subject: [Htmlparser-developer] Pinpointing a bug in attribute parsing > I've run into an issue using the parser, and I'd like to file a bug, but > I'd like some input as to which parts of the parser behavior are really > wrong. > > What I'm actually doing is parsing a page more than once. I download a > page, parse it to remove a few unwanted tags, and output the results by > calling toHtml() on all the nodes I want to keep. Later in the > application, I parse the page again, making some changes to some > StringNodes. > > The problem is that these consecutive parses lead to mangled HTML. The > output of the second parse doesn't match the output of the first. In > particular, the problem that I'm having is with tags which have > standalone attributes. For example, if I have a tag: > > <SOMETAG FOO1="BAR1" FOO2="BAR2" FOO3> > > FOO3 is a standalone attribute, which is perfectly valid in HTML. Many > common tags use standalone attributes, like CHECKED on a checkbox, > DISABLED on a text input, or NOWRAP on a table cell. The parser doesn't > seem to like standalones, though, and assigns it an empty string as a > value, so calling toHtml() gets me the result: > > <SOMETAG FOO1="BAR1" FOO3="" FOO2="BAR2"> > > This seems wrong to me, but would not itself be disastrous. However, > there also appears to be a bug in the parser such that it doesn't like > empty-string values for attributes, and toHtml() chokes on them. If I > run this first result through the parser again, calling toHtml() now > gets me: > > <SOMETAG FOO1="BAR1" FOO3=""> > > This is clearly wrong. I've lost the attribute FOO2 and its value entirely. > > It would seem to me that parsing a page and producing HTML output should > be consistent over multiple runs. If I have output from the parser, the > parser should consider all of that to be valid HTML, and should produce > identical output if the results are run through it again. > > Thoughts? > > _____________________________________________________________ > Joe Robins Tel: 212-918-5057 > Thaumaturgix, Inc. Fax: 212-918-5001 > 19 W. 44th St., 18th Floor Email: jmr...@tg... > New York, NY 10036 http://www.tgix.com > > thau'ma-tur-gy, n. the working of miracles. > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: eBay > Great deals on office technology -- on eBay now! Click here: > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Joseph R. <jmr...@tg...> - 2003-06-12 19:17:59
|
I've run into an issue using the parser, and I'd like to file a bug, but I'd like some input as to which parts of the parser behavior are really wrong. What I'm actually doing is parsing a page more than once. I download a page, parse it to remove a few unwanted tags, and output the results by calling toHtml() on all the nodes I want to keep. Later in the application, I parse the page again, making some changes to some StringNodes. The problem is that these consecutive parses lead to mangled HTML. The output of the second parse doesn't match the output of the first. In particular, the problem that I'm having is with tags which have standalone attributes. For example, if I have a tag: <SOMETAG FOO1="BAR1" FOO2="BAR2" FOO3> FOO3 is a standalone attribute, which is perfectly valid in HTML. Many common tags use standalone attributes, like CHECKED on a checkbox, DISABLED on a text input, or NOWRAP on a table cell. The parser doesn't seem to like standalones, though, and assigns it an empty string as a value, so calling toHtml() gets me the result: <SOMETAG FOO1="BAR1" FOO3="" FOO2="BAR2"> This seems wrong to me, but would not itself be disastrous. However, there also appears to be a bug in the parser such that it doesn't like empty-string values for attributes, and toHtml() chokes on them. If I run this first result through the parser again, calling toHtml() now gets me: <SOMETAG FOO1="BAR1" FOO3=""> This is clearly wrong. I've lost the attribute FOO2 and its value entirely. It would seem to me that parsing a page and producing HTML output should be consistent over multiple runs. If I have output from the parser, the parser should consider all of that to be valid HTML, and should produce identical output if the results are run through it again. Thoughts? _____________________________________________________________ Joe Robins Tel: 212-918-5057 Thaumaturgix, Inc. Fax: 212-918-5001 19 W. 44th St., 18th Floor Email: jmr...@tg... New York, NY 10036 http://www.tgix.com thau'ma-tur-gy, n. the working of miracles. |
From: Derrick O. <Der...@ro...> - 2003-06-12 01:25:01
|
The failing tests are listed in the build notes: http://sourceforge.net/project/shownotes.php?group_id=24399&release_id=162916 You can shut some of them up by pretending that it's version 1.3. Edit Parser.java: public final static double VERSION_NUMBER = 1.3 ; Derrick jo...@in... wrote: >I'm about to do some refactoring experiments so I started by running >AllTests. I'm getting 8 failures: > >3 in ScriptScannerTest >2 in AttributeParserTest >3 in TagParserTest > >Anyone know what is going on here? I want to begin with a green bar >before I begin to refactor. > >thanks >jk > > > > > >------------------------------------------------------- >This SF.NET email is sponsored by: eBay >Great deals on office technology -- on eBay now! Click here: >http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |
From: Marc N. <ma...@ke...> - 2003-06-11 18:08:35
|
The failures in ScriptScannerTest I assume are the following: testScanScriptWithTagsInComment() testScanScriptWithJavascriptLineEndings() testScanScriptWithTags() I added those three tests after some refactoring of ScriptScanner on May = 24 broke some of my code that depended on the "old" ScriptScanner = behavior. Derrick has come up with some good ideas for some major = global refactoring, but until he (or any other developer) has time to = implement his refactoring ideas, these three tests will probably remain = broken. Marc -----Original Message----- From: jo...@in... [mailto:jo...@in...] Sent: Wednesday, June 11, 2003 7:01 AM To: htm...@li... Subject: [Htmlparser-developer] failing tests I'm about to do some refactoring experiments so I started by running AllTests. I'm getting 8 failures: 3 in ScriptScannerTest 2 in AttributeParserTest 3 in TagParserTest Anyone know what is going on here? I want to begin with a green bar before I begin to refactor. thanks jk ------------------------------------------------------- This SF.NET email is sponsored by: eBay Great deals on office technology -- on eBay now! Click here: http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: <jo...@in...> - 2003-06-11 17:49:34
|
I'm about to do some refactoring experiments so I started by running AllTests. I'm getting 8 failures: 3 in ScriptScannerTest 2 in AttributeParserTest 3 in TagParserTest Anyone know what is going on here? I want to begin with a green bar before I begin to refactor. thanks jk |
From: Somik R. <so...@ya...> - 2003-05-30 12:08:34
|
Dear Dhaval, Thank you for being a part of this project, and best wishes for your higher studies! Cheers Somik ----- Original Message ----- From: <dha...@po...> To: <htm...@li...>; <htm...@li...> Sent: Friday, May 30, 2003 2:50 AM Subject: [Htmlparser-developer] Bye bye Everyone, I have been associated with this project for a shade less than one year. During this period I have made some small contributions to this project and identified a few bugs. Most of all what I have enjoyed is the tremendous learning that I have received both from a techncial viewpoint and a design perspective. Its altered my methodology of software development. For one it has instilled JUnit into my development methodolgy. It has also showed me that redesign is not such a bad thing. On the whole it has been quite a great experience working with some amazing people like Somik, Derrick and many more amongst u all. I thank u all for the suport that I have recvd, the quick bug-fixes, the quick-fix solutions and the exhilarating discussions that I have been involved in within this group. I am moving on for higher studies int eh field of management and I do not think I can keep so many things on my plate. So very sadly letting go of a few. One of them being the HTMLParser. I wish it all the best in future and hope that the tool continues for a long long time to come. Regards to all, Dhaval |
From: Derrick O. <Der...@ro...> - 2003-05-30 10:59:27
|
Dhaval, Your valuable input and extensive experience will be sorely missed. Best of luck in your new endeavors. Derrick dha...@po... wrote: >Everyone, > >I have been associated with this project for a shade less than one year. >During this period I have made some small contributions to this project >and identified a few bugs. Most of all what I have enjoyed is the >tremendous learning that I have received both from a technical viewpoint >and a design perspective. Its altered my methodology of software >development. For one it has instilled JUnit into my development >methodology. It has also showed me that redesign is not such a bad thing. >On the whole it has been quite a great experience working with some >amazing people like Somik, Derrick and many more amongst u all. I thank >u all for the support that I have recvd, the quick bug-fixes, the >quick-fix solutions and the exhilarating discussions that I have been >involved in within this group. > >I am moving on for higher studies in the field of management and I do >not think I can keep so many things on my plate. So very sadly letting >go of a few. One of them being the HTMLParser. > >I wish it all the best in future and hope that the tool continues for a >long long time to come. > >Regards to all, >Dhaval > > >------------------------------------------------------------------------ > >This e-Mail may contain proprietary and confidential information and is sent for the intended recipient(s) only. >If by an addressing or transmission error this mail has been misdirected to you, you are requested to delete this mail immediately. >You are also hereby notified that any use, any form of reproduction, dissemination, copying, disclosure, modification, >distribution and/or publication of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. > >Visit Us at http://www.polaris.co.in > > |
From: <dha...@po...> - 2003-05-30 07:13:00
|
Everyone, I have been associated with this project for a shade less than one year. During this period I have made some small contributions to this project and identified a few bugs. Most of all what I have enjoyed is the tremendous learning that I have received both from a techncial viewpoint and a design perspective. Its altered my methodology of software development. For one it has instilled JUnit into my development methodolgy. It has also showed me that redesign is not such a bad thing. On the whole it has been quite a great experience working with some amazing people like Somik, Derrick and many more amongst u all. I thank u all for the suport that I have recvd, the quick bug-fixes, the quick-fix solutions and the exhilarating discussions that I have been involved in within this group. I am moving on for higher studies int eh field of management and I do not think I can keep so many things on my plate. So very sadly letting go of a few. One of them being the HTMLParser.=20 I wish it all the best in future and hope that the tool continues for a long long time to come.=20 Regards to all, Dhaval |
From: <dha...@po...> - 2003-05-29 13:06:43
|
Hi Terry, I had also felt the need for a root tag which would allow me to drill down using CompositeTag functionality. However as a work around u could register the HtmlScanner and then you would obtain the HTML as the abse tag under which all tags would be present. Instead of using registerScanners use registerDomScanners(). Apart from registerScanners it also registers the HtmlScanner, HeadScanner and BodyScanner. Dhaval > -----Original Message----- > From: htm...@li...=20 > [mailto:htm...@li...] On=20 > Behalf Of tez...@ya... > Sent: Thursday, May 29, 2003 5:23 PM > To: Htm...@li... > Subject: [Htmlparser-developer] Composite Tags !=3D Composite Pattern >=20 >=20 > The CompositeTag is quite heavy on tasty [Australian > for good] functionality. >=20 > The way it seems to be implemented here is contrary to > the 'Composite' Design Pattern. I'm having difficulty > forming a composite of the whole document, say an > abstract <PARSE_ROOT> object. >=20 > Currently I'm doing a hack, knowing there is a table: > See >=20 > Node nodes [] =3D myParser.extractAllNodesThatAre(TableTag.class); > TableTag table =3D (TableTag)nodes[0]; > TableTag htmlComposite =3D (TableTag) nodes[0]; >=20 > I need to do this to access the CompositeTag > functionality. Is there a simpler way? >=20 > Would it be useful to have a >=20 > public CompositeTag getRootTag() {} >=20 > in Parser? >=20 > Terry. >=20 > =3D=3D=3D=3D=3D > ------------------------------------------------------------ > Terry Alexis Lurie | 'Something witty that doesn't > Freelance Computer Engineer | look good with variable > United Kingdom | width fonts' - Most nerds >=20 > __________________________________________________ > Yahoo! Plus - For a better Internet experience=20 > http://uk.promotions.yahoo.com/yplus/yoffer.ht> ml >=20 >=20 >=20 > ------------------------------------------------------- >=20 > This SF.net email is sponsored by: eBay > Get office equipment for less on eBay!=20 > http://adfarm.mediaplex.com/ad/ck/711-11697-> 6916-5 >=20 > _______________________________________________ >=20 > Htmlparser-developer mailing list=20 > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >=20 |
From: <tez...@ya...> - 2003-05-29 11:53:13
|
The CompositeTag is quite heavy on tasty [Australian for good] functionality. The way it seems to be implemented here is contrary to the 'Composite' Design Pattern. I'm having difficulty forming a composite of the whole document, say an abstract <PARSE_ROOT> object. Currently I'm doing a hack, knowing there is a table: See Node nodes [] = myParser.extractAllNodesThatAre(TableTag.class); TableTag table = (TableTag)nodes[0]; TableTag htmlComposite = (TableTag) nodes[0]; I need to do this to access the CompositeTag functionality. Is there a simpler way? Would it be useful to have a public CompositeTag getRootTag() {} in Parser? Terry. ===== ------------------------------------------------------------ Terry Alexis Lurie | 'Something witty that doesn't Freelance Computer Engineer | look good with variable United Kingdom | width fonts' - Most nerds __________________________________________________ Yahoo! Plus - For a better Internet experience http://uk.promotions.yahoo.com/yplus/yoffer.html |
From: <dha...@po...> - 2003-05-29 04:29:13
|
Marc, Your requirement is quite common. Mostly code inside <SCRIPT> tag should = be produced as it is. I think its important that we have the test cases = and appropriate fixes in the main codebase. Dhaval > -----Original Message----- > From: htm...@li...=20 > [mailto:htm...@li...] On=20 > Behalf Of Marc Novakowski > Sent: Wednesday, May 28, 2003 8:30 PM > To: htm...@li...;=20 > htm...@li... > Subject: RE: [Htmlparser-developer] RE: [Htmlparser-cvs]=20 > htmlparser/src/org/htmlparser/scanners=20 > CompositeTagScanner.java,1.52,1.53 ScriptScanner.java,1.21,1.22 >=20 >=20 > Derrick, if it's anybody's fault that my code is failing=20 > because of your change, it's mine. I should have checked in=20 > specific test cases that excersise my usage of the library. =20 > I apologise for not doing that earlier... > =20 > Here are the main things that the new ScriptScanner does that=20 > breaks my code: > 1) acts very strangely when it encounters "\" at a newline=20 > (doesn't just get rid of the newline, but it starts repeating=20 > the entire line about 6 times) > 2) uppercases and auto-closes tags that aren't in quotes > =20 > I have some specific test cases that demonstrate these. I'll=20 > check them in if you'd like. I have to admit that after=20 > playing with the internals of NodeReader, TagScanner, etc.=20 > that I'm not 100% clear on how some of this low level=20 > scanning code works. Nor is it always clear from reading the=20 > code. That's why I am not confident that I will be able to=20 > refactor the existing code to handle my specific problems. > =20 > I realize my usage of the parser may be quite different than=20 > 95% of the people who use the library, so if there isn't a=20 > solution that fits into the existing architecture I'll be=20 > happy to just make some local changes to fix things. I can=20 > always make my own scanner and not check it into the codeline=20 > (or just copy the old version of ScriptScanner into my code).=20 > However, if I'm running into this now, chances are somebody=20 > in the future will, also. > =20 > Marc >=20 > -----Original Message-----=20 > From: Derrick Oswald [mailto:Der...@ro...]=20 > Sent: Tue 5/27/2003 6:26 PM=20 > To: htm...@li...=20 > Cc:=20 > Subject: Re: [Htmlparser-developer] RE:=20 > [Htmlparser-cvs] htmlparser/src/org/htmlparser/scanners=20 > CompositeTagScanner.java,1.52,1.53 ScriptScanner.java,1.21,1.22 > =09 > =09 >=20 > You may need to back out the change, or at a minimum=20 > get the old code by > going back a version and putting it in your=20 > ScriptScanner base class. > =09 > I guess I screwed up. I saw you're drop that allowed=20 > all the lines to be > accumulated in a tag and I thought the two scanners=20 > were very close then > (apart from the tags in quotes thing). My only excuse=20 > is it passed all > the unit tests. Well to be truthful I changed two of=20 > the tests, but it > was only extraneous newline stuff at the start and end of text. > =09 > The script scanner is breaking your code because of=20 > uppercasing tags > (not just within in comments) and removing newlines=20 > after \, right? > =09 > Marc Novakowski wrote: > =09 > >I just realized that it's more complicated than that=20 > (for me, at least). In my application that uses htmlparser,=20 > I am extending certain scanners and tags (such as=20 > ScriptScanner but mostly CompositeTagScanner) to allow for=20 > "custom" tags in an HTML page. When the "HTML + custom tags"=20 > are run through my custom parser, the custom tags are=20 > converted into an object model which is then turned into=20 > dynamic javascript code. > > > >Long story short: some of these custom tags (i.e. the=20 > ones that extend ScriptScanner) _absolutely_ need the inner=20 > contents of the tag to remain unchanged. Also, since it's=20 > not always Javascript that is inside of the tags, adding=20 > extra rules to ignore tags in comments or strings won't=20 > always work. For example, one tag allows for arbitrary XML=20 > innards. Currently, the scanner will UPPERCASE all tags=20 > inside unless they're in quotes (which messes up the XML). > > > >The old ScriptScanner did exactly what I needed --=20 > that is, it didn't scan for tags at all. It just looked for=20 > the exact (case-insensitive) string match of the end tag. It=20 > didn't look for "<" and it didn't defer to scanners. I took=20 > a look at the current code and I can't see any easy way to do this. > > > >Marc > > > >-----Original Message----- > >From: Derrick Oswald [mailto:Der...@ro...] > >Sent: Tuesday, May 27, 2003 2:39 PM > >To: htm...@li... > >Subject: Re: [Htmlparser-developer] RE: [Htmlparser-cvs] > >htmlparser/src/org/htmlparser/scanners > >CompositeTagScanner.java,1.52,1.53 ScriptScanner.java,1.21,1.22 > > > > > >Marc, > > > >The text within <SCRIPT></SCRIPT> is supposed to be=20 > parsed as pure text > >or remarks. > >I guess the text scanner goes until it sees a <x...=20 > and then stops to > >defer to a tag scanner. I hadn't thought about those=20 > in comments, or > >about the \ end of lines. > > > >Perhaps, rather than write a new scanner, fix the=20 > StringScanner (the > >remark scanner should be OK), so that it does the=20 > correct behaviour when > >balance_quotes is true. Then the 'balance_quotes' flag=20 > could be called > >'strict_script' or something. > > > >Derrick > > > >Marc Novakowski wrote: > > > >=20 > > > =09 > =09 > =09 > =09 > ------------------------------------------------------- > This SF.net email is sponsored by: ObjectStore. > If flattening out C++ or Java code to make your=20 > application fit in a > relational database is painful, don't do it! Check out=20 > ObjectStore. > Now part of Progress Software.=20 > http://www.objectstore.net/sourceforge > =09 > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > =09 > https://lists.sourceforge.net/lists/listinfo/h> tmlparser-developer > =09 >=20 > NHun~uj=CA=89jjjjvv > 9r>JF yqjzzzy=E2=96=8Az >=20 |
From: Marc N. <ma...@ke...> - 2003-05-28 22:44:45
|
RGVycmljaywNCg0KSSBsaWtlIHlvdXIgaWRlYXMsIGFuZCBJIHRoaW5rIHRoYXQgeW91ciBzdWdn ZXN0ZWQgcmVmYWN0b3Jpbmcgd291bGQgbWFrZSB0aGUgbG93ZXItbGV2ZWwgY29kZSBpbiBodG1s cGFyc2VyIG11Y2ggbGVzcyBteXN0ZXJpb3VzIGFuZCAoaG9wZWZ1bGx5KSBlYXNpZXIgdG8gbWFp bnRhaW4gYW5kIGV4dGVuZC4NCg0KTWFyYw0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0K RnJvbTogRGVycmljayBPc3dhbGQgW21haWx0bzpEZXJyaWNrT3N3YWxkQHJvZ2Vycy5jb21dDQpT ZW50OiBXZWRuZXNkYXksIE1heSAyOCwgMjAwMyAzOjI1IFBNDQpUbzogaHRtbHBhcnNlci1kZXZl bG9wZXJAbGlzdHMuc291cmNlZm9yZ2UubmV0DQpTdWJqZWN0OiBSZTogW0h0bWxwYXJzZXItZGV2 ZWxvcGVyXSBSRTogW0h0bWxwYXJzZXItY3ZzXQ0KaHRtbHBhcnNlci9zcmMvb3JnL2h0bWxwYXJz ZXIvc2Nhbm5lcnMNCkNvbXBvc2l0ZVRhZ1NjYW5uZXIuamF2YSwxLjUyLDEuNTMgU2NyaXB0U2Nh bm5lci5qYXZhLDEuMjEsMS4yMg0KDQoNCg0KTWFyYywNCg0KSSd2ZSBiZWVuIHRoaW5raW5nIGFi b3V0IHlvdXIgcHJvYmxlbSBhbmQgSSB0aGluayBJIGhhdmUgYSBzb2x1dGlvbi4NCkknbGwgcmUt d3JpdGUgdGhlIG5vZGUgcmVhZGVyLg0KDQpPSywgdGhhdCdzIHRoZSBib3R0b20gbGluZSwgYnV0 IEkndmUgc2FpZCBiZWZvcmUgdGhhdCB0aGUgbG93ZXN0IGxldmVsIA0Kc2hvdWxkIHJldHVybiBh IGNvbnRpZ3VvdXMgc3RyZWFtIG9mIG5vZGVzLCB0aGF0IGhhdmUgdGhlIG9yaWdpbmFsIA0KY2hh cmFjdGVycyAobm90IGNhc2UgY29udmVydGVkKSBhbmQgaW5jbHVkZSB0aGUgZm9ybWF0dGluZyBs aWtlIGxpbmUgDQplbmRpbmdzIGFuZCBvdGhlciB3aGl0ZXNwYWNlIHNvIHRoYXQgdG9IdG1sKCkg Z2l2ZXMgeW91IHRoZSBleGFjdCBzYW1lIA0KcGFnZSB0aGF0IHlvdSBzdGFydGVkIHdpdGguDQoN Ckkgc2hvdWxkIG1ha2UgYSBwaWN0dXJlLCBidXQgc2VlIGlmIHlvdSBjYW4gZm9sbG93IG1lIGhl cmUuDQoNClRoZSBsb3dlc3QgbGV2ZWwgaXMgYSBieXRlIHN0cmVhbSwgcmlnaHQgb2ZmIHRoZSB3 aXJlLiBUaGlzIG5lZWRzIHRvIA0Kc3VwcG9ydCBtYXJrIGFuZCByZXNldCBpbiBjYXNlIHRoZSBj aGFyYWN0ZXIgc2V0IGNoYW5nZXMuDQoNClRoZSBzZWNvbmQgbGV2ZWwgaXMgYSBjaGFyYWN0ZXIg c3RyZWFtLCBhZnRlciBhcHBseWluZyB0aGUgZGVjb2RpbmcgZm9yIA0KYSBwYXJ0aWN1bGFyIGNo YXJzZXQuDQoNClRoZSB0aGlyZCBsZXZlbCBpcyBhIHN0cmluZywgd2hpY2ggaXMgYSBjaGFyIGFy cmF5LiBUaGUgY2hhcnMgYXJlIGNvcGllZCANCmZyb20gdGhlIHNlY29uZCBsZXZlbCwgc28gdGhh dCBjYW4gYmUgZGlzY2FyZGVkLCBidXQgb25seSBhZnRlciB0aGUgDQplbnRpcmUgc3RyZWFtIGhh cyBiZWVuIGRyYWluZWQuIElmIHdlIHdhbnQgdG8gZG8gdGhyZWFkZWQgYWNjZXNzIHRvIHRoZSAN CnNvY2tldCB0byBwcm92aWRlIGZvciBwYXJhbGxlbCBwYXJzaW5nIHdoaWxlIHJlYWRpbmcsIHRo ZSBjaGFyYWN0ZXJzIA0KbmVlZCB0byBiZSBrZXB0IGFyb3VuZCB0byBjcmVhdGUgd2hvbGUgbmV3 IHN0cmluZ3MuDQoNClRoZSBmb3VydGggbGV2ZWwgaXMgYSBzdHJlYW0gb2YgdGFncy4gSW5zdGVh ZCBvZiBrZWVwaW5nIHN1YnN0cmluZ3MgDQp0aG91Z2gsIHRoZSB0YWdzIGp1c3Qga2VlcCBjaGFy YWN0ZXIgcG9zaXRpb24sIHN0YXJ0IGFuZCBlbmQsIHdpdGhpbiB0aGUgDQplbnRpcmUgcGFnZSwg bGlrZSBhIGN1cnNvciwgYW5kIGEgcG9pbnRlciB0byBhIG5ldyAnUGFnZScgb2JqZWN0LiBUaGF0 IA0Kd2F5IGFzIHRoZSBQYWdlIHJlYWRzIG1vcmUgYnl0ZXMgZnJvbSB0aGUgc3RyZWFtLCBpdCBh Y2N1bXVsYXRlcyBtb3JlIA0KY2hhcmFjdGVycywgd2hpY2ggbWFrZSBhIGJpZ2dlciBzdHJpbmcg dGhhdCByZXByZXNlbnRzIHRoZSBwYWdlIHJlYWQgc28gDQpmYXIsIGFuZCB0aGVyZSdzIG5vdGhp bmcgcHJldmVudGluZyB0aGUgb2xkZXIgc3RyaW5ncyBmcm9tIGJlaW5nIGdhcmJhZ2UgDQpjb2xs ZWN0ZWQuDQoNClRoZSB1cHBlciBjYXNlIHRoaW5nIGdvZXMgYXdheSBzaW5jZSB0aGUgdGFncyBw b2ludCB0byB0aGUgb3JpZ2luYWwgDQpjaGFyYWN0ZXJzIHZpYSB0aGVpciBvZmZzZXRzLiBUaGUg ZW5kIG9mIGxpbmUgdGhpbmcgZ29lcyBhd2F5IGJlY2F1c2UgDQp0aGUgcmVhZGVyIGp1c3QgdHJl YXRzIGEgbmV3bGluZSBhcyBhbnkgb3RoZXIgd2hpdGVzcGFjZS4NCg0KU28gd2hhdCB5b3UgaGF2 ZSBhZnRlciBhIHBhcnNlIGlzIGEgc2luZ2xlICh2ZXJ5IGxhcmdlKSBzdHJpbmcgd2l0aCBhIA0K cGFyYWxsZWwgc3RyZWFtIG9mIHRhZyBvYmplY3RzIHdpdGggYSB3aG9sZSBidW5jaCBvZiBjdXJz b3JzIHBvaW50aW5nIA0KaW50byB0aGUgc3RyaW5nLg0KDQpJJ3ZlIGV4cGVyaW1lbnRlZCB3aXRo IHJlYWRpbmcgYWxsIHRoZSBjaGFyYWN0ZXJzIHVwIGZyb250IGFuZCB0aGF0IA0KYnJlYWtzIDY3 IHRlc3QgY2FzZXMuIElmIHlvdSBlcnJvbmVvdXNseSBzdWJzdGl0dXRlICJcbiIgZm9yICJcclxu IiAob3IgDQp2aWNlIHZlcnNhKSB0aGVyZSBhcmUgb25seSA0NyBmYWlsZWQgY2FzZXMgbGVmdC4g VGhlIHJlc2V0IG9uIGNoYXJhY3RlciANCnNldCBjaGFuZ2UgdGVzdCBjYXNlIGlzIG9uZSBvZiB0 aGVtLiAgSWYgeW91IGVycm9uZW91c2x5IGNvbnN1bWUgDQpuZXdsaW5lcyBhdCB0aGUgZnJvbnQg b2Ygc3RyaW5nIG5vZGVzIHRoZSBudW1iZXIgb2YgZmFpbGluZyB0ZXN0cyBpcyANCm9ubHkgMzMu IEFuZCBpZiB5b3UgZXJyb25lb3VzbHkgcmV0dXJuIG5vIHN0cmluZyBub2RlcyBpZiB0aGF0IA0K Y29uc3VtcHRpb24gbGVhdmVzIG5vdGhpbmcgbGVmdCBpbiB0aGUgc3RyaW5nLCB0aGVyZSBhcmUg b25seSAxNSBmYWlsaW5nIA0KY2FzZXMuIFRoZXNlIHdvdWxkIGhhdmUgdG8gYmUgZXhhbWluZWQg aW4gZGV0YWlsIGZvciBjb3JyZWN0bmVzcywgDQphY2NvcmRpbmcgdG8gSFRNTCB0aGUgc3BlYy4N Cg0KU28gaXQncyBkb2FibGUuDQpJIGp1c3QgaGF2ZSB0byBmaW5kIHRoZSB0aW1lLg0KRm9yIG5v dyBqdXN0IGluY2x1ZGUgdGhlIGVudGlyZSBvcmlnaW5hbCBTY3JpcFNjYW5uZXIuc2NhbigpIGNv ZGUgaW4gYSANCmJhc2UgY2xhc3MgZm9yIHlvdXIgc2NyaXB0IHNjYW5uZXJzIHNvIHRoYXQgdGhl IGV2aWwgDQpDb21wb3NpdGVUYWdTY2FubmVyLnNjYW4oKSBpcyBvdmVycmlkZGVuLg0KDQpEZXJy aWNrDQoNCk1hcmMgd3JvdGU6DQoNCj5IZXJlIGFyZSB0aGUgbWFpbiB0aGluZ3MgdGhhdCB0aGUg bmV3IFNjcmlwdFNjYW5uZXIgZG9lcyB0aGF0IGJyZWFrcyBteSBjb2RlOg0KPiAgDQo+DQo+SGVy ZSBhcmUgdGhlIG1haW4gdGhpbmdzIHRoYXQgdGhlIG5ldyBTY3JpcHRTY2FubmVyIGRvZXMgdGhh dCBicmVha3MgbXkgY29kZToNCj4xKSBhY3RzIHZlcnkgc3RyYW5nZWx5IHdoZW4gaXQgZW5jb3Vu dGVycyAiXCIgYXQgYSBuZXdsaW5lIChkb2Vzbid0IGp1c3QgZ2V0IHJpZCBvZiB0aGUgbmV3bGlu ZSwgYnV0IGl0IHN0YXJ0cyByZXBlYXRpbmcgdGhlIGVudGlyZSBsaW5lIGFib3V0IDYgdGltZXMp DQo+MikgdXBwZXJjYXNlcyBhbmQgYXV0by1jbG9zZXMgdGFncyB0aGF0IGFyZW4ndCBpbiBxdW90 ZXMNCj4gIA0KPg0KDQoNCg0KDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tDQpUaGlzIFNGLm5ldCBlbWFpbCBpcyBzcG9uc29yZWQgYnk6IGVC YXkNCkdldCBvZmZpY2UgZXF1aXBtZW50IGZvciBsZXNzIG9uIGVCYXkhDQpodHRwOi8vYWRmYXJt Lm1lZGlhcGxleC5jb20vYWQvY2svNzExLTExNjk3LTY5MTYtNQ0KX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCkh0bWxwYXJzZXItZGV2ZWxvcGVyIG1haWxp bmcgbGlzdA0KSHRtbHBhcnNlci1kZXZlbG9wZXJAbGlzdHMuc291cmNlZm9yZ2UubmV0DQpodHRw czovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9saXN0aW5mby9odG1scGFyc2VyLWRldmVs b3Blcg0K |
From: Derrick O. <Der...@ro...> - 2003-05-28 22:32:24
|
Marc, I've been thinking about your problem and I think I have a solution. I'll re-write the node reader. OK, that's the bottom line, but I've said before that the lowest level should return a contiguous stream of nodes, that have the original characters (not case converted) and include the formatting like line endings and other whitespace so that toHtml() gives you the exact same page that you started with. I should make a picture, but see if you can follow me here. The lowest level is a byte stream, right off the wire. This needs to support mark and reset in case the character set changes. The second level is a character stream, after applying the decoding for a particular charset. The third level is a string, which is a char array. The chars are copied from the second level, so that can be discarded, but only after the entire stream has been drained. If we want to do threaded access to the socket to provide for parallel parsing while reading, the characters need to be kept around to create whole new strings. The fourth level is a stream of tags. Instead of keeping substrings though, the tags just keep character position, start and end, within the entire page, like a cursor, and a pointer to a new 'Page' object. That way as the Page reads more bytes from the stream, it accumulates more characters, which make a bigger string that represents the page read so far, and there's nothing preventing the older strings from being garbage collected. The upper case thing goes away since the tags point to the original characters via their offsets. The end of line thing goes away because the reader just treats a newline as any other whitespace. So what you have after a parse is a single (very large) string with a parallel stream of tag objects with a whole bunch of cursors pointing into the string. I've experimented with reading all the characters up front and that breaks 67 test cases. If you erroneously substitute "\n" for "\r\n" (or vice versa) there are only 47 failed cases left. The reset on character set change test case is one of them. If you erroneously consume newlines at the front of string nodes the number of failing tests is only 33. And if you erroneously return no string nodes if that consumption leaves nothing left in the string, there are only 15 failing cases. These would have to be examined in detail for correctness, according to HTML the spec. So it's doable. I just have to find the time. For now just include the entire original ScripScanner.scan() code in a base class for your script scanners so that the evil CompositeTagScanner.scan() is overridden. Derrick Marc wrote: >Here are the main things that the new ScriptScanner does that breaks my code: > > >Here are the main things that the new ScriptScanner does that breaks my code: >1) acts very strangely when it encounters "\" at a newline (doesn't just get rid of the newline, but it starts repeating the entire line about 6 times) >2) uppercases and auto-closes tags that aren't in quotes > > |
From: Marc N. <ma...@ke...> - 2003-05-28 14:59:52
|
RGVycmljaywgaWYgaXQncyBhbnlib2R5J3MgZmF1bHQgdGhhdCBteSBjb2RlIGlzIGZhaWxpbmcg YmVjYXVzZSBvZiB5b3VyIGNoYW5nZSwgaXQncyBtaW5lLiAgSSBzaG91bGQgaGF2ZSBjaGVja2Vk IGluIHNwZWNpZmljIHRlc3QgY2FzZXMgdGhhdCBleGNlcnNpc2UgbXkgdXNhZ2Ugb2YgdGhlIGxp YnJhcnkuICBJIGFwb2xvZ2lzZSBmb3Igbm90IGRvaW5nIHRoYXQgZWFybGllci4uLg0KIA0KSGVy ZSBhcmUgdGhlIG1haW4gdGhpbmdzIHRoYXQgdGhlIG5ldyBTY3JpcHRTY2FubmVyIGRvZXMgdGhh dCBicmVha3MgbXkgY29kZToNCjEpIGFjdHMgdmVyeSBzdHJhbmdlbHkgd2hlbiBpdCBlbmNvdW50 ZXJzICJcIiBhdCBhIG5ld2xpbmUgKGRvZXNuJ3QganVzdCBnZXQgcmlkIG9mIHRoZSBuZXdsaW5l LCBidXQgaXQgc3RhcnRzIHJlcGVhdGluZyB0aGUgZW50aXJlIGxpbmUgYWJvdXQgNiB0aW1lcykN CjIpIHVwcGVyY2FzZXMgYW5kIGF1dG8tY2xvc2VzIHRhZ3MgdGhhdCBhcmVuJ3QgaW4gcXVvdGVz DQogDQpJIGhhdmUgc29tZSBzcGVjaWZpYyB0ZXN0IGNhc2VzIHRoYXQgZGVtb25zdHJhdGUgdGhl c2UuICBJJ2xsIGNoZWNrIHRoZW0gaW4gaWYgeW91J2QgbGlrZS4gIEkgaGF2ZSB0byBhZG1pdCB0 aGF0IGFmdGVyIHBsYXlpbmcgd2l0aCB0aGUgaW50ZXJuYWxzIG9mIE5vZGVSZWFkZXIsIFRhZ1Nj YW5uZXIsIGV0Yy4gdGhhdCBJJ20gbm90IDEwMCUgY2xlYXIgb24gaG93IHNvbWUgb2YgdGhpcyBs b3cgbGV2ZWwgc2Nhbm5pbmcgY29kZSB3b3Jrcy4gIE5vciBpcyBpdCBhbHdheXMgY2xlYXIgZnJv bSByZWFkaW5nIHRoZSBjb2RlLiAgVGhhdCdzIHdoeSBJIGFtIG5vdCBjb25maWRlbnQgdGhhdCBJ IHdpbGwgYmUgYWJsZSB0byByZWZhY3RvciB0aGUgZXhpc3RpbmcgY29kZSB0byBoYW5kbGUgbXkg c3BlY2lmaWMgcHJvYmxlbXMuDQogDQpJIHJlYWxpemUgbXkgdXNhZ2Ugb2YgdGhlIHBhcnNlciBt YXkgYmUgcXVpdGUgZGlmZmVyZW50IHRoYW4gOTUlIG9mIHRoZSBwZW9wbGUgd2hvIHVzZSB0aGUg bGlicmFyeSwgc28gaWYgdGhlcmUgaXNuJ3QgYSBzb2x1dGlvbiB0aGF0IGZpdHMgaW50byB0aGUg ZXhpc3RpbmcgYXJjaGl0ZWN0dXJlIEknbGwgYmUgaGFwcHkgdG8ganVzdCBtYWtlIHNvbWUgbG9j YWwgY2hhbmdlcyB0byBmaXggdGhpbmdzLiAgSSBjYW4gYWx3YXlzIG1ha2UgbXkgb3duIHNjYW5u ZXIgYW5kIG5vdCBjaGVjayBpdCBpbnRvIHRoZSBjb2RlbGluZSAob3IganVzdCBjb3B5IHRoZSBv bGQgdmVyc2lvbiBvZiBTY3JpcHRTY2FubmVyIGludG8gbXkgY29kZSkuICBIb3dldmVyLCBpZiBJ J20gcnVubmluZyBpbnRvIHRoaXMgbm93LCBjaGFuY2VzIGFyZSBzb21lYm9keSBpbiB0aGUgZnV0 dXJlIHdpbGwsIGFsc28uDQogDQpNYXJjDQoNCgktLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLSAN CglGcm9tOiBEZXJyaWNrIE9zd2FsZCBbbWFpbHRvOkRlcnJpY2tPc3dhbGRAcm9nZXJzLmNvbV0g DQoJU2VudDogVHVlIDUvMjcvMjAwMyA2OjI2IFBNIA0KCVRvOiBodG1scGFyc2VyLWRldmVsb3Bl ckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXQgDQoJQ2M6IA0KCVN1YmplY3Q6IFJlOiBbSHRtbHBhcnNl ci1kZXZlbG9wZXJdIFJFOiBbSHRtbHBhcnNlci1jdnNdIGh0bWxwYXJzZXIvc3JjL29yZy9odG1s cGFyc2VyL3NjYW5uZXJzIENvbXBvc2l0ZVRhZ1NjYW5uZXIuamF2YSwxLjUyLDEuNTMgU2NyaXB0 U2Nhbm5lci5qYXZhLDEuMjEsMS4yMg0KCQ0KCQ0KDQoJWW91IG1heSBuZWVkIHRvIGJhY2sgb3V0 IHRoZSBjaGFuZ2UsIG9yIGF0IGEgbWluaW11bSBnZXQgdGhlIG9sZCBjb2RlIGJ5DQoJZ29pbmcg YmFjayBhIHZlcnNpb24gYW5kIHB1dHRpbmcgaXQgaW4geW91ciBTY3JpcHRTY2FubmVyIGJhc2Ug Y2xhc3MuDQoJDQoJSSBndWVzcyBJIHNjcmV3ZWQgdXAuIEkgc2F3IHlvdSdyZSBkcm9wIHRoYXQg YWxsb3dlZCBhbGwgdGhlIGxpbmVzIHRvIGJlDQoJYWNjdW11bGF0ZWQgaW4gYSB0YWcgYW5kIEkg dGhvdWdodCB0aGUgdHdvIHNjYW5uZXJzIHdlcmUgdmVyeSBjbG9zZSB0aGVuDQoJKGFwYXJ0IGZy b20gdGhlIHRhZ3MgaW4gcXVvdGVzIHRoaW5nKS4gIE15IG9ubHkgZXhjdXNlIGlzIGl0IHBhc3Nl ZCBhbGwNCgl0aGUgdW5pdCB0ZXN0cy4gV2VsbCB0byBiZSB0cnV0aGZ1bCBJIGNoYW5nZWQgdHdv IG9mIHRoZSB0ZXN0cywgYnV0IGl0DQoJd2FzIG9ubHkgZXh0cmFuZW91cyBuZXdsaW5lIHN0dWZm IGF0IHRoZSBzdGFydCBhbmQgZW5kIG9mIHRleHQuDQoJDQoJVGhlIHNjcmlwdCBzY2FubmVyIGlz IGJyZWFraW5nIHlvdXIgY29kZSBiZWNhdXNlIG9mIHVwcGVyY2FzaW5nIHRhZ3MNCgkobm90IGp1 c3Qgd2l0aGluIGluIGNvbW1lbnRzKSBhbmQgcmVtb3ZpbmcgbmV3bGluZXMgYWZ0ZXIgXCwgcmln aHQ/DQoJDQoJTWFyYyBOb3Zha293c2tpIHdyb3RlOg0KCQ0KCT5JIGp1c3QgcmVhbGl6ZWQgdGhh dCBpdCdzIG1vcmUgY29tcGxpY2F0ZWQgdGhhbiB0aGF0IChmb3IgbWUsIGF0IGxlYXN0KS4gIElu IG15IGFwcGxpY2F0aW9uIHRoYXQgdXNlcyBodG1scGFyc2VyLCBJIGFtIGV4dGVuZGluZyBjZXJ0 YWluIHNjYW5uZXJzIGFuZCB0YWdzIChzdWNoIGFzIFNjcmlwdFNjYW5uZXIgYnV0IG1vc3RseSBD b21wb3NpdGVUYWdTY2FubmVyKSB0byBhbGxvdyBmb3IgImN1c3RvbSIgdGFncyBpbiBhbiBIVE1M IHBhZ2UuICBXaGVuIHRoZSAiSFRNTCArIGN1c3RvbSB0YWdzIiBhcmUgcnVuIHRocm91Z2ggbXkg Y3VzdG9tIHBhcnNlciwgdGhlIGN1c3RvbSB0YWdzIGFyZSBjb252ZXJ0ZWQgaW50byBhbiBvYmpl Y3QgbW9kZWwgd2hpY2ggaXMgdGhlbiB0dXJuZWQgaW50byBkeW5hbWljIGphdmFzY3JpcHQgY29k ZS4NCgk+DQoJPkxvbmcgc3Rvcnkgc2hvcnQ6IHNvbWUgb2YgdGhlc2UgY3VzdG9tIHRhZ3MgKGku ZS4gdGhlIG9uZXMgdGhhdCBleHRlbmQgU2NyaXB0U2Nhbm5lcikgX2Fic29sdXRlbHlfIG5lZWQg dGhlIGlubmVyIGNvbnRlbnRzIG9mIHRoZSB0YWcgdG8gcmVtYWluIHVuY2hhbmdlZC4gIEFsc28s IHNpbmNlIGl0J3Mgbm90IGFsd2F5cyBKYXZhc2NyaXB0IHRoYXQgaXMgaW5zaWRlIG9mIHRoZSB0 YWdzLCBhZGRpbmcgZXh0cmEgcnVsZXMgdG8gaWdub3JlIHRhZ3MgaW4gY29tbWVudHMgb3Igc3Ry aW5ncyB3b24ndCBhbHdheXMgd29yay4gIEZvciBleGFtcGxlLCBvbmUgdGFnIGFsbG93cyBmb3Ig YXJiaXRyYXJ5IFhNTCBpbm5hcmRzLiAgQ3VycmVudGx5LCB0aGUgc2Nhbm5lciB3aWxsIFVQUEVS Q0FTRSBhbGwgdGFncyBpbnNpZGUgdW5sZXNzIHRoZXkncmUgaW4gcXVvdGVzICh3aGljaCBtZXNz ZXMgdXAgdGhlIFhNTCkuDQoJPg0KCT5UaGUgb2xkIFNjcmlwdFNjYW5uZXIgZGlkIGV4YWN0bHkg d2hhdCBJIG5lZWRlZCAtLSB0aGF0IGlzLCBpdCBkaWRuJ3Qgc2NhbiBmb3IgdGFncyBhdCBhbGwu ICBJdCBqdXN0IGxvb2tlZCBmb3IgdGhlIGV4YWN0IChjYXNlLWluc2Vuc2l0aXZlKSBzdHJpbmcg bWF0Y2ggb2YgdGhlIGVuZCB0YWcuICBJdCBkaWRuJ3QgbG9vayBmb3IgIjwiIGFuZCBpdCBkaWRu J3QgZGVmZXIgdG8gc2Nhbm5lcnMuICBJIHRvb2sgYSBsb29rIGF0IHRoZSBjdXJyZW50IGNvZGUg YW5kIEkgY2FuJ3Qgc2VlIGFueSBlYXN5IHdheSB0byBkbyB0aGlzLg0KCT4NCgk+TWFyYw0KCT4N Cgk+LS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCgk+RnJvbTogRGVycmljayBPc3dhbGQgW21h aWx0bzpEZXJyaWNrT3N3YWxkQHJvZ2Vycy5jb21dDQoJPlNlbnQ6IFR1ZXNkYXksIE1heSAyNywg MjAwMyAyOjM5IFBNDQoJPlRvOiBodG1scGFyc2VyLWRldmVsb3BlckBsaXN0cy5zb3VyY2Vmb3Jn ZS5uZXQNCgk+U3ViamVjdDogUmU6IFtIdG1scGFyc2VyLWRldmVsb3Blcl0gUkU6IFtIdG1scGFy c2VyLWN2c10NCgk+aHRtbHBhcnNlci9zcmMvb3JnL2h0bWxwYXJzZXIvc2Nhbm5lcnMNCgk+Q29t cG9zaXRlVGFnU2Nhbm5lci5qYXZhLDEuNTIsMS41MyBTY3JpcHRTY2FubmVyLmphdmEsMS4yMSwx LjIyDQoJPg0KCT4NCgk+TWFyYywNCgk+DQoJPlRoZSB0ZXh0IHdpdGhpbiA8U0NSSVBUPjwvU0NS SVBUPiBpcyBzdXBwb3NlZCB0byBiZSBwYXJzZWQgYXMgcHVyZSB0ZXh0DQoJPm9yIHJlbWFya3Mu DQoJPkkgZ3Vlc3MgdGhlIHRleHQgc2Nhbm5lciBnb2VzIHVudGlsIGl0IHNlZXMgYSA8eC4uLiBh bmQgdGhlbiBzdG9wcyB0bw0KCT5kZWZlciB0byBhIHRhZyBzY2FubmVyLiBJIGhhZG4ndCB0aG91 Z2h0IGFib3V0IHRob3NlIGluIGNvbW1lbnRzLCBvcg0KCT5hYm91dCB0aGUgXCBlbmQgb2YgbGlu ZXMuDQoJPg0KCT5QZXJoYXBzLCByYXRoZXIgdGhhbiB3cml0ZSBhIG5ldyBzY2FubmVyLCBmaXgg dGhlIFN0cmluZ1NjYW5uZXIgKHRoZQ0KCT5yZW1hcmsgc2Nhbm5lciBzaG91bGQgYmUgT0spLCBz byB0aGF0IGl0IGRvZXMgdGhlIGNvcnJlY3QgYmVoYXZpb3VyIHdoZW4NCgk+YmFsYW5jZV9xdW90 ZXMgaXMgdHJ1ZS4gVGhlbiB0aGUgJ2JhbGFuY2VfcXVvdGVzJyBmbGFnIGNvdWxkIGJlIGNhbGxl ZA0KCT4nc3RyaWN0X3NjcmlwdCcgb3Igc29tZXRoaW5nLg0KCT4NCgk+RGVycmljaw0KCT4NCgk+ TWFyYyBOb3Zha293c2tpIHdyb3RlOg0KCT4NCgk+IA0KCT4NCgkNCgkNCgkNCgkNCgktLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQoJVGhpcyBT Ri5uZXQgZW1haWwgaXMgc3BvbnNvcmVkIGJ5OiBPYmplY3RTdG9yZS4NCglJZiBmbGF0dGVuaW5n IG91dCBDKysgb3IgSmF2YSBjb2RlIHRvIG1ha2UgeW91ciBhcHBsaWNhdGlvbiBmaXQgaW4gYQ0K CXJlbGF0aW9uYWwgZGF0YWJhc2UgaXMgcGFpbmZ1bCwgZG9uJ3QgZG8gaXQhIENoZWNrIG91dCBP YmplY3RTdG9yZS4NCglOb3cgcGFydCBvZiBQcm9ncmVzcyBTb2Z0d2FyZS4gaHR0cDovL3d3dy5v YmplY3RzdG9yZS5uZXQvc291cmNlZm9yZ2UNCglfX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fXw0KCUh0bWxwYXJzZXItZGV2ZWxvcGVyIG1haWxpbmcgbGlzdA0K CUh0bWxwYXJzZXItZGV2ZWxvcGVyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldA0KCWh0dHBzOi8vbGlz dHMuc291cmNlZm9yZ2UubmV0L2xpc3RzL2xpc3RpbmZvL2h0bWxwYXJzZXItZGV2ZWxvcGVyDQoJ DQoNCg== |