[Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #120 - 2 msgs
Brought to you by:
derrickoswald
From: ope t. <op...@ho...> - 2002-09-16 16:23:28
|
I will update the parser and let you know the results.. Thanks >From: htm...@li... >Reply-To: htm...@li... >To: htm...@li... >Subject: Htmlparser-user digest, Vol 1 #120 - 2 msgs >Date: Fri, 13 Sep 2002 12:08:16 -0700 > >Send Htmlparser-user mailing list submissions to > htm...@li... > >To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/htmlparser-user >or, via email, send a message with subject or body 'help' to > htm...@li... > >You can reach the person managing the list at > htm...@li... > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Htmlparser-user digest..." > > >Today's Topics: > > 1. Re: help desperately needed! parser wont parse properly (Somik Raha) > 2. RE: Script tags bug (dha...@or...) > >--__--__-- > >Message: 1 >From: "Somik Raha" <so...@ya...> >To: <htm...@li...> >Subject: Re: [Htmlparser-user] help desperately needed! parser wont parse >properly >Date: Fri, 13 Sep 2002 11:24:30 +0530 >Reply-To: htm...@li... > >Hi, > You can try the same thing with runParser http://www.amazon.com -l > It works fine for me, but from your code it looks like you are using >htmlparser 1.1. That is very old. > Can u upgrade to the latest integration release ? > >Regards, >Somik >----- Original Message ----- >From: "ope tomori" <op...@ho...> >To: <htm...@li...> >Sent: Friday, September 13, 2002 12:23 AM >Subject: [Htmlparser-user] help desperately needed! parser wont parse >properly > > > > > > > > Hello anyone.. Im using this parser on a research project. Im building a > > browser in java, using JEditorPane as the panel that displays the html >on > > the websites. I have succeeded in doing that. > > > > The next step was to parse the links on the website and we came across >this > > parser, anyway, i set up the kizna classes and i used this piece of >code: > > > > file://this is in the actionPerformed function, when you press the "GO" >Button > > > > HTMLParser parser = new HTMLParser(urlAddress); > > parser.registerScanners(); > > for (Enumeration e = parser.elements();e.hasMoreElements();) { > > HTMLNode node = (HTMLNode)e.nextElement(); > > if (node instanceof HTMLLinkTag) { > > HTMLLinkTag linkTag = (HTMLLinkTag)node; > > System.out.println("Link Tag is " + linkTag.getLink()); > > } > > } > > > > when i run the browser with say, amazon.com, this is the result i get: > > ***************************************************** > > Address : http://www.amazon.com > > tagContents: a > > >href="http://www.amazon.com/exec/obidos/subst/home/home.html/ref=wt_404page/ >" > > Link Tag is > > http://www.amazon.com/exec/obidos/subst/home/home.html/ref=wt_404page/ > > tagContents: table border=0 align=center cellpadding=4 > > tagContents: a > > >href="http://www.amazon.com/exec/obidos/subst/home/home.html/ref=404page/" > > Link Tag is > > http://www.amazon.com/exec/obidos/subst/home/home.html/ref=404page/ > > > > > > ***********************************88 > > > > when i checked the link tag, its redirects to the amazon home page. Can > > someone pls tell me what im doing wrong? > > > > Thanks > > > > > > > > _________________________________________________________________ > > Send and receive Hotmail on your mobile device: http://mobile.msn.com > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > >--__--__-- > >Message: 2 >From: dha...@or... >Date: Fri, 13 Sep 2002 12:08:26 +0530 >Subject: RE: [Htmlparser-user] Script tags bug >TO: htm...@li... >Reply-To: htm...@li... > > >--openmail-part-106f1235-00000002 >Content-Type: text/plain; charset=ISO-8859-1; name="BDY.RTF" >Content-Disposition: inline; filename="BDY.RTF" >Content-Transfer-Encoding: 8bit > >The following bug only occurs if JavaScript is written within HTML >comment tags. The same comment written outside of JavaScript comment >tags works fine. > >One more parsing bug that we have come across and I'd like to report. > >If I have a tag as follows <TEXTAREA name="JohnDoe" ></TEXTAREA> (Note >the space before the closing '>' of TEXTAREA tag). > >On reproduction using toHTML() of TEXTAREA I get the following ><TEXTAREA ="" name="JohnDoe"></TEXTAREA> > >I think this might have been introduced with the fix which took names >without values and assigned blank strings to them. > >Regards, > >Dhaval Udani >Senior Analyst >M-Line, QPEG >OrbiTech Solutions Ltd. >+91-22-8290019 Extn. 1457 > > > >-----Original Message----- >From: Udani, Dhaval H. >Sent: Thursday, September 12, 2002 2:05 PM >To: htmlparser-user >Cc: Udani, Dhaval H. >Subject: [Htmlparser-user] Script tags bug > > >Hi, > >The following code : > ><SCRIPT Language="JavaScript"> ><!-- >function validateForm() >{ >var i = 10 ; >if(i < 5) >i = i - 1 ; >return true; >} >// --> > >gets converted to : > ><SCRIPT Language="JavaScript"> >if(i < 5) >i = i - 1 ; >return true; >} >// --> ></SCRIPT> > > >We have analyzed that the problem is occurring because of the '<' >character in the if statement. If the character is change to say '==' >then the problem does not occur. I think some parsing logic will need to >be corrected for data within <SCRIPT> tags. > >Also in many cases the ending script tag i.e. </SCRIPT> comes on the >same line as the last tag i.e in this particluar case on the line of // >-->. This will potentially cause </SCRIPT> to appear as a JavaScript >comment. I think whatever be the condition </SCRIPT> should always be >put on a new line. > >Regards, > >Dhaval Udani >Senior Analyst >M-Line, QPEG >OrbiTech Solutions Ltd. >+91-22-8290019 Extn. 1457 > > > >-----Original Message----- >From: somik [mailto:so...@ya...] >Sent: Wednesday, September 11, 2002 6:54 AM >To: htmlparser-user >Cc: somik >Subject: Re: [Htmlparser-user] Anyone monitor this > > >Hi Barry > Which version are u using ? Do u have the latest integration release >? > >Regards, >Somik >----- Original Message ----- >From: "Barry Newman" <bar...@am...> >To: <htm...@li...> >Sent: Wednesday, September 11, 2002 2:30 AM >Subject: [Htmlparser-user] Anyone monitor this > > > > Don't know if anyone is monitoring this list, but I was wondering if >anyone > > had a patch for the problem where text before a comment tag is not >parsed > > correctly. I noticed on the sourceforge site that that bug was >reported > > and fixed and I am experiencing the same problem. Wondering if anyone >has > > the code to fix this? > > > > Thanks. > > > > > > > > > > Barry Newman > > Principal > > > > AMS > > Bar...@AM... > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: OSDN - Tired of that same old > > cell phone? Get a new here for FREE! > > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > >------------------------------------------------------- >In remembrance >www.osdn.com/911/ >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >--openmail-part-106f1235-00000002 >Content-Type: application/rtf; name="BDY.RTF" >Content-Disposition: attachment; filename="BDY.RTF" >Content-Transfer-Encoding: base64 > >e1xydGYxXGFuc2lcYW5zaWNwZzEyNTJcZnJvbXRleHQgXGRlZmYwe1xmb250dGJsDQp7XGYw >XGZzd2lzcyBBcmlhbDt9DQp7XGYxXGZtb2Rlcm4gQ291cmllciBOZXc7fQ0Ke1xmMlxmbmls >XGZjaGFyc2V0MiBTeW1ib2w7fQ0Ke1xmM1xmbW9kZXJuXGZjaGFyc2V0MCBDb3VyaWVyIE5l >dzt9fQ0Ke1xjb2xvcnRibFxyZWQwXGdyZWVuMFxibHVlMDtccmVkMFxncmVlbjBcYmx1ZTI1 >NTt9DQpcdWMxXHBhcmRccGxhaW5cZGVmdGFiMzYwIFxmMFxmczIwXGNmMCBUaGUgZm9sbG93 >aW5nIGJ1ZyBvbmx5IG9jY3VycyBpZiBKYXZhU2NyaXB0IGlzIHdyaXR0ZW4gd2l0aGluIEhU >TUwgY29tbWVudCB0YWdzLiBUaGUgc2FtZSBjb21tZW50IHdyaXR0ZW4gb3V0c2lkZSBvZiBK >YXZhU2NyaXB0IGNvbW1lbnQgdGFncyB3b3JrcyBmaW5lLlxwYXINClxwYXINCk9uZSBtb3Jl >IHBhcnNpbmcgYnVnIHRoYXQgd2UgaGF2ZSBjb21lIGFjcm9zcyBhbmQgSSdkIGxpa2UgdG8g >cmVwb3J0LlxwYXINClxwYXINCklmIEkgaGF2ZSBhIHRhZyBhcyBmb2xsb3dzIDxURVhUQVJF >QSBuYW1lPSJKb2huRG9lIiA+PC9URVhUQVJFQT4gKE5vdGUgdGhlIHNwYWNlIGJlZm9yZSB0 >aGUgY2xvc2luZyAnPicgb2YgVEVYVEFSRUEgdGFnKS5ccGFyDQpccGFyDQpPbiByZXByb2R1 >Y3Rpb24gdXNpbmcgdG9IVE1MKCkgb2YgVEVYVEFSRUEgSSBnZXQgdGhlIGZvbGxvd2luZ1xw >YXINCjxURVhUQVJFQSA9IiIgbmFtZT0iSm9obkRvZSI+PC9URVhUQVJFQT5ccGFyDQpccGFy >DQpJIHRoaW5rIHRoaXMgbWlnaHQgaGF2ZSBiZWVuIGludHJvZHVjZWQgd2l0aCB0aGUgZml4 >IHdoaWNoIHRvb2sgbmFtZXMgd2l0aG91dCB2YWx1ZXMgYW5kIGFzc2lnbmVkIGJsYW5rIHN0 >cmluZ3MgdG8gdGhlbS5ccGFyDQpccGFyDQpSZWdhcmRzLFxwYXINClxwYXINCkRoYXZhbCBV >ZGFuaVxwYXINClNlbmlvciBBbmFseXN0XHBhcg0KTS1MaW5lLCBRUEVHXHBhcg0KT3JiaVRl >Y2ggU29sdXRpb25zIEx0ZC5ccGFyDQorOTEtMjItODI5MDAxOSBFeHRuLiAxNDU3XHBhcg0K >XHBhcg0KXHBhcg0KXHBhcg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS1ccGFyDQpGcm9t >OiBVZGFuaSwgRGhhdmFsIEguIFxwYXINClNlbnQ6IFRodXJzZGF5LCBTZXB0ZW1iZXIgMTIs >IDIwMDIgMjowNSBQTVxwYXINClRvOiBodG1scGFyc2VyLXVzZXJccGFyDQpDYzogVWRhbmks >IERoYXZhbCBILlxwYXINClN1YmplY3Q6IFtIdG1scGFyc2VyLXVzZXJdIFNjcmlwdCB0YWdz >IGJ1Z1xwYXINClxwYXINClxwYXINCkhpLFxwYXINClxwYXINClRoZSBmb2xsb3dpbmcgY29k >ZSA6XHBhcg0KXHBhcg0KPFNDUklQVCBMYW5ndWFnZT0iSmF2YVNjcmlwdCI+XHBhcg0KPCEt >LVxwYXINCmZ1bmN0aW9uIHZhbGlkYXRlRm9ybSgpXHBhcg0KXHtccGFyDQp2YXIgaSA9IDEw >IDtccGFyDQppZihpIDwgNSlccGFyDQppID0gaSAtIDEgOyBccGFyDQpyZXR1cm4gdHJ1ZTtc >cGFyDQpcfVxwYXINCi8vIC0tPlxwYXINClxwYXINCmdldHMgY29udmVydGVkIHRvIDpccGFy >DQpccGFyDQo8U0NSSVBUIExhbmd1YWdlPSJKYXZhU2NyaXB0Ij5ccGFyDQppZihpIDwgNSlc >cGFyDQppID0gaSAtIDEgOyBccGFyDQpyZXR1cm4gdHJ1ZTtccGFyDQpcfVxwYXINCi8vIC0t >PlxwYXINCjwvU0NSSVBUPlxwYXINClxwYXINClxwYXINCldlIGhhdmUgYW5hbHl6ZWQgdGhh >dCB0aGUgcHJvYmxlbSBpcyBvY2N1cnJpbmcgYmVjYXVzZSBvZiB0aGUgJzwnIGNoYXJhY3Rl >ciBpbiB0aGUgaWYgc3RhdGVtZW50LiBJZiB0aGUgY2hhcmFjdGVyIGlzIGNoYW5nZSB0byBz >YXkgJz09JyB0aGVuIHRoZSBwcm9ibGVtIGRvZXMgbm90IG9jY3VyLiBJIHRoaW5rIHNvbWUg >cGFyc2luZyBsb2dpYyB3aWxsIG5lZWQgdG8gYmUgY29ycmVjdGVkIGZvciBkYXRhIHdpdGhp >biA8U0NSSVBUPiB0YWdzLlxwYXINClxwYXINCkFsc28gaW4gbWFueSBjYXNlcyB0aGUgZW5k >aW5nIHNjcmlwdCB0YWcgaS5lLiA8L1NDUklQVD4gY29tZXMgb24gdGhlIHNhbWUgbGluZSBh >cyB0aGUgbGFzdCB0YWcgaS5lIGluIHRoaXMgcGFydGljbHVhciBjYXNlIG9uIHRoZSBsaW5l >IG9mIC8vIC0tPi4gVGhpcyB3aWxsIHBvdGVudGlhbGx5IGNhdXNlIDwvU0NSSVBUPiB0byBh >cHBlYXIgYXMgYSBKYXZhU2NyaXB0IGNvbW1lbnQuIEkgdGhpbmsgd2hhdGV2ZXIgYmUgdGhl >IGNvbmRpdGlvbiA8L1NDUklQVD4gc2hvdWxkIGFsd2F5cyBiZSBwdXQgb24gYSBuZXcgbGlu >ZS5ccGFyDQpccGFyDQpSZWdhcmRzLFxwYXINClxwYXINCkRoYXZhbCBVZGFuaVxwYXINClNl >bmlvciBBbmFseXN0XHBhcg0KTS1MaW5lLCBRUEVHXHBhcg0KT3JiaVRlY2ggU29sdXRpb25z >IEx0ZC5ccGFyDQorOTEtMjItODI5MDAxOSBFeHRuLiAxNDU3XHBhcg0KXHBhcg0KXHBhcg0K >XHBhcg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS1ccGFyDQpGcm9tOiBzb21payBbbWFp >bHRvOnNvbWlrQHlhaG9vLmNvbV1ccGFyDQpTZW50OiBXZWRuZXNkYXksIFNlcHRlbWJlciAx >MSwgMjAwMiA2OjU0IEFNXHBhcg0KVG86IGh0bWxwYXJzZXItdXNlclxwYXINCkNjOiBzb21p >a1xwYXINClN1YmplY3Q6IFJlOiBbSHRtbHBhcnNlci11c2VyXSBBbnlvbmUgbW9uaXRvciB0 >aGlzXHBhcg0KXHBhcg0KXHBhcg0KSGkgQmFycnlccGFyDQogICAgV2hpY2ggdmVyc2lvbiBh >cmUgdSB1c2luZyA/IERvIHUgaGF2ZSB0aGUgbGF0ZXN0IGludGVncmF0aW9uIHJlbGVhc2Ug >P1xwYXINClxwYXINClJlZ2FyZHMsXHBhcg0KU29taWtccGFyDQotLS0tLSBPcmlnaW5hbCBN >ZXNzYWdlIC0tLS0tXHBhcg0KRnJvbTogIkJhcnJ5IE5ld21hbiIgPGJhcnJ5Lm5ld21hbkBh >bXMuY29tPlxwYXINClRvOiA8aHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5l >dD5ccGFyDQpTZW50OiBXZWRuZXNkYXksIFNlcHRlbWJlciAxMSwgMjAwMiAyOjMwIEFNXHBh >cg0KU3ViamVjdDogW0h0bWxwYXJzZXItdXNlcl0gQW55b25lIG1vbml0b3IgdGhpc1xwYXIN >ClxwYXINClxwYXINCj4gRG9uJ3Qga25vdyBpZiBhbnlvbmUgaXMgbW9uaXRvcmluZyB0aGlz >IGxpc3QsIGJ1dCBJIHdhcyB3b25kZXJpbmcgaWZccGFyDQphbnlvbmVccGFyDQo+IGhhZCBh >IHBhdGNoIGZvciB0aGUgcHJvYmxlbSB3aGVyZSB0ZXh0IGJlZm9yZSBhIGNvbW1lbnQgdGFn >IGlzIG5vdCBwYXJzZWRccGFyDQo+IGNvcnJlY3RseS4gIEkgbm90aWNlZCBvbiB0aGUgc291 >cmNlZm9yZ2Ugc2l0ZSB0aGF0IHRoYXQgYnVnIHdhcyByZXBvcnRlZFxwYXINCj4gYW5kIGZp >eGVkIGFuZCBJIGFtIGV4cGVyaWVuY2luZyB0aGUgc2FtZSBwcm9ibGVtLiBXb25kZXJpbmcg >aWYgYW55b25lIGhhc1xwYXINCj4gdGhlIGNvZGUgdG8gZml4IHRoaXM/XHBhcg0KPlxwYXIN >Cj4gVGhhbmtzLlxwYXINCj5ccGFyDQo+XHBhcg0KPlxwYXINCj5ccGFyDQo+IEJhcnJ5IE5l >d21hblxwYXINCj4gUHJpbmNpcGFsXHBhcg0KPlxwYXINCj4gQU1TXHBhcg0KPiBCYXJyeV9O >ZXdtYW5AQU1TLmNvbVxwYXINCj5ccGFyDQo+XHBhcg0KPlxwYXINCj5ccGFyDQo+IC0tLS0t >LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS1ccGFy >DQo+IFRoaXMgc2YubmV0IGVtYWlsIGlzIHNwb25zb3JlZCBieTogT1NETiAtIFRpcmVkIG9m >IHRoYXQgc2FtZSBvbGRccGFyDQo+IGNlbGwgcGhvbmU/ICBHZXQgYSBuZXcgaGVyZSBmb3Ig >RlJFRSFccGFyDQo+IGh0dHBzOi8vd3d3LmlucGhvbmljLmNvbS9yLmFzcD9yPXNvdXJjZWZv >cmdlMSZyZWZjb2RlMT12czMzOTBccGFyDQo+IF9fX19fX19fX19fX19fX19fX19fX19fX19f >X19fX19fX19fX19fX19fX19fX19fXHBhcg0KPiBIdG1scGFyc2VyLXVzZXIgbWFpbGluZyBs >aXN0XHBhcg0KPiBIdG1scGFyc2VyLXVzZXJAbGlzdHMuc291cmNlZm9yZ2UubmV0XHBhcg0K >PiBodHRwczovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9saXN0aW5mby9odG1scGFy >c2VyLXVzZXJccGFyDQpccGFyDQpccGFyDQpccGFyDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0t >LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tXHBhcg0KSW4gcmVtZW1icmFuY2Vc >cGFyDQp3d3cub3Nkbi5jb20vOTExL1xwYXINCl9fX19fX19fX19fX19fX19fX19fX19fX19f >X19fX19fX19fX19fX19fX19fX19fXHBhcg0KSHRtbHBhcnNlci11c2VyIG1haWxpbmcgbGlz >dFxwYXINCkh0bWxwYXJzZXItdXNlckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXRccGFyDQpodHRw >czovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9saXN0aW5mby9odG1scGFyc2VyLXVz >ZXJccGFyDQp9 > >--openmail-part-106f1235-00000002-- > > > > >--__--__-- > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >End of Htmlparser-user Digest _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx |