htmlparser-user Mailing List for HTML Parser (Page 91)
Brought to you by:
derrickoswald
You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
| 2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
| 2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
| 2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
| 2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
| 2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
| 2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
| 2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
| 2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
| 2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
| 2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
| 2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
| 2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
| 2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
| 2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
| 2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
| 2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
|
From: Stephen H. <Ste...@tr...> - 2002-09-27 17:41:23
|
I have a simple document which I am trying to parse a link out of: Here is the code: <html> <body> <DL> <DT>YOUR QUERY WAS: </DL> Select one of the following documents to retrieve. <P> <HR> <P><DL> <DT><B>1:</B> <!-- hit --><A HREF="/cgi-bin/view_search?query_text=postdate>20020701&txt_clr=White&bg_clr=Red&url=http://localhost/Testing/Report 1.html">20020702 Report 1</A> <DD><font size="-1">Score: 1000, Size: 7.4 kbytes, Type: URL file</font> </DL> </body> </html> The parser is getting confused by the '>' after the postdate. Instead of returning the whole link: http://localhost/cgi-bin/view_search?query_text=postdate>20020701&txt_clr=White&bg_clr=Red&url=http://localhost/Testing/Report 1.html only a portion of the link is returned: http://localhost/cgi-bin/view_search?query_text If the 'postdate>' is replaced by 'postdate=' then it functions properly. Seems like the parser is not looking at the double quotes. I am using the latest integration build (1.2-2002_08_31) Before digging into the source code and trying to fix the problem, I thought maybe someone might have run into this problem before. Thanks, --stephen |
|
From: Joe R. <jo...@le...> - 2002-09-19 16:25:09
|
Is there anyway to modify the formAction tag? I tried passing a modified URL to formTag.setFormLocation() but this new location isn't being output in the toHTML() conversion. Regards, Joe Ryburn Technical Director Lead Router LLC Office 501-221-8865 Mobile 501-249-5015 |
|
From: Joe R. <jo...@le...> - 2002-09-19 14:45:27
|
I get the following parsing Yahoo travel home page... http://travel.yahoo.com Could not create parser object com.kizna.html.util.HTMLParserException: Unexpected Exception occurred in HTMLPa rser.hasMoreNodes()http://travel.yahoo.com/; com.kizna.html.util.HTMLParserException: HTMLReader.readElement() : Error occurr ed while trying to read the next element; com.kizna.html.util.HTMLParserException: HTMLReader.readElement() : Error occurr ed while trying to decipher the tag using scanners; com.kizna.html.util.HTMLParserException: HTMLTag.scan() : Error while scanning t ag, tag contents = a href="/feature/special/fallvt/more/*http://www.yahoovacationstore.com/ge taway/de fault.asp?.l=Y&.gt=Hello!", tagLine = href="/feature/special/fallvt/more/*http:/ /www.yahoovacationstore.com/getaway/default.asp?.l=Y&.gt=Hello!">more... </; com.kizna.html.util.HTMLParserException: HTMLLinkScanner.scan() : Error while sc anning a link tag, current line = href="/feature/special/fallvt/more/*http://www .yahoovacationstore.com/getaway/default.asp?.l=Y&.gt=Hello!">more...</; java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.String.charAt(String.java:455) at com.kizna.html.scanners.HTMLLinkScanner.scan(HTMLLinkScanner.java:223 ) Joe Ryburn Technical Director Lead Router LLC Office 501-221-8865 Mobile 501-249-5015 |
|
From: <dha...@or...> - 2002-09-18 07:55:33
|
Hi Somik, =A0 Just to update=A0the below-mentioned=A0list with a=A0bug I had reported earlier : =A0 If there are some special characters(we found a problem with <) within HTML comments then all lines upto that line(on which the charcter is present) gets deleted when you reprint the tag(using toHTML()). I have been using Node.toHTML() and I am assuming that the tag will get parsed as a HTMLRemarkNode and its toHTML() will get called. Whatever the case the output is distinctly different from the input. Even the starting HTML comments i.e. <!--=A0 get deleted. Regards,=20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 =A0 -----Original Message----- From: somik [mailto:so...@ya...] Sent: Wednesday, September 18, 2002 11:55 AM To: htmlparser-user Cc: somik Subject: Re: [Htmlparser-user] Parsing 'Base' tag .... =20 =20 =20 Hi Joe, =A0=A0=A0 Thanks for bringing this up - its been on my mind for a whil= e. We should handle this before making the production release. As of now, we have a couple of issues to sort out : [1] Ensure all testcases pass on Linux [2] Look into Dhaval's reports of modification of representation of tags [3] Handle Base Tags as a special case of the link tag [4] Add functionality to the feedback API=20 =A0 I might be able to spend some time on these from next week. But any help in any of these is highly appreciated.=20 =A0 Regards, Somik =20 |
|
From: Somik R. <so...@ya...> - 2002-09-18 06:24:27
|
MessageHi Joe,
Thanks for bringing this up - its been on my mind for a while. We =
should handle this before making the production release.
As of now, we have a couple of issues to sort out :
[1] Ensure all testcases pass on Linux
[2] Look into Dhaval's reports of modification of representation of tags
[3] Handle Base Tags as a special case of the link tag
[4] Add functionality to the feedback API=20
I might be able to spend some time on these from next week. But any help =
in any of these is highly appreciated.=20
Regards,
Somik
|
|
From: Joe R. <jo...@le...> - 2002-09-17 16:08:44
|
I believe any 'Base' tag encountered should be parsed as a special case of the Link tag, so the base URL is easily extracted. Has anyone made this modification? Regards, Joe Ryburn Technical Director Lead Router LLC Office 501-221-8865 Mobile 501-249-5015 |
|
From: ope t. <op...@ho...> - 2002-09-16 16:23:28
|
I will update the parser and let you know the results.. Thanks >From: htm...@li... >Reply-To: htm...@li... >To: htm...@li... >Subject: Htmlparser-user digest, Vol 1 #120 - 2 msgs >Date: Fri, 13 Sep 2002 12:08:16 -0700 > >Send Htmlparser-user mailing list submissions to > htm...@li... > >To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/htmlparser-user >or, via email, send a message with subject or body 'help' to > htm...@li... > >You can reach the person managing the list at > htm...@li... > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Htmlparser-user digest..." > > >Today's Topics: > > 1. Re: help desperately needed! parser wont parse properly (Somik Raha) > 2. RE: Script tags bug (dha...@or...) > >--__--__-- > >Message: 1 >From: "Somik Raha" <so...@ya...> >To: <htm...@li...> >Subject: Re: [Htmlparser-user] help desperately needed! parser wont parse >properly >Date: Fri, 13 Sep 2002 11:24:30 +0530 >Reply-To: htm...@li... > >Hi, > You can try the same thing with runParser http://www.amazon.com -l > It works fine for me, but from your code it looks like you are using >htmlparser 1.1. That is very old. > Can u upgrade to the latest integration release ? > >Regards, >Somik >----- Original Message ----- >From: "ope tomori" <op...@ho...> >To: <htm...@li...> >Sent: Friday, September 13, 2002 12:23 AM >Subject: [Htmlparser-user] help desperately needed! parser wont parse >properly > > > > > > > > Hello anyone.. Im using this parser on a research project. Im building a > > browser in java, using JEditorPane as the panel that displays the html >on > > the websites. I have succeeded in doing that. > > > > The next step was to parse the links on the website and we came across >this > > parser, anyway, i set up the kizna classes and i used this piece of >code: > > > > file://this is in the actionPerformed function, when you press the "GO" >Button > > > > HTMLParser parser = new HTMLParser(urlAddress); > > parser.registerScanners(); > > for (Enumeration e = parser.elements();e.hasMoreElements();) { > > HTMLNode node = (HTMLNode)e.nextElement(); > > if (node instanceof HTMLLinkTag) { > > HTMLLinkTag linkTag = (HTMLLinkTag)node; > > System.out.println("Link Tag is " + linkTag.getLink()); > > } > > } > > > > when i run the browser with say, amazon.com, this is the result i get: > > ***************************************************** > > Address : http://www.amazon.com > > tagContents: a > > >href="http://www.amazon.com/exec/obidos/subst/home/home.html/ref=wt_404page/ >" > > Link Tag is > > http://www.amazon.com/exec/obidos/subst/home/home.html/ref=wt_404page/ > > tagContents: table border=0 align=center cellpadding=4 > > tagContents: a > > >href="http://www.amazon.com/exec/obidos/subst/home/home.html/ref=404page/" > > Link Tag is > > http://www.amazon.com/exec/obidos/subst/home/home.html/ref=404page/ > > > > > > ***********************************88 > > > > when i checked the link tag, its redirects to the amazon home page. Can > > someone pls tell me what im doing wrong? > > > > Thanks > > > > > > > > _________________________________________________________________ > > Send and receive Hotmail on your mobile device: http://mobile.msn.com > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > >--__--__-- > >Message: 2 >From: dha...@or... >Date: Fri, 13 Sep 2002 12:08:26 +0530 >Subject: RE: [Htmlparser-user] Script tags bug >TO: htm...@li... >Reply-To: htm...@li... > > >--openmail-part-106f1235-00000002 >Content-Type: text/plain; charset=ISO-8859-1; name="BDY.RTF" >Content-Disposition: inline; filename="BDY.RTF" >Content-Transfer-Encoding: 8bit > >The following bug only occurs if JavaScript is written within HTML >comment tags. The same comment written outside of JavaScript comment >tags works fine. > >One more parsing bug that we have come across and I'd like to report. > >If I have a tag as follows <TEXTAREA name="JohnDoe" ></TEXTAREA> (Note >the space before the closing '>' of TEXTAREA tag). > >On reproduction using toHTML() of TEXTAREA I get the following ><TEXTAREA ="" name="JohnDoe"></TEXTAREA> > >I think this might have been introduced with the fix which took names >without values and assigned blank strings to them. > >Regards, > >Dhaval Udani >Senior Analyst >M-Line, QPEG >OrbiTech Solutions Ltd. >+91-22-8290019 Extn. 1457 > > > >-----Original Message----- >From: Udani, Dhaval H. >Sent: Thursday, September 12, 2002 2:05 PM >To: htmlparser-user >Cc: Udani, Dhaval H. >Subject: [Htmlparser-user] Script tags bug > > >Hi, > >The following code : > ><SCRIPT Language="JavaScript"> ><!-- >function validateForm() >{ >var i = 10 ; >if(i < 5) >i = i - 1 ; >return true; >} >// --> > >gets converted to : > ><SCRIPT Language="JavaScript"> >if(i < 5) >i = i - 1 ; >return true; >} >// --> ></SCRIPT> > > >We have analyzed that the problem is occurring because of the '<' >character in the if statement. If the character is change to say '==' >then the problem does not occur. I think some parsing logic will need to >be corrected for data within <SCRIPT> tags. > >Also in many cases the ending script tag i.e. </SCRIPT> comes on the >same line as the last tag i.e in this particluar case on the line of // >-->. This will potentially cause </SCRIPT> to appear as a JavaScript >comment. I think whatever be the condition </SCRIPT> should always be >put on a new line. > >Regards, > >Dhaval Udani >Senior Analyst >M-Line, QPEG >OrbiTech Solutions Ltd. >+91-22-8290019 Extn. 1457 > > > >-----Original Message----- >From: somik [mailto:so...@ya...] >Sent: Wednesday, September 11, 2002 6:54 AM >To: htmlparser-user >Cc: somik >Subject: Re: [Htmlparser-user] Anyone monitor this > > >Hi Barry > Which version are u using ? Do u have the latest integration release >? > >Regards, >Somik >----- Original Message ----- >From: "Barry Newman" <bar...@am...> >To: <htm...@li...> >Sent: Wednesday, September 11, 2002 2:30 AM >Subject: [Htmlparser-user] Anyone monitor this > > > > Don't know if anyone is monitoring this list, but I was wondering if >anyone > > had a patch for the problem where text before a comment tag is not >parsed > > correctly. I noticed on the sourceforge site that that bug was >reported > > and fixed and I am experiencing the same problem. Wondering if anyone >has > > the code to fix this? > > > > Thanks. > > > > > > > > > > Barry Newman > > Principal > > > > AMS > > Bar...@AM... > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: OSDN - Tired of that same old > > cell phone? Get a new here for FREE! > > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > >------------------------------------------------------- >In remembrance >www.osdn.com/911/ >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >--openmail-part-106f1235-00000002 >Content-Type: application/rtf; name="BDY.RTF" >Content-Disposition: attachment; filename="BDY.RTF" >Content-Transfer-Encoding: base64 > >e1xydGYxXGFuc2lcYW5zaWNwZzEyNTJcZnJvbXRleHQgXGRlZmYwe1xmb250dGJsDQp7XGYw >XGZzd2lzcyBBcmlhbDt9DQp7XGYxXGZtb2Rlcm4gQ291cmllciBOZXc7fQ0Ke1xmMlxmbmls >XGZjaGFyc2V0MiBTeW1ib2w7fQ0Ke1xmM1xmbW9kZXJuXGZjaGFyc2V0MCBDb3VyaWVyIE5l >dzt9fQ0Ke1xjb2xvcnRibFxyZWQwXGdyZWVuMFxibHVlMDtccmVkMFxncmVlbjBcYmx1ZTI1 >NTt9DQpcdWMxXHBhcmRccGxhaW5cZGVmdGFiMzYwIFxmMFxmczIwXGNmMCBUaGUgZm9sbG93 >aW5nIGJ1ZyBvbmx5IG9jY3VycyBpZiBKYXZhU2NyaXB0IGlzIHdyaXR0ZW4gd2l0aGluIEhU >TUwgY29tbWVudCB0YWdzLiBUaGUgc2FtZSBjb21tZW50IHdyaXR0ZW4gb3V0c2lkZSBvZiBK >YXZhU2NyaXB0IGNvbW1lbnQgdGFncyB3b3JrcyBmaW5lLlxwYXINClxwYXINCk9uZSBtb3Jl >IHBhcnNpbmcgYnVnIHRoYXQgd2UgaGF2ZSBjb21lIGFjcm9zcyBhbmQgSSdkIGxpa2UgdG8g >cmVwb3J0LlxwYXINClxwYXINCklmIEkgaGF2ZSBhIHRhZyBhcyBmb2xsb3dzIDxURVhUQVJF >QSBuYW1lPSJKb2huRG9lIiA+PC9URVhUQVJFQT4gKE5vdGUgdGhlIHNwYWNlIGJlZm9yZSB0 >aGUgY2xvc2luZyAnPicgb2YgVEVYVEFSRUEgdGFnKS5ccGFyDQpccGFyDQpPbiByZXByb2R1 >Y3Rpb24gdXNpbmcgdG9IVE1MKCkgb2YgVEVYVEFSRUEgSSBnZXQgdGhlIGZvbGxvd2luZ1xw >YXINCjxURVhUQVJFQSA9IiIgbmFtZT0iSm9obkRvZSI+PC9URVhUQVJFQT5ccGFyDQpccGFy >DQpJIHRoaW5rIHRoaXMgbWlnaHQgaGF2ZSBiZWVuIGludHJvZHVjZWQgd2l0aCB0aGUgZml4 >IHdoaWNoIHRvb2sgbmFtZXMgd2l0aG91dCB2YWx1ZXMgYW5kIGFzc2lnbmVkIGJsYW5rIHN0 >cmluZ3MgdG8gdGhlbS5ccGFyDQpccGFyDQpSZWdhcmRzLFxwYXINClxwYXINCkRoYXZhbCBV >ZGFuaVxwYXINClNlbmlvciBBbmFseXN0XHBhcg0KTS1MaW5lLCBRUEVHXHBhcg0KT3JiaVRl >Y2ggU29sdXRpb25zIEx0ZC5ccGFyDQorOTEtMjItODI5MDAxOSBFeHRuLiAxNDU3XHBhcg0K >XHBhcg0KXHBhcg0KXHBhcg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS1ccGFyDQpGcm9t >OiBVZGFuaSwgRGhhdmFsIEguIFxwYXINClNlbnQ6IFRodXJzZGF5LCBTZXB0ZW1iZXIgMTIs >IDIwMDIgMjowNSBQTVxwYXINClRvOiBodG1scGFyc2VyLXVzZXJccGFyDQpDYzogVWRhbmks >IERoYXZhbCBILlxwYXINClN1YmplY3Q6IFtIdG1scGFyc2VyLXVzZXJdIFNjcmlwdCB0YWdz >IGJ1Z1xwYXINClxwYXINClxwYXINCkhpLFxwYXINClxwYXINClRoZSBmb2xsb3dpbmcgY29k >ZSA6XHBhcg0KXHBhcg0KPFNDUklQVCBMYW5ndWFnZT0iSmF2YVNjcmlwdCI+XHBhcg0KPCEt >LVxwYXINCmZ1bmN0aW9uIHZhbGlkYXRlRm9ybSgpXHBhcg0KXHtccGFyDQp2YXIgaSA9IDEw >IDtccGFyDQppZihpIDwgNSlccGFyDQppID0gaSAtIDEgOyBccGFyDQpyZXR1cm4gdHJ1ZTtc >cGFyDQpcfVxwYXINCi8vIC0tPlxwYXINClxwYXINCmdldHMgY29udmVydGVkIHRvIDpccGFy >DQpccGFyDQo8U0NSSVBUIExhbmd1YWdlPSJKYXZhU2NyaXB0Ij5ccGFyDQppZihpIDwgNSlc >cGFyDQppID0gaSAtIDEgOyBccGFyDQpyZXR1cm4gdHJ1ZTtccGFyDQpcfVxwYXINCi8vIC0t >PlxwYXINCjwvU0NSSVBUPlxwYXINClxwYXINClxwYXINCldlIGhhdmUgYW5hbHl6ZWQgdGhh >dCB0aGUgcHJvYmxlbSBpcyBvY2N1cnJpbmcgYmVjYXVzZSBvZiB0aGUgJzwnIGNoYXJhY3Rl >ciBpbiB0aGUgaWYgc3RhdGVtZW50LiBJZiB0aGUgY2hhcmFjdGVyIGlzIGNoYW5nZSB0byBz >YXkgJz09JyB0aGVuIHRoZSBwcm9ibGVtIGRvZXMgbm90IG9jY3VyLiBJIHRoaW5rIHNvbWUg >cGFyc2luZyBsb2dpYyB3aWxsIG5lZWQgdG8gYmUgY29ycmVjdGVkIGZvciBkYXRhIHdpdGhp >biA8U0NSSVBUPiB0YWdzLlxwYXINClxwYXINCkFsc28gaW4gbWFueSBjYXNlcyB0aGUgZW5k >aW5nIHNjcmlwdCB0YWcgaS5lLiA8L1NDUklQVD4gY29tZXMgb24gdGhlIHNhbWUgbGluZSBh >cyB0aGUgbGFzdCB0YWcgaS5lIGluIHRoaXMgcGFydGljbHVhciBjYXNlIG9uIHRoZSBsaW5l >IG9mIC8vIC0tPi4gVGhpcyB3aWxsIHBvdGVudGlhbGx5IGNhdXNlIDwvU0NSSVBUPiB0byBh >cHBlYXIgYXMgYSBKYXZhU2NyaXB0IGNvbW1lbnQuIEkgdGhpbmsgd2hhdGV2ZXIgYmUgdGhl >IGNvbmRpdGlvbiA8L1NDUklQVD4gc2hvdWxkIGFsd2F5cyBiZSBwdXQgb24gYSBuZXcgbGlu >ZS5ccGFyDQpccGFyDQpSZWdhcmRzLFxwYXINClxwYXINCkRoYXZhbCBVZGFuaVxwYXINClNl >bmlvciBBbmFseXN0XHBhcg0KTS1MaW5lLCBRUEVHXHBhcg0KT3JiaVRlY2ggU29sdXRpb25z >IEx0ZC5ccGFyDQorOTEtMjItODI5MDAxOSBFeHRuLiAxNDU3XHBhcg0KXHBhcg0KXHBhcg0K >XHBhcg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS1ccGFyDQpGcm9tOiBzb21payBbbWFp >bHRvOnNvbWlrQHlhaG9vLmNvbV1ccGFyDQpTZW50OiBXZWRuZXNkYXksIFNlcHRlbWJlciAx >MSwgMjAwMiA2OjU0IEFNXHBhcg0KVG86IGh0bWxwYXJzZXItdXNlclxwYXINCkNjOiBzb21p >a1xwYXINClN1YmplY3Q6IFJlOiBbSHRtbHBhcnNlci11c2VyXSBBbnlvbmUgbW9uaXRvciB0 >aGlzXHBhcg0KXHBhcg0KXHBhcg0KSGkgQmFycnlccGFyDQogICAgV2hpY2ggdmVyc2lvbiBh >cmUgdSB1c2luZyA/IERvIHUgaGF2ZSB0aGUgbGF0ZXN0IGludGVncmF0aW9uIHJlbGVhc2Ug >P1xwYXINClxwYXINClJlZ2FyZHMsXHBhcg0KU29taWtccGFyDQotLS0tLSBPcmlnaW5hbCBN >ZXNzYWdlIC0tLS0tXHBhcg0KRnJvbTogIkJhcnJ5IE5ld21hbiIgPGJhcnJ5Lm5ld21hbkBh >bXMuY29tPlxwYXINClRvOiA8aHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5l >dD5ccGFyDQpTZW50OiBXZWRuZXNkYXksIFNlcHRlbWJlciAxMSwgMjAwMiAyOjMwIEFNXHBh >cg0KU3ViamVjdDogW0h0bWxwYXJzZXItdXNlcl0gQW55b25lIG1vbml0b3IgdGhpc1xwYXIN >ClxwYXINClxwYXINCj4gRG9uJ3Qga25vdyBpZiBhbnlvbmUgaXMgbW9uaXRvcmluZyB0aGlz >IGxpc3QsIGJ1dCBJIHdhcyB3b25kZXJpbmcgaWZccGFyDQphbnlvbmVccGFyDQo+IGhhZCBh >IHBhdGNoIGZvciB0aGUgcHJvYmxlbSB3aGVyZSB0ZXh0IGJlZm9yZSBhIGNvbW1lbnQgdGFn >IGlzIG5vdCBwYXJzZWRccGFyDQo+IGNvcnJlY3RseS4gIEkgbm90aWNlZCBvbiB0aGUgc291 >cmNlZm9yZ2Ugc2l0ZSB0aGF0IHRoYXQgYnVnIHdhcyByZXBvcnRlZFxwYXINCj4gYW5kIGZp >eGVkIGFuZCBJIGFtIGV4cGVyaWVuY2luZyB0aGUgc2FtZSBwcm9ibGVtLiBXb25kZXJpbmcg >aWYgYW55b25lIGhhc1xwYXINCj4gdGhlIGNvZGUgdG8gZml4IHRoaXM/XHBhcg0KPlxwYXIN >Cj4gVGhhbmtzLlxwYXINCj5ccGFyDQo+XHBhcg0KPlxwYXINCj5ccGFyDQo+IEJhcnJ5IE5l >d21hblxwYXINCj4gUHJpbmNpcGFsXHBhcg0KPlxwYXINCj4gQU1TXHBhcg0KPiBCYXJyeV9O >ZXdtYW5AQU1TLmNvbVxwYXINCj5ccGFyDQo+XHBhcg0KPlxwYXINCj5ccGFyDQo+IC0tLS0t >LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS1ccGFy >DQo+IFRoaXMgc2YubmV0IGVtYWlsIGlzIHNwb25zb3JlZCBieTogT1NETiAtIFRpcmVkIG9m >IHRoYXQgc2FtZSBvbGRccGFyDQo+IGNlbGwgcGhvbmU/ICBHZXQgYSBuZXcgaGVyZSBmb3Ig >RlJFRSFccGFyDQo+IGh0dHBzOi8vd3d3LmlucGhvbmljLmNvbS9yLmFzcD9yPXNvdXJjZWZv >cmdlMSZyZWZjb2RlMT12czMzOTBccGFyDQo+IF9fX19fX19fX19fX19fX19fX19fX19fX19f >X19fX19fX19fX19fX19fX19fX19fXHBhcg0KPiBIdG1scGFyc2VyLXVzZXIgbWFpbGluZyBs >aXN0XHBhcg0KPiBIdG1scGFyc2VyLXVzZXJAbGlzdHMuc291cmNlZm9yZ2UubmV0XHBhcg0K >PiBodHRwczovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9saXN0aW5mby9odG1scGFy >c2VyLXVzZXJccGFyDQpccGFyDQpccGFyDQpccGFyDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0t >LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tXHBhcg0KSW4gcmVtZW1icmFuY2Vc >cGFyDQp3d3cub3Nkbi5jb20vOTExL1xwYXINCl9fX19fX19fX19fX19fX19fX19fX19fX19f >X19fX19fX19fX19fX19fX19fX19fXHBhcg0KSHRtbHBhcnNlci11c2VyIG1haWxpbmcgbGlz >dFxwYXINCkh0bWxwYXJzZXItdXNlckBsaXN0cy5zb3VyY2Vmb3JnZS5uZXRccGFyDQpodHRw >czovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9saXN0aW5mby9odG1scGFyc2VyLXVz >ZXJccGFyDQp9 > >--openmail-part-106f1235-00000002-- > > > > >--__--__-- > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >End of Htmlparser-user Digest _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx |
|
From: Claude D. <CD...@ar...> - 2002-09-13 19:28:29
|
The integration builds are very stable, thanks to a comprehensive suite
of unit tests. We have tested several hundred thousand files in our
production system without any significant problems.
-----Original Message-----
From: Stephen Harrington [mailto:Ste...@tr...]=20
Sent: Friday, September 13, 2002 12:18 PM
To: htm...@li...
Subject: [Htmlparser-user] Production vs. Integration?
I am a relatively new user of the htmlparser. It is being utilized in a
production system which we deliver, so I am not so keen on using
"integration builds". I am utilizing Version 1.1
I saw the following in response to a question:
Hi,
You can try the same thing with runParser http://www.amazon.com -l
It works fine for me, but from your code it looks like you are using
htmlparser 1.1. That is very old.
Can u upgrade to the latest integration release ?
How stable are the integration builds? Could you suggest one which
would be appropriate for a production system?
Thanks,
--stephen
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: Stephen H. <Ste...@tr...> - 2002-09-13 19:20:48
|
I am a relatively new user of the htmlparser. It is being utilized in a
production system which we deliver, so I am not so keen on using
"integration builds". I am utilizing Version 1.1
I saw the following in response to a question:
Hi,
You can try the same thing with runParser http://www.amazon.com -l
It works fine for me, but from your code it looks like you are using
htmlparser 1.1. That is very old.
Can u upgrade to the latest integration release ?
How stable are the integration builds? Could you suggest one which
would be appropriate for a production system?
Thanks,
--stephen
|
|
From: <dha...@or...> - 2002-09-13 06:38:47
|
The following bug only occurs if JavaScript is written within HTML
comment tags. The same comment written outside of JavaScript comment
tags works fine.
One more parsing bug that we have come across and I'd like to report.
If I have a tag as follows <TEXTAREA name="JohnDoe" ></TEXTAREA> (Note
the space before the closing '>' of TEXTAREA tag).
On reproduction using toHTML() of TEXTAREA I get the following
<TEXTAREA ="" name="JohnDoe"></TEXTAREA>
I think this might have been introduced with the fix which took names
without values and assigned blank strings to them.
Regards,
Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457
-----Original Message-----
From: Udani, Dhaval H.
Sent: Thursday, September 12, 2002 2:05 PM
To: htmlparser-user
Cc: Udani, Dhaval H.
Subject: [Htmlparser-user] Script tags bug
Hi,
The following code :
<SCRIPT Language="JavaScript">
<!--
function validateForm()
{
var i = 10 ;
if(i < 5)
i = i - 1 ;
return true;
}
// -->
gets converted to :
<SCRIPT Language="JavaScript">
if(i < 5)
i = i - 1 ;
return true;
}
// -->
</SCRIPT>
We have analyzed that the problem is occurring because of the '<'
character in the if statement. If the character is change to say '=='
then the problem does not occur. I think some parsing logic will need to
be corrected for data within <SCRIPT> tags.
Also in many cases the ending script tag i.e. </SCRIPT> comes on the
same line as the last tag i.e in this particluar case on the line of //
-->. This will potentially cause </SCRIPT> to appear as a JavaScript
comment. I think whatever be the condition </SCRIPT> should always be
put on a new line.
Regards,
Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457
-----Original Message-----
From: somik [mailto:so...@ya...]
Sent: Wednesday, September 11, 2002 6:54 AM
To: htmlparser-user
Cc: somik
Subject: Re: [Htmlparser-user] Anyone monitor this
Hi Barry
Which version are u using ? Do u have the latest integration release
?
Regards,
Somik
----- Original Message -----
From: "Barry Newman" <bar...@am...>
To: <htm...@li...>
Sent: Wednesday, September 11, 2002 2:30 AM
Subject: [Htmlparser-user] Anyone monitor this
> Don't know if anyone is monitoring this list, but I was wondering if
anyone
> had a patch for the problem where text before a comment tag is not
parsed
> correctly. I noticed on the sourceforge site that that bug was
reported
> and fixed and I am experiencing the same problem. Wondering if anyone
has
> the code to fix this?
>
> Thanks.
>
>
>
>
> Barry Newman
> Principal
>
> AMS
> Bar...@AM...
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: OSDN - Tired of that same old
> cell phone? Get a new here for FREE!
> https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
-------------------------------------------------------
In remembrance
www.osdn.com/911/
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: Somik R. <so...@ya...> - 2002-09-13 05:54:44
|
Hi,
You can try the same thing with runParser http://www.amazon.com -l
It works fine for me, but from your code it looks like you are using
htmlparser 1.1. That is very old.
Can u upgrade to the latest integration release ?
Regards,
Somik
----- Original Message -----
From: "ope tomori" <op...@ho...>
To: <htm...@li...>
Sent: Friday, September 13, 2002 12:23 AM
Subject: [Htmlparser-user] help desperately needed! parser wont parse
properly
>
>
> Hello anyone.. Im using this parser on a research project. Im building a
> browser in java, using JEditorPane as the panel that displays the html on
> the websites. I have succeeded in doing that.
>
> The next step was to parse the links on the website and we came across
this
> parser, anyway, i set up the kizna classes and i used this piece of code:
>
> file://this is in the actionPerformed function, when you press the "GO"
Button
>
> HTMLParser parser = new HTMLParser(urlAddress);
> parser.registerScanners();
> for (Enumeration e = parser.elements();e.hasMoreElements();) {
> HTMLNode node = (HTMLNode)e.nextElement();
> if (node instanceof HTMLLinkTag) {
> HTMLLinkTag linkTag = (HTMLLinkTag)node;
> System.out.println("Link Tag is " + linkTag.getLink());
> }
> }
>
> when i run the browser with say, amazon.com, this is the result i get:
> *****************************************************
> Address : http://www.amazon.com
> tagContents: a
>
href="http://www.amazon.com/exec/obidos/subst/home/home.html/ref=wt_404page/
"
> Link Tag is
> http://www.amazon.com/exec/obidos/subst/home/home.html/ref=wt_404page/
> tagContents: table border=0 align=center cellpadding=4
> tagContents: a
> href="http://www.amazon.com/exec/obidos/subst/home/home.html/ref=404page/"
> Link Tag is
> http://www.amazon.com/exec/obidos/subst/home/home.html/ref=404page/
>
>
> ***********************************88
>
> when i checked the link tag, its redirects to the amazon home page. Can
> someone pls tell me what im doing wrong?
>
> Thanks
>
>
>
> _________________________________________________________________
> Send and receive Hotmail on your mobile device: http://mobile.msn.com
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: ope t. <op...@ho...> - 2002-09-12 18:54:07
|
Hello anyone.. Im using this parser on a research project. Im building a
browser in java, using JEditorPane as the panel that displays the html on
the websites. I have succeeded in doing that.
The next step was to parse the links on the website and we came across this
parser, anyway, i set up the kizna classes and i used this piece of code:
//this is in the actionPerformed function, when you press the "GO" Button
HTMLParser parser = new HTMLParser(urlAddress);
parser.registerScanners();
for (Enumeration e = parser.elements();e.hasMoreElements();) {
HTMLNode node = (HTMLNode)e.nextElement();
if (node instanceof HTMLLinkTag) {
HTMLLinkTag linkTag = (HTMLLinkTag)node;
System.out.println("Link Tag is " + linkTag.getLink());
}
}
when i run the browser with say, amazon.com, this is the result i get:
*****************************************************
Address : http://www.amazon.com
tagContents: a
href="http://www.amazon.com/exec/obidos/subst/home/home.html/ref=wt_404page/"
Link Tag is
http://www.amazon.com/exec/obidos/subst/home/home.html/ref=wt_404page/
tagContents: table border=0 align=center cellpadding=4
tagContents: a
href="http://www.amazon.com/exec/obidos/subst/home/home.html/ref=404page/"
Link Tag is
http://www.amazon.com/exec/obidos/subst/home/home.html/ref=404page/
***********************************88
when i checked the link tag, its redirects to the amazon home page. Can
someone pls tell me what im doing wrong?
Thanks
_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com
|
|
From: <dha...@or...> - 2002-09-12 08:35:55
|
Hi,
The following code :
<SCRIPT Language="JavaScript">
<!--
function validateForm()
{
var i = 10 ;
if(i < 5)
i = i - 1 ;
return true;
}
// -->
gets converted to :
<SCRIPT Language="JavaScript">
if(i < 5)
i = i - 1 ;
return true;
}
// -->
</SCRIPT>
We have analyzed that the problem is occurring because of the '<'
character in the if statement. If the character is change to say '=='
then the problem does not occur. I think some parsing logic will need to
be corrected for data within <SCRIPT> tags.
Also in many cases the ending script tag i.e. </SCRIPT> comes on the
same line as the last tag i.e in this particluar case on the line of //
-->. This will potentially cause </SCRIPT> to appear as a JavaScript
comment. I think whatever be the condition </SCRIPT> should always be
put on a new line.
Regards,
Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457
-----Original Message-----
From: somik [mailto:so...@ya...]
Sent: Wednesday, September 11, 2002 6:54 AM
To: htmlparser-user
Cc: somik
Subject: Re: [Htmlparser-user] Anyone monitor this
Hi Barry
Which version are u using ? Do u have the latest integration release
?
Regards,
Somik
----- Original Message -----
From: "Barry Newman" <bar...@am...>
To: <htm...@li...>
Sent: Wednesday, September 11, 2002 2:30 AM
Subject: [Htmlparser-user] Anyone monitor this
> Don't know if anyone is monitoring this list, but I was wondering if
anyone
> had a patch for the problem where text before a comment tag is not
parsed
> correctly. I noticed on the sourceforge site that that bug was
reported
> and fixed and I am experiencing the same problem. Wondering if anyone
has
> the code to fix this?
>
> Thanks.
>
>
>
>
> Barry Newman
> Principal
>
> AMS
> Bar...@AM...
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: OSDN - Tired of that same old
> cell phone? Get a new here for FREE!
> https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
-------------------------------------------------------
In remembrance
www.osdn.com/911/
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: Somik R. <so...@ya...> - 2002-09-11 01:23:38
|
Hi Barry
Which version are u using ? Do u have the latest integration release ?
Regards,
Somik
----- Original Message -----
From: "Barry Newman" <bar...@am...>
To: <htm...@li...>
Sent: Wednesday, September 11, 2002 2:30 AM
Subject: [Htmlparser-user] Anyone monitor this
> Don't know if anyone is monitoring this list, but I was wondering if
anyone
> had a patch for the problem where text before a comment tag is not parsed
> correctly. I noticed on the sourceforge site that that bug was reported
> and fixed and I am experiencing the same problem. Wondering if anyone has
> the code to fix this?
>
> Thanks.
>
>
>
>
> Barry Newman
> Principal
>
> AMS
> Bar...@AM...
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: OSDN - Tired of that same old
> cell phone? Get a new here for FREE!
> https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: Barry N. <bar...@am...> - 2002-09-10 21:00:58
|
Don't know if anyone is monitoring this list, but I was wondering if anyone had a patch for the problem where text before a comment tag is not parsed correctly. I noticed on the sourceforge site that that bug was reported and fixed and I am experiencing the same problem. Wondering if anyone has the code to fix this? Thanks. Barry Newman Principal AMS Bar...@AM... |
|
From: Somik R. <so...@ya...> - 2002-09-06 09:49:19
|
Hi Folks,
I am on the road now, in Singapore.. I will be travelling the next 2
weeks, so I cannot be regular with mails till then.
Dhaval and Joe --> Indeed, I understand the issue, that sometimes things get
messed up. But storing all newlines would make the parser a whole lot more
complex than it is right now.
I am however open to examining it - when I find some time (from October).
This does not stop anyone from doing an anlysis and see what can be done.
That would be a good contribution for the community.
So pls go ahead- and let us know what you think.
Cheers,
Somik
----- Original Message -----
From: <dha...@or...>
To: <htm...@li...>
Sent: Friday, September 06, 2002 1:00 PM
Subject: RE: [Htmlparser-user] Quick beginner question...
Yeah thats the point. However not only new lines but any spaces, any
tabs etc. A well-formed HTML document if given to the parser must result
in the same at theoutput of the parser.
Regards,
Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457
-----Original Message-----
From: jryburn [mailto:jr...@ya...]
Sent: Thursday, September 05, 2002 7:14 PM
To: htmlparser-user
Cc: jryburn
Subject: RE: [Htmlparser-user] Quick beginner question...
I'd suggest that we also parse newlines on the input, and perhaps
store them as a tag as well. Then we can walk through the parsed code
and print all other tags without newlines, and the newline tag would
then be printed to output as a newline.
Joe Ryburn
Technical Director
Lead Router LLC
Office 501-221-8865
Mobile 501-249-5015
-----Original Message-----
From: htm...@li...
[mailto:htm...@li...] On Behalf Of
dha...@or...
Sent: Thursday, September 05, 2002 12:16 AM
To: htm...@li...
Subject: RE: [Htmlparser-user] Quick beginner question...
|
|
From: <dha...@or...> - 2002-09-06 05:00:48
|
Yeah thats the point. However not only new lines but any spaces, any
tabs etc. A well-formed HTML document if given to the parser must result
in the same at theoutput of the parser.
=A0
Regards,=20
Dhaval Udani=20
Senior Analyst=20
M-Line, QPEG=20
OrbiTech Solutions Ltd.=20
+91-22-8290019 Extn. 1457=20
=A0
-----Original Message-----
From: jryburn [mailto:jr...@ya...]
Sent: Thursday, September 05, 2002 7:14 PM
To: htmlparser-user
Cc: jryburn
Subject: RE: [Htmlparser-user] Quick beginner question...
=20
=20
=20
I'd suggest that we also parse newlines on the input, and perhaps
store them as a tag as well. Then we can walk through the parsed code
and print all other=A0tags without newlines, and the newline tag would
then be printed to output as a newline.=20
=A0
=20
=A0
Joe Ryburn
Technical Director
Lead Router LLC
Office=A0 501-221-8865
Mobile 501-249-5015
=A0
-----Original Message-----
From: htm...@li...
[mailto:htm...@li...] On Behalf Of
dha...@or...
Sent: Thursday, September 05, 2002 12:16 AM
To: htm...@li...
Subject: RE: [Htmlparser-user] Quick beginner question...
=20
=20
=20
|
|
From: Joe R. <jr...@ya...> - 2002-09-05 13:43:42
|
I'd suggest that we also parse newlines on the input, and perhaps store them as a tag as well. Then we can walk through the parsed code and print all other tags without newlines, and the newline tag would then be printed to output as a newline. Joe Ryburn Technical Director Lead Router LLC Office 501-221-8865 Mobile 501-249-5015 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of dha...@or... Sent: Thursday, September 05, 2002 12:16 AM To: htm...@li... Subject: RE: [Htmlparser-user] Quick beginner question... |
|
From: <dha...@or...> - 2002-09-05 05:18:26
|
Hi Joe,
In the HTMLParser, all the tags are printed on new lines. So the output
that you are gettign is expected output. We ahve discussed this problem
on the list. can probably check it out fromt he archives.
But more importantly, the HTMLParser is changing the HTML file given to
it which it must not do (whethrr it is a browser bug or not should be
immaterial to the parser). i.e. input and output of parser must match
specially in the presentation aspects. I too think something should be
done about it.
What do u say Somik?
Regards,
Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457
-----Original Message-----
From: jryburn [mailto:jr...@ya...]
Sent: Thursday, September 05, 2002 12:48 AM
To: htmlparser-user
Cc: jryburn
Subject: [Htmlparser-user] Quick beginner question...
I'm not sure if this is a browser bug or a parser bug, but the
following code...
<TABLE width="100%" cellspacing="0" cellpadding="1" border="0">
<TR>
<TD valign="top">
<FONT face="arial" size="-1"><B>•</B> </FONT>
</TD>
<TD>
<A href="s/15341"><FONT face="arial" size="-1">Bush vows to
seek Congress' OK on Iraq</FONT></A>
</TD>
</TR>
</TABLE>
when parsed by the html parser and rewritten, is output with the first
<TD> element broken up as follows...
<FONT face="arial" size="-1">
<B>
•
</B>
</FONT>
This renders differently than when they are joined. I didn't think
whitespace was supposed to affect presentation but here it seems to be
significant, in both Internet Explorer and Mozilla. This is from the
news headline table on the right of the main 'www.yahoo.com' page. Is
there a way to output this to render correctly using HTMLParser?
Regards,
Joe Ryburn
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: Joe R. <jr...@ya...> - 2002-09-04 19:17:56
|
I'm not sure if this is a browser bug or a parser bug, but the
following code...
<TABLE width="100%" cellspacing="0" cellpadding="1" border="0">
<TR>
<TD valign="top">
<FONT face="arial" size="-1"><B>•</B> </FONT>
</TD>
<TD>
<A href="s/15341"><FONT face="arial" size="-1">Bush vows to
seek Congress' OK on Iraq</FONT></A>
</TD>
</TR>
</TABLE>
when parsed by the html parser and rewritten, is output with the first
<TD> element broken up as follows...
<FONT face="arial" size="-1">
<B>
•
</B>
</FONT>
This renders differently than when they are joined. I didn't think
whitespace was supposed to affect presentation but here it seems to be
significant, in both Internet Explorer and Mozilla. This is from the
news headline table on the right of the main 'www.yahoo.com' page. Is
there a way to output this to render correctly using HTMLParser?
Regards,
Joe Ryburn
|
|
From: Somik R. <so...@ya...> - 2002-09-01 03:47:54
|
Hi Folks,
Integration Release 1.2-2002_08_31 is out.=20
Changes are :
[1] Feedback integrated into the API. Not yet functional - but will be =
over the next few releases. The API change has been put in early. This =
is the last planned change in the API for production release - 1.2.
[2] End of Line String implemented across all scanners. Some test cases =
might fail if run in linux - this will be fixed over the next =
integration release.=20
Regards,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-29 03:13:30
|
Hi Bahman,=20
Yes, this is possible. Actually, we do have a form scanner to make =
life easier.. But the Form scanner is not integrated (you can register =
urself), as bad form tags cannot be corrected, and this parser has an =
auto-correcting feature. We're in the process of finalizing v1.2 - maybe =
we can put back the form scanner, since we have our exception handling =
system in place.
To get started, download v1.2 (latest integration release), and look =
at some sample applications, in the com.kizna.html.parserapplications =
package. Its very easy to get the tags you want, and render it back to =
html, using toHTML(). In fact, check the threads in this list (from the =
archives) from Dhaval Udani - he also tackled a very similar problem.
Regards,
Somik
----- Original Message -----=20
From: bk...@cs...=20
To: htm...@li...=20
Sent: Thursday, August 29, 2002 8:54 AM
Subject: [Htmlparser-user] Annotatting a HTML file
Hi,
I have a problem that I want to see if I can use this HTML parser to =
solve it.
I need to read a HTML file from my file system, merge this file with =
a user=20
provided data on a HTML FORM and save it back to the file system. The =
idea is=20
that I want to annotate a HTML file.
Thanks in advance for your help.
--Bahman =20
---------------------------------------------
This message was sent using Endymion MailMan.
http://www.endymion.com/products/mailman/
-------------------------------------------------------
This sf.net email is sponsored by: Jabber - The world's fastest =
growing=20
real-time communications platform! Don't just IM. Build it in!=20
http://www.jabber.com/osdn/xim
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|
|
From: <bk...@cs...> - 2002-08-28 23:42:05
|
Hi, I have a problem that I want to see if I can use this HTML parser to solve it. I need to read a HTML file from my file system, merge this file with a user provided data on a HTML FORM and save it back to the file system. The idea is that I want to annotate a HTML file. Thanks in advance for your help. --Bahman --------------------------------------------- This message was sent using Endymion MailMan. http://www.endymion.com/products/mailman/ |
|
From: Somik R. <so...@ya...> - 2002-08-26 01:52:38
|
Hi Folks,
Integration Release 1.2-2002_08_26 is out. From the change log :
[1] Added new tag-scanner pairs for the following tags:
(i) INPUT
(ii) TEXTAREA
(iii) OPTION
(iv) SELECT
[2] The newline character being used in toHTML() was different across
different tags and also in some cases in scan() methods of scanners. It
has been changed to pick up the system property "line.separator" from
the JVM environment and that is used in toHTML() as well as scan().
Alternatively, users can set their own line separators using
HTMLParser.setLineSeparator() or HTMLNode.setLineSeparator(). Both have
the same effect.
[3] Fixed HTMLRemarkNode bugs (594301)
[1] and [2] were done by Dhaval Udhani, of OrbiTech Solutions Ltd.
[3] was done by John Zook. John - thanks for sending in the bugs and =
their fixes (its a pleasure to receive the latter with the bug reports).
Cheers,
Somik
|
|
From: Somik R. <so...@ya...> - 2002-08-18 00:57:15
|
Hi Chris,
This hasnt been attempted yet. If you do anything in this regard, we'd
be happy to include it in the parser.
Regards,
Somik
----- Original Message -----
From: "Chris Carey" <ch...@su...>
To: <htm...@li...>
Sent: Saturday, August 17, 2002 8:54 AM
Subject: [Htmlparser-user] Passing Cookies through parser
>
> I would like to pass cookies through the parser to the server, and have
> the server respond back with the cookie headers.
>
> Has anyone dealt with the issue of cookies through the parser?
>
> -Chris Carey
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: OSDN - Tired of that same old
> cell phone? Get a new here for FREE!
> https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|