htmlparser-developer Mailing List for HTML Parser (Page 31)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(1) |
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(12) |
Feb
|
Mar
(7) |
Apr
(27) |
May
(14) |
Jun
(16) |
Jul
(27) |
Aug
(74) |
Sep
(1) |
Oct
(23) |
Nov
(12) |
Dec
(119) |
2003 |
Jan
(31) |
Feb
(23) |
Mar
(28) |
Apr
(59) |
May
(119) |
Jun
(10) |
Jul
(3) |
Aug
(17) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(1) |
2004 |
Jan
(4) |
Feb
(4) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(7) |
Jul
(6) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
(1) |
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(10) |
Oct
(4) |
Nov
(15) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
(4) |
May
(11) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(5) |
Oct
(1) |
Nov
|
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
(2) |
May
|
Jun
(4) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
From: Somik R. <so...@ya...> - 2002-05-07 06:25:34
|
Hi Folks, Following some nice suggestions from Sam Joseph, I have just = completed some design modifications to the basic HTMLNode API. The modifications are : [1] HTMLNode is no longer an interface, but an abstract class. There = were two reasons for this change. Firstly, I couldnt think of a scenario = where an object would be an html tag AND something else. Secondly, I = wanted to enforce the implementation of toString(), which is usually = done if you implement from the interface (as Object has a default = toString()). [2] abstract toString() method - children have to implement this. [3] print() and print(PrintWriter) - final methods. They will make a = call to toString(), and print to standard output and the print writer = respectively. [4] toPlainText() - this method will provide a string representation of = a tag, if there is such a representation. If not , a blank string is = returned. This has implications - our program to extract all strings = from a html page will be simplified to: HTMLNode node; for (Enumeration e =3D parser.elements();e.hasMoreElements();) { node =3D (HTMLNode)e.nextElement(); System.out.println(node.toPlainTextString()); // or whatever = processing you want to do with the string } [5] toRawString() - this method provides the complete html element (a = reconstruction), thus allowing ripping programs to be really simple. So = if you want to rip the html page to your local hard disk, your program = would look like, PrintWriter pw =3D new PrintWriter(new FileWriter("...")); for (Enumeration e =3D parser.elements();e.hasMoreElements();) { node =3D (HTMLNode)e.nextElement(); pw.println(node.toRawString()); } pw.close(); [6] Lots of bug fixes done - HTMLImageScanner had a bug, = HTMLStyleScanner also had one - all caught with more testcases. We have 100 testcases as of now, all of them passing. To-do list for Release 1.2 ------------------------------------ [1] Integration of Raghavender Srimantula's contribution - = HTMLFrameScanner and HTMLFormScanner, into the parser. This will be = integrated as soon as I get the testcases from Raghav. [2] Adding an HTML Ripping program in the parserApplications package. [3] Improving the Robot Crawler (??) [4] Bug fixes to any bugs that get reported in this period. You can check out the latest code from CVS. Or you can go to = http://htmlparser.sourceforge.net and click on the download link, and = choose htmlparser1_2_20020507.zip Feedback is welcome. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-05-03 09:26:01
|
Hi Folks, A testing build is out - you can download it from = http://htmlparser.sourceforge.net (choose the download link). This is a = testing build with important bug fixes.=20 Regards, Somik |
From: Somik R. <so...@ya...> - 2002-05-03 08:15:23
|
Hi Folks, We seem to have a heroic parser now... You can check out the latest code from CVS. Here's the fix. As you know - if we have an additional erroneous = inverted comma in a tag, the parser cannot judge whether to treat this = as erroneous or valid. Now the parser has some amount of intelligence - = if it encounters an inverted comma, and a close tag character, then it = does a check to see whether it should treat this as an error or a valid = character. This decision making process is facilitated with a strictVector - = which holds the tags for which it should not make allowances. Currently, = there is only one - "INPUT" (Should we have any more? ). If the tag = being parsed is not a strict tag like INPUT, then it is assumed that = this is an erroneous tag and needs to be corrected. The correction process occurs (and is validated with some testcases = in HTMLTag - particularly testStrictParsing). If you go thru that = testcase - you will see that the attributes are also correctly = retrieved. This solution doesent break anything else - we have 82 testcases, = all passing. I'd be grateful if folks can test this version and let me know if = this solution is acceptable. =20 Also - a general question - would you prefer something like nightly = drop packages for downloading, or is a request to checkout from CVS fine = ? Thanks and Regards, Somik =20 |
From: Somik R. <so...@ya...> - 2002-05-02 03:30:52
|
Hi Folks, Thanks to an interesting bug report by Roger Sollberger, a bug in = HTMLStringNode has been fixed. Links of the type : <a href=3D"http://asgard.ch">[> ASGARD <]</a> would get messed up bcos of the tag symbols, when they should really be = a part of HTMLStringNode. This has been fixed (after the bug has been reproduced in a testcase in = HTMLStringNodeTest).=20 CVS code base updated. Roger --> Thanks a lot for the report. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-05-02 03:11:27
|
Hi Folks, If you've been following the latest exchange on htmlparser-user, = Annette has shown us a crazy example of dirty html, which works in the = browser, but crashes the parser. The site is http://www.cia.gov =20 Search for this string - <font face=3D"Arial,"helvetica," and you will find it in the html. Now this erroneous inverted comma = in front of helvetica should not be there.=20 This has been captured in a test case in HTMLTagTest.java (you can = get it from CVS), and this test fails (testParsing()). The problem is - the core parsing mechanism ignores anything within = inverted commas. This is critical so as to be able to accept angular = brackets in inverted commas. If we remove this feature from the parser = other tests will break. =20 So I need some suggestions on how we might modify our parsing - how = do we intelligently understand that this is an error (how easy it is for = us humans to figure this out) ? Looks like linear approaches wouldnt = work anymore... Maybe we need to associate some intelligence - that if = its a font tag, then this kind of stuff is most definitely an error. = Whereas if its a jsp tag, we can get more strict with our parsing. This = will probably cause a fundamental shift in our core parsing technique. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-05-02 02:59:22
|
Hi Annette, Regarding your second problem, the parsing error occurs because -=20 =20 <div align=3D"center"><font face=3D"Arial,"helvetica," = sans-serif=3D"sans-serif" size=3D"2" color=3D"#FFFFFF"><a = href=3D"/index.html" link=3D"#000000" vlink=3D"#000000"><font=20 In the above - font face=3D"Arial,"helvetica," -- note the erroneoue = extra " in front of helvetica. Remove it and the parsing is fine. Now of = course you cant remove it, bcos this site is not yours :). So, we do = have to support this kind of dirty html. Thank you so much for bringing = it to our notice. I have written a test case to reproduce this bug, and = am working to resolve this. Regards, Somik =20 <div align=3D"center"><font face=3D"Arial,"helvetica," = sans-serif=3D"sans-serif" size=3D"2" color=3D"#FFFFFF"><a = href=3D"/index.html" link=3D"#000000" vlink=3D"#000000"><font = color=3D"#FFFFFF">Home</font></a>=20 | <a href=3D"/cia/notices.html" link=3D"#000000" = vlink=3D"#000000"><font color=3D"#FFFFFF">Notices</font></a>=20 | <a href=3D"/cia/notices.html#priv" link=3D"#000000" = vlink=3D"#000000"><font color=3D"#FFFFFF">Privacy</font></a>=20 | <a href=3D"/cia/notices.html#sec" link=3D"#000000" = vlink=3D"#000000"><font color=3D"#FFFFFF">Security</font></a>=20 | <a href=3D"/cia/contact.htm" link=3D"#000000" = vlink=3D"#000000"><font color=3D"#FFFFFF">Contact Us</font></a> | <a href=3D"/cia/sitemap.html" link=3D"#000000" = vlink=3D"#000000"><font color=3D"#FFFFFF">Site Map</font></a> | <a href=3D"/cia/siteindex.html" link=3D"#000000" = vlink=3D"#000000"><font color=3D"#FFFFFF">Index</font></a> | <a href=3D"/search" link=3D"#000000" vlink=3D"#000000"><font = color=3D"#FFFFFF">Search</font></a>=20 </font></div> =20 Stops at=20 TAG LINE FOUND <div align=3D"center"><font = face=3D"Arial,"helvetica," sans-serif=3D"sans-serif" size=3D"2" = color=3D"#FFFFFF"><a href=3D"/index.html" link=3D"#000000" = vlink=3D"#000000"><font color=3D"#FFFFFF">Home</font></a>=20 LINE is <div align=3D"center"><font face=3D"Arial,"helvetica," = sans-serif=3D"sans-serif" size=3D"2" color=3D"#FFFFFF"><a = href=3D"/index.html" link=3D"#000000" vlink=3D"#000000"><font = color=3D"#FFFFFF">Home</font></a>=20 POSITION IS 26 TAGLINE 197 Process completed. =20 Annette Doyle =20 |
From: Raghavender S. <kin...@ho...> - 2002-04-29 23:37:03
|
Hi Somik, I encountered a strange problem today. while I was running htmlparser...I got a java.lang.OutOfMemoryError. seems that lot of objects are being allocated. where exactly is this happening. I mean could you give me an idea where or in which file the potential problem could be. Raghav >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >CC: <htm...@li...> >Subject: Re: [Htmlparser-user] Hints on how to change image tag locations >and write out document >Date: Sat, 27 Apr 2002 18:22:34 +0900 > >Hi Annette, > Pls find attached a program to get you started. This program will do >what you want - you will need to modify the construct that checks for the >image tag - and replace it with the location of your choice. > Also - I found one bug thanks to this requirement - image tags params >were not being correctly put in. Though it needs a deeper look, I have done >a quick fix for now, and all test cases are passing (with one test case in >HTMLImageScannerTest trapping this bug). > Please check out the latest html parser source code from CVS. > >Regards, >Somik > > ----- Original Message ----- > From: Doyle, Annette > To: htm...@li... > Sent: Friday, April 26, 2002 10:08 PM > Subject: [Htmlparser-user] Hints on how to change image tag locations >and write out document > > > Could you please give me some hints as how to change only image tag >locations and then, (or at the same time) write out the html document to >file (with new image tag locations)? > > > > Thanks- > > Annette Doyle > ><< ImageTagRetriever.java >> _________________________________________________________________ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com |
From: Somik R. <so...@ya...> - 2002-04-27 09:33:26
|
Hi Folks, =20 I am getting a lot of pain integrating html parser with Swing. It = seems like Sun doesent want folks to change their parser. I am trying to = come to terms with the fact that I need 72 if-thens, for all kinds of = tags. I had initially written an object framework to compare html parser = parsed objects with the swing parser objects, and its a nightmare, bcos = even simple tags are not being picked up correctly by the latter. Meta tags dont seem to work, or tags with attributes have the = attributes not showing up. I think its crazy for one person to do all of this, but if I can = have help - then I will put up this integration code, and maybe we'd be = able to get this done in a month (??) I guess this would be kind of prestigious if it gets finished - so = developers- pls let me know who volunteers to help in this enterprise. = (Its not hard really, but lots to be done) Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-04-26 03:43:51
|
Hi Annette, I just figured out what is happening... Sorry for the previous mail - this is not a bug in the parser. You see - the tags which werent getting reported as image tags, were sandwiched between link tags <A HREF="..."><IMG ..></A>. Hence, in your application, you will also need to watch out for link tags, and pick up the images from within should there be any. Now - if this causes you additional headaches, then dont register all the scanners, so the link scanner will not interfere, and you will only get the image tags. In order to prove that this analysis is correct - I added one more test case to HTMLImageScannerTest.java - testImageTagsFromYahooWithAllScannersRegistered() This test case extracts the link and checks that the image is found within. Also no of tags found is verified. You can check out this code from CVS, it might help you if you are interested in getting image tags out of link tags. Correspondingly, there is also testImageTagsFromYahoo() which passes (with only html image scanner registered). Let me know if you need further help. Regards, Somik ----- Original Message ----- From: Doyle, Annette To: htm...@li... Sent: Friday, April 26, 2002 1:32 AM Subject: [Htmlparser-user] Not all image tags are returned Is there any known problem about not all image tags being returned? I did the following code: HTMLParser parser = new HTMLParser(htmlOriginalFileLoc); // Registering all the common scanners parser.registerScanners(); for (Enumeration e = parser.elements();e.hasMoreElements();) { HTMLNode node = (HTMLNode)e.nextElement(); if (node instanceof HTMLImageTag) { System.out.println(); System.out.println(((HTMLImageTag)node).getTagLine()); System.out.println(); file://imageTagsUrl.addElement(((HTMLImageTag)node).getImageLocation()); } } I was testing with another html parser and it found all the image tags. Attached is the source from www.yahoo.com when I ran the code above. |
From: Somik R. <so...@ya...> - 2002-04-26 03:28:17
|
Hi Annette, Thanks for the report, I wrote a functional testcase, to do a raw = check IMG tags, and with the parser, and could reproduce the bug. I dont = think its a problem with the image scanner code - bcos the unit tests = are passing with the same yahoo tags. Here's a quick solution for you : Dont use registerScanners() for = now. Since your app specifically needs to check only image scanners, = replace the line : parser.registerScanners();=20 with parser.addScanner(new HTMLImageScanner("-i"));=20 I checked that all the yahoo image tags come fine with this change. = The functional test has been checked into CVS (FunctionalTests.java), = and the one with registerScanners() fails. The corresponding unit test = in HTMLImageScanner passes. Meanwhile, I am trying to find out which scanner is messing up.. Thanks again for your report. Cheers, Somik ----- Original Message -----=20 From: Doyle, Annette=20 To: htm...@li...=20 Sent: Friday, April 26, 2002 1:32 AM Subject: [Htmlparser-user] Not all image tags are returned Is there any known problem about not all image tags being returned? I = did the following code: =20 HTMLParser parser =3D new = HTMLParser(htmlOriginalFileLoc); // Registering all the common scanners parser.registerScanners();=20 for (Enumeration e =3D = parser.elements();e.hasMoreElements();) { HTMLNode node =3D = (HTMLNode)e.nextElement(); if (node instanceof HTMLImageTag) { System.out.println(); = System.out.println(((HTMLImageTag)node).getTagLine()); System.out.println(); =20 = //imageTagsUrl.addElement(((HTMLImageTag)node).getImageLocation()); } } =20 I was testing with another html parser and it found all the image = tags. Attached is the source from www.yahoo.com when I ran the code = above. |
From: Somik R. <so...@ya...> - 2002-04-23 14:56:17
|
Hi Developers, What do you think of Gordon Deudney's bug report at = http://sourceforge.net/tracker/?group_id=3D24399&atid=3D381399 This is actually open for discussion. Regards Somik |
From: Somik R. <so...@ya...> - 2002-04-18 01:59:39
|
Hi Folks, To all the developers - here is a to-do list for the project. You = can pick any to get involved : [1] Swing integration - Plugin htmlparser and demonstrate how it can be = used instead of HTML Parser that comes with the Sun JDK [2] Set up a servlet - which allows people to test the html parser = online. The idea is : (i) Enter your URL - and click Parse (ii) The parser is launched on the server, and produces all the = nodes (node.print()) on the display. (iii) If an exception gets thrown, then this url is saved into a = database(??) (iv) If no exception is thrown, but however there is an error in the = parsing, then a report can be entered on the result page by the tester, = telling us why he thinks the output is incorrect. (v) We get notified everytime there is a report, either of a crash, = or a human reported error The vision is - to capitalize on distributed testing resources. Also = everyone has a tendency to desire simple testing -without downloading = and wasting time thru manuals. I think we can get a lot of feedback if = we can harness the power of the web. =20 To do this - some simple servlets will need to be written. And we = will need to find hosting, either at sourceforge or myservlets.com [3] Create AWT components which can understand HTML formatting. Since = HTMLParser works with Java 1.1, no special download is required for it = run in standard browsers.=20 [4] Have a report section on the htmlparser site, where people can = report and see how html parser is being used in various industry = projects. Pls feel free to add to this list - especially if you have any = interesting insights or vision about where you see this project going. = Once we are done with some basic brainstorming, we could probably set = milestones for each of these tasks.=20 Cheers, Somik |
From: Somik R. <so...@ya...> - 2002-04-17 03:40:48
|
Hi Folks, HTMLParser 1.1 has just been released. This is a production release = - HTMLParser finally moves out of the beta stage.=20 A whole lot of bug fixes, architecture modifications, and intense = testing has been done.=20 You can get it from http://htmlparser.sourceforge.net Thanks are due to a whole lot of people who helped with bug reports = and suggestions for this release: [1] Sam Joseph [2] Raj Sharma [3] Raghavender Srimantula Regards, Somik |
From: Somik R. <so...@ya...> - 2002-04-17 02:31:05
|
> Due to time constraints, I've decided to use the HTML parser in Swing > for the time being, but I'd definitely like to see the effect of a > better parser in Swing. Just try a search for 'JEditorPane' in the Bug > Parade and you'll see how long Sun has had issues with this area... Yes, I know the parser from Sun is not good. > I think your idea of trying the integration after 1.1 release is good. Ok - 1.1 should be out really soon. I am done with an ant script for building (phew!), and am giving the final touches to the code. We can expect a release this week. Regards, Somik ----- Original Message ----- From: "Craig Raw" <cr...@qu...> To: "'Somik Raha'" <so...@ya...>; <htm...@li...> Sent: Tuesday, April 16, 2002 6:17 PM Subject: [Htmlparser-developer] RE: [Htmlparser-user] Swing integration > Due to time constraints, I've decided to use the HTML parser in Swing > for the time being, but I'd definitely like to see the effect of a > better parser in Swing. Just try a search for 'JEditorPane' in the Bug > Parade and you'll see how long Sun has had issues with this area... > > I think your idea of trying the integration after 1.1 release is good. > > -craig > > > -----Original Message----- > From: Somik Raha [mailto:so...@ya...] > Sent: 16 April 2002 04:57 AM > To: htm...@li... > Cc: Craig Raw > Subject: Re: [Htmlparser-user] Swing integration > > Hi Craig, Asgher > I finally had the time to check Swing integration. Boy - the parser > design in Swing sucks!! Theoretically its possible to do it - and I got > started, but just realized that in order to be compatible with swing > objects > that do compile time type checking with a particular tag, I have to > actually > have 73 if statements to give the right tag to the callback. > I have more important things to do at the moment, but probably will > get > back to this donkey work. *sigh* > > I am thinking we should make release 1.1 and then try this. Any > suggestions ? > > Regards, > Somik > ----- Original Message ----- > From: "Somik Raha" <so...@ya...> > To: <htm...@li...> > Sent: Thursday, April 04, 2002 11:20 AM > Subject: Re: [Htmlparser-user] Swing integration > > > > Hi Craig, > > Thanks a lot for the post. Pls go ahead with your analysis. I will > try > > to catch up this weekend. > > Regards, > > Somik > > ----- Original Message ----- > > From: "Craig Raw" <cr...@qu...> > > To: "'Somik Raha'" <so...@ya...> > > Sent: Tuesday, April 02, 2002 3:32 PM > > Subject: RE: [Htmlparser-user] Swing integration > > > > > > > Hi Somik, > > > > > > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc - > which > > > is the driver behind JEditorPane's reading and writing HTML > > > capabilities. > > > > > > --- > > > Extendable/Scalable > > > > > > To maximize the usefulness of this kit, a great deal of effort has > gone > > > into making it extendable. These are some of the features. > > > The parser is replaceable. The default parser is the Hot Java parser > > > which is DTD based. A different DTD can be used, or an entirely > > > different parser can be used. To change the parser, reimplement the > > > getParser method. The default parser is dynamically loaded when > first > > > asked for, so the class files will never be loaded if an alternative > > > parser is used. The default parser is in a separate package called > > > parser below this package. > > > > > > The parser drives the ParserCallback, which is provided by > HTMLDocument. > > > To change the callback, subclass HTMLDocument and reimplement the > > > createDefaultDocument method to return document that produces a > > > different reader. The reader controls how the document is > structured. > > > Although the Document provides HTML support by default, there is > nothing > > > preventing support of non-HTML tags that result in alternative > element > > > structures. > > > --- > > > > > > I may find some time to look into this as well, although I am not > sure > > > how much it would fix JEditorPane's somewhat buggy HTML rendering > > > capabilities.... > > > > > > -craig > > > > > > > > > -----Original Message----- > > > From: htm...@li... > > > [mailto:htm...@li...] On Behalf Of > Somik > > > Raha > > > Sent: 01 April 2002 05:28 PM > > > To: HTMLParser User List > > > Cc: HTMLParser Developer List > > > Subject: Re: [Htmlparser-user] Swing integration > > > > > > Hi Craig > > > Wow! Thats a great question. > > > Actually, I doubt if I could replace Sun Microsystems' code with > > > mine. I > > > dont think Java is that open (or is it ?) > > > However, we could think of writing our own adapter for the html > parser > > > that > > > might plugin in some way... > > > I have never used Sun's html parser (If I had, I might not have > > > started > > > this project). > > > I will need to study Sun's parser before I can answer your > > > question.. > > > But there does seem to be some interesting possibilities. > > > > > > Regards > > > Somik > > > ----- Original Message ----- > > > From: "Craig Raw" <cr...@qu...> > > > To: <htm...@li...> > > > Sent: Monday, April 01, 2002 10:20 PM > > > Subject: [Htmlparser-user] Swing integration > > > > > > > > > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to > > > > provide a better implementation of JEditorPane's HTML viewing > > > > capabilities? HTML Parser would need to replace > > > > javax.swing.text.html.parser.Parser, which is currently somewhat > > > buggy. > > > > Anyone tried this? > > > > > > > > -craig > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Htmlparser-user mailing list > > > > Htm...@li... > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > _________________________________________________________ > > > Do You Yahoo!? > > > Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > _________________________________________________________ > > Do You Yahoo!? > > Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Craig R. <cr...@qu...> - 2002-04-16 09:19:24
|
Due to time constraints, I've decided to use the HTML parser in Swing for the time being, but I'd definitely like to see the effect of a better parser in Swing. Just try a search for 'JEditorPane' in the Bug Parade and you'll see how long Sun has had issues with this area... I think your idea of trying the integration after 1.1 release is good. -craig -----Original Message----- From: Somik Raha [mailto:so...@ya...] Sent: 16 April 2002 04:57 AM To: htm...@li... Cc: Craig Raw Subject: Re: [Htmlparser-user] Swing integration Hi Craig, Asgher I finally had the time to check Swing integration. Boy - the parser design in Swing sucks!! Theoretically its possible to do it - and I got started, but just realized that in order to be compatible with swing objects that do compile time type checking with a particular tag, I have to actually have 73 if statements to give the right tag to the callback. I have more important things to do at the moment, but probably will get back to this donkey work. *sigh* I am thinking we should make release 1.1 and then try this. Any suggestions ? Regards, Somik ----- Original Message ----- From: "Somik Raha" <so...@ya...> To: <htm...@li...> Sent: Thursday, April 04, 2002 11:20 AM Subject: Re: [Htmlparser-user] Swing integration > Hi Craig, > Thanks a lot for the post. Pls go ahead with your analysis. I will try > to catch up this weekend. > Regards, > Somik > ----- Original Message ----- > From: "Craig Raw" <cr...@qu...> > To: "'Somik Raha'" <so...@ya...> > Sent: Tuesday, April 02, 2002 3:32 PM > Subject: RE: [Htmlparser-user] Swing integration > > > > Hi Somik, > > > > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc - which > > is the driver behind JEditorPane's reading and writing HTML > > capabilities. > > > > --- > > Extendable/Scalable > > > > To maximize the usefulness of this kit, a great deal of effort has gone > > into making it extendable. These are some of the features. > > The parser is replaceable. The default parser is the Hot Java parser > > which is DTD based. A different DTD can be used, or an entirely > > different parser can be used. To change the parser, reimplement the > > getParser method. The default parser is dynamically loaded when first > > asked for, so the class files will never be loaded if an alternative > > parser is used. The default parser is in a separate package called > > parser below this package. > > > > The parser drives the ParserCallback, which is provided by HTMLDocument. > > To change the callback, subclass HTMLDocument and reimplement the > > createDefaultDocument method to return document that produces a > > different reader. The reader controls how the document is structured. > > Although the Document provides HTML support by default, there is nothing > > preventing support of non-HTML tags that result in alternative element > > structures. > > --- > > > > I may find some time to look into this as well, although I am not sure > > how much it would fix JEditorPane's somewhat buggy HTML rendering > > capabilities.... > > > > -craig > > > > > > -----Original Message----- > > From: htm...@li... > > [mailto:htm...@li...] On Behalf Of Somik > > Raha > > Sent: 01 April 2002 05:28 PM > > To: HTMLParser User List > > Cc: HTMLParser Developer List > > Subject: Re: [Htmlparser-user] Swing integration > > > > Hi Craig > > Wow! Thats a great question. > > Actually, I doubt if I could replace Sun Microsystems' code with > > mine. I > > dont think Java is that open (or is it ?) > > However, we could think of writing our own adapter for the html parser > > that > > might plugin in some way... > > I have never used Sun's html parser (If I had, I might not have > > started > > this project). > > I will need to study Sun's parser before I can answer your > > question.. > > But there does seem to be some interesting possibilities. > > > > Regards > > Somik > > ----- Original Message ----- > > From: "Craig Raw" <cr...@qu...> > > To: <htm...@li...> > > Sent: Monday, April 01, 2002 10:20 PM > > Subject: [Htmlparser-user] Swing integration > > > > > > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to > > > provide a better implementation of JEditorPane's HTML viewing > > > capabilities? HTML Parser would need to replace > > > javax.swing.text.html.parser.Parser, which is currently somewhat > > buggy. > > > Anyone tried this? > > > > > > -craig > > > > > > > > > > > > > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > _________________________________________________________ > > Do You Yahoo!? > > Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > _________________________________________________________ > Do You Yahoo!? > Get your free @yahoo.com address at http://mail.yahoo.com > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Somik R. <so...@ya...> - 2002-04-16 02:59:53
|
Hi Craig, Asgher I finally had the time to check Swing integration. Boy - the parser design in Swing sucks!! Theoretically its possible to do it - and I got started, but just realized that in order to be compatible with swing objects that do compile time type checking with a particular tag, I have to actually have 73 if statements to give the right tag to the callback. I have more important things to do at the moment, but probably will get back to this donkey work. *sigh* I am thinking we should make release 1.1 and then try this. Any suggestions ? Regards, Somik ----- Original Message ----- From: "Somik Raha" <so...@ya...> To: <htm...@li...> Sent: Thursday, April 04, 2002 11:20 AM Subject: Re: [Htmlparser-user] Swing integration > Hi Craig, > Thanks a lot for the post. Pls go ahead with your analysis. I will try > to catch up this weekend. > Regards, > Somik > ----- Original Message ----- > From: "Craig Raw" <cr...@qu...> > To: "'Somik Raha'" <so...@ya...> > Sent: Tuesday, April 02, 2002 3:32 PM > Subject: RE: [Htmlparser-user] Swing integration > > > > Hi Somik, > > > > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc - which > > is the driver behind JEditorPane's reading and writing HTML > > capabilities. > > > > --- > > Extendable/Scalable > > > > To maximize the usefulness of this kit, a great deal of effort has gone > > into making it extendable. These are some of the features. > > The parser is replaceable. The default parser is the Hot Java parser > > which is DTD based. A different DTD can be used, or an entirely > > different parser can be used. To change the parser, reimplement the > > getParser method. The default parser is dynamically loaded when first > > asked for, so the class files will never be loaded if an alternative > > parser is used. The default parser is in a separate package called > > parser below this package. > > > > The parser drives the ParserCallback, which is provided by HTMLDocument. > > To change the callback, subclass HTMLDocument and reimplement the > > createDefaultDocument method to return document that produces a > > different reader. The reader controls how the document is structured. > > Although the Document provides HTML support by default, there is nothing > > preventing support of non-HTML tags that result in alternative element > > structures. > > --- > > > > I may find some time to look into this as well, although I am not sure > > how much it would fix JEditorPane's somewhat buggy HTML rendering > > capabilities.... > > > > -craig > > > > > > -----Original Message----- > > From: htm...@li... > > [mailto:htm...@li...] On Behalf Of Somik > > Raha > > Sent: 01 April 2002 05:28 PM > > To: HTMLParser User List > > Cc: HTMLParser Developer List > > Subject: Re: [Htmlparser-user] Swing integration > > > > Hi Craig > > Wow! Thats a great question. > > Actually, I doubt if I could replace Sun Microsystems' code with > > mine. I > > dont think Java is that open (or is it ?) > > However, we could think of writing our own adapter for the html parser > > that > > might plugin in some way... > > I have never used Sun's html parser (If I had, I might not have > > started > > this project). > > I will need to study Sun's parser before I can answer your > > question.. > > But there does seem to be some interesting possibilities. > > > > Regards > > Somik > > ----- Original Message ----- > > From: "Craig Raw" <cr...@qu...> > > To: <htm...@li...> > > Sent: Monday, April 01, 2002 10:20 PM > > Subject: [Htmlparser-user] Swing integration > > > > > > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to > > > provide a better implementation of JEditorPane's HTML viewing > > > capabilities? HTML Parser would need to replace > > > javax.swing.text.html.parser.Parser, which is currently somewhat > > buggy. > > > Anyone tried this? > > > > > > -craig > > > > > > > > > > > > > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > _________________________________________________________ > > Do You Yahoo!? > > Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > _________________________________________________________ > Do You Yahoo!? > Get your free @yahoo.com address at http://mail.yahoo.com > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Somik R. <so...@ya...> - 2002-04-15 05:14:41
|
Hi Folks, Thanks to Sam Joseph (creator of Neurogrid). Sam is using the parser = in the neurogrid project, and has pointed out a bug that slipped our = attention. If links or image urls contain spaces, those spaces were = being absorbed. That is incorrect behaviour, especially if you have = something of the form : http://myservlet.com/someservlet?name=3DSam Joseph&age=3D22 The same goes for images like http://www.kizna.com/images/kizna corp.jpg Also - previously, newline character were being converted to spaces. = This has been modified - new line characters are left as is. The = responsibility to deal with them is now with the appropriate scanner. = So, the link and image scanners specifically filter out the newline = characters, whereas jsp tags which might have jsp code - would like to = preserve the new line chars. Over 73 testcases now in the htmlparser, and all passing.. I think we're ready for release 1.1 now, unless I get any more bug = reports this week. You can check out the latest code from CVS. Regards, Somik |
From: Raghavender S. <kin...@ho...> - 2002-04-12 03:38:39
|
Thanks somik. I will work on it. Raghav >From: "Somik Raha" <so...@ya...> >To: <htm...@li...>, "Raghavender Srimantula" ><kin...@ho...> >Subject: Re: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1 >Date: Fri, 12 Apr 2002 11:57:50 +0900 > >Hi Raghav > You are right. That is indeed a bug. I have written a test case for >it, >captured it, and fixed it. > Code is checked into CVS - it should work for you now. > >Regards, >Somik >----- Original Message ----- >From: "Raghavender Srimantula" <kin...@ho...> >To: <so...@ya...>; <htm...@li...> >Sent: Friday, April 12, 2002 6:12 AM >Subject: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1 > > > > hi Somik, > > the code snippet you mailed me seems to have some problems. > > let me explain you. the method > > isXMLTagFound(node,"OPTION") > > would always return false. the reason: in the definition of the above >method > > we have > > > > if (node instanceof HTMLTag) { > > System.out.println("node instanceof HTMLTag in tagscanner "); > > HTMLTag tag = (HTMLTag)node; > > if (tag.getText().equals(tagName)) { > > xmlTagFound=true; > > } > > } > > > > tag.getText() would always give me > > OPTION value="#">Select a destination > > > > which is not equal to the tagName, in this case the tagName=OPTION. > > > > Raghav > > > > > > >From: "Somik Raha" <so...@ya...> > > >To: "Raghavender Srimantula" <kin...@ho...>, > > ><htm...@li...> > > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > > >Date: Thu, 11 Apr 2002 11:14:51 +0900 > > > > > >Hi Raghav > > > I replied to your earlier query. Did you recieve the mail (I >forwarded > > >it again) ? > > > Regarding your current query, there are two ways to handle option > > >tags. > > > > > >[1] Like in the previous question, you will have to recognize a HTMLTag > > >(begin tag), followed by HTMLStringNode, and finally HTMLEndTag. > > >[2] To make life easier, since this tag is basic xml, you can use a >special > > >XML parsing method provided in the superclass HTMLTagScanner. > > > > > >The methods are : > > >(i) isXMLTagFound > > >(ii) extractXMLData > > > > > >both of them are static mehods. > > >You would use it like this : > > > > > >HTMLNode node = reader.readElement(); > > >if (isXMLTag(node,"OPTION")) { > > > String option = extractXMLData(node,"OPTION",reader); > > > // The string now contains the data within the option xml tag > > > // So given an input : <OPTION value="#">Select a >destination</OPTION> > > > // option will hold "Select a destination" > > >} > > > > > >But getting the value from the option tag itself would need to be >handled > > >seperately. > > > > > >Regards, > > >Somik > > >----- Original Message ----- > > >From: "Raghavender Srimantula" <kin...@ho...> > > >To: <so...@ya...>; <htm...@li...> > > >Sent: Thursday, April 11, 2002 9:22 AM > > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > > > > > > > > > > hi Somik, > > > > any ideas about my previous mail. let us say if we have > > > > <OPTION value="#">Select a destination</OPTION> > > > > when I do a > > > > node = reader.readElement(); > > > > where "reader" is HTMLReader > > > > the node I get is of type neither HTMLStringNode, HTMLEndTag, > > > > HTMLRemarkNode. > > > > how do I classify this if I want to do some thing with them. > > > > Raghav > > > > > > > > >From: "Somik Raha" <so...@ya...> > > > > >To: "Raghavender Srimantula" <kin...@ho...> > > > > >CC: <htm...@li...> > > > > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > > > > >Date: Mon, 8 Apr 2002 13:04:07 +0900 > > > > > > > > > >Hi Raghav > > > > > > when would be this HTMLparser 1.1 out? > > > > >As soon as I can wrap it up. Technically, the code is ready and >already > > > > >checked into CVS. I need to do the process of creating a release - >make > > > > >some > > > > >documentation, check everything is ok, .. > > > > >If I had some help I could wrap it up sooner. > > > > > > > > > > > I am not sure, but to me the way htmlparser parses is it gives >me > > >the > > > > >tag > > > > > > parameter of the first line in the above snippet of html code, >when > > >I > > >do > > > > > > Hashtable table = tag.parseParameters(); > > > > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > > > > .....</FORM> > > > > > > > > > >Yes - parseParameters() will give you the stuff inside the FORM >tag. > > >That > > > > >is > > > > >what I call "microscopic" parsing. But to get the remaining tags - >till > > >you > > > > >encounter </FORM> you need to do "macroscopic" parsing. This is not > > >hard- > > > > >check HTMLAppletScanner as an example. > > > > > > > > > >In a nutshell - concept is very simple. The scan method provides >you > > >with > > >a > > > > >reader. So you are to use that reader to read ahead and get the >next > > >tags. > > > > >This is simple bcos the reader will automatically identify the >correct > > > > >tags, > > > > >and the mechanism is very similar to using the parser to get the >tags > > >you > > > > >want. The HTMLLinkScanner among others, also works on the same > > >principle. > > > > > > > > > >Bytway - I think we should take this discussion to the Developer >list. > > > > > > > > > >Regards, > > > > >Somik > > > > >----- Original Message ----- > > > > >From: "Raghavender Srimantula" <kin...@ho...> > > > > >To: <htm...@li...> > > > > >Sent: Monday, April 08, 2002 6:39 AM > > > > >Subject: [Htmlparser-user] HTML parser 1.1 > > > > > > > > > > > > > > > > Hi Somik, > > > > > > when would be this HTMLparser 1.1 out? > > > > > > one more question. to parse the FORM tags, I have a small >question. > > > > > > let us say this is a form tag > > > > > > > > > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > > > > > <P>User name: > > > > > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > > > > > <P>Password: > > > > > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > > > > > <P><INPUT TYPE="submit" VALUE="Log in"> > > > > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > > > > > </FORM> > > > > > > > > > > > > I am not sure, but to me the way htmlparser parses is it gives >me > > >the > > > > >tag > > > > > > parameter of the first line in the above snippet of html code, >when > > >I > > >do > > > > > > Hashtable table = tag.parseParameters(); > > > > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > > > > .....</FORM> > > > > > > > > > > > > could you suggest me how to go ahead with this. > > > > > > Raghav > > > > > > > > > > > > > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >_________________________________________________________________ > > > > > > MSN Photos is the easiest way to share and print your photos: > > > > > > http://photos.msn.com/support/worldwide.aspx > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Htmlparser-user mailing list > > > > > > Htm...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > > > > > > >_________________________________________________________ > > > > >Do You Yahoo!? > > > > >Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > > > > > > > > > > > > > > > > > > > _________________________________________________________________ > > > > Get your FREE download of MSN Explorer at > > >http://explorer.msn.com/intl.asp. > > > > > > > > >_________________________________________________________ > > >Do You Yahoo!? > > >Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > > > > > > > _________________________________________________________________ > > Chat with friends online, try MSN Messenger: http://messenger.msn.com > > > > > > _______________________________________________ > > Htmlparser-developer mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > >_________________________________________________________ >Do You Yahoo!? >Get your free @yahoo.com address at http://mail.yahoo.com > _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. |
From: Somik R. <so...@ya...> - 2002-04-12 03:00:50
|
Hi Raghav You are right. That is indeed a bug. I have written a test case for it, captured it, and fixed it. Code is checked into CVS - it should work for you now. Regards, Somik ----- Original Message ----- From: "Raghavender Srimantula" <kin...@ho...> To: <so...@ya...>; <htm...@li...> Sent: Friday, April 12, 2002 6:12 AM Subject: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1 > hi Somik, > the code snippet you mailed me seems to have some problems. > let me explain you. the method > isXMLTagFound(node,"OPTION") > would always return false. the reason: in the definition of the above method > we have > > if (node instanceof HTMLTag) { > System.out.println("node instanceof HTMLTag in tagscanner "); > HTMLTag tag = (HTMLTag)node; > if (tag.getText().equals(tagName)) { > xmlTagFound=true; > } > } > > tag.getText() would always give me > OPTION value="#">Select a destination > > which is not equal to the tagName, in this case the tagName=OPTION. > > Raghav > > > >From: "Somik Raha" <so...@ya...> > >To: "Raghavender Srimantula" <kin...@ho...>, > ><htm...@li...> > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > >Date: Thu, 11 Apr 2002 11:14:51 +0900 > > > >Hi Raghav > > I replied to your earlier query. Did you recieve the mail (I forwarded > >it again) ? > > Regarding your current query, there are two ways to handle option > >tags. > > > >[1] Like in the previous question, you will have to recognize a HTMLTag > >(begin tag), followed by HTMLStringNode, and finally HTMLEndTag. > >[2] To make life easier, since this tag is basic xml, you can use a special > >XML parsing method provided in the superclass HTMLTagScanner. > > > >The methods are : > >(i) isXMLTagFound > >(ii) extractXMLData > > > >both of them are static mehods. > >You would use it like this : > > > >HTMLNode node = reader.readElement(); > >if (isXMLTag(node,"OPTION")) { > > String option = extractXMLData(node,"OPTION",reader); > > // The string now contains the data within the option xml tag > > // So given an input : <OPTION value="#">Select a destination</OPTION> > > // option will hold "Select a destination" > >} > > > >But getting the value from the option tag itself would need to be handled > >seperately. > > > >Regards, > >Somik > >----- Original Message ----- > >From: "Raghavender Srimantula" <kin...@ho...> > >To: <so...@ya...>; <htm...@li...> > >Sent: Thursday, April 11, 2002 9:22 AM > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > > > > > > > hi Somik, > > > any ideas about my previous mail. let us say if we have > > > <OPTION value="#">Select a destination</OPTION> > > > when I do a > > > node = reader.readElement(); > > > where "reader" is HTMLReader > > > the node I get is of type neither HTMLStringNode, HTMLEndTag, > > > HTMLRemarkNode. > > > how do I classify this if I want to do some thing with them. > > > Raghav > > > > > > >From: "Somik Raha" <so...@ya...> > > > >To: "Raghavender Srimantula" <kin...@ho...> > > > >CC: <htm...@li...> > > > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > > > >Date: Mon, 8 Apr 2002 13:04:07 +0900 > > > > > > > >Hi Raghav > > > > > when would be this HTMLparser 1.1 out? > > > >As soon as I can wrap it up. Technically, the code is ready and already > > > >checked into CVS. I need to do the process of creating a release - make > > > >some > > > >documentation, check everything is ok, .. > > > >If I had some help I could wrap it up sooner. > > > > > > > > > I am not sure, but to me the way htmlparser parses is it gives me > >the > > > >tag > > > > > parameter of the first line in the above snippet of html code, when > >I > >do > > > > > Hashtable table = tag.parseParameters(); > > > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > > > .....</FORM> > > > > > > > >Yes - parseParameters() will give you the stuff inside the FORM tag. > >That > > > >is > > > >what I call "microscopic" parsing. But to get the remaining tags - till > >you > > > >encounter </FORM> you need to do "macroscopic" parsing. This is not > >hard- > > > >check HTMLAppletScanner as an example. > > > > > > > >In a nutshell - concept is very simple. The scan method provides you > >with > >a > > > >reader. So you are to use that reader to read ahead and get the next > >tags. > > > >This is simple bcos the reader will automatically identify the correct > > > >tags, > > > >and the mechanism is very similar to using the parser to get the tags > >you > > > >want. The HTMLLinkScanner among others, also works on the same > >principle. > > > > > > > >Bytway - I think we should take this discussion to the Developer list. > > > > > > > >Regards, > > > >Somik > > > >----- Original Message ----- > > > >From: "Raghavender Srimantula" <kin...@ho...> > > > >To: <htm...@li...> > > > >Sent: Monday, April 08, 2002 6:39 AM > > > >Subject: [Htmlparser-user] HTML parser 1.1 > > > > > > > > > > > > > Hi Somik, > > > > > when would be this HTMLparser 1.1 out? > > > > > one more question. to parse the FORM tags, I have a small question. > > > > > let us say this is a form tag > > > > > > > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > > > > <P>User name: > > > > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > > > > <P>Password: > > > > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > > > > <P><INPUT TYPE="submit" VALUE="Log in"> > > > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > > > > </FORM> > > > > > > > > > > I am not sure, but to me the way htmlparser parses is it gives me > >the > > > >tag > > > > > parameter of the first line in the above snippet of html code, when > >I > >do > > > > > Hashtable table = tag.parseParameters(); > > > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > > > .....</FORM> > > > > > > > > > > could you suggest me how to go ahead with this. > > > > > Raghav > > > > > > > > > > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _________________________________________________________________ > > > > > MSN Photos is the easiest way to share and print your photos: > > > > > http://photos.msn.com/support/worldwide.aspx > > > > > > > > > > > > > > > _______________________________________________ > > > > > Htmlparser-user mailing list > > > > > Htm...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > > > >_________________________________________________________ > > > >Do You Yahoo!? > > > >Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > > > > > > > > > > > > > _________________________________________________________________ > > > Get your FREE download of MSN Explorer at > >http://explorer.msn.com/intl.asp. > > > > > >_________________________________________________________ > >Do You Yahoo!? > >Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > _________________________________________________________________ > Chat with friends online, try MSN Messenger: http://messenger.msn.com > > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Raghavender S. <kin...@ho...> - 2002-04-11 21:12:58
|
hi Somik, the code snippet you mailed me seems to have some problems. let me explain you. the method isXMLTagFound(node,"OPTION") would always return false. the reason: in the definition of the above method we have if (node instanceof HTMLTag) { System.out.println("node instanceof HTMLTag in tagscanner "); HTMLTag tag = (HTMLTag)node; if (tag.getText().equals(tagName)) { xmlTagFound=true; } } tag.getText() would always give me OPTION value="#">Select a destination which is not equal to the tagName, in this case the tagName=OPTION. Raghav >From: "Somik Raha" <so...@ya...> >To: "Raghavender Srimantula" <kin...@ho...>, ><htm...@li...> >Subject: Re: [Htmlparser-user] HTML parser 1.1 >Date: Thu, 11 Apr 2002 11:14:51 +0900 > >Hi Raghav > I replied to your earlier query. Did you recieve the mail (I forwarded >it again) ? > Regarding your current query, there are two ways to handle option >tags. > >[1] Like in the previous question, you will have to recognize a HTMLTag >(begin tag), followed by HTMLStringNode, and finally HTMLEndTag. >[2] To make life easier, since this tag is basic xml, you can use a special >XML parsing method provided in the superclass HTMLTagScanner. > >The methods are : >(i) isXMLTagFound >(ii) extractXMLData > >both of them are static mehods. >You would use it like this : > >HTMLNode node = reader.readElement(); >if (isXMLTag(node,"OPTION")) { > String option = extractXMLData(node,"OPTION",reader); > // The string now contains the data within the option xml tag > // So given an input : <OPTION value="#">Select a destination</OPTION> > // option will hold "Select a destination" >} > >But getting the value from the option tag itself would need to be handled >seperately. > >Regards, >Somik >----- Original Message ----- >From: "Raghavender Srimantula" <kin...@ho...> >To: <so...@ya...>; <htm...@li...> >Sent: Thursday, April 11, 2002 9:22 AM >Subject: Re: [Htmlparser-user] HTML parser 1.1 > > > > hi Somik, > > any ideas about my previous mail. let us say if we have > > <OPTION value="#">Select a destination</OPTION> > > when I do a > > node = reader.readElement(); > > where "reader" is HTMLReader > > the node I get is of type neither HTMLStringNode, HTMLEndTag, > > HTMLRemarkNode. > > how do I classify this if I want to do some thing with them. > > Raghav > > > > >From: "Somik Raha" <so...@ya...> > > >To: "Raghavender Srimantula" <kin...@ho...> > > >CC: <htm...@li...> > > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > > >Date: Mon, 8 Apr 2002 13:04:07 +0900 > > > > > >Hi Raghav > > > > when would be this HTMLparser 1.1 out? > > >As soon as I can wrap it up. Technically, the code is ready and already > > >checked into CVS. I need to do the process of creating a release - make > > >some > > >documentation, check everything is ok, .. > > >If I had some help I could wrap it up sooner. > > > > > > > I am not sure, but to me the way htmlparser parses is it gives me >the > > >tag > > > > parameter of the first line in the above snippet of html code, when >I >do > > > > Hashtable table = tag.parseParameters(); > > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > > .....</FORM> > > > > > >Yes - parseParameters() will give you the stuff inside the FORM tag. >That > > >is > > >what I call "microscopic" parsing. But to get the remaining tags - till >you > > >encounter </FORM> you need to do "macroscopic" parsing. This is not >hard- > > >check HTMLAppletScanner as an example. > > > > > >In a nutshell - concept is very simple. The scan method provides you >with >a > > >reader. So you are to use that reader to read ahead and get the next >tags. > > >This is simple bcos the reader will automatically identify the correct > > >tags, > > >and the mechanism is very similar to using the parser to get the tags >you > > >want. The HTMLLinkScanner among others, also works on the same >principle. > > > > > >Bytway - I think we should take this discussion to the Developer list. > > > > > >Regards, > > >Somik > > >----- Original Message ----- > > >From: "Raghavender Srimantula" <kin...@ho...> > > >To: <htm...@li...> > > >Sent: Monday, April 08, 2002 6:39 AM > > >Subject: [Htmlparser-user] HTML parser 1.1 > > > > > > > > > > Hi Somik, > > > > when would be this HTMLparser 1.1 out? > > > > one more question. to parse the FORM tags, I have a small question. > > > > let us say this is a form tag > > > > > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > > > <P>User name: > > > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > > > <P>Password: > > > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > > > <P><INPUT TYPE="submit" VALUE="Log in"> > > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > > > </FORM> > > > > > > > > I am not sure, but to me the way htmlparser parses is it gives me >the > > >tag > > > > parameter of the first line in the above snippet of html code, when >I >do > > > > Hashtable table = tag.parseParameters(); > > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > > .....</FORM> > > > > > > > > could you suggest me how to go ahead with this. > > > > Raghav > > > > > > > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > > > > > > > > > > > > > _________________________________________________________________ > > > > MSN Photos is the easiest way to share and print your photos: > > > > http://photos.msn.com/support/worldwide.aspx > > > > > > > > > > > > _______________________________________________ > > > > Htmlparser-user mailing list > > > > Htm...@li... > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > >_________________________________________________________ > > >Do You Yahoo!? > > >Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > > > > > > > _________________________________________________________________ > > Get your FREE download of MSN Explorer at >http://explorer.msn.com/intl.asp. > > >_________________________________________________________ >Do You Yahoo!? >Get your free @yahoo.com address at http://mail.yahoo.com > _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com |
From: Somik R. <so...@ya...> - 2002-04-11 02:17:41
|
Hi Raghav I replied to your earlier query. Did you recieve the mail (I forwarded it again) ? Regarding your current query, there are two ways to handle option tags. [1] Like in the previous question, you will have to recognize a HTMLTag (begin tag), followed by HTMLStringNode, and finally HTMLEndTag. [2] To make life easier, since this tag is basic xml, you can use a special XML parsing method provided in the superclass HTMLTagScanner. The methods are : (i) isXMLTagFound (ii) extractXMLData both of them are static mehods. You would use it like this : HTMLNode node = reader.readElement(); if (isXMLTag(node,"OPTION")) { String option = extractXMLData(node,"OPTION",reader); // The string now contains the data within the option xml tag // So given an input : <OPTION value="#">Select a destination</OPTION> // option will hold "Select a destination" } But getting the value from the option tag itself would need to be handled seperately. Regards, Somik ----- Original Message ----- From: "Raghavender Srimantula" <kin...@ho...> To: <so...@ya...>; <htm...@li...> Sent: Thursday, April 11, 2002 9:22 AM Subject: Re: [Htmlparser-user] HTML parser 1.1 > hi Somik, > any ideas about my previous mail. let us say if we have > <OPTION value="#">Select a destination</OPTION> > when I do a > node = reader.readElement(); > where "reader" is HTMLReader > the node I get is of type neither HTMLStringNode, HTMLEndTag, > HTMLRemarkNode. > how do I classify this if I want to do some thing with them. > Raghav > > >From: "Somik Raha" <so...@ya...> > >To: "Raghavender Srimantula" <kin...@ho...> > >CC: <htm...@li...> > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > >Date: Mon, 8 Apr 2002 13:04:07 +0900 > > > >Hi Raghav > > > when would be this HTMLparser 1.1 out? > >As soon as I can wrap it up. Technically, the code is ready and already > >checked into CVS. I need to do the process of creating a release - make > >some > >documentation, check everything is ok, .. > >If I had some help I could wrap it up sooner. > > > > > I am not sure, but to me the way htmlparser parses is it gives me the > >tag > > > parameter of the first line in the above snippet of html code, when I do > > > Hashtable table = tag.parseParameters(); > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > .....</FORM> > > > >Yes - parseParameters() will give you the stuff inside the FORM tag. That > >is > >what I call "microscopic" parsing. But to get the remaining tags - till you > >encounter </FORM> you need to do "macroscopic" parsing. This is not hard- > >check HTMLAppletScanner as an example. > > > >In a nutshell - concept is very simple. The scan method provides you with a > >reader. So you are to use that reader to read ahead and get the next tags. > >This is simple bcos the reader will automatically identify the correct > >tags, > >and the mechanism is very similar to using the parser to get the tags you > >want. The HTMLLinkScanner among others, also works on the same principle. > > > >Bytway - I think we should take this discussion to the Developer list. > > > >Regards, > >Somik > >----- Original Message ----- > >From: "Raghavender Srimantula" <kin...@ho...> > >To: <htm...@li...> > >Sent: Monday, April 08, 2002 6:39 AM > >Subject: [Htmlparser-user] HTML parser 1.1 > > > > > > > Hi Somik, > > > when would be this HTMLparser 1.1 out? > > > one more question. to parse the FORM tags, I have a small question. > > > let us say this is a form tag > > > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > > <P>User name: > > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > > <P>Password: > > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > > <P><INPUT TYPE="submit" VALUE="Log in"> > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > > </FORM> > > > > > > I am not sure, but to me the way htmlparser parses is it gives me the > >tag > > > parameter of the first line in the above snippet of html code, when I do > > > Hashtable table = tag.parseParameters(); > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > .....</FORM> > > > > > > could you suggest me how to go ahead with this. > > > Raghav > > > > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > > > > > > > _________________________________________________________________ > > > MSN Photos is the easiest way to share and print your photos: > > > http://photos.msn.com/support/worldwide.aspx > > > > > > > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > >_________________________________________________________ > >Do You Yahoo!? > >Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Raghavender S. <kin...@ho...> - 2002-04-11 00:23:02
|
hi Somik, any ideas about my previous mail. let us say if we have <OPTION value="#">Select a destination</OPTION> when I do a node = reader.readElement(); where "reader" is HTMLReader the node I get is of type neither HTMLStringNode, HTMLEndTag, HTMLRemarkNode. how do I classify this if I want to do some thing with them. Raghav >From: "Somik Raha" <so...@ya...> >To: "Raghavender Srimantula" <kin...@ho...> >CC: <htm...@li...> >Subject: Re: [Htmlparser-user] HTML parser 1.1 >Date: Mon, 8 Apr 2002 13:04:07 +0900 > >Hi Raghav > > when would be this HTMLparser 1.1 out? >As soon as I can wrap it up. Technically, the code is ready and already >checked into CVS. I need to do the process of creating a release - make >some >documentation, check everything is ok, .. >If I had some help I could wrap it up sooner. > > > I am not sure, but to me the way htmlparser parses is it gives me the >tag > > parameter of the first line in the above snippet of html code, when I do > > Hashtable table = tag.parseParameters(); > > it is looking for parameters inside <FORM ..... >, but not <FORM > > .....</FORM> > >Yes - parseParameters() will give you the stuff inside the FORM tag. That >is >what I call "microscopic" parsing. But to get the remaining tags - till you >encounter </FORM> you need to do "macroscopic" parsing. This is not hard- >check HTMLAppletScanner as an example. > >In a nutshell - concept is very simple. The scan method provides you with a >reader. So you are to use that reader to read ahead and get the next tags. >This is simple bcos the reader will automatically identify the correct >tags, >and the mechanism is very similar to using the parser to get the tags you >want. The HTMLLinkScanner among others, also works on the same principle. > >Bytway - I think we should take this discussion to the Developer list. > >Regards, >Somik >----- Original Message ----- >From: "Raghavender Srimantula" <kin...@ho...> >To: <htm...@li...> >Sent: Monday, April 08, 2002 6:39 AM >Subject: [Htmlparser-user] HTML parser 1.1 > > > > Hi Somik, > > when would be this HTMLparser 1.1 out? > > one more question. to parse the FORM tags, I have a small question. > > let us say this is a form tag > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > <P>User name: > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > <P>Password: > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > <P><INPUT TYPE="submit" VALUE="Log in"> > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > </FORM> > > > > I am not sure, but to me the way htmlparser parses is it gives me the >tag > > parameter of the first line in the above snippet of html code, when I do > > Hashtable table = tag.parseParameters(); > > it is looking for parameters inside <FORM ..... >, but not <FORM > > .....</FORM> > > > > could you suggest me how to go ahead with this. > > Raghav > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > _________________________________________________________________ > > MSN Photos is the easiest way to share and print your photos: > > http://photos.msn.com/support/worldwide.aspx > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >_________________________________________________________ >Do You Yahoo!? >Get your free @yahoo.com address at http://mail.yahoo.com > _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. |
From: Somik R. <so...@ya...> - 2002-04-09 14:42:52
|
Hi Raghav > Begin Tag : SELECT name="pulldown" class="smaller-text"; begins at : 0; ends > at : 44 > > this node which I get is of neither HTMLRemarkNode, HTMLStringNode, > HTMLEndTag. Thats right- this is expected behaviour. The type of this node is HTMLTag. If you downcast to HTMLTag, you can get all the info. Regards, Somik ----- Original Message ----- From: "Raghavender Srimantula" <kin...@ho...> To: <so...@ya...> Cc: <htm...@li...> Sent: Tuesday, April 09, 2002 7:01 PM Subject: [Htmlparser-developer] Re: [Htmlparser-user] HTML parser 1.1 > hi Somik, > question regarding the form parsing. let us say I have this tag > <SELECT name="pulldown" class="smaller-text"> > > so now when I do a > node = reader.readElement(); > > if I do a node.print(), I get > > Begin Tag : SELECT name="pulldown" class="smaller-text"; begins at : 0; ends > at : 44 > > this node which I get is of neither HTMLRemarkNode, HTMLStringNode, > HTMLEndTag. > I am not sure how to classify this. because if I want to take some action > here I need to classify this node. > could you help me out. > Raghav > > > >From: "Somik Raha" <so...@ya...> > >To: "Raghavender Srimantula" <kin...@ho...> > >CC: <htm...@li...> > >Subject: Re: [Htmlparser-user] HTML parser 1.1 > >Date: Mon, 8 Apr 2002 13:04:07 +0900 > > > >Hi Raghav > > > when would be this HTMLparser 1.1 out? > >As soon as I can wrap it up. Technically, the code is ready and already > >checked into CVS. I need to do the process of creating a release - make > >some > >documentation, check everything is ok, .. > >If I had some help I could wrap it up sooner. > > > > > I am not sure, but to me the way htmlparser parses is it gives me the > >tag > > > parameter of the first line in the above snippet of html code, when I do > > > Hashtable table = tag.parseParameters(); > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > .....</FORM> > > > >Yes - parseParameters() will give you the stuff inside the FORM tag. That > >is > >what I call "microscopic" parsing. But to get the remaining tags - till you > >encounter </FORM> you need to do "macroscopic" parsing. This is not hard- > >check HTMLAppletScanner as an example. > > > >In a nutshell - concept is very simple. The scan method provides you with a > >reader. So you are to use that reader to read ahead and get the next tags. > >This is simple bcos the reader will automatically identify the correct > >tags, > >and the mechanism is very similar to using the parser to get the tags you > >want. The HTMLLinkScanner among others, also works on the same principle. > > > >Bytway - I think we should take this discussion to the Developer list. > > > >Regards, > >Somik > >----- Original Message ----- > >From: "Raghavender Srimantula" <kin...@ho...> > >To: <htm...@li...> > >Sent: Monday, April 08, 2002 6:39 AM > >Subject: [Htmlparser-user] HTML parser 1.1 > > > > > > > Hi Somik, > > > when would be this HTMLparser 1.1 out? > > > one more question. to parse the FORM tags, I have a small question. > > > let us say this is a form tag > > > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > > <P>User name: > > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > > <P>Password: > > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > > <P><INPUT TYPE="submit" VALUE="Log in"> > > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > > </FORM> > > > > > > I am not sure, but to me the way htmlparser parses is it gives me the > >tag > > > parameter of the first line in the above snippet of html code, when I do > > > Hashtable table = tag.parseParameters(); > > > it is looking for parameters inside <FORM ..... >, but not <FORM > > > .....</FORM> > > > > > > could you suggest me how to go ahead with this. > > > Raghav > > > > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > > > > > > > _________________________________________________________________ > > > MSN Photos is the easiest way to share and print your photos: > > > http://photos.msn.com/support/worldwide.aspx > > > > > > > > > _______________________________________________ > > > Htmlparser-user mailing list > > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > >_________________________________________________________ > >Do You Yahoo!? > >Get your free @yahoo.com address at http://mail.yahoo.com > > > > > > > _________________________________________________________________ > Join the world's largest e-mail service with MSN Hotmail. > http://www.hotmail.com > > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Raghavender S. <kin...@ho...> - 2002-04-09 10:01:43
|
hi Somik, question regarding the form parsing. let us say I have this tag <SELECT name="pulldown" class="smaller-text"> so now when I do a node = reader.readElement(); if I do a node.print(), I get Begin Tag : SELECT name="pulldown" class="smaller-text"; begins at : 0; ends at : 44 this node which I get is of neither HTMLRemarkNode, HTMLStringNode, HTMLEndTag. I am not sure how to classify this. because if I want to take some action here I need to classify this node. could you help me out. Raghav >From: "Somik Raha" <so...@ya...> >To: "Raghavender Srimantula" <kin...@ho...> >CC: <htm...@li...> >Subject: Re: [Htmlparser-user] HTML parser 1.1 >Date: Mon, 8 Apr 2002 13:04:07 +0900 > >Hi Raghav > > when would be this HTMLparser 1.1 out? >As soon as I can wrap it up. Technically, the code is ready and already >checked into CVS. I need to do the process of creating a release - make >some >documentation, check everything is ok, .. >If I had some help I could wrap it up sooner. > > > I am not sure, but to me the way htmlparser parses is it gives me the >tag > > parameter of the first line in the above snippet of html code, when I do > > Hashtable table = tag.parseParameters(); > > it is looking for parameters inside <FORM ..... >, but not <FORM > > .....</FORM> > >Yes - parseParameters() will give you the stuff inside the FORM tag. That >is >what I call "microscopic" parsing. But to get the remaining tags - till you >encounter </FORM> you need to do "macroscopic" parsing. This is not hard- >check HTMLAppletScanner as an example. > >In a nutshell - concept is very simple. The scan method provides you with a >reader. So you are to use that reader to read ahead and get the next tags. >This is simple bcos the reader will automatically identify the correct >tags, >and the mechanism is very similar to using the parser to get the tags you >want. The HTMLLinkScanner among others, also works on the same principle. > >Bytway - I think we should take this discussion to the Developer list. > >Regards, >Somik >----- Original Message ----- >From: "Raghavender Srimantula" <kin...@ho...> >To: <htm...@li...> >Sent: Monday, April 08, 2002 6:39 AM >Subject: [Htmlparser-user] HTML parser 1.1 > > > > Hi Somik, > > when would be this HTMLparser 1.1 out? > > one more question. to parse the FORM tags, I have a small question. > > let us say this is a form tag > > > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > > <P>User name: > > <INPUT TYPE="text" NAME="userName" SIZE="10"> > > <P>Password: > > <INPUT TYPE="password" NAME="password" SIZE="12"> > > <P><INPUT TYPE="submit" VALUE="Log in"> > > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > > </FORM> > > > > I am not sure, but to me the way htmlparser parses is it gives me the >tag > > parameter of the first line in the above snippet of html code, when I do > > Hashtable table = tag.parseParameters(); > > it is looking for parameters inside <FORM ..... >, but not <FORM > > .....</FORM> > > > > could you suggest me how to go ahead with this. > > Raghav > > > > > > to extract the INPUT tag parameters > > > > > > > > > > > > _________________________________________________________________ > > MSN Photos is the easiest way to share and print your photos: > > http://photos.msn.com/support/worldwide.aspx > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >_________________________________________________________ >Do You Yahoo!? >Get your free @yahoo.com address at http://mail.yahoo.com > _________________________________________________________________ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com |
From: Somik R. <so...@ya...> - 2002-04-08 04:06:58
|
Hi Raghav > when would be this HTMLparser 1.1 out? As soon as I can wrap it up. Technically, the code is ready and already checked into CVS. I need to do the process of creating a release - make some documentation, check everything is ok, .. If I had some help I could wrap it up sooner. > I am not sure, but to me the way htmlparser parses is it gives me the tag > parameter of the first line in the above snippet of html code, when I do > Hashtable table = tag.parseParameters(); > it is looking for parameters inside <FORM ..... >, but not <FORM > .....</FORM> Yes - parseParameters() will give you the stuff inside the FORM tag. That is what I call "microscopic" parsing. But to get the remaining tags - till you encounter </FORM> you need to do "macroscopic" parsing. This is not hard- check HTMLAppletScanner as an example. In a nutshell - concept is very simple. The scan method provides you with a reader. So you are to use that reader to read ahead and get the next tags. This is simple bcos the reader will automatically identify the correct tags, and the mechanism is very similar to using the parser to get the tags you want. The HTMLLinkScanner among others, also works on the same principle. Bytway - I think we should take this discussion to the Developer list. Regards, Somik ----- Original Message ----- From: "Raghavender Srimantula" <kin...@ho...> To: <htm...@li...> Sent: Monday, April 08, 2002 6:39 AM Subject: [Htmlparser-user] HTML parser 1.1 > Hi Somik, > when would be this HTMLparser 1.1 out? > one more question. to parse the FORM tags, I have a small question. > let us say this is a form tag > > <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> > <P>User name: > <INPUT TYPE="text" NAME="userName" SIZE="10"> > <P>Password: > <INPUT TYPE="password" NAME="password" SIZE="12"> > <P><INPUT TYPE="submit" VALUE="Log in"> > <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> > </FORM> > > I am not sure, but to me the way htmlparser parses is it gives me the tag > parameter of the first line in the above snippet of html code, when I do > Hashtable table = tag.parseParameters(); > it is looking for parameters inside <FORM ..... >, but not <FORM > .....</FORM> > > could you suggest me how to go ahead with this. > Raghav > > > to extract the INPUT tag parameters > > > > > > _________________________________________________________________ > MSN Photos is the easiest way to share and print your photos: > http://photos.msn.com/support/worldwide.aspx > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |