htmlparser-user Mailing List for HTML Parser (Page 99)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Craig R. <cr...@qu...> - 2002-04-01 13:20:43
|
Has the HTML Parser been integrated into Swing's HTMLEditorKit to provide a better implementation of JEditorPane's HTML viewing capabilities? HTML Parser would need to replace javax.swing.text.html.parser.Parser, which is currently somewhat buggy. Anyone tried this? -craig |
From: Somik R. <so...@ya...> - 2002-03-24 05:51:01
|
Dear Users, Thanks for using HTMLParser. HTMLParser is getting some new = features, namely,=20 [1] HTMLMetaTag scanner [2] Support for not ".html" pages - I am planning to bring in dynamic = pages under the purview of the parser as well. Though I might need a bit = of help for this. I wanted to have some feedback from the user community -what are the = features that you would really like to see added to the parser (or r u = quite happy with the parser as is?) Regards, Somik |
From: Somik R. <so...@ya...> - 2002-03-22 16:41:00
|
Hi Folks, Release 1.04 is out. Has the following bug fixes : [1] Parsing JSP tags which had tags within inverted commas, was causing = problems. [2] A link with no link url would cause the parser to crash with a null = pointer exception. The above bugs were reported by Gordon Deudney and Robert Kausch. More test cases added.=20 Regards, Somik |
From: Somik R. <so...@ya...> - 2002-03-16 10:52:44
|
Hi Gordon, This is in reply to your request for help on the sourceforge site. I = couldnt find out how to format code and put it up there. Here's the sample code for you : HTMLNode node,node2;=20 HTMLLinkTag linkTag;=20 HTMLImageTag imageTag;=20 for (Enumeration e=3D parser.elements();e.hasMoreElements();) {=20 node =3D (HTMLNode)e.nextElement();=20 if (node instanceof HTMLLinkTag) { // If its a link tag, only = then shall we look for image tags within them linkTag =3D (HTMLLinkTag)node;=20 for (Enumeration e2=3DlinkTag.linkData();e2.hasMoreElements();) { = // Go through the list of elements in this node2 =3D (HTMLNode)e2.nextElement();=20 if (node2 instanceof HTMLImageTag) { // Only if the element is = an image tag shall we downcase imageTag =3D (HTMLImageTag)node2;=20 System.out.println("Image loc = =3D"+imageTag.getImageLocation());=20 }=20 }=20 }=20 }=20 Regards, Somik |
From: Gordon D. <gde...@on...> - 2002-03-16 10:40:06
|
I was wondering how to extract an image tag and img properties from within an anchor tag? I know HTMLlinkTag has a linkData() which returns an enumeration, I have tried to convert that to a tag but I have had no luck. example of what I am trying to parse (a href="asdf") (img src="asdf")(/a) <a href="test"><img src="test"> </a> I want to get the image tag info. I appreciate any help. -- Gordon Deudney gde...@on... - email (212) 894-3750 x7884 - voicemail/fax __________________________________________________ Voicemail, email, and fax...all in one place. Sign Up Now! http://www.onebox.com |
From: Somik R. <so...@ya...> - 2002-03-14 12:43:41
|
Here's your program attached. Run it on the attached file. I think this is what you wanted to do. Regards, Somik ----- Original Message ----- From: "Somik Raha" <so...@ya...> To: "HTMLParser User List" <htm...@li...> Sent: Wednesday, March 13, 2002 11:38 PM Subject: Re: [Htmlparser-user] HTML Parsing > Yes - this should be possible. You have to process a HTMLTag, and check if > it is a DIV tag. If yes, you can use the parseParameters() to get the CLASS > value. > > Regards, > Somik > ----- Original Message ----- > From: "Kalyan Kumar Mudumbai" <mk...@wi...> > To: "'Somik Raha'" <so...@ya...> > Sent: Wednesday, March 13, 2002 2:56 PM > Subject: RE: [Htmlparser-user] HTML Parsing > > > > Hi Somik, > > thanks alot for the quick reply. I had had a feel of the parser. I > > wanted to obtain the attribute value of CLASS in DIV to do the further > > processing in my application. But what I found from the initial running > was, > > DIV tag hasn't been handled (if I'm not wrong. Please excuse me for my > > ignorance). I have also tried using the default java parser > > HTMLEditorKit.ParserCallback. Even this guy is also not handling this one. > > Can I handle this and if so, how can I? > > Thanks for your input. > > > > Regards, > > Kalyan > > > > > -----Original Message----- > > > From: Somik Raha [SMTP:so...@ya...] > > > Sent: Tuesday, March 12, 2002 1:10 PM > > > To: Kalyan Kumar Mudumbai > > > Subject: Re: [Htmlparser-user] HTML Parsing > > > > > > Hi Kalyan > > > It seems like you are using something else other than HtmlParser. > > > Please download htmlparser from http://htmlparser.sourceforge.net > > > You will find the documentation as well with all info. To try it > > > immediately after downloading it, you can try : > > > run.bat http://www.yahoo.com > > > > > > Type run.bat to get options for your switches. In your question - you > want > > > to only extract links. So you can do : > > > > > > run.bat http://www.yahoo.com -l > > > > > > This will only show you the links. > > > > > > Regards, > > > Somik > > > > > > > > > This message is confidential and may also be legally privileged. If you > are not the intended recipient, please notify us immediately. You should not > copy it or use it for any purpose, nor disclose it's contents to any other > person. The views and opinions expressed in this e-mail message are the > author's own and may not reflect the views and opinions of Wilco > International. > > > _________________________________________________________ > Do You Yahoo!? > Get your free @yahoo.com address at http://mail.yahoo.com > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Somik R. <so...@ya...> - 2002-03-13 14:33:37
|
Yes - this should be possible. You have to process a HTMLTag, and check if it is a DIV tag. If yes, you can use the parseParameters() to get the CLASS value. Regards, Somik ----- Original Message ----- From: "Kalyan Kumar Mudumbai" <mk...@wi...> To: "'Somik Raha'" <so...@ya...> Sent: Wednesday, March 13, 2002 2:56 PM Subject: RE: [Htmlparser-user] HTML Parsing > Hi Somik, > thanks alot for the quick reply. I had had a feel of the parser. I > wanted to obtain the attribute value of CLASS in DIV to do the further > processing in my application. But what I found from the initial running was, > DIV tag hasn't been handled (if I'm not wrong. Please excuse me for my > ignorance). I have also tried using the default java parser > HTMLEditorKit.ParserCallback. Even this guy is also not handling this one. > Can I handle this and if so, how can I? > Thanks for your input. > > Regards, > Kalyan > > > -----Original Message----- > > From: Somik Raha [SMTP:so...@ya...] > > Sent: Tuesday, March 12, 2002 1:10 PM > > To: Kalyan Kumar Mudumbai > > Subject: Re: [Htmlparser-user] HTML Parsing > > > > Hi Kalyan > > It seems like you are using something else other than HtmlParser. > > Please download htmlparser from http://htmlparser.sourceforge.net > > You will find the documentation as well with all info. To try it > > immediately after downloading it, you can try : > > run.bat http://www.yahoo.com > > > > Type run.bat to get options for your switches. In your question - you want > > to only extract links. So you can do : > > > > run.bat http://www.yahoo.com -l > > > > This will only show you the links. > > > > Regards, > > Somik > > > > > This message is confidential and may also be legally privileged. If you are not the intended recipient, please notify us immediately. You should not copy it or use it for any purpose, nor disclose it's contents to any other person. The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of Wilco International. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Somik R. <so...@ya...> - 2002-03-12 07:47:43
|
Hi Kalyan It seems like you are using something else other than HtmlParser. Please download htmlparser from http://htmlparser.sourceforge.net You will find the documentation as well with all info. To try it immediately after downloading it, you can try : run.bat http://www.yahoo.com Type run.bat to get options for your switches. In your question - you want to only extract links. So you can do : run.bat http://www.yahoo.com -l This will only show you the links. Regards, Somik ----- Original Message ----- From: "Kalyan Kumar Mudumbai" <mk...@wi...> To: <htm...@li...> Sent: Monday, March 11, 2002 7:49 PM Subject: [Htmlparser-user] HTML Parsing > Hi All, > how do I parse an HTML document and obtain the value of a tag in that > document. Suppose if I have an html document named Table.html which contains > a table will cells having HREF to another document which also contains a > table, I should be first able to obtain the HREF and then the table content. > I am not able to find out a way to obtain a parser object from the > HTMLEditorKit. Can some one please post a code snippet of parsing the HTML > and obtaining the attribute value of any HREF which can specified from the > command line. Something like > > java TestParser Table.html HREF > > should read the Table.html file and the output on the console has to be the > value of HREF > > Thanks, > Kalyan > > > > > > > > > This message is confidential and may also be legally privileged. If you are not the intended recipient, please notify us immediately. You should not copy it or use it for any purpose, nor disclose it's contents to any other person. The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of Wilco International. > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Kalyan K. M. <mk...@wi...> - 2002-03-11 10:51:55
|
Hi All, how do I parse an HTML document and obtain the value of a tag in that document. Suppose if I have an html document named Table.html which contains a table will cells having HREF to another document which also contains a table, I should be first able to obtain the HREF and then the table content. I am not able to find out a way to obtain a parser object from the HTMLEditorKit. Can some one please post a code snippet of parsing the HTML and obtaining the attribute value of any HREF which can specified from the command line. Something like java TestParser Table.html HREF should read the Table.html file and the output on the console has to be the value of HREF Thanks, Kalyan This message is confidential and may also be legally privileged. If you are not the intended recipient, please notify us immediately. You should not copy it or use it for any purpose, nor disclose it's contents to any other person. The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of Wilco International. |
From: Somik R. <so...@ya...> - 2002-03-04 14:28:40
|
HTMLParser 1.03 has been released. It contains a bug fix in = HTMLRemarkNode which was causing the parser to crash on pages with = remarks going over one line. A test case for the bug has been added in = HTMLRemarkNodeTest.=20 The release also contains the design documentation in the zip. Thanks to = Serge Kruppa for pointing out the bug. Check http://htmlparser.sourceforge.net Regards Somik |
From: Somik R. <so...@ya...> - 2002-01-21 01:17:25
|
Hi Rohit, For including your own scanner type, you would need to do something like this : [1] HTMLTableTag - the tag that stores the data of the table tags [2] HTMLTableScanner - the class which does the scanning - implement the two template methods : (i) evaluate() - returns true if the tag name is "TABLE". false otherwise (ii) scan() - returns the HTMLTableTag object from the available text data. Here, you will be having the tag contents, and you will need to extract the relevant data out, construct the table object appropriately and return it. Finally, you need to register this scanner. Thats it - after this, table object will be identified. All the scanners in the library were written with this architecture in mind. Check out the entire scanners package, in particular, HTMLLinkScanner. Check out the corresponding test cases (in scannersTests package), and you should get a clear idea of the usage. Also - could you subscribe to the HTMLParser User's list, and mail your queries to that single mail id. Cheers Somik ----- Original Message ----- From: "Rohit Kelapure" <rke...@vt...> To: <fal...@mt...>; <kaa...@ik...>; <na...@us...>; <so...@ki...> Sent: Monday, January 21, 2002 10:07 AM Subject: HTML TABLE PARSER > My name is Rohit Kelapure. > > I am a graduate student in Computer Science at Virginia Tech. > > I have been going through the source code of the HTML parser. > > I need to customize this so as to extract the items of a table on a HTML page > and insert in a database. > > >From the code and documentation it is clear that I need to create my own > scanner-tag pair. > > Could you give some more pointers to this.Which are the java source files > which I should be working with? Have any of you worked on this modification > before? > > Your help and suggestions are greatly welcome. > > Thanks, > Rohit Kelapure. > Graduate Student Computer Science Virginia Tech USA. > > _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Somik R. <so...@ya...> - 2002-01-16 14:09:44
|
Hi Folks, Check http://htmlparser.sourceforge.net for a totally new look. = Design documentation with sample programs has been added. Feedback is welcome. Regards, Somik |
From: Somik R. <so...@ki...> - 2002-01-16 14:08:53
|
Hi Folks, Check http://htmlparser.sourceforge.net for a totally new look. = Design documentation with sample programs has been added. Feedback is welcome. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-01-09 16:36:59
|
Hi Folks, Another bug was detected in HTMLStyleScanner, and has been = immediately fixed. v1.02 has been released with this fix, and another = one - which allows scanning of Finnish pages to proceed properly. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-01-08 17:35:06
|
Hi Folks, An important bug fix has been done. The parser was crashing on style = tags - this has been fixed. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-01-05 17:11:41
|
Hi Folks, Sorry bout that, the zip file that was uploaded seemed to be = corrupted. Its fixed, and you should be able to download it now. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-01-03 20:05:24
|
Hi Folks, A new year present - HTMLParser 1.0 is released. We've finally made = the transition from alpha to a beta stage. Modifications henceforth = would only be of a maintenance nature and API should remain constant. There are huge changes in the architecture, and lots of bug fixes. = Thanks a lot to Kaarle Kaaila for some great support and ideas. Thanks = also to Rodney Foley, for some nice ideas for improvement. And thanks to = everyone else who's been supporting this project.=20 Looking forward to your continuing support, and wishing you a very = happy new year. =20 Cheers, Somik |
From: Somik R. <so...@ya...> - 2001-11-13 16:56:18
|
Hi folks, I have modified the architecture, to include the change I spoke of = last. Now, the parser throws an exception if no scanners have been = registered. This feature can be turned off by setting a boolean flag, = but by default it is set to true. Also, a static method called registerScanners is now available in = HTMLParser, which will register some of the common scanners. Hopefully, this will alleviate much of the confusion being caused by = the scanner registration process. Regards, Somik |