htmlparser-user Mailing List for HTML Parser (Page 98)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Doyle, A. <Ann...@au...> - 2002-04-23 15:38:30
|
Is there any way to modify the image tag locations? I want to find the image tags in a document and then modify the location to a local location. I will download the images from the original location and save locally. I also want to then write out the resulting html to a file locally. The resulting document I write out will have the new local location for the images. =20 =20 Thanks- Annette |
From: Somik R. <so...@ya...> - 2002-04-23 15:25:30
|
Hi Mats >Have you still got this applet (reply to the previous question) so i can run it on my machine? >Also, have you had a chance yet to have a look at my previous questions about extracting links using my >program i attached. >Thanks for your time and effort Actually I was looking into it right now - finally had some time. Will let you know very soon... Cheers, Somik |
From: Sodergren, M.G. <mg...@le...> - 2002-04-23 15:18:03
|
Hello, Have you still got this applet (reply to the previous question) so i can = run it on my machine? Also, have you had a chance yet to have a look at my previous questions = about extracting links using my program i attached. Thanks for your time and effort. Mats |
From: Somik R. <so...@ya...> - 2002-04-23 14:38:16
|
Hi Annette, Yes - HTML Parser works with Java 1.1. I had tested it a few months = back in an applet.. =20 Regards, Somik ----- Original Message -----=20 From: Doyle, Annette=20 To: htm...@li...=20 Sent: Tuesday, April 23, 2002 9:57 PM Subject: [Htmlparser-user] Java version of html parser What is the Java version of the html parser ? (Is it Java 1.1 = compatible?) =20 Annette |
From: Doyle, A. <Ann...@au...> - 2002-04-23 12:57:28
|
What is the Java version of the html parser ? (Is it Java 1.1 compatible?) =20 Annette |
From: Somik R. <so...@ya...> - 2002-04-22 04:37:37
|
Go to https://lists.sourceforge.net/lists/listinfo/htmlparser-user to do the same (this link is also in the footer). Cheers, Somik ----- Original Message ----- From: "sultan" <ma...@ho...> To: <htm...@li...> Sent: Monday, April 22, 2002 1:52 PM Subject: [Htmlparser-user] Unsubscribe me > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: sultan <ma...@ho...> - 2002-04-22 03:51:27
|
From: Sodergren, M.G. <mg...@le...> - 2002-04-21 13:18:20
|
Thanks for your reply. I really appriciate if you get the time when you = get back to code and send back WebViewer to me as i have to finish this = by the end of next week. Thanks again, Mats Sodergren |
From: Somik R. <so...@ya...> - 2002-04-19 23:34:45
|
Hi Mats Unfortunately this weekend I am going to be out of station. Will try to help you when I get back. Cheers, Somik |
From: Sodergren, M.G. <mg...@le...> - 2002-04-19 10:49:44
|
Hello, I was wondering if it is possible that you could help me with the = following. If you have not got time to do this i understand. I have been = trying it code this but cannot make it work and i am streesed for time = to complete it. I want my WebViewer program (attached) to be run under emacs in windows. = This will display a window where i can enter a url at the top. Once = enter is pressed i want WebViewer to work with htmlparser to extract the = links of that HTML page and return them below in the window as a result. = The way in which WebViewer works now is that it only shows the HTML page = of the url entered. So i need Webviewer to be changed in order to work = with htmlparser and i need the classes used from the html parser to be = sent to me (as all of them will not be used for just extracting links). = So i want all my programs just to be in one directory when i get them = and run them so i dont have any package com.kizna....... import = com.kizna........ etc.. in the programs. Please run WebViewer and you = will understand more clearly what i mean. Thanks for your time, Mats |
From: Somik R. <so...@ya...> - 2002-04-17 03:41:45
|
Hi Folks, HTMLParser 1.1 has just been released. This is a production release = - HTMLParser finally moves out of the beta stage.=20 A whole lot of bug fixes, architecture modifications, and intense = testing has been done.=20 You can get it from http://htmlparser.sourceforge.net Regards, Somik |
From: Somik R. <so...@ya...> - 2002-04-17 02:29:38
|
Hi Mats, HTMLParser.elements returns an Enumeration. So you can enumerate through a list of nodes. This is actually the Iterator design pattern. HTMLNode is the interface that represents just about any kind of html element. The element might be a string node, a remark node, a tag, or end tag. If it is a tag - then there are several types of tags - and that forms another hierarchy. All this is explained with class diagrams at : http://htmlparser.sourceforge.net/design/index.html http://htmlparser.sourceforge.net/design/tags.html (this shows the HTMLNode particularly). To use the parser is quite simple - from the user perspective, you only need a loop - HTMLNode node; for (Enumeration e = parser.elements();e.hasMoreElements();) { node = (HTMLNode)e.nextElement(); // Now you have an object of type HTMLNode. // This is however of a type which implements HTMLNode. So you can use instanceof if you are interested // in particular types. Or you can use reflections to find out information about the object itself. The former is usually // what is used by most folks. // Suppose you want to only print strings, you will want to take action if the node is a HTMLStringNode if (node instanceof HTMLStringNode) { // Yes, now we can downcast it to HTMLStringNode HTMLStringNode stringNode = (HTMLStringNode)node; // Print the contents of the string node System.out.println(stringNode.getText()); } } HTH. Pls feel free to ask any questions that you have. Regards, Somik ----- Original Message ----- From: "Sodergren, M.G." <mg...@le...> To: <htm...@li...> Sent: Tuesday, April 16, 2002 6:26 PM Subject: [Htmlparser-user] the parser I have a problem with the following: node = (HTMLNode)e.nextElement(); Please tell me what is the content and return type of HTMLParser.elements(), and how the HTMLNode is defined. _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Sodergren, M.G. <mg...@le...> - 2002-04-16 09:26:35
|
I have a problem with the following:=20 node =3D (HTMLNode)e.nextElement();=20 Please tell me what is the content and return type of=20 HTMLParser.elements(), and how the HTMLNode is defined.=20 |
From: Somik R. <so...@ya...> - 2002-04-15 05:14:42
|
Hi Folks, Thanks to Sam Joseph (creator of Neurogrid). Sam is using the parser = in the neurogrid project, and has pointed out a bug that slipped our = attention. If links or image urls contain spaces, those spaces were = being absorbed. That is incorrect behaviour, especially if you have = something of the form : http://myservlet.com/someservlet?name=3DSam Joseph&age=3D22 The same goes for images like http://www.kizna.com/images/kizna corp.jpg Also - previously, newline character were being converted to spaces. = This has been modified - new line characters are left as is. The = responsibility to deal with them is now with the appropriate scanner. = So, the link and image scanners specifically filter out the newline = characters, whereas jsp tags which might have jsp code - would like to = preserve the new line chars. Over 73 testcases now in the htmlparser, and all passing.. I think we're ready for release 1.1 now, unless I get any more bug = reports this week. You can check out the latest code from CVS. Regards, Somik |
From: Somik R. <so...@ya...> - 2002-04-08 04:22:46
|
Hi Raghav > when would be this HTMLparser 1.1 out? As soon as I can wrap it up. Technically, the code is ready and already checked into CVS. I need to do the process of creating a release - make some documentation, check everything is ok, .. If I had some help I could wrap it up sooner. > I am not sure, but to me the way htmlparser parses is it gives me the tag > parameter of the first line in the above snippet of html code, when I do > Hashtable table = tag.parseParameters(); > it is looking for parameters inside <FORM ..... >, but not <FORM > .....</FORM> Yes - parseParameters() will give you the stuff inside the FORM tag. That is what I call "microscopic" parsing. But to get the remaining tags - till you encounter </FORM> you need to do "macroscopic" parsing. This is not hard- check HTMLAppletScanner as an example. In a nutshell - concept is very simple. The scan method provides you with a reader. So you are to use that reader to read ahead and get the next tags. This is simple bcos the reader will automatically identify the correct tags, and the mechanism is very similar to using the parser to get the tags you want. The HTMLLinkScanner among others, also works on the same principle. Bytway - I think we should take this discussion to the Developer list. Regards, Somik _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Somik R. <so...@ya...> - 2002-04-08 04:22:15
|
Mats wrote : > So can i not some how get the htmlparser (just the extract link programs) to work with my WebPage Viewer so that i enter a > search like before but this time the HTML page shown in my window is just the extracted links? Can I do this? If so how? Yes you can do this. Look at the program MailRipper.java in the com.kizna.html.parserapplications package. This program will print only the email addresses on a page (which are also link tags). Its even simpler to simply filter out all the tags except the link tags. Try this : HTMLParser parser = new HTMLParser("http://myurl.com"); HTMLNode node; for (Enumeration e = parser.elements();e.hasMoreElements();) { node = (HTMLNode)e.nextElement(); // Get the next html node if (node instanceof HTMLLinkTag) { // Yes - this is a web link. Now you can downcast and do what you want with it HTMLLinkTag linkTag = (HTMLLinkTag)node; System.out.println("link to "+linkTag.getLink()+", link text = "+linkTag.getLinkText()); } } This should give you what you want. Regards, Somik _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Raghavender S. <kin...@ho...> - 2002-04-07 21:40:11
|
Hi Somik, when would be this HTMLparser 1.1 out? one more question. to parse the FORM tags, I have a small question. let us say this is a form tag <FORM NAME="LoginForm" METHOD=POST ACTION="urltoInvoke"> <P>User name: <INPUT TYPE="text" NAME="userName" SIZE="10"> <P>Password: <INPUT TYPE="password" NAME="password" SIZE="12"> <P><INPUT TYPE="submit" VALUE="Log in"> <INPUT TYPE="button" VALUE="Cancel" onClick="window.close()"> </FORM> I am not sure, but to me the way htmlparser parses is it gives me the tag parameter of the first line in the above snippet of html code, when I do Hashtable table = tag.parseParameters(); it is looking for parameters inside <FORM ..... >, but not <FORM .....</FORM> could you suggest me how to go ahead with this. Raghav to extract the INPUT tag parameters _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx |
From: Sodergren, M.G. <mg...@le...> - 2002-04-06 13:53:55
|
Hello again, I think I have figured out how to do it. Its as follows: - I have written two programs called WebPage (which is a static utility = class that provides operations for downloading web pages) and = WebPageViewer (contains utilities to download and display HTML pages). = So if i run WebPageViewer i just enter my search at the top (which makes = the yahoo search url with my search) and it displays the yahoo search = page for this search because the beginning URL is the standard yahoo = search url. So can i not some how get the htmlparser (just the extract = link programs) to work with my WebPage Viewer so that i enter a search = like before but this time the HTML page shown in my window is just the = extracted links? Can I do this? If so how? This would mean that I dont = need a servlet or something else. I have attached my two programs so you = can have a look at them and run them if you want. They are quite basic = and really i want them to contain a text field where i can enter my = search without the basic yahoo search url been shown and then have a = search button i can press to perform the search instead of pressing = enter.=20 Thanks, Mats |
From: Somik R. <so...@ya...> - 2002-04-05 07:14:43
|
Hi Folks, The dynamic page parsing bug is fixed, and as far as I've tested, I = am able to parse correctly pages like = http://search.yahoo.com/bin/search?p=3Ddogs=20 which Mats had posted earlier. We are now ready for release 1.1. I'd be grateful if I had some help = in testing the parser - and see if there are any showstopper bugs for = this release. (Get the latest code from CVS) Regards, Somik |
From: Somik R. <so...@ya...> - 2002-04-05 03:04:55
|
Hi Folks, An important bug has been pointed out by Raj Sharma, which would = halt the parser if a page contained a link spread over two lines. This = was a bug in HTMLTag, and I was able to find it quickly, thanks to the = refactoring done earlier with the help of Arnaud. Also - HTMLLinkScanner and HTMLImageScanner have some small changes = in connection with the fix. Please get the latest code from CVS. =20 Regards, Somik =20 |
From: Somik R. <so...@ya...> - 2002-04-04 15:55:35
|
Of course you can use the library in your servlet. The library isnt meant to be used thru the command line - that is only provided as a quick demo and for testing. Putting it in an applet is also an interesting idea. I have kept the parser JDK 1.1 compatible - so you should be able to get it into an applet on IE. Regards, Somik ----- Original Message ----- From: "Sodergren, M.G." <mg...@le...> To: "Somik Raha" <so...@ya...> Sent: Friday, April 05, 2002 12:50 AM Subject: RE: [Htmlparser-user] link extractor What i meant was if i have an address where i can store my web page (actually where i will run my programs from) is it easiest to make my programs into a applet, servlet, etc... or do i even have to do this at all. I am going to adapt the program so i can enter a url in a box instead of on the command line. Thanks Mats -----Original Message----- From: Somik Raha [mailto:so...@ya...] Sent: 04 April 2002 16:47 To: Sodergren, M.G. Subject: Re: [Htmlparser-user] link extractor Hi Mats, >What is the easiest way to make the htmlparser (link extractor) internet >based? Sorry, didnt understand this question. Regards, Somik _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Sodergren, M.G. <mg...@le...> - 2002-04-04 13:57:21
|
How come when you use the parser on most sites to extract links it works fine but when you use it on search engine i.e.=20 http://search.yahoo.com/bin/search?p=3Ddogs which is a page with search results for dogs, it does not work? Thanks Mats |
From: Sodergren, M.G. <mg...@le...> - 2002-04-04 10:43:42
|
What is the easiest way to make the htmlparser (link extractor) internet based? Mats |
From: Somik R. <so...@ya...> - 2002-04-04 02:26:47
|
Hi Craig, Thanks a lot for the post. Pls go ahead with your analysis. I will try to catch up this weekend. Regards, Somik ----- Original Message ----- From: "Craig Raw" <cr...@qu...> To: "'Somik Raha'" <so...@ya...> Sent: Tuesday, April 02, 2002 3:32 PM Subject: RE: [Htmlparser-user] Swing integration > Hi Somik, > > A quick excerpt from javax.swing.text.html.HTMLEditorKit javadoc - which > is the driver behind JEditorPane's reading and writing HTML > capabilities. > > --- > Extendable/Scalable > > To maximize the usefulness of this kit, a great deal of effort has gone > into making it extendable. These are some of the features. > The parser is replaceable. The default parser is the Hot Java parser > which is DTD based. A different DTD can be used, or an entirely > different parser can be used. To change the parser, reimplement the > getParser method. The default parser is dynamically loaded when first > asked for, so the class files will never be loaded if an alternative > parser is used. The default parser is in a separate package called > parser below this package. > > The parser drives the ParserCallback, which is provided by HTMLDocument. > To change the callback, subclass HTMLDocument and reimplement the > createDefaultDocument method to return document that produces a > different reader. The reader controls how the document is structured. > Although the Document provides HTML support by default, there is nothing > preventing support of non-HTML tags that result in alternative element > structures. > --- > > I may find some time to look into this as well, although I am not sure > how much it would fix JEditorPane's somewhat buggy HTML rendering > capabilities.... > > -craig > > > -----Original Message----- > From: htm...@li... > [mailto:htm...@li...] On Behalf Of Somik > Raha > Sent: 01 April 2002 05:28 PM > To: HTMLParser User List > Cc: HTMLParser Developer List > Subject: Re: [Htmlparser-user] Swing integration > > Hi Craig > Wow! Thats a great question. > Actually, I doubt if I could replace Sun Microsystems' code with > mine. I > dont think Java is that open (or is it ?) > However, we could think of writing our own adapter for the html parser > that > might plugin in some way... > I have never used Sun's html parser (If I had, I might not have > started > this project). > I will need to study Sun's parser before I can answer your > question.. > But there does seem to be some interesting possibilities. > > Regards > Somik > ----- Original Message ----- > From: "Craig Raw" <cr...@qu...> > To: <htm...@li...> > Sent: Monday, April 01, 2002 10:20 PM > Subject: [Htmlparser-user] Swing integration > > > > Has the HTML Parser been integrated into Swing's HTMLEditorKit to > > provide a better implementation of JEditorPane's HTML viewing > > capabilities? HTML Parser would need to replace > > javax.swing.text.html.parser.Parser, which is currently somewhat > buggy. > > Anyone tried this? > > > > -craig > > > > > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > _________________________________________________________ > Do You Yahoo!? > Get your free @yahoo.com address at http://mail.yahoo.com > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |
From: Somik R. <so...@ya...> - 2002-04-01 15:22:00
|
Hi Craig Wow! Thats a great question. Actually, I doubt if I could replace Sun Microsystems' code with mine. I dont think Java is that open (or is it ?) However, we could think of writing our own adapter for the html parser that might plugin in some way... I have never used Sun's html parser (If I had, I might not have started this project). I will need to study Sun's parser before I can answer your question.. But there does seem to be some interesting possibilities. Regards Somik ----- Original Message ----- From: "Craig Raw" <cr...@qu...> To: <htm...@li...> Sent: Monday, April 01, 2002 10:20 PM Subject: [Htmlparser-user] Swing integration > Has the HTML Parser been integrated into Swing's HTMLEditorKit to > provide a better implementation of JEditorPane's HTML viewing > capabilities? HTML Parser would need to replace > javax.swing.text.html.parser.Parser, which is currently somewhat buggy. > Anyone tried this? > > -craig > > > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |