htmlparser-user Mailing List for HTML Parser (Page 30)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Chen, Shui-S. <qoo...@gm...> - 2007-03-22 15:49:01
|
Hello everyone: I have the same problem that i want to create an new node,and attach it to an exist node. The following are my code: TagNode div=new Div(); div.setAttribute("style","xxxx"); NodeList tmp =new NodeList(); tmp.add(div); div.setEndTag(new Div()); // is it a right way to create end tag? div.setChildren(traverseNode.getChildren()); ((TagNode) traverseNode).setChildren(tmp); // the traverseNode is the parent node(<li>) i want to attach on it Finally i get the result: <li><DIV style="xxx"><DIV></li> instead of : <li><DIV style="xxx"></DIV></li> How can i create a correct end tag say : </DIV> ? Stone 2007/1/11, Joel <jo...@ha...>: > > > I just didn't realize the need of end tag. That works. > > Thanks, > Joel > > Martin Sturm wrote: > > Hi, > > > > I did a quick test, and I think you forgot to add the endtag to the > > toCreate object. > > You should add: > > > > toCreate.setEndTag(new Span()); > > > > before you final println statement. > > > > The following code snippet works for me: > > TagNode toCreate = new Span(); > > toCreate.setAttribute("key", "Test", '"'); > > NodeList nl = new NodeList(); > > nl.add(new TextNode("Test2")); > > toCreate.setChildren(nl); > > toCreate.setEndTag(new Span()); > > System.out.println(toCreate.toHtml()); > > > > Result: <SPAN key="Test">Test2<SPAN> > > > > Hope this will help you. > > > > -- Martin > > > > > > 2007/1/10, Joel <jo...@ha... <mailto:jo...@ha...>>: > > > > I want to wrap text string with a span tag. I've tried the > > folowing, but > > I'm running into a problem, that the tag's children aren't being > > displayed. > > > > //New <span key="x">some text here</span> > > TagNode toCreate = new Span(); > > toCreate.setAttribute("key", getKey(str), '"'); > > NodeList nl = new NodeList(); > > nl.add(new TextNode(str)); > > toCreate.setChildren (nl); > > System.out.println(toCreate.toHtml()); > > > > This ends up showing <span key="x"> without the text node and end > tag, > > what am I doing wrong? > > > > Joel > > > > > > > > > > > ------------------------------------------------------------------------- > > > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to > > share your > > opinions on IT & business topics through brief surveys - and earn > > cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > < > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV> > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > <mailto:Htm...@li...> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > <https://lists.sourceforge.net/lists/listinfo/htmlparser-user> > > > > > > ------------------------------------------------------------------------ > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > > opinions on IT & business topics through brief surveys - and earn cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Ian M. <ian...@gm...> - 2007-03-20 15:30:52
|
HasAttributeFilter doesn't yet support this. There was talk of doing this by adding a constructor that takes a case-sensitivity value in addition to the attribute value, but it's not been implemented yet. Feel free to write it yourself and submit it to the project :) Ian On 3/20/07, Luca Telloli <tuk...@gm...> wrote: > Hello everyone, > I'm a newbie of htmlparser libraries so apologies for asking a > possibly easy question. I have different HTML pages to parse, with > the same meta attributes in different case fashions: for instance: > > case 1: <meta name="description" ...> > case 2: <meta name="Description" ...> > > etc. I wrote a NodeFilter as follows: > > NodeFilter DescriptionMetaTag = new AndFilter (new TagNameFilter > ("meta"), new HasAttributeFilter("name", "Description")); > > but it doesn't match the case 1. How can I correct the filter to > match any possible case of the "description" keyword? > > Thanks in advance, > Luca > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Luca T. <tuk...@gm...> - 2007-03-20 12:53:21
|
Hello everyone, I'm a newbie of htmlparser libraries so apologies for asking a possibly easy question. I have different HTML pages to parse, with the same meta attributes in different case fashions: for instance: case 1: <meta name="description" ...> case 2: <meta name="Description" ...> etc. I wrote a NodeFilter as follows: NodeFilter DescriptionMetaTag = new AndFilter (new TagNameFilter ("meta"), new HasAttributeFilter("name", "Description")); but it doesn't match the case 1. How can I correct the filter to match any possible case of the "description" keyword? Thanks in advance, Luca |
From: Derrick O. <der...@ro...> - 2007-03-13 22:51:49
|
=0ASee the FAQ=0Ahttp://htmlparser.sourceforge.net/faq.html=0A=0A----- Orig= inal Message ----=0AFrom: Maryam <mk...@ya...>=0ATo: htmlparser-user@= lists.sourceforge.net=0ASent: Tuesday, March 13, 2007 6:23:27 PM=0ASubject:= [Htmlparser-user] How to Fix this problem=0A=0AHi, =0A=0AI am trying to pa= rse an HTML page with HTMLparser but=0AI got this error, I dont know how to= fix this error,=0Aif you have any kind of idea please help me. =0A=0Aerror= is:=0A" org.htmlparser.util.EncodingChangeException:=0Acharacter mismatch = (new: ? [0x161] !=3D old: [0x9a?])=0Afor encoding change from ISO-8859-1 t= o windows-1252 at=0Acharacter offset 279749 =0A=0A at=0Aorg.htmlparser.l= exer.InputStreamSource.setEncoding(InputStreamSource.java:280)=0A at= =0Aorg.htmlparser.lexer.Page.setEncoding(Page.java:865)=0A at=0Aorg.= htmlparser.tags.MetaTag.doSemanticAction(MetaTag.java:150)=0A at=0Ao= rg.htmlparser.scanners.TagScanner.scan(TagScanner.java:69)=0A at=0Ao= rg.htmlparser.scanners.CompositeTagScanner.scan(CompositeTagScanner.java:16= 0)=0A at=0Aorg.htmlparser.util.IteratorImpl.nextNode(IteratorImpl.ja= va:92)=0A at=0Aorg.htmlparser.Parser.parse(Parser.java:701)=0A = at org.htmlparser.Parser.main(Parser.java:849)=0A"=0A=0AThanks=0A=0A=0A= =0A=0A =0A_________________________________________________________________= ___________________=0ANo need to miss a message. Get email on-the-go =0Awit= h Yahoo! Mail for Mobile. Get started.=0Ahttp://mobile.yahoo.com/mail =0A= =0A------------------------------------------------------------------------= -=0ATake Surveys. Earn Cash. Influence the Future of IT=0AJoin SourceForge.= net's Techsay panel and you'll get the chance to share your=0Aopinions on I= T & business topics through brief surveys-and earn cash=0Ahttp://www.techsa= y.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDEVDEV=0A__________= _____________________________________=0AHtmlparser-user mailing list=0AHtml= par...@li...=0Ahttps://lists.sourceforge.net/lists/lis= tinfo/htmlparser-user=0A=0A=0A=0A=0A |
From: Maryam <mk...@ya...> - 2007-03-13 22:23:43
|
Hi, I am trying to parse an HTML page with HTMLparser but I got this error, I dont know how to fix this error, if you have any kind of idea please help me. error is: " org.htmlparser.util.EncodingChangeException: character mismatch (new: ? [0x161] != old: [0x9a?]) for encoding change from ISO-8859-1 to windows-1252 at character offset 279749 at org.htmlparser.lexer.InputStreamSource.setEncoding(InputStreamSource.java:280) at org.htmlparser.lexer.Page.setEncoding(Page.java:865) at org.htmlparser.tags.MetaTag.doSemanticAction(MetaTag.java:150) at org.htmlparser.scanners.TagScanner.scan(TagScanner.java:69) at org.htmlparser.scanners.CompositeTagScanner.scan(CompositeTagScanner.java:160) at org.htmlparser.util.IteratorImpl.nextNode(IteratorImpl.java:92) at org.htmlparser.Parser.parse(Parser.java:701) at org.htmlparser.Parser.main(Parser.java:849) " Thanks ____________________________________________________________________________________ No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. http://mobile.yahoo.com/mail |
From: Martin S. <mst...@gm...> - 2007-03-13 08:28:40
|
2007/3/12, Eduardo David <edu...@gm...>: > I'm new at htmlparser, and I would like to know if is possible > to submit a form using the htmlparser. Yes, that's possible. See http://htmlparser.sourceforge.net/faq.html#post |
From: Eduardo D. <edu...@gm...> - 2007-03-12 14:04:28
|
Hello, I'm new at htmlparser, and I would like to know if is possible to submit a form using the htmlparser. Regards |
From: Kenneth F. <ke...@ne...> - 2007-03-12 11:06:06
|
Hi I will be away from the 7th March till 14th March, and will be back only on the 15th. If you have any urgent matters, please contact the following persons Technical issues: Ciek Yi (ci...@ne... / +60123260583) Other issues: Edwin Tay (ed...@ne... / +60123165148) Thank you. |
From: <kas...@gm...> - 2007-03-12 11:05:50
|
Hello, I'm trying to parse a HTML document from a HTTPS URL which comes up with a ceritificate prompt/security alert in a web browser. Clicking "Yes" during the prompt shows the content, but when I try to use the Parser class, it throws the following exception. Is there a way to bypass the prompt to read the content and avoid SSL alerts? Regards =========================================================== Exception in thread "main" org.htmlparser.util.ParserException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target; javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:150) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1518) at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:174) at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:168) at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:848) at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:106) at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:495) at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:433) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:818) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1030) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1057) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1041) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:402) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:170) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:133) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:643) at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:841) at org.htmlparser.Parser.setResource(Parser.java:398) at org.htmlparser.Parser.<init>(Parser.java:317) ========================================================= |
From: <rei...@ba...> - 2007-03-05 14:44:36
|
DQoNCkRlcnJpY2ssDQoNClRoYW5rIHlvdSBmb3IgeW91ciBoZWxwZnVrIGxpbmsgKA0KaHR0cDov L3NvdXJjZWZvcmdlLm5ldC90cmFja2VyL2luZGV4LnBocD9mdW5jPWRldGFpbCZhaWQ9MTU5NzE0 NyZncm91cF9pZD0yNDM5OSZhdGlkPTM4MTM5OQ0KKS4NCg0KSSB3aWxsIGRvIHdoYXQgd29ya3Mg Zm9yIEpvcmdlIEdlbG1ldHRpIGluIHRoZSBldmVuaW5nIGJlY2F1c2Ugbm93IGknbSBhdCB0aGUN Cm9mZmljZSB3aGVyZSBpIGhhdmUgbm8gY2hhbmNlIHRvIGJ1aWxkIGEgbmV3IGphciBmaWxlIHdp dGggdGhlIG1vZGlmaWVkIG1ldGhvZC4NCg0KU28gYXQgaG9tZSBpIHdpbGwgcmVidWlsZCB0aGUg aHRtbHBhcnNlci1qYXItZmlsZSBzZW5kIGl0IHRvIG15IG9mZmljZSBhbmQgdHJ5DQppdCBvdXQu DQpCdXQgaXQgc291bmRzIGdvb2QsIGltIHF1aXRlIHN1cmUgaXQgd2lsbCBoZWxwLg0KSWYgaXQg d29ya3Mgb3Igbm90LCBhIG1haWwgd2lsbCBmb2xsb3cgdG9tb3Jyb3cgKGFyb3VuZCAxMSBVVEMp Lg0KDQpraW5kIHJlZ2FyZHMsDQpSZWluaGFydA0KDQpwcy46IGkgdHJpZWQgdG8gImhhcmRjb2Rl IiBpdCBpbiBteSBzb3VyY2UgKGNvcGllZCB0aGUgcGllY2Ugb2YgY29kZSBmcm9tIHRoZQ0KbW9k aWZpZWQgc291cmNlKSwgdGhhdCBpcyBzZXR0aW5nIHRoZSBwcm9wZXJ0eSAiUHJveHktQXV0aG9y aXphdGlvbiIgYnkgbXlzZWxmLg0KQnV0IHdoZW4gcnVubmluZyB0aGUgY29kZSAoZW5jb2Rpbmcg dGhlIHN0cmluZyBvZiBQcm94eVVzZXIsIHRoZSBjb2xvbiBhbmQNClByb3h5UGFzc3dvcmQpLCBp IGdldCBhbiBlcnJvciBtZXNzYWdlIGZyb20gdGhlIGdldEJ5dGVzKCktbWV0aG9kIHRoYXQgc2F5 cw0KIklTTy04ODU5LTEiIGlzIG5vdCBhIHZhbGlkIGNoYXJzZXQuLi4oYW5kIHdpdGhvdXQgdGhl IGNoYXJzZXQgKG1ldGhvZCBzaG91bGQNCnVzZSBzeXN0ZW0tZGVmYXVsdCBjaGFyc2V0KSBpIGdl dCB0aGUgc2FtZSBlcnJvciBhcyBpbiBteSBwcmV2aW91cyBwb3N0KQ0KDQoqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioNCkRpZXNlIEluZm9ybWF0aW9uIHVuZCBldmVudHVlbGxlIEFuaGFlbmdlIHNpbmQgdmVy dHJhdWxpY2ggDQp1bmQgYXVzc2NobGllc3NsaWNoIHp1ciBLZW5udG5pc25haG1lIGR1cmNoIGRl biBvZGVyIGRpZSANCmdlbmFubnRlbiBBZHJlc3NhdGVuIGJlc3RpbW10LiBTb2xsdGVuIFNpZSBu aWNodCBkZXIgDQp2b3JnZXNlaGVuZSBBZHJlc3NhdCBzZWluLCBlcnN1Y2hlbiB3aXIgU2llLCB1 bnMgdW52ZXJ6dWVnbGljaCANCnp1IGluZm9ybWllcmVuIHVuZCBkaWUgTmFjaHJpY2h0IHp1IGxv ZXNjaGVuLiBEZXIgSW5oYWx0IGRlciANCmZlaGxnZWxlaXRldGVuIE5hY2hyaWNodCBkYXJmIHdl ZGVyIGF1ZmdlemVpY2huZXQgbm9jaCANClVuYmVmdWd0ZW4gbWl0Z2V0ZWlsdCBvZGVyIGZ1ZXIg aXJnZW5kd2VsY2hlIFp3ZWNrZSB2ZXJ3ZXJ0ZXQgDQp3ZXJkZW4uIEJpdHRlIGJlYWNodGVuIFNp ZSB3ZWl0ZXJzLCBkYXNzIHRyb3R6IGhvZWNoc3Rtb2VnbGljaGVyIA0KU29yZ2ZhbHQgdW5zZXJl cnNlaXRzIGF1ZmdydW5kIGRlciB0ZWNobmlzY2hlbiBHZWdlYmVuaGVpdGVuIA0KaW0gSW50ZXJu ZXQga2VpbmUgVmVyYW50d29ydHVuZyBmdWVyIGRpZSBFeGlzdGVueiB2b24gVmlyZW4gDQp1ZWJl cm5vbW1lbiB3ZXJkZW4ga2Fubi4NCg0KVGhpcyBtZXNzYWdlIGFuZCBhbnkgYXR0YWNobWVudHMg YXJlIGNvbmZpZGVudGlhbCBhbmQgYXJlIA0Kb25seSBpbnRlbmRlZCBmb3IgdGhlIHJlY2lwaWVu dChzKSB0byB3aGljaCB0aGV5IGhhdmUgYmVlbiANCmFkZHJlc3NlZC4gSWYgeW91IGhhdmUgcmVj ZWl2ZWQgdGhpcyBtZXNzYWdlIGluIGVycm9yLCBwbGVhc2UgDQpub3RpZnkgdGhlIHNlbmRlciBp bW1lZGlhdGVseSBhbmQgZGVsZXRlIHRoZSBtZXNzYWdlIGZyb20gDQp5b3VyIHN5c3RlbS4gVGhl IGNvbnRlbnRzIG9mIHRoaXMgbWlzZGlyZWN0ZWQgbWFpbCBtYXkgbm90IGJlIA0Kc2F2ZWQsIHJl Y29yZGVkIG9yIHVzZWQgZm9yIGFueSBwdXJwb3NlIHdoYXRzb2V2ZXIgb3IgbWFkZSANCmF2YWls YWJsZSB0byB1bmF1dGhvcmlzZWQgcGVyc29ucy4gVGhpcyBtZXNzYWdlIGhhcyBiZWVuIA0KcHJl cGFyZWQgYW5kIHNlbnQgd2l0aCB0aGUgZ3JlYXRlc3QgcG9zc2libGUgY2FyZSwgaW5jbHVkaW5n IA0Kc2Nhbm5pbmcgZm9yIHZpcnVzZXMuIEluIHNwaXRlIG9mIHRoaXMsIHdlIGFzc3VtZSBubyBs aWFiaWxpdHkgDQp3aGF0c29ldmVyIGZvciB0aGUgZXhpc3RlbmNlIG9mIGFueSB2aXJ1c2VzLg0K KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqDQo= |
From: Derrick O. <der...@ro...> - 2007-03-03 04:41:03
|
Reinhart,=0A=0AYou might want to look at:=0A http://sourceforge.net/tracke= r/index.php?func=3Ddetail&aid=3D1597147&group_id=3D24399&atid=3D381399=0Afo= r a patch from Jorge Gelmetti regarding proxies.=0A=0ADerrick=0A=0A----- Or= iginal Message ----=0AFrom: "rei...@ba..." <reinhart.grues= sl...@ba...>=0ATo: htm...@li...=0ASent: Friday= , March 2, 2007 3:20:42 AM=0ASubject: [Htmlparser-user] Help for using a pr= oxy needed - 1 step further=0A=0A=0A@Derrick=0AThank you your reply, i trie= d it with additionaly adding=0A=0Acm.setCookieProcessingEnabled(true);=0Acm= .setRedirectionProcessingEnabled(true);=0A=0Aas you suggested.=0A=0AI still= get an error, but slightly different.=0A=0AThe error messages are:=0A=0Aor= g.htmlparser.util.ParserException: Unexpected end of file from server;=0Aja= va.net.SocketException: Unexpected end of file from server=0A at sun.r= eflect.NativeConstructorAccessorImpl.newInstance0(Native Method)=0A=0A=0ATh= e adapted code (copy from my previous post; descriptive way...):=0A=0Afirst= i set up 2 variables:=0A=0Aps is of class Parser=0Acm of class ConnectionM= anager=0A=0Athen i assign cm the return value of the method getConnectionMa= nager from the=0AParser class=0AThen i set Cookie-processing and Redirectio= n-Processing-Enabled to true (thanks=0Ato Derrick!)=0AIn the next steps i a= ssign the proxy-values using the methods setProxyHost,=0AsetProxyPort, setP= roxyUser and setProxyPassword=0AThe next step is to get a new Parser instan= ce withe a URL as a parameter and=0Aassign it to the ps variable=0A=0AIn th= is step i get the error (please see above)=0A=0A=0AThe hints from Derrick r= eplaces the last line of the error messages=0Aat sun.net.www.http.HttpClien= t.parseHTTPHeader(Unknown Source)=0Aby=0Aat sun.reflect.NativeConstructorAc= cessorImpl.newInstance0(Native Method)=0A=0AThe two new lines seems to help= a little further but not completely.=0A=0AAnyone has another idea what i c= ould try?=0AAny help/hint would be appreciated!=0A=0AIn the meantime i play= around with the request-properties: see what properties=0Aare set and mayb= e set new ones.=0A=0Akind regards,=0AReinhart=0A=0A************************= ***************************************************=0ADiese Information und= eventuelle Anhaenge sind vertraulich =0Aund ausschliesslich zur Kenntnisna= hme durch den oder die =0Agenannten Adressaten bestimmt. Sollten Sie nicht = der =0Avorgesehene Adressat sein, ersuchen wir Sie, uns unverzueglich =0Azu= informieren und die Nachricht zu loeschen. Der Inhalt der =0Afehlgeleitete= n Nachricht darf weder aufgezeichnet noch =0AUnbefugten mitgeteilt oder fue= r irgendwelche Zwecke verwertet =0Awerden. Bitte beachten Sie weiters, dass= trotz hoechstmoeglicher =0ASorgfalt unsererseits aufgrund der technischen = Gegebenheiten =0Aim Internet keine Verantwortung fuer die Existenz von Vire= n =0Auebernommen werden kann.=0A=0AThis message and any attachments are con= fidential and are =0Aonly intended for the recipient(s) to which they have = been =0Aaddressed. If you have received this message in error, please =0Ano= tify the sender immediately and delete the message from =0Ayour system. The= contents of this misdirected mail may not be =0Asaved, recorded or used fo= r any purpose whatsoever or made =0Aavailable to unauthorised persons. This= message has been =0Aprepared and sent with the greatest possible care, inc= luding =0Ascanning for viruses. In spite of this, we assume no liability = =0Awhatsoever for the existence of any viruses.=0A*************************= **************************************************=0A----------------------= ---------------------------------------------------=0ATake Surveys. Earn Ca= sh. Influence the Future of IT=0AJoin SourceForge.net's Techsay panel and y= ou'll get the chance to share your=0Aopinions on IT & business topics throu= gh brief surveys-and earn cash=0Ahttp://www.techsay.com/default.php?page=3D= join.php&p=3Dsourceforge&CID=3DDEVDEV=0A___________________________________= ____________=0AHtmlparser-user mailing list=0AH...@li...urcef= orge.net=0Ahttps://lists.sourceforge.net/lists/listinfo/htmlparser-user=0A= =0A=0A=0A=0A |
From: Derrick O. <der...@ro...> - 2007-03-03 01:49:21
|
Dan,=0A=0ALooking into the surefire log it seems that the two errors are du= e to changes in server responses.=0AThe test suites of HTML Parser of neces= sity fetch a fair number of pages from random servers=0Aas a way to reprodu= ce test cases that caused bugs. The two test failures are easily explained = and=0Acan probably be ignored.=0A=0A1) a change in HTML after parsing and r= egurgitating, probably due to an illegal ETAG0=0A (see the comment in the d= oc comment for the STRICT member):=0Atext length differed after encounterin= g node <![CDATA[=0A document.write('>=0A=0A2) a change in server name cod= eproject.com to www.codeproject.com: =0A**** COMPLETE STRING EXPECTED ****= =0Ahttp://codeproject.com/favicon.ico =0A**** COMPLETE STRING ACTUAL***=0Ah= ttp://www.codeproject.com/favicon.ico=0A=0A=0AAs far as activity, HTML Pars= er is a very, very stable codebase.=0AIt's been around for several years an= d will be around for several more.=0AIt's as active as you want to make it = really.=0AThe 2.0 SNAPSHOT has only had a little more than 2000 downloads,= =0Aas compared to 28,000 for version 1.6, so it is probably a little green = yet.=0AMy guess is May.=0A=0A=0ADerrick=0A=0A----- Original Message ----=0A= From: Dan Litwiller <dan...@ad...>=0ATo: htmlparser-user@lists.= sourceforge.net=0ASent: Friday, March 2, 2007 4:55:12 PM=0ASubject: [Htmlpa= rser-user] Test Failures in current 2.0 snapshot?=0A=0AGreetings;=0A=0AWhen= I retrieve the latest snapshot of htmlparser and attempt to build,=0AI enc= ounter two test failures:=0A=0AFailed tests: =0A testFidelity(org.htmlpars= er.tests.lexerTests.LexerTests)=0A testAbsoluteLink(org.htmlparser.tests.t= agTests.BaseHrefTagTest)=0A=0ATests run: 623, Failures: 2, Errors: 0, Skipp= ed: 0=0A=0AIs it safe to skip the unit tests to get a complete build?=0A=0A= Also, I'm trying to determine how active this project is, as it could=0Abec= ome important to an ongoing project. When is 2.0 expected to go to=0Aproduc= tion?=0A=0ADL=0A=0A=0A=0A--------------------------------------------------= -----------------------=0ATake Surveys. Earn Cash. Influence the Future of = IT=0AJoin SourceForge.net's Techsay panel and you'll get the chance to shar= e your=0Aopinions on IT & business topics through brief surveys-and earn ca= sh=0Ahttp://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID= =3DDEVDEV=0A_______________________________________________=0AHtmlparser-us= er mailing list=0AH...@li...=0Ahttps://lists.sou= rceforge.net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |
From: Dan L. <dan...@ad...> - 2007-03-02 22:26:10
|
Greetings; When I retrieve the latest snapshot of htmlparser and attempt to build, I encounter two test failures: Failed tests: testFidelity(org.htmlparser.tests.lexerTests.LexerTests) testAbsoluteLink(org.htmlparser.tests.tagTests.BaseHrefTagTest) Tests run: 623, Failures: 2, Errors: 0, Skipped: 0 Is it safe to skip the unit tests to get a complete build? Also, I'm trying to determine how active this project is, as it could become important to an ongoing project. When is 2.0 expected to go to production? DL |
From: <rei...@ba...> - 2007-03-02 08:20:57
|
DQpARGVycmljaw0KVGhhbmsgeW91IHlvdXIgcmVwbHksIGkgdHJpZWQgaXQgd2l0aCBhZGRpdGlv bmFseSBhZGRpbmcNCg0KY20uc2V0Q29va2llUHJvY2Vzc2luZ0VuYWJsZWQodHJ1ZSk7DQpjbS5z ZXRSZWRpcmVjdGlvblByb2Nlc3NpbmdFbmFibGVkKHRydWUpOw0KDQphcyB5b3Ugc3VnZ2VzdGVk Lg0KDQpJIHN0aWxsIGdldCBhbiBlcnJvciwgYnV0IHNsaWdodGx5IGRpZmZlcmVudC4NCg0KVGhl IGVycm9yIG1lc3NhZ2VzIGFyZToNCg0Kb3JnLmh0bWxwYXJzZXIudXRpbC5QYXJzZXJFeGNlcHRp b246IFVuZXhwZWN0ZWQgZW5kIG9mIGZpbGUgZnJvbSBzZXJ2ZXI7DQpqYXZhLm5ldC5Tb2NrZXRF eGNlcHRpb246IFVuZXhwZWN0ZWQgZW5kIG9mIGZpbGUgZnJvbSBzZXJ2ZXINCiAgICAgIGF0IHN1 bi5yZWZsZWN0Lk5hdGl2ZUNvbnN0cnVjdG9yQWNjZXNzb3JJbXBsLm5ld0luc3RhbmNlMChOYXRp dmUgTWV0aG9kKQ0KDQoNClRoZSBhZGFwdGVkIGNvZGUgKGNvcHkgZnJvbSBteSBwcmV2aW91cyBw b3N0OyBkZXNjcmlwdGl2ZSB3YXkuLi4pOg0KDQpmaXJzdCBpIHNldCB1cCAyIHZhcmlhYmxlczoN Cg0KcHMgaXMgb2YgY2xhc3MgUGFyc2VyDQpjbSBvZiBjbGFzcyBDb25uZWN0aW9uTWFuYWdlcg0K DQp0aGVuIGkgYXNzaWduIGNtIHRoZSByZXR1cm4gdmFsdWUgb2YgdGhlIG1ldGhvZCBnZXRDb25u ZWN0aW9uTWFuYWdlciBmcm9tIHRoZQ0KUGFyc2VyIGNsYXNzDQpUaGVuIGkgc2V0IENvb2tpZS1w cm9jZXNzaW5nIGFuZCBSZWRpcmVjdGlvbi1Qcm9jZXNzaW5nLUVuYWJsZWQgdG8gdHJ1ZSAodGhh bmtzDQp0byBEZXJyaWNrISkNCkluIHRoZSBuZXh0IHN0ZXBzIGkgYXNzaWduIHRoZSBwcm94eS12 YWx1ZXMgdXNpbmcgdGhlIG1ldGhvZHMgc2V0UHJveHlIb3N0LA0Kc2V0UHJveHlQb3J0LCBzZXRQ cm94eVVzZXIgYW5kIHNldFByb3h5UGFzc3dvcmQNClRoZSBuZXh0IHN0ZXAgaXMgdG8gZ2V0IGEg bmV3IFBhcnNlciBpbnN0YW5jZSB3aXRoZSBhIFVSTCBhcyBhIHBhcmFtZXRlciBhbmQNCmFzc2ln biBpdCB0byB0aGUgcHMgdmFyaWFibGUNCg0KSW4gdGhpcyBzdGVwIGkgZ2V0IHRoZSBlcnJvciAo cGxlYXNlIHNlZSBhYm92ZSkNCg0KDQpUaGUgaGludHMgZnJvbSBEZXJyaWNrIHJlcGxhY2VzIHRo ZSBsYXN0IGxpbmUgb2YgdGhlIGVycm9yIG1lc3NhZ2VzDQphdCBzdW4ubmV0Lnd3dy5odHRwLkh0 dHBDbGllbnQucGFyc2VIVFRQSGVhZGVyKFVua25vd24gU291cmNlKQ0KYnkNCmF0IHN1bi5yZWZs ZWN0Lk5hdGl2ZUNvbnN0cnVjdG9yQWNjZXNzb3JJbXBsLm5ld0luc3RhbmNlMChOYXRpdmUgTWV0 aG9kKQ0KDQpUaGUgdHdvIG5ldyBsaW5lcyBzZWVtcyB0byBoZWxwIGEgbGl0dGxlIGZ1cnRoZXIg YnV0IG5vdCBjb21wbGV0ZWx5Lg0KDQpBbnlvbmUgaGFzIGFub3RoZXIgaWRlYSB3aGF0IGkgY291 bGQgdHJ5Pw0KQW55IGhlbHAvaGludCB3b3VsZCBiZSBhcHByZWNpYXRlZCENCg0KSW4gdGhlIG1l YW50aW1lIGkgcGxheSBhcm91bmQgd2l0aCB0aGUgcmVxdWVzdC1wcm9wZXJ0aWVzOiBzZWUgd2hh dCBwcm9wZXJ0aWVzDQphcmUgc2V0IGFuZCBtYXliZSBzZXQgbmV3IG9uZXMuDQoNCmtpbmQgcmVn YXJkcywNClJlaW5oYXJ0DQoNCioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKg0KRGllc2UgSW5mb3JtYXRpb24g dW5kIGV2ZW50dWVsbGUgQW5oYWVuZ2Ugc2luZCB2ZXJ0cmF1bGljaCANCnVuZCBhdXNzY2hsaWVz c2xpY2ggenVyIEtlbm50bmlzbmFobWUgZHVyY2ggZGVuIG9kZXIgZGllIA0KZ2VuYW5udGVuIEFk cmVzc2F0ZW4gYmVzdGltbXQuIFNvbGx0ZW4gU2llIG5pY2h0IGRlciANCnZvcmdlc2VoZW5lIEFk cmVzc2F0IHNlaW4sIGVyc3VjaGVuIHdpciBTaWUsIHVucyB1bnZlcnp1ZWdsaWNoIA0KenUgaW5m b3JtaWVyZW4gdW5kIGRpZSBOYWNocmljaHQgenUgbG9lc2NoZW4uIERlciBJbmhhbHQgZGVyIA0K ZmVobGdlbGVpdGV0ZW4gTmFjaHJpY2h0IGRhcmYgd2VkZXIgYXVmZ2V6ZWljaG5ldCBub2NoIA0K VW5iZWZ1Z3RlbiBtaXRnZXRlaWx0IG9kZXIgZnVlciBpcmdlbmR3ZWxjaGUgWndlY2tlIHZlcndl cnRldCANCndlcmRlbi4gQml0dGUgYmVhY2h0ZW4gU2llIHdlaXRlcnMsIGRhc3MgdHJvdHogaG9l Y2hzdG1vZWdsaWNoZXIgDQpTb3JnZmFsdCB1bnNlcmVyc2VpdHMgYXVmZ3J1bmQgZGVyIHRlY2hu aXNjaGVuIEdlZ2ViZW5oZWl0ZW4gDQppbSBJbnRlcm5ldCBrZWluZSBWZXJhbnR3b3J0dW5nIGZ1 ZXIgZGllIEV4aXN0ZW56IHZvbiBWaXJlbiANCnVlYmVybm9tbWVuIHdlcmRlbiBrYW5uLg0KDQpU aGlzIG1lc3NhZ2UgYW5kIGFueSBhdHRhY2htZW50cyBhcmUgY29uZmlkZW50aWFsIGFuZCBhcmUg DQpvbmx5IGludGVuZGVkIGZvciB0aGUgcmVjaXBpZW50KHMpIHRvIHdoaWNoIHRoZXkgaGF2ZSBi ZWVuIA0KYWRkcmVzc2VkLiBJZiB5b3UgaGF2ZSByZWNlaXZlZCB0aGlzIG1lc3NhZ2UgaW4gZXJy b3IsIHBsZWFzZSANCm5vdGlmeSB0aGUgc2VuZGVyIGltbWVkaWF0ZWx5IGFuZCBkZWxldGUgdGhl IG1lc3NhZ2UgZnJvbSANCnlvdXIgc3lzdGVtLiBUaGUgY29udGVudHMgb2YgdGhpcyBtaXNkaXJl Y3RlZCBtYWlsIG1heSBub3QgYmUgDQpzYXZlZCwgcmVjb3JkZWQgb3IgdXNlZCBmb3IgYW55IHB1 cnBvc2Ugd2hhdHNvZXZlciBvciBtYWRlIA0KYXZhaWxhYmxlIHRvIHVuYXV0aG9yaXNlZCBwZXJz b25zLiBUaGlzIG1lc3NhZ2UgaGFzIGJlZW4gDQpwcmVwYXJlZCBhbmQgc2VudCB3aXRoIHRoZSBn cmVhdGVzdCBwb3NzaWJsZSBjYXJlLCBpbmNsdWRpbmcgDQpzY2FubmluZyBmb3IgdmlydXNlcy4g SW4gc3BpdGUgb2YgdGhpcywgd2UgYXNzdW1lIG5vIGxpYWJpbGl0eSANCndoYXRzb2V2ZXIgZm9y IHRoZSBleGlzdGVuY2Ugb2YgYW55IHZpcnVzZXMuDQoqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioNCg== |
From: Derrick O. <der...@ro...> - 2007-03-02 01:37:51
|
Reinhart,=0A=0AIt seems that the problem is not with the proxy, in which ca= se you would get a 401 error I think, but perhaps a problem with other requ= est properties or maybe with redirect following. Look at the documentation = on cm.setDefaultRequestProperties(). Try setting cookie processing enabled = and redirection processing too:=0A=0A cm.setCookieProcessingEnabled (true)= ;=0A cm.setRedirectionProcessingEnabled (true);=0A=0ADerrick=0A=0A----- Or= iginal Message ----=0AFrom: "rei...@ba..." <reinhart.grues= sl...@ba...>=0ATo: htm...@li...=0ASent: Thursd= ay, March 1, 2007 4:48:43 AM=0ASubject: [Htmlparser-user] Once again: Help = for using a proxy needed=0A=0A=0AOoops - i just see that my example of code= didn't come trough correctly, parts=0Awhere filtered out.=0A=0ASorry for t= hat!=0A=0Ai try a more descriptive way to tell you what i have done so far:= =0A=0Afirst i set up 2 variables:=0A=0Aps is of class Parser=0Acm of class = ConnectionManager=0A=0Athen i assign cm the return value of the method getC= onnectionManager from the=0AParser class=0AIn the next steps i assign the p= roxy-values using the methods setProxyHost,=0AsetProxyPort, setProxyUser an= d setProxyPassword=0AThe next step is to get a new Parser instance withe a = URL as a parameter and=0Aassign it to the ps variable=0A=0AIn this step i g= et the error:=0A=0Aorg.htmlparser.util.ParserException: Unexpected end of f= ile from server;=0Ajava.net.SocketException: Unexpected end of file from se= rver =0Aat sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)=0A= =0ACan anyone give me a hint where the error is?=0AWhere's my mistake, what= is missing from the code?=0ADoes anyone know if there's a documentation or= an "living" example about using a=0Aproxy with HTMLParser?=0AI was not abl= e to find an example...=0A=0AAny help/hint would be appreciated!=0A=0AAnd s= orry again for my first post with the garbled piece of code!!!=0A=0Akind re= gards,=0AReinhart=0A=0A****************************************************= ***********************=0ADiese Information und eventuelle Anhaenge sind ve= rtraulich =0Aund ausschliesslich zur Kenntnisnahme durch den oder die =0Age= nannten Adressaten bestimmt. Sollten Sie nicht der =0Avorgesehene Adressat = sein, ersuchen wir Sie, uns unverzueglich =0Azu informieren und die Nachric= ht zu loeschen. Der Inhalt der =0Afehlgeleiteten Nachricht darf weder aufge= zeichnet noch =0AUnbefugten mitgeteilt oder fuer irgendwelche Zwecke verwer= tet =0Awerden. Bitte beachten Sie weiters, dass trotz hoechstmoeglicher =0A= Sorgfalt unsererseits aufgrund der technischen Gegebenheiten =0Aim Internet= keine Verantwortung fuer die Existenz von Viren =0Auebernommen werden kann= .=0A=0AThis message and any attachments are confidential and are =0Aonly in= tended for the recipient(s) to which they have been =0Aaddressed. If you ha= ve received this message in error, please =0Anotify the sender immediately = and delete the message from =0Ayour system. The contents of this misdirecte= d mail may not be =0Asaved, recorded or used for any purpose whatsoever or = made =0Aavailable to unauthorised persons. This message has been =0Aprepare= d and sent with the greatest possible care, including =0Ascanning for virus= es. In spite of this, we assume no liability =0Awhatsoever for the existenc= e of any viruses.=0A*******************************************************= ********************=0A----------------------------------------------------= ---------------------=0ATake Surveys. Earn Cash. Influence the Future of IT= =0AJoin SourceForge.net's Techsay panel and you'll get the chance to share = your=0Aopinions on IT & business topics through brief surveys-and earn cash= =0Ahttp://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID= =3DDEVDEV=0A_______________________________________________=0AHtmlparser-us= er mailing list=0AH...@li...=0Ahttps://lists.sou= rceforge.net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |
From: <rei...@ba...> - 2007-03-01 09:48:54
|
DQpPb29wcyAtIGkganVzdCBzZWUgdGhhdCBteSBleGFtcGxlIG9mIGNvZGUgZGlkbid0IGNvbWUg dHJvdWdoIGNvcnJlY3RseSwgcGFydHMNCndoZXJlIGZpbHRlcmVkIG91dC4NCg0KU29ycnkgZm9y IHRoYXQhDQoNCmkgdHJ5IGEgbW9yZSBkZXNjcmlwdGl2ZSB3YXkgdG8gdGVsbCB5b3Ugd2hhdCBp IGhhdmUgZG9uZSBzbyBmYXI6DQoNCmZpcnN0IGkgc2V0IHVwIDIgdmFyaWFibGVzOg0KDQpwcyBp cyBvZiBjbGFzcyBQYXJzZXINCmNtIG9mIGNsYXNzIENvbm5lY3Rpb25NYW5hZ2VyDQoNCnRoZW4g aSBhc3NpZ24gY20gdGhlIHJldHVybiB2YWx1ZSBvZiB0aGUgbWV0aG9kIGdldENvbm5lY3Rpb25N YW5hZ2VyIGZyb20gdGhlDQpQYXJzZXIgY2xhc3MNCkluIHRoZSBuZXh0IHN0ZXBzIGkgYXNzaWdu IHRoZSBwcm94eS12YWx1ZXMgdXNpbmcgdGhlIG1ldGhvZHMgc2V0UHJveHlIb3N0LA0Kc2V0UHJv eHlQb3J0LCBzZXRQcm94eVVzZXIgYW5kIHNldFByb3h5UGFzc3dvcmQNClRoZSBuZXh0IHN0ZXAg aXMgdG8gZ2V0IGEgbmV3IFBhcnNlciBpbnN0YW5jZSB3aXRoZSBhIFVSTCBhcyBhIHBhcmFtZXRl ciBhbmQNCmFzc2lnbiBpdCB0byB0aGUgcHMgdmFyaWFibGUNCg0KSW4gdGhpcyBzdGVwIGkgZ2V0 IHRoZSBlcnJvcjoNCg0Kb3JnLmh0bWxwYXJzZXIudXRpbC5QYXJzZXJFeGNlcHRpb246IFVuZXhw ZWN0ZWQgZW5kIG9mIGZpbGUgZnJvbSBzZXJ2ZXI7DQpqYXZhLm5ldC5Tb2NrZXRFeGNlcHRpb246 IFVuZXhwZWN0ZWQgZW5kIG9mIGZpbGUgZnJvbSBzZXJ2ZXIgAA0KYXQgc3VuLm5ldC53d3cuaHR0 cC5IdHRwQ2xpZW50LnBhcnNlSFRUUEhlYWRlcihVbmtub3duIFNvdXJjZSkNCg0KQ2FuIGFueW9u ZSBnaXZlIG1lIGEgaGludCB3aGVyZSB0aGUgZXJyb3IgaXM/DQpXaGVyZSdzIG15IG1pc3Rha2Us IHdoYXQgaXMgbWlzc2luZyBmcm9tIHRoZSBjb2RlPw0KRG9lcyBhbnlvbmUga25vdyBpZiB0aGVy ZSdzIGEgZG9jdW1lbnRhdGlvbiBvciBhbiAibGl2aW5nIiBleGFtcGxlIGFib3V0IHVzaW5nIGEN CnByb3h5IHdpdGggSFRNTFBhcnNlcj8NCkkgd2FzIG5vdCBhYmxlIHRvIGZpbmQgYW4gZXhhbXBs ZS4uLg0KDQpBbnkgaGVscC9oaW50IHdvdWxkIGJlIGFwcHJlY2lhdGVkIQ0KDQpBbmQgc29ycnkg YWdhaW4gZm9yIG15IGZpcnN0IHBvc3Qgd2l0aCB0aGUgZ2FyYmxlZCBwaWVjZSBvZiBjb2RlISEh DQoNCmtpbmQgcmVnYXJkcywNClJlaW5oYXJ0DQoNCioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKg0KRGllc2Ug SW5mb3JtYXRpb24gdW5kIGV2ZW50dWVsbGUgQW5oYWVuZ2Ugc2luZCB2ZXJ0cmF1bGljaCANCnVu ZCBhdXNzY2hsaWVzc2xpY2ggenVyIEtlbm50bmlzbmFobWUgZHVyY2ggZGVuIG9kZXIgZGllIA0K Z2VuYW5udGVuIEFkcmVzc2F0ZW4gYmVzdGltbXQuIFNvbGx0ZW4gU2llIG5pY2h0IGRlciANCnZv cmdlc2VoZW5lIEFkcmVzc2F0IHNlaW4sIGVyc3VjaGVuIHdpciBTaWUsIHVucyB1bnZlcnp1ZWds aWNoIA0KenUgaW5mb3JtaWVyZW4gdW5kIGRpZSBOYWNocmljaHQgenUgbG9lc2NoZW4uIERlciBJ bmhhbHQgZGVyIA0KZmVobGdlbGVpdGV0ZW4gTmFjaHJpY2h0IGRhcmYgd2VkZXIgYXVmZ2V6ZWlj aG5ldCBub2NoIA0KVW5iZWZ1Z3RlbiBtaXRnZXRlaWx0IG9kZXIgZnVlciBpcmdlbmR3ZWxjaGUg WndlY2tlIHZlcndlcnRldCANCndlcmRlbi4gQml0dGUgYmVhY2h0ZW4gU2llIHdlaXRlcnMsIGRh c3MgdHJvdHogaG9lY2hzdG1vZWdsaWNoZXIgDQpTb3JnZmFsdCB1bnNlcmVyc2VpdHMgYXVmZ3J1 bmQgZGVyIHRlY2huaXNjaGVuIEdlZ2ViZW5oZWl0ZW4gDQppbSBJbnRlcm5ldCBrZWluZSBWZXJh bnR3b3J0dW5nIGZ1ZXIgZGllIEV4aXN0ZW56IHZvbiBWaXJlbiANCnVlYmVybm9tbWVuIHdlcmRl biBrYW5uLg0KDQpUaGlzIG1lc3NhZ2UgYW5kIGFueSBhdHRhY2htZW50cyBhcmUgY29uZmlkZW50 aWFsIGFuZCBhcmUgDQpvbmx5IGludGVuZGVkIGZvciB0aGUgcmVjaXBpZW50KHMpIHRvIHdoaWNo IHRoZXkgaGF2ZSBiZWVuIA0KYWRkcmVzc2VkLiBJZiB5b3UgaGF2ZSByZWNlaXZlZCB0aGlzIG1l c3NhZ2UgaW4gZXJyb3IsIHBsZWFzZSANCm5vdGlmeSB0aGUgc2VuZGVyIGltbWVkaWF0ZWx5IGFu ZCBkZWxldGUgdGhlIG1lc3NhZ2UgZnJvbSANCnlvdXIgc3lzdGVtLiBUaGUgY29udGVudHMgb2Yg dGhpcyBtaXNkaXJlY3RlZCBtYWlsIG1heSBub3QgYmUgDQpzYXZlZCwgcmVjb3JkZWQgb3IgdXNl ZCBmb3IgYW55IHB1cnBvc2Ugd2hhdHNvZXZlciBvciBtYWRlIA0KYXZhaWxhYmxlIHRvIHVuYXV0 aG9yaXNlZCBwZXJzb25zLiBUaGlzIG1lc3NhZ2UgaGFzIGJlZW4gDQpwcmVwYXJlZCBhbmQgc2Vu dCB3aXRoIHRoZSBncmVhdGVzdCBwb3NzaWJsZSBjYXJlLCBpbmNsdWRpbmcgDQpzY2FubmluZyBm b3IgdmlydXNlcy4gSW4gc3BpdGUgb2YgdGhpcywgd2UgYXNzdW1lIG5vIGxpYWJpbGl0eSANCndo YXRzb2V2ZXIgZm9yIHRoZSBleGlzdGVuY2Ugb2YgYW55IHZpcnVzZXMuDQoqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioNCg== |
From: <rei...@ba...> - 2007-02-28 16:48:01
|
DQpIZWxsbyBmcm9tIGEgSFRNTFBhcnNlciBuZXdiaWUhDQoNCk15IHF1ZXN0aW9uIGlzIGFib3V0 IHVzaW5nIGEgcHJveHkgdG8gY29ubmVjdCB0byBhIGdpdmVuIFVSTC4NCg0KSSB0cmllZCB0aGUg Zm9sbG93aW5nIHdoaWNoIGxlYWRzIHRvIGFuIGVycm9yIG1lc3NhZ2UgeW91IGNhbiBzZWUgYWZ0 ZXIgdGhlDQpwaWVjZSBvZiBjb2RlOg0KDQotLS0tIGNvZGUgc25pcHBldCAtLS0tLS0tLQ0Kdm9p ZCBnZXREYXRhRnJvbUNsaWVudCgpDQp7IADCoMKgb3JnLmh0bWxwYXJzZXIuUGFyc2VyIHBzOyAA wqDCoENvbm5lY3Rpb25NYW5hZ2VyIGNtOyAAwqAAwqDCoC8vIGdldCB0aGUNCmNvbm5lY3Rpb24t bWFuYWdlciAAwqDCoGNtID0gUGFyc2VyLmdldENvbm5lY3Rpb25NYW5hZ2VyKCk7IADCoADCoMKg Ly8gc2V0IHRoZQ0KcHJveHktZGF0YSBpIGZpbmQgaW4gdGhlIGphdmEtZG9jcyBvZiB0aGUgbGli cmFyeSAAwqDCoGNtLnNldFByb3h5SG9zdCgiW3RoZSBwcm94eQ0Kb2YgbXkgY29tcGFueV0iKTsg AMKgwqBjbS5zZXRQcm94eVBvcnQoW3RoZSBwcm94eS1wb3J0XSk7IADCoMKgY20uc2V0UHJveHlV c2VyKCJbbXkNCnVzZXItaWQiKTsgAMKgwqBjbS5zZXRQcm94eVBhc3N3b3JkKCJbbXkgcGFzc3dv cmRdIik7IADCoADCoMKgcHMgPSBuZXcNCm9yZy5odG1scGFyc2VyLlBhcnNlciAoIlt1cmwgdG8g Z2V0IHRoZSBodG1sLWNvZGUgZnJvbV0iKTsgAMKgwqAgwqAgAMKgwqBPckZpbHRlciBvcmYgPQ0K bmV3IE9yRmlsdGVyKCk7IADCoADCoMKgTm9kZUZpbHRlcltdIG5mbHMgPSBuZXcgTm9kZUZpbHRl clsxXTsgAMKgAMKgwqBuZmxzWzBdID0gbmV3DQpUYWdOYW1lRmlsdGVyKCJodG1sIik7IADCoADC oMKgb3JmLnNldFByZWRpY2F0ZXMobmZscyk7IADCoADCoMKgTm9kZUxpc3Qgbkxpc3QgPQ0KcHMu cGFyc2Uob3JmKTsgAMKgwqBOb2RlIMKgIMKgIG5vZGUgwqA9IG5MaXN0LmVsZW1lbnRBdCAoMCk7 IADCoADCoMKgdGhpcy5wYXJzZVRyZWUobm9kZSk7DQp9DQotLS0tIGVuZCBvZiBjb2RlIHNuaXBw ZXQgLS0tLS0tLS0NCg0KdGhlIGVycm9yIGlzIHRoZSBmb2xsb3dpbmc6DQoNCg0Kb3JnLmh0bWxw YXJzZXIudXRpbC5QYXJzZXJFeGNlcHRpb246IFVuZXhwZWN0ZWQgZW5kIG9mIGZpbGUgZnJvbSBz ZXJ2ZXI7DQpqYXZhLm5ldC5Tb2NrZXRFeGNlcHRpb246IFVuZXhwZWN0ZWQgZW5kIG9mIGZpbGUg ZnJvbSBzZXJ2ZXIgAA0KYXQgc3VuLm5ldC53d3cuaHR0cC5IdHRwQ2xpZW50LnBhcnNlSFRUUEhl YWRlcihVbmtub3duIFNvdXJjZSkNCg0KDQphbmQgb2NjdXJzIHJpZ2h0IGFmdGVyOg0KDQpwcyA9 IG5ldyBvcmcuaHRtbHBhcnNlci5QYXJzZXIgKCJbdXJsIHRvIGdldCB0aGUgaHRtbC1jb2RlIGZy b21dIik7DQoNCg0KQ2FuIGFueW9uZSBnaXZlIG1lIGEgaGludCB3aGVyZSB0aGUgZXJyb3IgaXM/ DQpXaGVyZSdzIG15IG1pc3Rha2UsIHdoYXQgaXMgbWlzc2luZyBmcm9tIHRoZSBjb2RlPw0KRG9l cyBhbnlvbmUga25vdyBpZiB0aGVyZSdzIGEgZG9jdW1lbnRhdGlvbiBvciBhbiAibGl2aW5nIiBl eGFtcGxlIGFib3V0IHVzaW5nIGENCnByb3h5IHdpdGggSFRNTFBhcnNlcj8NCkkgd2FzIG5vdCBh YmxlIHRvIGZpbmQgYW4gZXhhbXBsZS4uLg0KDQpBbnkgaGVscC9oaW50IHdvdWxkIGJlIGFwcHJl Y2lhdGVkIQ0KDQpraW5kIHJlZ2FyZHMsDQpSZWluaGFydA0KDQoqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioN CkRpZXNlIEluZm9ybWF0aW9uIHVuZCBldmVudHVlbGxlIEFuaGFlbmdlIHNpbmQgdmVydHJhdWxp Y2ggDQp1bmQgYXVzc2NobGllc3NsaWNoIHp1ciBLZW5udG5pc25haG1lIGR1cmNoIGRlbiBvZGVy IGRpZSANCmdlbmFubnRlbiBBZHJlc3NhdGVuIGJlc3RpbW10LiBTb2xsdGVuIFNpZSBuaWNodCBk ZXIgDQp2b3JnZXNlaGVuZSBBZHJlc3NhdCBzZWluLCBlcnN1Y2hlbiB3aXIgU2llLCB1bnMgdW52 ZXJ6dWVnbGljaCANCnp1IGluZm9ybWllcmVuIHVuZCBkaWUgTmFjaHJpY2h0IHp1IGxvZXNjaGVu LiBEZXIgSW5oYWx0IGRlciANCmZlaGxnZWxlaXRldGVuIE5hY2hyaWNodCBkYXJmIHdlZGVyIGF1 ZmdlemVpY2huZXQgbm9jaCANClVuYmVmdWd0ZW4gbWl0Z2V0ZWlsdCBvZGVyIGZ1ZXIgaXJnZW5k d2VsY2hlIFp3ZWNrZSB2ZXJ3ZXJ0ZXQgDQp3ZXJkZW4uIEJpdHRlIGJlYWNodGVuIFNpZSB3ZWl0 ZXJzLCBkYXNzIHRyb3R6IGhvZWNoc3Rtb2VnbGljaGVyIA0KU29yZ2ZhbHQgdW5zZXJlcnNlaXRz IGF1ZmdydW5kIGRlciB0ZWNobmlzY2hlbiBHZWdlYmVuaGVpdGVuIA0KaW0gSW50ZXJuZXQga2Vp bmUgVmVyYW50d29ydHVuZyBmdWVyIGRpZSBFeGlzdGVueiB2b24gVmlyZW4gDQp1ZWJlcm5vbW1l biB3ZXJkZW4ga2Fubi4NCg0KVGhpcyBtZXNzYWdlIGFuZCBhbnkgYXR0YWNobWVudHMgYXJlIGNv bmZpZGVudGlhbCBhbmQgYXJlIA0Kb25seSBpbnRlbmRlZCBmb3IgdGhlIHJlY2lwaWVudChzKSB0 byB3aGljaCB0aGV5IGhhdmUgYmVlbiANCmFkZHJlc3NlZC4gSWYgeW91IGhhdmUgcmVjZWl2ZWQg dGhpcyBtZXNzYWdlIGluIGVycm9yLCBwbGVhc2UgDQpub3RpZnkgdGhlIHNlbmRlciBpbW1lZGlh dGVseSBhbmQgZGVsZXRlIHRoZSBtZXNzYWdlIGZyb20gDQp5b3VyIHN5c3RlbS4gVGhlIGNvbnRl bnRzIG9mIHRoaXMgbWlzZGlyZWN0ZWQgbWFpbCBtYXkgbm90IGJlIA0Kc2F2ZWQsIHJlY29yZGVk IG9yIHVzZWQgZm9yIGFueSBwdXJwb3NlIHdoYXRzb2V2ZXIgb3IgbWFkZSANCmF2YWlsYWJsZSB0 byB1bmF1dGhvcmlzZWQgcGVyc29ucy4gVGhpcyBtZXNzYWdlIGhhcyBiZWVuIA0KcHJlcGFyZWQg YW5kIHNlbnQgd2l0aCB0aGUgZ3JlYXRlc3QgcG9zc2libGUgY2FyZSwgaW5jbHVkaW5nIA0Kc2Nh bm5pbmcgZm9yIHZpcnVzZXMuIEluIHNwaXRlIG9mIHRoaXMsIHdlIGFzc3VtZSBubyBsaWFiaWxp dHkgDQp3aGF0c29ldmVyIGZvciB0aGUgZXhpc3RlbmNlIG9mIGFueSB2aXJ1c2VzLg0KKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqDQo= |
From: Ian M. <ian...@gm...> - 2007-02-28 16:41:29
|
You could have a helper thread which pulls in files from the disk and provides them to a second thread which does the HTML processing. This doesn't really seem like an HTML parser issue, unless there is a bug with HTML Parser that makes it slow pulling in files from the disk. You could check this by instead reading the file into a String first and then creating a parser with that String using Parser.setInputHTML and then Parser.parse(null) - if there's a noticeable difference in speed doing it this way, it might be worth looking into the code of the HTML Parser constructor you are using to see if there are any inefficiencies in it. Ian On 2/26/07, sajid khan <ass...@gm...> wrote: > Hi, > I am using HTMLParser for extracting the content of the Html page. I > have noticed that bulk of the time is spent in extracting the information > than processing the data. > The code looks like this, > > // inputStream is of type InputStream. It carries the page Source of a > Html page. > Page page = new Page(inputStream, null); > Lexer lexer = new Lexer(page); > Parser parser = new Parser(lexer); > StringBean sb=new StringBean(); > parser.visitAllNodesWith (sb); > String text = sb.getStrings(); > //Doing something with text. > > Here I want to inform you that i have crawled few pages with the help of a > crawler. So html pages are in my Hard Disk. > > Can anybody please help me to improve the speed of my program. > > regards > Sajid Khan. > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Ian M. <ian...@gm...> - 2007-02-28 16:35:30
|
Have a look at the NodeTreeWalker class. It includes a depth-first (how you'd probably want to do it) iteration option, so you can go to the root node and simply call next and then print out the name of the current element. As it works on a Node, if the root has multiple nodes (common if there is a doctype and some other text spacing), you get a NodeList - you can simply iterate through the Nodes of the root NodeList and perform the above on each one. Ian On 16 Feb 2007 02:20:16 -0000, Dipesh Sharma <dip...@re...> wrote: > > Hi there, > > I'm trying to build an HTML tree model using your parser. I have come up > with this code so far: > > import org.htmlparser.Parser; > import org.htmlparser.util.NodeList; > import org.htmlparser.util.ParserException; > import org.htmlparser.parserapplications.filterbuilder.HtmlTreeModel; > > > > class Test > { > public static void main (String[] args) > { > try > { > Parser parser = new Parser ("http://www.deals2buy.com") > NodeList list = parser.parse (null); > HtmlTreeModel Tree= new HtmlTreeModel(list); > System.out.println (Tree.toString()); > } > catch (ParserException pe) > { > pe.printStackTrace (); > } > } > } > > > However toString method simply prints the objects name. Could you please > help me with a solution that prints out the whole HTML tree? > > Thanks, > Chikki > > > <http://adworks.rediff.com/cgi-bin/AdWorks/sigclick.cgi/www.rediff.com/signature-home.htm/1507191490@Middle5?PARTNER=3> > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: sajid k. <ass...@gm...> - 2007-02-26 11:41:13
|
Hi, I am using HTMLParser for extracting the content of the Html page. I have noticed that bulk of the time is spent in extracting the information than processing the data. The code looks like this, // inputStream is of type InputStream. It carries the page Source of a Html page. Page page = new Page(inputStream, null); Lexer lexer = new Lexer(page); Parser parser = new Parser(lexer); StringBean sb=new StringBean(); parser.visitAllNodesWith (sb); String text = sb.getStrings(); //Doing something with text. Here I want to inform you that i have crawled few pages with the help of a crawler. So html pages are in my Hard Disk. Can anybody please help me to improve the speed of my program. regards Sajid Khan. |
From: prakash M <pri...@gm...> - 2007-02-23 05:01:34
|
Hi! It's me Prakash! I would like to invite you to join my network on WebMate.com. Once you join, you will immediately be connected to all the people in my social network. WebMate.com is an online service that lets you find old friends and meet new people. You may also show yourself here by updating your profile and creating a blog. WebMate.com will supply more online services in future. It's also fun, safe and free!! That's worth something. Join my network on WebMate.com now!!! Best regards, prakash pri...@gm... |
From: Dipesh S. <dip...@re...> - 2007-02-16 02:23:33
|
Hi there,=0A=0AI'm trying to build an HTML tree model using your parser. I = have come up with this code so far:=0A=0Aimport org.htmlparser.Parser;=0A = import org.htmlparser.util.NodeList;=0A import org.htmlparser.util.Par= serException;=0A import org.htmlparser.parserapplications.filterbuilder.= HtmlTreeModel;=0A=0A=0A =0A class Test=0A {=0A public static = void main (String[] args)=0A {=0A try=0A {=0A = Parser parser =3D new Parser ("http://www.deals2buy.com")=0A= NodeList list =3D parser.parse (null);=0A Ht= mlTreeModel Tree=3D new HtmlTreeModel(list);=0A System.out.p= rintln (Tree.toString());=0A }=0A catch (ParserExcept= ion pe)=0A {=0A pe.printStackTrace ();=0A = }=0A }=0A }=0A=0A=0AHowever toString method simply prints the= objects name. Could you please help me with a solution that prints out the= whole HTML tree?=0A=0AThanks,=0AChikki |
From: Derrick O. <der...@ro...> - 2007-02-15 02:14:52
|
Hi Subbu,=0A=0AThis message is spit out if a character set that is specifie= d in a META tag can't be resolved to a real character set name.=0ASee the d= oSemanticAction() of MetaTag:=0A charset =3D getPage ().getChars= et (getAttribute ("CONTENT"));=0Awhere the call to Page.getCharset() is cal= ling Page.findCharset(), where the message is generated.=0A=0ADerrick=0A=0A= ----- Original Message ----=0AFrom: Subramanya Sastry <sa...@cs...>= =0ATo: htm...@li...=0ASent: Wednesday, February 14= , 2007 3:17:51 AM=0ASubject: [Htmlparser-user] Warning message about charse= t=0A=0AHi there,=0A=0AI have been using HTML Parser for a while now, and ha= ppy with it. But, =0AI have now run into a minor glitch and am hoping some= one here might be =0Aable to help me with it.=0A=0AOn some html pages, I ge= t the following warning / error message:=0A "unable to determine cann= onical charset name for x-user-defined - =0Ausing ISO-8859-1"=0AThe rest of= the parsing proceeds fine without a hitch.=0A=0AI am not able to figure ou= t where the error message is coming from .. I =0Aknow it is being spit out = during parsing. Can anyone tell me how I =0Acould catch this error and rai= se an exception -- because this error =0Amessage corresponds to other bad t= hings happening further down the =0Apipeline in my application.=0A=0AThanks= for any leads.=0A=0ASubbu.=0A=0A=0A---------------------------------------= ----------------------------------=0ATake Surveys. Earn Cash. Influence the= Future of IT=0AJoin SourceForge.net's Techsay panel and you'll get the cha= nce to share your=0Aopinions on IT & business topics through brief surveys-= and earn cash=0Ahttp://www.techsay.com/default.php?page=3Djoin.php&p=3Dsour= ceforge&CID=3DDEVDEV=0A_______________________________________________=0AHt= mlparser-user mailing list=0AH...@li...=0Ahttps:= //lists.sourceforge.net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |
From: Subramanya S. <sa...@cs...> - 2007-02-14 08:18:05
|
Hi there, I have been using HTML Parser for a while now, and happy with it. But, I have now run into a minor glitch and am hoping someone here might be able to help me with it. On some html pages, I get the following warning / error message: "unable to determine cannonical charset name for x-user-defined - using ISO-8859-1" The rest of the parsing proceeds fine without a hitch. I am not able to figure out where the error message is coming from .. I know it is being spit out during parsing. Can anyone tell me how I could catch this error and raise an exception -- because this error message corresponds to other bad things happening further down the pipeline in my application. Thanks for any leads. Subbu. |
From: Derrick O. <der...@ro...> - 2007-02-13 04:35:22
|
=0AI don't think I have time to do a formal release.=0ABetter go with the i= ntegration build ... if it suits.=0A=0ADerrick=0A=0A----- Original Message = ----=0AFrom: sebb <se...@gm...>=0ATo: htmlparser user list <htmlparser= -u...@li...>=0ASent: Sunday, February 11, 2007 4:15:04 PM= =0ASubject: [Htmlparser-user] Formal release of 2.0 ?=0A=0AI'm hoping to re= lease an updated version of JMeter before too long,=0Aand would like to inc= lude the updated htmlparser 2.0 (with the new=0Alicense, for which many tha= nks).=0A=0AThe current build of 2.0 is listed as an "Integration" build - i= s this=0AOK to use, or is there going to be a formal release of 2.0?=0A=0AS= ///=0A=0A------------------------------------------------------------------= -------=0AUsing Tomcat but need to do more? Need to support web services, s= ecurity?=0AGet stuff done quickly with pre-integrated technology to make yo= ur job easier.=0ADownload IBM WebSphere Application Server v.1.0.1 based on= Apache Geronimo=0Ahttp://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D120709&b= id=3D263057&dat=3D121642=0A_______________________________________________= =0AHtmlparser-user mailing list=0AH...@li...=0Ah= ttps://lists.sourceforge.net/lists/listinfo/htmlparser-user=0A=0A=0A=0A=0A |