htmlparser-user Mailing List for HTML Parser (Page 81)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Somik R. <so...@ya...> - 2003-03-16 21:36:46
|
Hi Folks, This is a major milestone release. A massive refactoring has been completed (took two weeks) - which has brought all the robust error handling cases into CompositeTagScanner. This means, all tags that have children will be able to do error correction uniformly. Form tag (and table tags too) should be robust. Table tags are not yet in the standard set of scanners (you still need to add them manually). They should make the cut next week. We have a new method - registerDomScanners() in Parser - that allows you to build html dom objects. Interesting fact, as a result of the refactorings, the LOC of the scanners package has reduced from 1553 to 1355 (I was surprised at the digits). Documentation has been updated - we've started putting up answers by our list members to common questions. Pls feel free to update the Wiki and improve it. No login is required. From the change log: Integration build 1.3 - 20030316 -------------------------------- [1] Added method finishedParsing() to NodeVisitor [2] LinkScanner uses CompositeTagScanner.scan() [3] BulletScanner added [4] FormScanner uses CompositeTagScanner.scan() [5] AppletScanner uses CompositeTagScanner.scan() We highly recommend an upgrade to this version. Regards, Somik |
From: Derrick O. <Der...@ro...> - 2003-03-15 20:59:22
|
Guilherme, I think what you need is in src/org/htmlparser/util/Translate.java Something like this should work: String htmltext = Translate.encode (resultset.getString ("databasetext")); If you have to do a lot of it though, you'll probably want to rewrite that method. As it stands it allocates one Character for each character in the input string. If you do want to rewrite it, you should probably instead adjust the Generate class in the same package since the Translate.java source is created by running Generate. Derrick >To: htm...@li... >Date: Fri, 14 Mar 2003 20:40:12 +0000 (WET) >From: Guilherme Zambon <gz...@sa...> >Subject: [Htmlparser-user] html code parsing >Reply-To: htm...@li... > >Anyone using htmlparser to parse ", <, > from user input to >", < and > ? >I have the following scenario: >my database has texts with these chars (",< and >) and I have to >put them from database to a <textarea> in the html. Is there any >taglib or other solution to I filter this database information, >to show in a html form field? > >Thanks in advance, > >Guilherme Zambon > >Example of code that I need to threat: > ><textarea><%= rs.getString("databasetext") %></textarea> > >it generates something like ><textarea>a text with < won't work in a html</textarea> > >and I want something like ><textarea><sometag:encode string="<%= >rs.getString("databasetext")" /></textarea> > >-- >SAPO ADSL.PT, apanhe já o comboio da Banda Larga. Kit SAPO ADSL.PT €50 > >hTTP://www.sapo.pt/kitadsl > > > |
From: Guilherme Z. <gz...@sa...> - 2003-03-14 20:40:22
|
Anyone using htmlparser to parse ", <, > from user input to ", < and > ? I have the following scenario: my database has texts with these chars (",< and >) and I have to put them from database to a <textarea> in the html. Is there any taglib or other solution to I filter this database information, to show in a html form field? Thanks in advance, Guilherme Zambon Example of code that I need to threat: <textarea><%= rs.getString("databasetext") %></textarea> it generates something like <textarea>a text with < won't work in a html</textarea> and I want something like <textarea><sometag:encode string="<%= rs.getString("databasetext")" /></textarea> -- SAPO ADSL.PT, apanhe já o comboio da Banda Larga. Kit SAPO ADSL.PT 50 hTTP://www.sapo.pt/kitadsl |
From: Somik R. <so...@ya...> - 2003-03-14 06:36:50
|
> 1) does this mean that I will do the same way for TableRowScanner and > TableColumnScanner or will I extend those from TableScanner. > No - actually TableScanner takes care of related scanners (column and row). So if you register TableScanner alone, you should be fine. > 2) and should this work in cases like <td ...><img src=..></td> Yes of course. Just make sure that you also call registerScanners(), if you want to pick up image tags within the td. Regards, Somik ----- Original Message ----- From: "ja...@jo... Jokisalo" <jan...@ho...> To: <htm...@li...> Sent: Wednesday, March 12, 2003 9:52 PM Subject: [Htmlparser-user] Re: Parsing td tr and table > Thank you Somik! > > 1) does this mean that I will do the same way for TableRowScanner and > TableColumnScanner or will I extend those from TableScanner. > > 2) and should this work in cases like <td ...><img src=..></td> > > Thanks for good product! --Janne > > --------- > parser.registerScanners(); > parser.addScanner(new TableScanner(parser)); > Node[] tables = > parser.extractAllNodesThatAre(TableTag.class); > // you can cast each table to a TableTag and do > // what you want.. > > Regards, > Somik > > > > _________________________________________________________________ > The new MSN 8: advanced junk mail protection and 2 months FREE* > http://join.msn.com/?page=features/junkmail > > > > ------------------------------------------------------- > This SF.net email is sponsored by:Crypto Challenge is now open! > Get cracking and register here for some mind boggling fun and > the chance of winning an Apple iPod: > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Derrick O. <Der...@ro...> - 2003-03-13 12:30:07
|
I gave this problem a cursory look and it appears that the input stream opened with that charset doesn't return any lines. I didn't have time to construct a simple test case and I'm not sure a US-English system is the best platform to test this. Derrick >Message: 3 >From: "Somik Raha" <so...@ya...> >To: <htm...@li...> >Subject: Re: [Htmlparser-user] problem parsing Chinese character website >Date: Tue, 11 Mar 2003 22:37:51 -0800 >Reply-To: htm...@li... > >Derrick, Amit - any ideas ? > >----- Original Message ----- >From: "Joe Lin" <gu...@ya...> >To: <htm...@li...> >Sent: Saturday, March 08, 2003 1:32 AM >Subject: [Htmlparser-user] problem parsing Chinese character website > > > > >>Hi, >> >>It seems that the parser has problem handling Chinese >>chracters. I experiment with a simple web page as >>follows (I saved it as "test.html"): >> >><HTML> >><HEAD> >><TITLE>Hello</TITLE> >><META http-equiv=Content-Type content="text/html; >>charset=gb2312"> >></HEAD> >><BODY bgColor=#ffffff> >><h1>Hello</h1><br> >></body> >></html> >> >>I then run the parser as >>java -jar htmlparser.jar file:test.html. >>The parser output nothing but: >>HTMLParser v1.3 (Integration Build Mar 02, 2003) >>Parsing file:test.html >>INFO: detected charset "gb2312", using "EUC-CN" >> >>Thanks for any help. >> >>Joe >> >> >> >> |
From: <ja...@jo...> - 2003-03-13 05:52:49
|
Thank you Somik! 1) does this mean that I will do the same way for TableRowScanner and TableColumnScanner or will I extend those from TableScanner. 2) and should this work in cases like <td ...><img src=..></td> Thanks for good product! --Janne --------- parser.registerScanners(); parser.addScanner(new TableScanner(parser)); Node[] tables = parser.extractAllNodesThatAre(TableTag.class); // you can cast each table to a TableTag and do // what you want.. Regards, Somik _________________________________________________________________ The new MSN 8: advanced junk mail protection and 2 months FREE* http://join.msn.com/?page=features/junkmail |
From: Somik R. <so...@ya...> - 2003-03-13 05:21:44
|
> I'm just wondering if the limitation on the formscanner (i.e can't parse a > form without the endtag) has been succesfully removed. Here is my suggestion > on how to implement it. If there is no endtag of formtag, the parser should > know when it ends when it sees another formtag. if there is no another > formtag in the html page, just parse it till it sees the end of the html > code. thanks. I hope you guys can improve this as I really need this feature > in my Harvester project. thank you. Working hard on this one... I can't believe how much bad code I have myself written - one's bad code always comes back to haunt one! Refactoring LinkScanner to use the CompositeTagScanner - and thereby let all composite tag scanners handle broken tags uniformly. Regards, Somik |
From: Somik R. <so...@ya...> - 2003-03-12 22:58:29
|
parser.registerScanners(); parser.addScanner(new TableScanner(parser)); Node[] tables = parser.extractAllNodesThatAre(TableTag.class); // you can cast each table to a TableTag and do // what you want.. Regards, Somik --- "ja...@jo... Jokisalo" <jan...@ho...> wrote: > Hi! > > Is there any example of how to parse e.g. text > inside td:s in a table and > img inside a table td. There are a lot of webpages > with tables with this > kind of information. > > Maybe one can do it with TableColumn, > TableColumnScanner, TableRow, > TableRowScanner, TableScanner and TableTag but I > have not figured out how. > > Thanks / Janne > > > > > > _________________________________________________________________ > The new MSN 8: smart spam protection and 2 months > FREE* > http://join.msn.com/?page=features/junkmail > > > > ------------------------------------------------------- > This SF.net email is sponsored by:Crypto Challenge > is now open! > Get cracking and register here for some mind > boggling fun and > the chance of winning an Apple iPod: > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Web Hosting - establish your business online http://webhosting.yahoo.com |
From: <ja...@jo...> - 2003-03-12 20:42:29
|
Hello! Is there any example of how to parse e.g. text inside td:s in a table and img inside a table td. There are a lot of webpages with tables with this kind of information. Maybe one can do it with TableColumn, TableColumnScanner, TableRow, TableRowScanner, TableScanner and TableTag but I have not figured out how. Thanks / Janne _________________________________________________________________ Tired of spam? Get advanced junk mail protection with MSN 8. http://join.msn.com/?page=features/junkmail |
From: Mohd-Taqiyuddin Z. <mt...@ec...> - 2003-03-12 20:41:23
|
Hi, I'm just wondering if the limitation on the formscanner (i.e can't parse a form without the endtag) has been succesfully removed. Here is my suggestion on how to implement it. If there is no endtag of formtag, the parser should know when it ends when it sees another formtag. if there is no another formtag in the html page, just parse it till it sees the end of the html code. thanks. I hope you guys can improve this as I really need this feature in my Harvester project. thank you. |
From: <ja...@jo...> - 2003-03-12 20:39:49
|
Hi! Is there any example of how to parse e.g. text inside td:s in a table and img inside a table td. There are a lot of webpages with tables with this kind of information. Maybe one can do it with TableColumn, TableColumnScanner, TableRow, TableRowScanner, TableScanner and TableTag but I have not figured out how. Thanks / Janne _________________________________________________________________ The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail |
From: Somik R. <so...@ya...> - 2003-03-12 20:33:31
|
Hi Marc, What you say makes sense. A node should know which line it began and which line it ended. The reason we don't do this already, we only used it to pick up the next node, which is on the same or the next line. Like you said, doing this is not hard as the reader stores the line info. It should be in one of the integration releases (do add this as a feature request so we don't forget). Regards, Somik --- Marc Novakowski <ma...@ke...> wrote: > Hello, > > I just thought I'd start out by thanking everyone > who has worked on the htmlparser project. I'm been > using it for only a few days now but the > functionality it provides has saved me amazing > amounts of work. So far I have found it very easy > to integrate into my project. > > I am using the htmlparser library for what I'm > guessing is a less traditional application. I've > integrated it into a custom servlet filter which > takes a processed JSP page and parses it for > "custom" tags which I've defined. Using custom > scanner and tag, I'm able to replace my "custom > tags" with appropriate HTML/Javascript in the > toHtml() method for each tag. However, I'd like to > add some validation to my code to ensure certain > constraints are observed, such as certain tags which > REQUIRE a "name" attribute to be defined. I've done > this easily enough by adding a "verify()" method to > my custom tags and throwing a ParserException if a > constraint is violated. > > However, just throwing an exception does not help > the webpage developer determine where the problem is > in the HTML. What would REALLY help me is if the > Node object had a method on it called something like > getLineNumber() which returned the line number at > which that node was parsed. > > I've looked at the source code and this seems > feasible. The NodeReader class keeps track of the > current line number as it finds nodes in the HTML. > Maybe the constructor for a Node() object could take > in one more argument, the lineNumber, so that it > could expose that lineNumber in a public method. > > Does this sound like a hairbrained idea? Has this > ever come up before? > > Thanks again, > Marc Novakowski > > > ------------------------------------------------------- > This SF.net email is sponsored by:Crypto Challenge > is now open! > Get cracking and register here for some mind > boggling fun and > the chance of winning an Apple iPod: > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Web Hosting - establish your business online http://webhosting.yahoo.com |
From: Marc N. <ma...@ke...> - 2003-03-12 19:15:28
|
Hello, I just thought I'd start out by thanking everyone who has worked on the = htmlparser project. I'm been using it for only a few days now but the = functionality it provides has saved me amazing amounts of work. So far = I have found it very easy to integrate into my project. I am using the htmlparser library for what I'm guessing is a less = traditional application. I've integrated it into a custom servlet = filter which takes a processed JSP page and parses it for "custom" tags = which I've defined. Using custom scanner and tag, I'm able to replace = my "custom tags" with appropriate HTML/Javascript in the toHtml() method = for each tag. However, I'd like to add some validation to my code to = ensure certain constraints are observed, such as certain tags which = REQUIRE a "name" attribute to be defined. I've done this easily enough = by adding a "verify()" method to my custom tags and throwing a = ParserException if a constraint is violated. However, just throwing an exception does not help the webpage developer = determine where the problem is in the HTML. What would REALLY help me = is if the Node object had a method on it called something like = getLineNumber() which returned the line number at which that node was = parsed. I've looked at the source code and this seems feasible. The NodeReader = class keeps track of the current line number as it finds nodes in the = HTML. Maybe the constructor for a Node() object could take in one more = argument, the lineNumber, so that it could expose that lineNumber in a = public method. Does this sound like a hairbrained idea? Has this ever come up before? Thanks again, Marc Novakowski |
From: Bob L. <bob...@ya...> - 2003-03-12 16:41:01
|
In order to send cookies in your Http requests, all you need to do is set the Cookie HTTP Header in the URL Connection. Generally what I've done is first create a HttpURLConnection, create some Cookie objects that are needed, and set the HTTP Header using those objects (See below for code to format the header value). Then I'll create the Parser using the URLConnection something like this: DefaultHTMLParserFeedback feedback = new DefaultHTMLParserFeedback(DefaultHTMLParserFeedback.DEBUG); HTMLReader reader = null; HTMLParser parser = null; String charset = HttpUtil.getCharacterSet(urlConn); InputStreamReader isr = new InputStreamReader(urlConn.getInputStream(), charset); reader = new HTMLReader(isr, 8192); parser = new HTMLParser(reader, feedback); The HttpUtil.getCharacterSet method used above is basically just taken from the method of the same name in the HTMLParser class. That method is protected, so I had to duplicate it elsewhere. /** set cookies to send in a HttpURLConnection<br> * This method should only be called before any parameters are posted * and before the connection is made. * @param urlConn the HttpURLConnection to send the cookies through * @param cookies the cookies to send */ public static void postCookies(HttpURLConnection urlConn, Cookie[] cookies) { if ((cookies == null) || (cookies.length == 0)) { return; } String[] cookieHeaders = new String[cookies.length]; urlConn.setRequestProperty("cookie", generateCookieHeader(cookies)); } /** generate a HTTP cookie header value string from an array of cookies * @param cookies the cookies which should be set in the header value * @return A string containing the HTTP Cookie Header value */ private static String generateCookieHeader(Cookie[] cookies) { StringBuffer buf = new StringBuffer(); for (int i=0; i < cookies.length;i++) { buf.append(cookies[i].getName()); buf.append("="); buf.append(cookies[i].getValue()); if (i+1 != cookies.length) { buf.append("; "); } else buf.append(" "); } return buf.toString(); } --- Shan Sivakolundhu <vss...@ya...> wrote: > > Hi, > > In order to access a particular site I neet to have > a cookie set. Is there any way I can set the cookie > before I create a parser object ? Just like ... > > URLConnection.("Cookie", cookieValue); > > URLConnection.connect(); > > > > Regards, > > Shan > > > > --------------------------------- > Do you Yahoo!? > Yahoo! Web Hosting - establish your business online __________________________________________________ Do you Yahoo!? Yahoo! Web Hosting - establish your business online http://webhosting.yahoo.com |
From: Shan S. <vss...@ya...> - 2003-03-12 16:13:54
|
Hi, In order to access a particular site I neet to have a cookie set. Is there any way I can set the cookie before I create a parser object ? Just like ... URLConnection.("Cookie", cookieValue); URLConnection.connect(); Regards, Shan --------------------------------- Do you Yahoo!? Yahoo! Web Hosting - establish your business online |
From: Somik R. <so...@ya...> - 2003-03-12 06:36:20
|
Derrick, Amit - any ideas ? ----- Original Message ----- From: "Joe Lin" <gu...@ya...> To: <htm...@li...> Sent: Saturday, March 08, 2003 1:32 AM Subject: [Htmlparser-user] problem parsing Chinese character website > Hi, > > It seems that the parser has problem handling Chinese > chracters. I experiment with a simple web page as > follows (I saved it as "test.html"): > > <HTML> > <HEAD> > <TITLE>Hello</TITLE> > <META http-equiv=Content-Type content="text/html; > charset=gb2312"> > </HEAD> > <BODY bgColor=#ffffff> > <h1>Hello</h1><br> > </body> > </html> > > I then run the parser as > java -jar htmlparser.jar file:test.html. > The parser output nothing but: > HTMLParser v1.3 (Integration Build Mar 02, 2003) > Parsing file:test.html > INFO: detected charset "gb2312", using "EUC-CN" > > Thanks for any help. > > Joe > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger > for complex code. Debugging C/C++ programs can leave you feeling lost and > disoriented. TotalView can help you find your way. Available on major UNIX > and Linux platforms. Try it free. www.etnus.com > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Somik R. <so...@ya...> - 2003-03-12 06:35:30
|
Hi Joe, Changing links in script is not yet supported. Can you add it as a feature request ? Inline javascript ought to be available from the attributes - though, we don't have any tests yet. If we can get some help - it would speed us up. Regards, Somik ----- Original Message ----- From: "Joe Lin" <gu...@ya...> To: <htm...@li...> Sent: Tuesday, March 04, 2003 9:42 PM Subject: [Htmlparser-user] Changing links embedded inside a script tag? > Hi, > > I need to change links embedded inside the code of a > script tag such as: > <script language="Javascript"> > window.open("http://mysite/index.html"); > </script> > > There's only getScriptCode() in ScriptTag and no > setScriptCode() available. Has anyone done changing > links inside Javascript? Can you please suggest a good > way to do this? > > Also, how about inline Java script such as > <form ....> > <input type="button" onClick="<script > window.open..../>"> > </form> > > Thanks so much for the help! > > Joe > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger > for complex code. Debugging C/C++ programs can leave you feeling lost and > disoriented. TotalView can help you find your way. Available on major UNIX > and Linux platforms. Try it free. www.etnus.com > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Somik R. <so...@ya...> - 2003-03-12 06:25:55
|
Devin Gillman wrote: > I am also trying to parse a string containing html tags. I am just > trying to pull the text from the string but I have been unsuccessful at it. String myStringWithTags = "<html><head>....</head><body>..</body></html>"; Parser parser = Parser.createParser(myStringWithTags); TextExtractingVisitor visitor = new TextExtractingVisitor(); parser.visitAllNodesWith(visitor); System.out.println(visitor.getExtractedText()); HTH. Regards, Somik |
From: Amit M. <am...@ve...> - 2003-03-11 10:50:05
|
'help' "Provocans ad volandum" --------------------- +91-020-4367614 +91-0231-2663094 ami...@ya... --------------------- ----- Original Message ----- From: htm...@li... Date: Tuesday, March 11, 2003 1:43 am Subject: Htmlparser-user digest, Vol 1 #211 - 5 msgs > Send Htmlparser-user mailing list submissions to > htm...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > or, via email, send a message with subject or body 'help' to > htm...@li... > > You can reach the person managing the list at > htm...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Htmlparser-user digest..." > > > Today's Topics: > > 1. Parsing From A String 2 (Devin Gillman) > 2. compilation problem (Gokcen Ogutcu) > 3. RE: compilation problem (dha...@or...) > 4. RE: compilation problem (Gokcen Ogutcu) > 5. RE: compilation problem (Dave Knipp) > > --__--__-- > > Message: 1 > From: "Devin Gillman" <obi...@ho...> > To: htm...@li... > Date: Mon, 10 Mar 2003 03:41:57 -0600 > Subject: [Htmlparser-user] Parsing From A String 2 > Reply-To: htm...@li... > > Hi, > > I am also trying to parse a string containing html tags. I am > just > trying to pull the text from the string but I have been > unsuccessful at it. > I've tried creating a URL from the string and trying to use a > HTMLReader or > Reader to get at the information. I suppose I could write it to a > file, but > I would prefer not to have to go through all of that a short simple > string. > Nothing has worked for me yet. I am sure there is a simple way, but > I can't > seem to find it. Any help would be appreciated. > > Thanks ahead of time, > > Devin Gillman > > _________________________________________________________________ > Add photos to your messages with MSN 8. Get 2 months FREE*. > http://join.msn.com/?page=features/featuredemail > > > > --__--__-- > > Message: 2 > Date: Mon, 10 Mar 2003 12:16:31 +0200 (EET) > From: "Gokcen Ogutcu" <sca...@bi...> > To: <htm...@li...> > Subject: [Htmlparser-user] compilation problem > Reply-To: htm...@li... > > hello all, > > i'm experiencing some compilation problems. i've saved one of the > examplesthat comes with the documentation and try to compile it, > just to give it a > try. but it gave errors (error message was "unable to resolve > symbol"),i'm new to java, but this error was raised when the > compiler couldn't find > the relevant packages or classes (i think) > source file and the "org" dir were in the same level, i didn't > touch the > directory structure of the "htmlparser". > where am i doing wrong, i'm using j2se, maybe it requires "ant"?? > > thanks for your help, > gokcen > > > > > > --__--__-- > > Message: 3 > From: dha...@or... > Date: Mon, 10 Mar 2003 15:50:59 +0530 > Subject: RE: [Htmlparser-user] compilation problem > TO: htm...@li... > Reply-To: htm...@li... > > > --openmail-part-159d69ef-00000002 > Content-Type: text/plain; charset=ISO-8859-1; name="BDY.RTF" > Content-Disposition: inline; filename="BDY.RTF" > Content-Transfer-Encoding: 8bit > > You need to include "htmlparser.jar" in your classpath settings and > thencompile the example code. You do not need "ant". > > Regards, > > Dhaval Udani > > > -----Original Message----- > From: scapegoat [mailto:sca...@bi...] > Sent: Monday, March 10, 2003 3:47 PM > To: htmlparser-user > Cc: scapegoat > Subject: [Htmlparser-user] compilation problem > > > hello all, > > i'm experiencing some compilation problems. i've saved one of the > examples > that comes with the documentation and try to compile it, just to > give it > a > try. but it gave errors (error message was "unable to resolve > symbol"),i'm new to java, but this error was raised when the > compiler couldn't > find > the relevant packages or classes (i think) > source file and the "org" dir were in the same level, i didn't > touch the > directory structure of the "htmlparser". > where am i doing wrong, i'm using j2se, maybe it requires "ant"?? > > thanks for your help, > gokcen > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > --openmail-part-159d69ef-00000002 > Content-Type: application/rtf; name="BDY.RTF" > Content-Disposition: attachment; filename="BDY.RTF" > Content-Transfer-Encoding: base64 > > e1xydGYxXGFuc2lcYW5zaWNwZzEyNTJcZnJvbXRleHQgXGRlZmYwe1xmb250dGJsDQp7XGYw > XGZzd2lzcyBBcmlhbDt9DQp7XGYxXGZtb2Rlcm4gQ291cmllciBOZXc7fQ0Ke1xmMlxmbmls > XGZjaGFyc2V0MiBTeW1ib2w7fQ0Ke1xmM1xmbW9kZXJuXGZjaGFyc2V0MCBDb3VyaWVyIE5l > dzt9fQ0Ke1xjb2xvcnRibFxyZWQwXGdyZWVuMFxibHVlMDtccmVkMFxncmVlbjBcYmx1ZTI1 > NTt9DQpcdWMxXHBhcmRccGxhaW5cZGVmdGFiMzYwIFxmMFxmczIwXGNmMCBZb3UgbmVlZCB0 > byBpbmNsdWRlICJodG1scGFyc2VyLmphciIgaW4geW91ciBjbGFzc3BhdGggc2V0dGluZ3Mg > YW5kIHRoZW4gY29tcGlsZSB0aGUgZXhhbXBsZSBjb2RlLiBZb3UgZG8gbm90IG5lZWQgImFu > dCIuXHBhcg0KXHBhcg0KUmVnYXJkcyxccGFyDQpccGFyDQpEaGF2YWwgVWRhbmlccGFyDQpc > cGFyDQpccGFyDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLVxwYXINCkZyb206IHNjYXBl > Z29hdCBbbWFpbHRvOnNjYXBlZ29hdEBiaWtlci5nZW4udHJdXHBhcg0KU2VudDogTW9uZGF5 > LCBNYXJjaCAxMCwgMjAwMyAzOjQ3IFBNXHBhcg0KVG86IGh0bWxwYXJzZXItdXNlclxwYXIN > CkNjOiBzY2FwZWdvYXRccGFyDQpTdWJqZWN0OiBbSHRtbHBhcnNlci11c2VyXSBjb21waWxh > dGlvbiBwcm9ibGVtXHBhcg0KXHBhcg0KXHBhcg0KaGVsbG8gYWxsLFxwYXINClxwYXINCmkn > bSBleHBlcmllbmNpbmcgc29tZSBjb21waWxhdGlvbiBwcm9ibGVtcy4gaSd2ZSBzYXZlZCBv > bmUgb2YgdGhlIGV4YW1wbGVzXHBhcg0KdGhhdCBjb21lcyB3aXRoIHRoZSBkb2N1bWVudGF0 > aW9uIGFuZCB0cnkgdG8gY29tcGlsZSBpdCwganVzdCB0byBnaXZlIGl0IGFccGFyDQp0cnku > IGJ1dCBpdCBnYXZlIGVycm9ycyAoZXJyb3IgbWVzc2FnZSB3YXMgInVuYWJsZSB0byByZXNv > bHZlIHN5bWJvbCIpLFxwYXINCmknbSBuZXcgdG8gamF2YSwgYnV0IHRoaXMgZXJyb3Igd2Fz > IHJhaXNlZCB3aGVuIHRoZSBjb21waWxlciBjb3VsZG4ndCBmaW5kXHBhcg0KdGhlIHJlbGV2 > YW50IHBhY2thZ2VzIG9yIGNsYXNzZXMgKGkgdGhpbmspXHBhcg0Kc291cmNlIGZpbGUgYW5k > IHRoZSAib3JnIiBkaXIgd2VyZSBpbiB0aGUgc2FtZSBsZXZlbCwgaSBkaWRuJ3QgdG91Y2gg > dGhlXHBhcg0KZGlyZWN0b3J5IHN0cnVjdHVyZSBvZiB0aGUgImh0bWxwYXJzZXIiLlxwYXIN > CndoZXJlIGFtIGkgZG9pbmcgd3JvbmcsIGknbSB1c2luZyBqMnNlLCBtYXliZSBpdCByZXF1 > aXJlcyAiYW50Ij8/XHBhcg0KXHBhcg0KdGhhbmtzIGZvciB5b3VyIGhlbHAsXHBhcg0KZ29r > Y2VuXHBhcg0KXHBhcg0KXHBhcg0KXHBhcg0KXHBhcg0KXHBhcg0KLS0tLS0tLS0tLS0tLS0t > LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLVxwYXINClRoaXMgc2Yu > bmV0IGVtYWlsIGlzIHNwb25zb3JlZCBieTpUaGlua0dlZWtccGFyDQpXZWxjb21lIHRvIGdl > ZWsgaGVhdmVuLlxwYXINCmh0dHA6Ly90aGlua2dlZWsuY29tL3NmXHBhcg0KX19fX19fX19f > X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19ccGFyDQpIdG1scGFyc2Vy > LXVzZXIgbWFpbGluZyBsaXN0XHBhcg0KSHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZv > cmdlLm5ldFxwYXINCmh0dHBzOi8vbGlzdHMuc291cmNlZm9yZ2UubmV0L2xpc3RzL2xpc3Rp > bmZvL2h0bWxwYXJzZXItdXNlclxwYXINCn0= > > --openmail-part-159d69ef-00000002-- > > > > --__--__-- > > Message: 4 > Date: Mon, 10 Mar 2003 15:28:56 +0200 (EET) > Subject: RE: [Htmlparser-user] compilation problem > From: "Gokcen Ogutcu" <sca...@bi...> > To: <htm...@li...> > Reply-To: htm...@li... > > i have tried, > > javac -classpath htmlparser.jar LinkExtractor.java > > and > > export CLASSPATH=$CLASSPATH:/home/x/htmlparser.jar > javac LinkExtractor.java > > and they both didn't work, i'm probably mistyping the commands above. > where am i doing wrong? > > thanks again, > gokcen > > > You need to include "htmlparser.jar" in your classpath settings > and then > > compile the example code. You do not need "ant". > > > > Regards, > > > > Dhaval Udani > > > > > > > --__--__-- > > Message: 5 > From: "Dave Knipp" <dav...@ho...> > To: htm...@li... > Subject: RE: [Htmlparser-user] compilation problem > Date: Mon, 10 Mar 2003 08:06:06 -0600 > Reply-To: htm...@li... > > <html><div style='background-color:'><P>is your jar, in the same > folder as the entry point for your program? If not you need to > give the classpath the actual path to the htmlparser.jar. If it > is, i would suggest adding more files to your classpath. For > example, if you are compiling from the directory with your main in > it, then just doing something like this:</P> > <P>javac -classpath .;./htmlparser_location/htmlparser.jar > YourClass.java</P><P>just keep trying different configurations in > your classpath and you are bound to get it to compile.</P> > <P>good luck,</P> > <P>Dave Knipp</P></div><br clear=all><hr>MSN 8 with e-mail" > target="l">http://g.msn.com/8HMPENUS/2740">e-mail virus protection > service: 2 months FREE*</html> > > > > --__--__-- > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > End of Htmlparser-user Digest > 'help' |
From: Somik R. <so...@ya...> - 2003-03-11 02:50:58
|
Hi Joe, One suggestion is - pass the stream to the visitor, so you can close it outside. However, it might be a good idea to support a parseCompleted() event on the visitor interface. Regards, Somik --- Joe Lin <gu...@ya...> wrote: > Hi, > > I wrote a visitor and register with the Parser. > Basically I was paersing a web page and dump the > result to a file. I close the FileOutputStream in my > visitEndTag as such: > > public void visitEndTag(EndTag endTag) > { > if ( endTag.getTagName().equalsIgnoreCase("HTML") > ) > { > //flus and close the file outputstream > } > } > > However, my program is getting the IOException > saying > that the outputstream is closed while I was still > trying to write to it. I then realize that my "if" > statement in the visitEndTagis not a correct signal > for determining that the parser is done parsing. Can > anyone please help me find out if there's any way > that > I can know the parser is finished parsing? Thanks. > > Joe > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ |
From: Joe L. <gu...@ya...> - 2003-03-11 00:55:24
|
Hi, I wrote a visitor and register with the Parser. Basically I was paersing a web page and dump the result to a file. I close the FileOutputStream in my visitEndTag as such: public void visitEndTag(EndTag endTag) { if ( endTag.getTagName().equalsIgnoreCase("HTML") ) { //flus and close the file outputstream } } However, my program is getting the IOException saying that the outputstream is closed while I was still trying to write to it. I then realize that my "if" statement in the visitEndTagis not a correct signal for determining that the parser is done parsing. Can anyone please help me find out if there's any way that I can know the parser is finished parsing? Thanks. Joe __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ |
From: Dave K. <dav...@ho...> - 2003-03-10 14:06:16
|
<html><div style='background-color:'><P>is your jar, in the same folder as the entry point for your program? If not you need to give the classpath the actual path to the htmlparser.jar. If it is, i would suggest adding more files to your classpath. For example, if you are compiling from the directory with your main in it, then just doing something like this:</P> <P>javac -classpath .;./htmlparser_location/htmlparser.jar YourClass.java</P> <P>just keep trying different configurations in your classpath and you are bound to get it to compile.</P> <P>good luck,</P> <P>Dave Knipp</P></div><br clear=all><hr>MSN 8 with <a href="http://g.msn.com/8HMPENUS/2740">e-mail virus protection service: </a> 2 months FREE*</html> |
From: Gokcen O. <sca...@bi...> - 2003-03-10 13:29:52
|
i have tried, javac -classpath htmlparser.jar LinkExtractor.java and export CLASSPATH=$CLASSPATH:/home/x/htmlparser.jar javac LinkExtractor.java and they both didn't work, i'm probably mistyping the commands above. where am i doing wrong? thanks again, gokcen > You need to include "htmlparser.jar" in your classpath settings and then > compile the example code. You do not need "ant". > > Regards, > > Dhaval Udani > |
From: <dha...@or...> - 2003-03-10 10:44:09
|
You need to include "htmlparser.jar" in your classpath settings and then compile the example code. You do not need "ant". Regards, Dhaval Udani -----Original Message----- From: scapegoat [mailto:sca...@bi...] Sent: Monday, March 10, 2003 3:47 PM To: htmlparser-user Cc: scapegoat Subject: [Htmlparser-user] compilation problem hello all, i'm experiencing some compilation problems. i've saved one of the examples that comes with the documentation and try to compile it, just to give it a try. but it gave errors (error message was "unable to resolve symbol"), i'm new to java, but this error was raised when the compiler couldn't find the relevant packages or classes (i think) source file and the "org" dir were in the same level, i didn't touch the directory structure of the "htmlparser". where am i doing wrong, i'm using j2se, maybe it requires "ant"?? thanks for your help, gokcen ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: Gokcen O. <sca...@bi...> - 2003-03-10 10:17:22
|
hello all, i'm experiencing some compilation problems. i've saved one of the examples that comes with the documentation and try to compile it, just to give it a try. but it gave errors (error message was "unable to resolve symbol"), i'm new to java, but this error was raised when the compiler couldn't find the relevant packages or classes (i think) source file and the "org" dir were in the same level, i didn't touch the directory structure of the "htmlparser". where am i doing wrong, i'm using j2se, maybe it requires "ant"?? thanks for your help, gokcen |