htmlparser-user Mailing List for HTML Parser (Page 39)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Derrick O. <Der...@Ro...> - 2006-04-13 03:44:25
|
Bram, The mini <A/> tag in your test case is considered a content-less XML tag (see isEmptyXmlTag() in TagNode). This causes the parser to set the end tag reference to be the start tag - this is done so that there will always be an end tag, which may be a bad design decision, but it was thought to be better than inventing a non-existent tag, or returning null. When you add an HREF attribute in this case, the only attribute (the tag name with the slash) is no longer the last attribute and hence isEmptyXmlTag returns false, but the end tag reference is still pointing to the start tag, which causes the recursion. I guess the add attribute code could be smarter and detect this pathological situation, but I'm wondering if that's the real solution or just a band-aid. Was this discovered in the wild? Why is the XML syntax used for an empty link? Is this an XML file? Perhaps an XML parser would be a better choice. Derrick Bram wrote: >Hello, > >I've isolated a problem, where sometimes htmlparser runs into an >endless recursion loop when transforming links in HTML pages. > >Basically, I fetch & parse a document, change attributes on links, and >reassemble it into a HTML page again. Without changing the attributes, >the problem does not occur. > >See the attached JUnit test case. > >Best regards, >Bram Avontuur > > >------------------------------------------------------------------------ > >import java.util.logging.Logger; > >import org.htmlparser.Node; >import org.htmlparser.Parser; >import org.htmlparser.filters.TagNameFilter; >import org.htmlparser.nodes.TagNode; >import org.htmlparser.util.NodeList; >import org.htmlparser.util.ParserException; > >import junit.framework.TestCase; > >public class TestHtmlTransformer extends TestCase { > private static final Logger logger = Logger.global; > > public void testHtmlTransformer() { > String html = "<a/>"; > NodeList nodes = null; > > try { > Parser parser = Parser.createParser(html, "UTF-8"); > nodes = parser.parse(null); > } catch (ParserException e) { > logger.severe("ParserException: " + e.getMessage()); > fail("ParserException"); > } > > nodes.toHtml(); //this works fine. > logger.info("First html conversion went OK"); > > NodeList links = nodes.extractAllNodesThatMatch( > new TagNameFilter("a"), true); > > for (int i = 0; i < links.size(); i++) { > Node tmpNode = links.elementAt(i); > TagNode link = null; > if (tmpNode instanceof TagNode) { > link = (TagNode)links.elementAt(i); > } else { > continue; > } > > link.setAttribute("href", "http://foo.com/"); > } > > try { > nodes.toHtml(); //now it starts endless recursive loop > } catch (StackOverflowError e) { > //e.printStackTrace(); > fail("Endless recursion detected."); > } > logger.info("Passed second conversion"); //you never see this > } >} > > |
From: Bram <br...@av...> - 2006-04-12 12:31:43
|
Hello, I've isolated a problem, where sometimes htmlparser runs into an endless recursion loop when transforming links in HTML pages. Basically, I fetch & parse a document, change attributes on links, and reassemble it into a HTML page again. Without changing the attributes, the problem does not occur. See the attached JUnit test case. Best regards, Bram Avontuur -- Latest survey shows that 3 out of 4 people make up 75% of the world's population. |
From: <ga...@la...> - 2006-04-12 03:31:51
|
Hello, I will be out of the office from Friday April 7th through Wednesday April 12, returning to the office on the 13th. If this is an emergency please call out tech support at 1-877-233-4951, otherwise I will repond to your email as soon as possible. Thank You, Bill Gamble |
From: Ian M. <ian...@gm...> - 2006-04-11 21:20:23
|
HTMLParser is certainly capable of rewriting bits of HTML, but bear in mind that the rest of the output may not be quite the same as the input. To get HTMLParser to make a parser from a URL, use the constructor that takes the URL as a String. If you've got the page as a String already, then do Parser.setInputHTML. Then modify the document as you see fit, then use the toHTML method for the root document obtained from doing parser.parse. API reference is here: http://htmlparser.sourceforge.net/javadoc/index.html?org/htmlparser/Parser.= html Ian On 11/04/06, pre...@wi... <pre...@wi...> wrote: > > > > Hi All, > > > > Following is the requirement of my project > > > > Requirement > > > > I need to write a java proxy so that I have the handle to all the request= s > that are made to a particular site. This is so that I can charge the end > user. I have written a java code which connects to a particular URL gets = the > HTML data and forwards the same to the client. > > > > Problem area > > > > I want to parse the HTML returned from the site requested by user and app= end > or change the value of some tags eg > > > > href=3D"http://groups.google.co.in/grphp?hl=3Den&tab=3Dwg&ie=3DUTF-8"> > > > > To be changed to something like > > > > href=3D"http://localhost:8080/pageRedirect?page=3D"http://groups.google.c= o.in/grphp?hl=3Den&tab=3Dwg&ie=3DUTF-8""> > > > > So that the request is redirect back to my proxy server and I extract the > value of the page parameter and forward the request. > > > > Can I achieve this parsing and translation using HTML parser? > > Can you please provide me with any sample code? > > > > Thanks & Regards > > Prerna Sawhney > > |
From: Ian M. <ian...@gm...> - 2006-04-11 21:09:09
|
If you aren't running a web server (it's not clear from your email), then you will need to use file:/// 'URLs' instead, or read the documents in as Strings using Java's File and FileReader classes, and then do Parser.setInputHTML. Ian On 11/04/06, Derrick Oswald <Der...@ro...> wrote: > Jay, > > By the 'directory is handle as a text Node', do you mean the web server > is replying to the URL with a directory listing as a plain text node in > the HTML page?. > Usually HTTP servers either don't reply with anything, or if configured > more loosely will provide a hyper-linked listing of the directory so the > user can navigate through the directories. > As far as I know there isn't any existing code to handle a plain-text > listing of a directory and extract the tree structure if that's what > you're asking. > > Derrick > > HATTAT J=E9r=E9mie wrote: > > > Hi everbody, > > > > I'm working with htmlparser for few days, > > I want to miror an entire site, my start point was SiteCapturer example= , > > but i can't handle a site with an url like "http://localhost/mysite", > > because a > > directory is handle as a text Node. > > > > Is there a API implemented solution or should i developp the file > > handler engine? > > > > Thanks in advance, > > > > Jay > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting langua= ge > that extends applications into web and mobile media. Attend the live webc= ast > and join the prime developer group breaking into this new coding territor= y! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: <pre...@wi...> - 2006-04-11 12:39:05
|
Hi All, =20 Following is the requirement of my project =20 Requirement =20 I need to write a java proxy so that I have the handle to all the requests that are made to a particular site. This is so that I can charge the end user. I have written a java code which connects to a particular URL gets the HTML data and forwards the same to the client. =20 Problem area =20 I want to parse the HTML returned from the site requested by user and append or change the value of some tags eg =20 href=3D"http://groups.google.co.in/grphp?hl=3Den&tab=3Dwg&ie=3DUTF-8"> =20 To be changed to something like =20 href=3D"http://localhost:8080/pageRedirect?page=3D"http://groups.google.c= o.i n/grphp?hl=3Den&tab=3Dwg&ie=3DUTF-8 <http://localhost:8080/pageRedirect?page=3D%22http://groups.google.co.in/= g rphp?hl=3Den&tab=3Dwg&ie=3DUTF-8> ""> =20 So that the request is redirect back to my proxy server and I extract the value of the page parameter and forward the request. =20 Can I achieve this parsing and translation using HTML parser? Can you please provide me with any sample code? =20 Thanks & Regards Prerna Sawhney =20 |
From: Derrick O. <Der...@Ro...> - 2006-04-11 11:31:11
|
Jay, By the 'directory is handle as a text Node', do you mean the web server is replying to the URL with a directory listing as a plain text node in the HTML page?. Usually HTTP servers either don't reply with anything, or if configured more loosely will provide a hyper-linked listing of the directory so the user can navigate through the directories. As far as I know there isn't any existing code to handle a plain-text listing of a directory and extract the tree structure if that's what you're asking. Derrick HATTAT Jérémie wrote: > Hi everbody, > > I'm working with htmlparser for few days, > I want to miror an entire site, my start point was SiteCapturer example, > but i can't handle a site with an url like "http://localhost/mysite", > because a > directory is handle as a text Node. > > Is there a API implemented solution or should i developp the file > handler engine? > > Thanks in advance, > > Jay > |
From: <jh...@gm...> - 2006-04-11 08:04:13
|
Hi everbody, I'm working with htmlparser for few days, I want to miror an entire site, my start point was SiteCapturer example, but i can't handle a site with an url like "http://localhost/mysite", because a directory is handle as a text Node. Is there a API implemented solution or should i developp the file handler engine? Thanks in advance, Jay |
From: <ga...@la...> - 2006-04-11 03:34:06
|
Hello, I will be out of the office from Friday April 7th through Wednesday April 12, returning to the office on the 13th. If this is an emergency please call out tech support at 1-877-233-4951, otherwise I will repond to your email as soon as possible. Thank You, Bill Gamble |
From: Bastian H. <ho...@fm...> - 2006-04-10 10:38:34
|
Hey, im currently working on a robust utility class converting html to xml documents. I'd like to say THANKS for this nice software library which eases my work ;-) best regards Bastian |
From: Java P. <jpr...@gm...> - 2006-04-07 09:43:39
|
On 4/6/06, Derrick Oswald <Der...@ro...> wrote: > > You might want to set the connect timeout in your mainline: > System.setProperty ("sun.net.client.defaultReadTimeout", "7000"); > System.setProperty ("sun.net.client.defaultConnectTimeout", "7000= "); > Hello, I don't know the reason but it completly doesn't work for me - I checked http://java.sun.com/j2se/1.5.0/docs/guide/net/properties.html so this property is checked for 1.5, but partial code below still freezes for some URLs which don't respond. System.setProperty ("sun.net.client.defaultReadTimeout", "1500"); System.setProperty ("sun.net.client.defaultConnectTimeout", "1500"); String page =3D"", line =3D ""; try { =09URL url =3D new URL("http://www.i-am-bad-url-and-dont-respond.com/"); =09/*URLConnection urlconn =3D url.openConnection(); =09InputStream in =3D urlconn.getInputStream();*/ =09InputStream in =3D url.openStream(); =09Reader r =3D new InputStreamReader(in); =09BufferedReader br =3D new BufferedReader(r); =09while((line=3Dbr.readLine())!=3Dnull){ =09=09page +=3D line; =09} } catch (MalformedURLException e) { =09System.out.println(e.getMessage()); } catch (IOException e) { =09System.out.println(e.getMessage()); } //it will never gets here It's piece of code to test properties but I have exactly same situation with HTMLParser, and don't know where is the problem. Sorry, to ask again same question but it's really frustrating for me not to have simple timeout for connection :( If anyone have other hint/clue for this pls answer me. Best regards, Adr |
From: Derrick O. <Der...@Ro...> - 2006-04-06 04:16:53
|
You might want to set the connect timeout in your mainline: System.setProperty ("sun.net.client.defaultReadTimeout", "7000"); System.setProperty ("sun.net.client.defaultConnectTimeout", "7000"); Java Programmer wrote: >Hello, >I have problem with hanging out fetching threads - in few cases the >running threads stops when entering such method: > >public void fetch() throws ParserException{ > Parser parser = new Parser(this.url); > visitor = new SimplePageVisitor(); > parser.visitAllNodesWith(visitor); >} > >I logged what cause hang out and code stops right on: Parser parser = >new Parser(this.url); > >I saw several times that "java.net.SocketTimeoutException: Read timed >out" was thrown, but not in that case (thread just hangs for hours >until I don't kill it by hand). > >I searched for solution but without luck. > >If anyone can help I apperciate it. > >Best regards, >Adr > > >------------------------------------------------------- >This SF.Net email is sponsored by xPML, a groundbreaking scripting language >that extends applications into web and mobile media. Attend the live webcast >and join the prime developer group breaking into this new coding territory! >http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642 >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > |
From: Java P. <jpr...@gm...> - 2006-04-05 15:06:14
|
Hello, I have problem with hanging out fetching threads - in few cases the running threads stops when entering such method: public void fetch() throws ParserException{ Parser parser =3D new Parser(this.url); visitor =3D new SimplePageVisitor(); parser.visitAllNodesWith(visitor); } I logged what cause hang out and code stops right on: Parser parser =3D new Parser(this.url); I saw several times that "java.net.SocketTimeoutException: Read timed out" was thrown, but not in that case (thread just hangs for hours until I don't kill it by hand). I searched for solution but without luck. If anyone can help I apperciate it. Best regards, Adr |
From: <xia...@gm...> - 2006-04-04 12:17:18
|
I try to use connectionManager but it can not hold the connection to web server. that is to say when you post your credential to server ,you cookie message could not be hold for second connection. I change httpclient to hold cookie message to solve this . On 4/3/06, Derrick Oswald <Der...@ro...> wrote: > > > I guess it would be considered top down parsing. The lexer is under the > direction of a scanner for each tag, which knows the 'production rule'. > > To figure it out I think you look for 'push' or 'pull'. Is something > pulling (see IteratorImpl.nextNode()) or is it event driven and > something is reacting to lower level lexeme recognition. > > Vishal Monpara wrote: > > > Hi Everybody, > > > > I have used this htmlparser in my project and the next week I have to > > give presentation for this. I have few question about htmlparser. > > > > 1) What kind of parsing technique is used to develop it. Top down > > parsing or Bottom up parsing. > > 2) How one can figure it out that it is is using top down parsing / > > bottom up parsing technique is used. > > > > Thanks in advance. > > > > Regards, > > Vishal Monpara > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- Best Regards. Xiaodong Han MSN:hx...@ho... |
From: Derrick O. <Der...@Ro...> - 2006-04-03 00:45:30
|
I think I see the problem now. For an XML tag like <hello />, the end tag is the tag. Change it to: allowed.add(tag.getRawTagName()); allowed.add("/"); and it should work. dust wrote: >Hello, > > >Am I doing something wrong in the attached code? > >I generates stack overflow error when run with the >default url found in main. > > > >Exception in thread "main" java.lang.StackOverflowError > at java.lang.StringBuffer.append(Unknown Source) > at >org.htmlparser.lexer.InputStreamSource.getCharacters(InputStreamSource.java:641) > at org.htmlparser.lexer.Page.getText(Page.java:1021) > at org.htmlparser.lexer.PageAttribute.getRawValue(PageAttribute.java:384) > at org.htmlparser.Attribute.toString(Attribute.java:730) > at org.htmlparser.nodes.TagNode.toHtml(TagNode.java:686) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:177) > at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) > at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) > at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) > at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) > >etc, > >-- > > >------------------------------------------------------------------------ > >import java.util.HashSet; >import java.util.Set; >import java.util.Vector; > >import org.htmlparser.Attribute; >import org.htmlparser.Node; >import org.htmlparser.Parser; >import org.htmlparser.Tag; >import org.htmlparser.util.NodeList; >import org.htmlparser.util.ParserException; >import org.htmlparser.util.SimpleNodeIterator; > >public class HtmlParser >{ > Parser parser; > > public HtmlParser (String link) throws ParserException > { > parser = new Parser (link); > } > > public static void main (String[] args) throws ParserException > { > String link="http://mips.gsf.de/projects/fungi/fungi_db.html"; > if(args.length>0) > link=args[0]; > > HtmlParser htmlParser = new HtmlParser (link); > String html = htmlParser.parse(); > System.out.println(html); > } > > private String parse() throws ParserException { > > NodeList list = parser.parse(null); > > recurse(list); > System.err.println("done, trying toHtml()"); > return list.toHtml(); > } > > private NodeList recurse(NodeList list) { > if(list==null) > return null; > Node node; > SimpleNodeIterator iterator = list.elements(); > while(iterator.hasMoreNodes()) > { > node = iterator.nextNode(); > if(node==null) > break; > if(node instanceof Tag) > { > Tag tag = (Tag)node; > removeAttributes(tag); > recurse(node.getChildren()); > } > } > return null; > } > > static private void removeAttributes(Tag tag) { > String[] allowedAttrs = {""}; > Set allowed = new HashSet(); > for(int i=0;i<allowedAttrs.length;i++) > allowed.add(allowedAttrs[i]); > > allowed.add(tag.getRawTagName()); > allowed.add("/"+tag.getRawTagName()); > > Vector attrs = tag.getAttributesEx(); > for(int i=0;i<attrs.size();i++) > { > Attribute attr = (Attribute)attrs.get(i); > if(attr.getName()==null) > continue; > if(!allowed.contains(attr.getName())) > { > tag.removeAttribute(attr.getName()); > System.out.println("Removed attr: "+attr.getName()); > } > } > } >} > > |
From: Derrick O. <Der...@Ro...> - 2006-04-03 00:33:46
|
I don't see anything obviously wrong with it... ...it doesn't overflow if you don't remove the attributes? dust wrote: >Hello, > > >Am I doing something wrong in the attached code? > >I generates stack overflow error when run with the >default url found in main. > > > >Exception in thread "main" java.lang.StackOverflowError > at java.lang.StringBuffer.append(Unknown Source) > at >org.htmlparser.lexer.InputStreamSource.getCharacters(InputStreamSource.java:641) > at org.htmlparser.lexer.Page.getText(Page.java:1021) > at org.htmlparser.lexer.PageAttribute.getRawValue(PageAttribute.java:384) > at org.htmlparser.Attribute.toString(Attribute.java:730) > at org.htmlparser.nodes.TagNode.toHtml(TagNode.java:686) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:177) > at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) > at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) > at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) > at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) > at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) > >etc, > >-- > > >------------------------------------------------------------------------ > >import java.util.HashSet; >import java.util.Set; >import java.util.Vector; > >import org.htmlparser.Attribute; >import org.htmlparser.Node; >import org.htmlparser.Parser; >import org.htmlparser.Tag; >import org.htmlparser.util.NodeList; >import org.htmlparser.util.ParserException; >import org.htmlparser.util.SimpleNodeIterator; > >public class HtmlParser >{ > Parser parser; > > public HtmlParser (String link) throws ParserException > { > parser = new Parser (link); > } > > public static void main (String[] args) throws ParserException > { > String link="http://mips.gsf.de/projects/fungi/fungi_db.html"; > if(args.length>0) > link=args[0]; > > HtmlParser htmlParser = new HtmlParser (link); > String html = htmlParser.parse(); > System.out.println(html); > } > > private String parse() throws ParserException { > > NodeList list = parser.parse(null); > > recurse(list); > System.err.println("done, trying toHtml()"); > return list.toHtml(); > } > > private NodeList recurse(NodeList list) { > if(list==null) > return null; > Node node; > SimpleNodeIterator iterator = list.elements(); > while(iterator.hasMoreNodes()) > { > node = iterator.nextNode(); > if(node==null) > break; > if(node instanceof Tag) > { > Tag tag = (Tag)node; > removeAttributes(tag); > recurse(node.getChildren()); > } > } > return null; > } > > static private void removeAttributes(Tag tag) { > String[] allowedAttrs = {""}; > Set allowed = new HashSet(); > for(int i=0;i<allowedAttrs.length;i++) > allowed.add(allowedAttrs[i]); > > allowed.add(tag.getRawTagName()); > allowed.add("/"+tag.getRawTagName()); > > Vector attrs = tag.getAttributesEx(); > for(int i=0;i<attrs.size();i++) > { > Attribute attr = (Attribute)attrs.get(i); > if(attr.getName()==null) > continue; > if(!allowed.contains(attr.getName())) > { > tag.removeAttribute(attr.getName()); > System.out.println("Removed attr: "+attr.getName()); > } > } > } >} > > |
From: Derrick O. <Der...@Ro...> - 2006-04-03 00:15:36
|
I guess it would be considered top down parsing. The lexer is under the direction of a scanner for each tag, which knows the 'production rule'. To figure it out I think you look for 'push' or 'pull'. Is something pulling (see IteratorImpl.nextNode()) or is it event driven and something is reacting to lower level lexeme recognition. Vishal Monpara wrote: > Hi Everybody, > > I have used this htmlparser in my project and the next week I have to > give presentation for this. I have few question about htmlparser. > > 1) What kind of parsing technique is used to develop it. Top down > parsing or Bottom up parsing. > 2) How one can figure it out that it is is using top down parsing / > bottom up parsing technique is used. > > Thanks in advance. > > Regards, > Vishal Monpara > |
From: dust <du...@i2...> - 2006-04-02 19:15:40
|
Hello, Am I doing something wrong in the attached code? I generates stack overflow error when run with the default url found in main. Exception in thread "main" java.lang.StackOverflowError at java.lang.StringBuffer.append(Unknown Source) at org.htmlparser.lexer.InputStreamSource.getCharacters(InputStreamSource.java:641) at org.htmlparser.lexer.Page.getText(Page.java:1021) at org.htmlparser.lexer.PageAttribute.getRawValue(PageAttribute.java:384) at org.htmlparser.Attribute.toString(Attribute.java:730) at org.htmlparser.nodes.TagNode.toHtml(TagNode.java:686) at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:177) at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) at org.htmlparser.tags.CompositeTag.putEndTagInto(CompositeTag.java:167) at org.htmlparser.tags.CompositeTag.toHtml(CompositeTag.java:182) etc, -- |
From: Vishal M. <mon...@ho...> - 2006-04-02 16:46:52
|
Hi Everybody, I have used this htmlparser in my project and the next week I have to give presentation for this. I have few question about htmlparser. 1) What kind of parsing technique is used to develop it. Top down parsing or Bottom up parsing. 2) How one can figure it out that it is is using top down parsing / bottom up parsing technique is used. Thanks in advance. Regards, Vishal Monpara _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ |
From: Derrick O. <Der...@Ro...> - 2006-03-31 12:46:31
|
The only documentation is in the http package javadocs: http://htmlparser.sourceforge.net/javadoc/org/htmlparser/http/package-summary.html ?? wrote: > in this topic ,you said the connectionManager can handle the cookie > and redirect. > but how to use this object set cookie for parse another page. > because parge another page need cookie to be set. > On 3/30/06, *Derrick Oswald* <Der...@ro... > <mailto:Der...@ro...>> wrote: > > > No, there is a 'FollowRedirects' flag that automatically follows it. > See the discussion in RFE #1436082 Follow redirections with cookie > processing > > http://sourceforge.net/tracker/index.php?func=detail&aid=1436082&group_id=24399&atid=381402 > <http://sourceforge.net/tracker/index.php?func=detail&aid=1436082&group_id=24399&atid=381402> > > > ?? wrote: > > > I tryed that but It still can not work, I research the web page flow > > and find that > > when you log in ,then server redirect you to another page. > > does the testPost() can handle this ? > > > > > > On 3/29/06, *Derrick Oswald* <Der...@ro... > <mailto:Der...@ro...> > > <mailto:Der...@ro... > <mailto:Der...@ro...>>> wrote: > > > > You may need to 'POST' to the login form using the ConnectionManager > > with your credentials. > > See the doc-comments for > > src/org/htmlparser/tests/ParserTest.testPOST() > > for an example. > > > > ?? wrote: > > > > > I want to parse a web page that need to log in.so I use the wiki > > > example but can not work. the cookie expired when the browser > > shut down. > > > Can you tell me how to handle this situation. > > > > > > -- > > > Best Regards. > > > > > > Xiaodong Han > > > MSN:hx...@ho... <mailto:MSN:hx...@ho...> > <mailto:MSN <mailto:MSN>: hx...@ho... > <mailto:hx...@ho...>> > > <mailto:MSN <mailto:MSN> <mailto:MSN > <mailto:MSN>>:hx...@ho... <mailto:hx...@ho...> > > <mailto: hx...@ho... <mailto:hx...@ho...>>> > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting > > language > > that extends applications into web and mobile media. Attend the > > live webcast > > and join the prime developer group breaking into this new coding > > territory! > > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642> > > > <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642>> > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > <mailto:Htm...@li...> > > <mailto:Htm...@li... > <mailto:Htm...@li...>> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > <https://lists.sourceforge.net/lists/listinfo/htmlparser-user> > > <https://lists.sourceforge.net/lists/listinfo/htmlparser-user> > > > > > > > > > > -- > > Best Regards. > > > > Xiaodong Han > > MSN:hx...@ho... <mailto:MSN:hx...@ho...> > <mailto:MSN <mailto:MSN>:hx...@ho... > <mailto:hx...@ho...>> > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the > live webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642> > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > <mailto:Htm...@li...> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > -- > Best Regards. > > Xiaodong Han > MSN:hx...@ho... <mailto:MSN:hx...@ho...> |
From: <xia...@gm...> - 2006-03-31 07:24:43
|
in this topic ,you said the connectionManager can handle the cookie and redirect. but how to use this object set cookie for parse another page. because parge another page need cookie to be set. On 3/30/06, Derrick Oswald <Der...@ro...> wrote: > > > No, there is a 'FollowRedirects' flag that automatically follows it. > See the discussion in RFE #1436082 Follow redirections with cookie > processing > > > http://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1436082&grou= p_id=3D24399&atid=3D381402 > > > ?? wrote: > > > I tryed that but It still can not work, I research the web page flow > > and find that > > when you log in ,then server redirect you to another page. > > does the testPost() can handle this ? > > > > > > On 3/29/06, *Derrick Oswald* <Der...@ro... > > <mailto:Der...@ro...>> wrote: > > > > You may need to 'POST' to the login form using the ConnectionManage= r > > with your credentials. > > See the doc-comments for > > src/org/htmlparser/tests/ParserTest.testPOST() > > for an example. > > > > ?? wrote: > > > > > I want to parse a web page that need to log in.so I use the wiki > > > example but can not work. the cookie expired when the browser > > shut down. > > > Can you tell me how to handle this situation. > > > > > > -- > > > Best Regards. > > > > > > Xiaodong Han > > > MSN:hx...@ho... <mailto:MSN:hx...@ho...> > > <mailto:MSN <mailto:MSN>:hx...@ho... > > <mailto:hx...@ho...>> > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting > > language > > that extends applications into web and mobile media. Attend the > > live webcast > > and join the prime developer group breaking into this new coding > > territory! > > > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642 > > < > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642> > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > <mailto:Htm...@li...> > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > <https://lists.sourceforge.net/lists/listinfo/htmlparser-user> > > > > > > > > > > -- > > Best Regards. > > > > Xiaodong Han > > MSN:hx...@ho... <mailto:MSN:hx...@ho...> > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- Best Regards. Xiaodong Han MSN:hx...@ho... |
From: Derrick O. <Der...@Ro...> - 2006-03-30 12:21:30
|
No, there is a 'FollowRedirects' flag that automatically follows it. See the discussion in RFE #1436082 Follow redirections with cookie processing http://sourceforge.net/tracker/index.php?func=detail&aid=1436082&group_id=24399&atid=381402 ?? wrote: > I tryed that but It still can not work, I research the web page flow > and find that > when you log in ,then server redirect you to another page. > does the testPost() can handle this ? > > > On 3/29/06, *Derrick Oswald* <Der...@ro... > <mailto:Der...@ro...>> wrote: > > You may need to 'POST' to the login form using the ConnectionManager > with your credentials. > See the doc-comments for > src/org/htmlparser/tests/ParserTest.testPOST() > for an example. > > ?? wrote: > > > I want to parse a web page that need to log in.so I use the wiki > > example but can not work. the cookie expired when the browser > shut down. > > Can you tell me how to handle this situation. > > > > -- > > Best Regards. > > > > Xiaodong Han > > MSN:hx...@ho... <mailto:MSN:hx...@ho...> > <mailto:MSN <mailto:MSN>:hx...@ho... > <mailto:hx...@ho...>> > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the > live webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > <http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642> > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > <mailto:Htm...@li...> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > <https://lists.sourceforge.net/lists/listinfo/htmlparser-user> > > > > > -- > Best Regards. > > Xiaodong Han > MSN:hx...@ho... <mailto:MSN:hx...@ho...> |
From: <xia...@gm...> - 2006-03-30 04:27:45
|
TXkgY29kZSBpcyBzb21ldGhpbmcgbGlrZSB0aGlzOgoKICAgICAgICAgICAgdXJsID0gbmV3IFVS TCAoIgpodHRwOi8vd3d3LnRqZncuY29tLzItaGFuZC9pbmNsdWRlL2NoZWNrbG9naW4uYXNwIik7 CiAgICAgICAgICAgIGNvbm5lY3Rpb24gPSAoSHR0cFVSTENvbm5lY3Rpb24pdXJsLm9wZW5Db25u ZWN0aW9uICgpOwogICAgICAgICAgICBjb25uZWN0aW9uLnNldFJlcXVlc3RNZXRob2QgKCJQT1NU Iik7CgogICAgICAgICAgICBjb25uZWN0aW9uLnNldERvT3V0cHV0ICh0cnVlKTsKICAgICAgICAg ICAgY29ubmVjdGlvbi5zZXREb0lucHV0ICh0cnVlKTsKICAgICAgICAgICAgY29ubmVjdGlvbi5z ZXRVc2VDYWNoZXMgKGZhbHNlKTsKCiAgICAgICAgICAgY29ubmVjdGlvbi5zZXRSZXF1ZXN0UHJv cGVydHkoIlJlZmVyZXIiLCIKaHR0cDovL3d3dy50amZ3LmNvbS9yZWcvbG9naW4uYXNwIik7Cgog ICAgICAgICAgIGJ1ZmZlciA9IG5ldyBTdHJpbmdCdWZmZXIgKDEwMjQpOwoKCiAgICAgICAgICAg YnVmZmVyLmFwcGVuZCgiYmFja1VybD0iKTsKICAgICAgICAgICAvL2J1ZmZlci5hcHBlbmQoImh0 dHA6Ly93d3cudGpmdy5jb20vcmVnL2xvZ2luLmFzcCIpOwogICAgICAgICAgIGJ1ZmZlci5hcHBl bmQoIiYiKTsKICAgICAgICAgICAgYnVmZmVyLmFwcGVuZCAoInVzZXJuYW1lPSIpOwoKCiAgICAg ICAgICAgIGJ1ZmZlci5hcHBlbmQgKCImIik7CiAgICAgICAgICAgIGJ1ZmZlci5hcHBlbmQoInBh c3N3b3JkPSIpOwogICAgICAgICAgIG91dCA9IG5ldyBQcmludFdyaXRlciAoY29ubmVjdGlvbi5n ZXRPdXRwdXRTdHJlYW0gKCkpOwogICAgICAgICAgICBvdXQucHJpbnQgKGJ1ZmZlcik7CiAgICAg ICAgICAgIG91dC5jbG9zZSAoKTsKCiAgICAgICAgICAgIFBhcnNlciBwYXJzZXI9bmV3IFBhcnNl cihjb25uZWN0aW9uKTsKCk9uIDMvMzAvMDYsIN/L38sgPHhpYW9kb25nLmhhbkBnbWFpbC5jb20+ IHdyb3RlOgo+Cj4gSSB0cnllZCB0aGF0IGJ1dCBJdCBzdGlsbCBjYW4gbm90IHdvcmssIEkgcmVz ZWFyY2ggdGhlIHdlYiBwYWdlIGZsb3cgYW5kCj4gZmluZCB0aGF0Cj4gd2hlbiB5b3UgbG9nIGlu ICx0aGVuIHNlcnZlciByZWRpcmVjdCB5b3UgdG8gYW5vdGhlciBwYWdlLgo+IGRvZXMgdGhlIHRl c3RQb3N0KCkgY2FuIGhhbmRsZSB0aGlzID8KPgo+Cj4KPiBPbiAzLzI5LzA2LCBEZXJyaWNrIE9z d2FsZCA8RGVycmlja09zd2FsZEByb2dlcnMuY29tPiB3cm90ZToKPiA+Cj4gPiBZb3UgbWF5IG5l ZWQgdG8gJ1BPU1QnIHRvIHRoZSBsb2dpbiBmb3JtIHVzaW5nIHRoZSBDb25uZWN0aW9uTWFuYWdl cgo+ID4gd2l0aCB5b3VyIGNyZWRlbnRpYWxzLgo+ID4gU2VlIHRoZSBkb2MtY29tbWVudHMgZm9y IHNyYy9vcmcvaHRtbHBhcnNlci90ZXN0cy9QYXJzZXJUZXN0LnRlc3RQT1NUKCkKPiA+IGZvciBh biBleGFtcGxlLgo+ID4KPiA+ID8/IHdyb3RlOgo+ID4KPiA+ID4gSSB3YW50IHRvIHBhcnNlIGEg d2ViIHBhZ2UgdGhhdCBuZWVkIHRvIGxvZyBpbi5zbyBJIHVzZSB0aGUgd2lraQo+ID4gPiBleGFt cGxlIGJ1dCBjYW4gbm90IHdvcmsuIHRoZSBjb29raWUgZXhwaXJlZCB3aGVuIHRoZSBicm93c2Vy IHNodXQKPiA+IGRvd24uCj4gPiA+IENhbiB5b3UgdGVsbCBtZSBob3cgdG8gaGFuZGxlIHRoaXMg c2l0dWF0aW9uLgo+ID4gPgo+ID4gPiAtLQo+ID4gPiBCZXN0IFJlZ2FyZHMuCj4gPiA+Cj4gPiA+ IFhpYW9kb25nIEhhbgo+ID4gPiBNU046aHhkaGFuQGhvdG1haWwuY29tIDxtYWlsdG86TVNOOmh4 ZGhhbkBob3RtYWlsLmNvbT4KPiA+Cj4gPgo+ID4KPiA+Cj4gPiAtLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCj4gPiBUaGlzIFNGLk5ldCBlbWFp bCBpcyBzcG9uc29yZWQgYnkgeFBNTCwgYSBncm91bmRicmVha2luZyBzY3JpcHRpbmcKPiA+IGxh bmd1YWdlCj4gPiB0aGF0IGV4dGVuZHMgYXBwbGljYXRpb25zIGludG8gd2ViIGFuZCBtb2JpbGUg bWVkaWEuIEF0dGVuZCB0aGUgbGl2ZQo+ID4gd2ViY2FzdAo+ID4gYW5kIGpvaW4gdGhlIHByaW1l IGRldmVsb3BlciBncm91cCBicmVha2luZyBpbnRvIHRoaXMgbmV3IGNvZGluZwo+ID4gdGVycml0 b3J5IQo+ID4gaHR0cDovL3NlbC5hcy11cy5mYWxrYWcubmV0L3NlbD9jbWQ9bG5rJmtpZD0xMTA5 NDQmYmlkPTI0MTcyMCZkYXQ9MTIxNjQyCj4gPiBfX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fXwo+ID4gSHRtbHBhcnNlci11c2VyIG1haWxpbmcgbGlzdAo+ID4g SHRtbHBhcnNlci11c2VyQGxpc3RzLnNvdXJjZWZvcmdlLm5ldAo+ID4gaHR0cHM6Ly9saXN0cy5z b3VyY2Vmb3JnZS5uZXQvbGlzdHMvbGlzdGluZm8vaHRtbHBhcnNlci11c2VyCj4gPgo+Cj4KPgo+ IC0tCj4gQmVzdCBSZWdhcmRzLgo+Cj4gWGlhb2RvbmcgSGFuCj4gTVNOOmh4ZGhhbkBob3RtYWls LmNvbQo+CgoKCi0tCkJlc3QgUmVnYXJkcy4KClhpYW9kb25nIEhhbgpNU046aHhkaGFuQGhvdG1h aWwuY29tCg== |
From: <xia...@gm...> - 2006-03-30 03:35:26
|
I tryed that but It still can not work, I research the web page flow and find that when you log in ,then server redirect you to another page. does the testPost() can handle this ? On 3/29/06, Derrick Oswald <Der...@ro...> wrote: > > You may need to 'POST' to the login form using the ConnectionManager > with your credentials. > See the doc-comments for src/org/htmlparser/tests/ParserTest.testPOST() > for an example. > > ?? wrote: > > > I want to parse a web page that need to log in.so I use the wiki > > example but can not work. the cookie expired when the browser shut down= . > > Can you tell me how to handle this situation. > > > > -- > > Best Regards. > > > > Xiaodong Han > > MSN:hx...@ho... <mailto:MSN:hx...@ho...> > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -- Best Regards. Xiaodong Han MSN:hx...@ho... |
From: Derrick O. <Der...@Ro...> - 2006-03-29 23:36:51
|
You have to register as an ConnectionMonitor on the ConnectionManager. The HTTP response from the HttpURLConnection passed in the postConnect () call should be 3xx in those cases. Antony Sequeira wrote: >Hi > >In my code I have a http fetcher that puts received content into a >file for a set of urls. The content includes all the headers received >(the complete stream of data for a request from the server). > >At a later time I parse it using the html parser. This part of the >code seems to work, in the sense I am able to extract links in those >pages using the parser. > >My question is - how do I get hold of the http status code , >specifically when it is a 302 kind and then get hold of the new >location. > >Thanks, >-Antony > > > > |