Re: [Htmlparser-user] Cookie handling
Brought to you by:
derrickoswald
From: Gavin G. <ga...@br...> - 2007-02-01 10:49:44
|
Ah, I see what you mean. Looking at the last url it seems even more redirects are taking place until it's arriving at the final url - I've counted around 7 so far. Regardless though, I'm still at a loss why the redirects aren't being processed and the cookie stuff handled in the end. Gavin. On Wed, Jan 31, 2007 at 06:00:17PM -0800, Derrick Oswald wrote: > > That should be handled OK. > But it doesn't look like what you want from that URL. > > ----- Original Message ---- > From: Gavin Gilmour <ga...@br...> > To: htmlparser user list <htm...@li...> > Sent: Tuesday, January 30, 2007 8:49:25 AM > Subject: Re: [Htmlparser-user] Cookie handling > > Wow, good point. The URL seems to load fine here in my browser but on further > inspection: > > sokar:~/junk% curl http://www3.interscience.wiley.com/cgi-bin/abstract/68504762/ABSTRACT\?CRETRY\=1\&SRETRY\=0 > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > <html><head> > <title>302 Found</title> > </head><body> > <h1>Found</h1> > <p>The document has moved <a > href="http://www3.interscience.wiley.com/cookie_setting_error.html">here</a>.</p> > </body></html> > > Seems to have already decided it's dud, weird. > > After a bit of investigating, the full story is a bit worse than I thought and > is involing multiple redirects. The first link I need comes back from this service: > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=10629107&retmode=ref&cmd=prlinks > - which just offers up a 302 or whatever and then issues a redirect. > > Fair enough, so: > > --- > sokar:~/junk% curl 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=10629107&retmode=ref&cmd=prlinks' > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > <html><head> > <title>302 Found</title> > </head><body> > <h1>Found</h1> > <p>The document has moved <a > href="http://dx.doi.org/10.1002/(SICI)1097-4644(1999)75:32+<84::AID-JCB11>3.0.CO;2-M">here</a>.</p> > </body></html> > --- > > Is giving 'http://dx...' which is what the parser is trying next I'd imagine. So then: > > sokar:~/junk% curl 'http://dx.doi.org/10.1002/(SICI)1097-4644(1999)75:32+<84::AID-JCB11>3.0.CO;2-M' > <HTML><HEAD><TITLE>Handle Redirect</TITLE></HEAD> > <BODY><A > HREF="http://doi.wiley.com/10.1002/%28SICI%291097-4644%281999%2975%3A32%2B%3C84%3A%3AAID-JCB11%3E3.0.CO%3B2-M">http://doi.wiley.com/10.1002/%28SICI%291097-4644%281999%2975%3A32%2B%3C84%3A%3AAID-JCB11%3E3.0.CO%3B2-M</A></BODY></HTML> > > Looking at this URL, it seems to be the 'final one' which is leading to the > (desired) destination in a browser. (Does that output even look like something > the parser would handle though?) > > What a mess :( > > Gavin. > > P.S. Sorry about the horribly formatted mail due to the unsightly urls > involved. > > > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |