Re: [Htmlparser-user] Cookie handling

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ah, I see what you mean. Looking at the last url it seems even more redirects
are taking place until it's arriving at the final url - I've counted around 7
so far. Regardless though, I'm still at a loss why the redirects aren't being
processed and the cookie stuff handled in the end.

Gavin.

On Wed, Jan 31, 2007 at 06:00:17PM -0800, Derrick Oswald wrote:
> 
> That should be handled OK.
> But it doesn't look like what you want from that URL.
> 
> ----- Original Message ----
> From: Gavin Gilmour <ga...@br...>
> To: htmlparser user list <htm...@li...>
> Sent: Tuesday, January 30, 2007 8:49:25 AM
> Subject: Re: [Htmlparser-user] Cookie handling
> 
> Wow, good point. The URL seems to load fine here in my browser but on further
> inspection:
> 
> sokar:~/junk% curl http://www3.interscience.wiley.com/cgi-bin/abstract/68504762/ABSTRACT\?CRETRY\=1\&SRETRY\=0
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>302 Found</title>
> </head><body>
> <h1>Found</h1>
> <p>The document has moved <a
> href="http://www3.interscience.wiley.com/cookie_setting_error.html">here</a>.</p>
> </body></html>
> 
> Seems to have already decided it's dud, weird.
> 
> After a bit of investigating, the full story is a bit worse than I thought and
> is involing multiple redirects. The first link I need comes back from this service:
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=10629107&retmode=ref&cmd=prlinks
> - which just offers up a 302 or whatever and then issues a redirect.
> 
> Fair enough, so:
> 
> ---
> sokar:~/junk% curl 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=10629107&retmode=ref&cmd=prlinks'
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>302 Found</title>
> </head><body>
> <h1>Found</h1>
> <p>The document has moved <a
> href="http://dx.doi.org/10.1002/(SICI)1097-4644(1999)75:32+&lt;84::AID-JCB11&gt;3.0.CO;2-M">here</a>.</p>
> </body></html>
> ---
> 
> Is giving 'http://dx...' which is what the parser is trying next I'd imagine. So then:
> 
> sokar:~/junk% curl 'http://dx.doi.org/10.1002/(SICI)1097-4644(1999)75:32+<84::AID-JCB11>3.0.CO;2-M'
> <HTML><HEAD><TITLE>Handle Redirect</TITLE></HEAD>
> <BODY><A
> HREF="http://doi.wiley.com/10.1002/%28SICI%291097-4644%281999%2975%3A32%2B%3C84%3A%3AAID-JCB11%3E3.0.CO%3B2-M">http://doi.wiley.com/10.1002/%28SICI%291097-4644%281999%2975%3A32%2B%3C84%3A%3AAID-JCB11%3E3.0.CO%3B2-M</A></BODY></HTML>
> 
> Looking at this URL, it seems to be the 'final one' which is leading to the
> (desired) destination in a browser. (Does that output even look like something
> the parser would handle though?)
> 
> What a mess :(
> 
> Gavin.
> 
> P.S. Sorry about the horribly formatted mail due to the unsightly urls
> involved.
> 
> 
> 
> 
> 

> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier.
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user