Re: [Htmlparser-user] Cookie handling
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-02-01 02:00:37
|
=0AThat should be handled OK.=0ABut it doesn't look like what you want from= that URL.=0A=0A----- Original Message ----=0AFrom: Gavin Gilmour <gavin@br= okentrain.net>=0ATo: htmlparser user list <htm...@li...urceforg= e.net>=0ASent: Tuesday, January 30, 2007 8:49:25 AM=0ASubject: Re: [Htmlpar= ser-user] Cookie handling=0A=0AWow, good point. The URL seems to load fine = here in my browser but on further=0Ainspection:=0A=0Asokar:~/junk% curl htt= p://www3.interscience.wiley.com/cgi-bin/abstract/68504762/ABSTRACT\?CRETRY\= =3D1\&SRETRY\=3D0=0A<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">=0A<h= tml><head>=0A<title>302 Found</title>=0A</head><body>=0A<h1>Found</h1>=0A<p= >The document has moved <a=0Ahref=3D"http://www3.interscience.wiley.com/coo= kie_setting_error.html">here</a>.</p>=0A</body></html>=0A=0ASeems to have a= lready decided it's dud, weird.=0A=0AAfter a bit of investigating, the full= story is a bit worse than I thought and=0Ais involing multiple redirects. = The first link I need comes back from this service:=0Ahttp://eutils.ncbi.nl= m.nih.gov/entrez/eutils/elink.fcgi?dbfrom=3Dpubmed&id=3D10629107&retmode=3D= ref&cmd=3Dprlinks=0A- which just offers up a 302 or whatever and then issue= s a redirect.=0A=0AFair enough, so:=0A=0A---=0Asokar:~/junk% curl 'http://e= utils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=3Dpubmed&id=3D106291= 07&retmode=3Dref&cmd=3Dprlinks'=0A<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML = 2.0//EN">=0A<html><head>=0A<title>302 Found</title>=0A</head><body>=0A<h1>F= ound</h1>=0A<p>The document has moved <a=0Ahref=3D"http://dx.doi.org/10.100= 2/(SICI)1097-4644(1999)75:32+<84::AID-JCB11>3.0.CO;2-M">here</a>.</p>= =0A</body></html>=0A---=0A=0AIs giving 'http://dx...' which is what the par= ser is trying next I'd imagine. So then:=0A=0Asokar:~/junk% curl 'http://dx= .doi.org/10.1002/(SICI)1097-4644(1999)75:32+<84::AID-JCB11>3.0.CO;2-M'=0A<H= TML><HEAD><TITLE>Handle Redirect</TITLE></HEAD>=0A<BODY><A=0AHREF=3D"http:/= /doi.wiley.com/10.1002/%28SICI%291097-4644%281999%2975%3A32%2B%3C84%3A%3AAI= D-JCB11%3E3.0.CO%3B2-M">http://doi.wiley.com/10.1002/%28SICI%291097-4644%28= 1999%2975%3A32%2B%3C84%3A%3AAID-JCB11%3E3.0.CO%3B2-M</A></BODY></HTML>=0A= =0ALooking at this URL, it seems to be the 'final one' which is leading to = the=0A(desired) destination in a browser. (Does that output even look like = something=0Athe parser would handle though?)=0A=0AWhat a mess :(=0A=0AGavin= .=0A=0AP.S. Sorry about the horribly formatted mail due to the unsightly ur= ls=0Ainvolved.=0A=0A=0A=0A=0A=0A |