Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

8 RIS#readFullyOrUntil IOE/timeout - ID: 1177462
Last Update: Comment added ( karl-ia )

Igor going against proxy on dev02:8080 is failing to
get this page:

http://www.odpm.gov.uk/stellent/groups/odpm_control/documents/contentserver
template/odpm_index.hcst?n=1537&l=1

It looks like httpclient is not shutting down the
connection. httpclient is sending a Proxy-Connection
keep-alive. This seems to have something to do with
the behaviors.

What we were seeing was that httpclient was reading all
content but not the -1 at the end of the doc. -- so it
went to timeout.

Need to fix so Igor can use the proxy testing.


Michael Stack ( stack-sf ) - 2005-04-06 00:44

8

Closed

Fixed

Nobody/Anonymous

None

None

Public


Comments ( 3 )

Date: 2007-03-14 00:22
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-387 -- please add further
comments at that location.


Date: 2005-04-07 21:45
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Did some more work on this issue:

Another detail on '[ 1177462 ] RIS#readFullyOrUntil
IOE/timeout' prompted
by mail on this list this morning asking about Heritrix and
proxy's.
* src/java/org/archive/httpclient/HttpRecorderMethod.java
(handleAddProxyConnectionHeader): Add new method to
handle the
addProxyConnectionHeader. Changes 'keep-alive' directive
to 'close' until
we have support for keep-alive..
* src/java/org/archive/httpclient/HttpRecorderGetMethod.java
* src/java/org/archive/httpclient/HttpRecorderPostMethod.java
Overrided addProxyConnectionHeader. Pass handling to
HttpRecorderMethod#handleAddProxyConnectionHeader.



Date: 2005-04-06 19:23
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Bad URL is actually
http://www.odpm.gov.uk/stellent/groups/odpm_control/documents/homepage/odpm_home_page.hcsp

Here is a fix. Here is commit.

Fix for '[ 1177462 ] RIS#readFullyOrUntil IOE/timeout'
* src/java/org/archive/crawler/fetcher/FetchHTTP.java
Added more help text to timeout attributes.
* src/java/org/archive/io/RecordingInputStream.java
Implement FIXME. When server is doing keep-alive, even
though we asked
it not too, we were hanging on the socket timeout and
then when it
expired, throwing an IOException killing the fetch. Do
as FIXME suggests
and on socket timeout, rather than IOE, fall into the
overall timeout
check.



Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
status_id Open 2005-04-06 19:23 stack-sf
resolution_id None 2005-04-06 19:23 stack-sf
close_date - 2005-04-06 19:23 stack-sf