Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

7 crawl.log entires do not reflect 'no space left' error - ID: 1395637
Last Update: Comment added ( karl-ia )

When ARCWriter hits 'no space left on device' error,
crawl.log should have entries that reflect this error.
Maybe a new error code or annotation should be added.

i.





Igor Ranitovic ( ia_igor ) - 2006-01-02 23:24

7

Closed

None

Karl Thiessen

General

1.8.0

Public


Comments ( 4 )

Date: 2007-03-14 01:04
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-532 -- please add further
comments at that location.


Date: 2006-05-03 21:03
Sender: karl-ia

Logged In: YES
user_id=1269624

Closing; separate RFE for better disk-full Heritrix behaviour.


Date: 2006-01-17 01:58
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

ARCWriterProcessor already catches and logs (as a
localizedError) IOExceptions during writing. Made a 1-line
change which ensures all localizedErrors also cause an
annotation to be added to the current CrawlURI. Format is:

le:[short-name-of-throwable]@[current-processor-name]

This does not specifically mark crawl.log line with 'disk
full' note -- the annotation may look the same for other
IOExceptions -- but I suspect even that generic markup will
be sufficient for the operator need.

Commit comment:
Fix for [ 1395637 ] crawl.log entires do not reflect 'no
space left' error
* CrawlURI.java
add log annotation for all localizedErrors, of form
"le:[throwable]@[processor]"

We should definitely have a 'fail-gracefully-when-disk-full'
test case (or several). Possible approach: create a temp ram
disk, have crawling write ARCs there, ensure quick failure
(after at least a few successful writes) is reported properly.

Assigning to Karl for implementation of such a test, time
permitting.


Date: 2006-01-17 01:07
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Annotation seems better: the code is primarily meant as a
'fetchStatusCode' (though in the case of major unhandled
errors, we've overloaded the code with meanings drawn from
non-fetch conditions).


Attached File

No Files Currently Attached

Changes ( 6 )

Field Old Value Date By
status_id Open 2006-05-03 21:03 karl-ia
close_date - 2006-05-03 21:03 karl-ia
artifact_group_id None 2006-03-17 23:13 gojomo
assigned_to gojomo 2006-01-17 01:58 gojomo
priority 6 2006-01-17 01:07 gojomo
assigned_to nobody 2006-01-17 01:07 gojomo