Share

Heritrix: Internet Archive Web Crawler

Tracker: Bugs

5 ARCReader crashes if zero-length gzip record - ID: 1045736
Last Update: Comment added ( karl-ia )

Reported by two people on the list. Means we're
writing zero-length GZIP records sometimes.


Michael Stack ( stack-sf ) - 2004-10-12 21:41

5

Closed

Fixed

Michael Stack

None

None

Public


Comments ( 3 )

Date: 2007-03-14 00:16
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-258 -- please add further
comments at that location.


Date: 2004-10-14 01:18
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Actually close.


Date: 2004-10-12 22:11
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed. Here is the commit:

Fixes for [ 1045736 ] ARCReader crashes if zero-length gzip
record.
Have ARCReader issue warning and move to next record if an
empty gzip
record found rather than crash. Also, in ARCWriter, move
checks to
before the getting of a GZIP writer; If we failed
consistency checks,
we were leaving and empty GZIP record in an ARC.
* src/java/org/archive/io/arc/ARCReader.java
Added new RecoverableIOException. If caught in next,
and hasNext is true
(hasNext fixes up the underlying stream so it points at
head of next
gzip record), we'll return this next record else null.
Fixed our not noticing metalines that are too short.
Converted 'Hit EOF before..' to a RecoverableIOException.
Added more detailed output on metaline being too big.
* src/java/org/archive/io/arc/ARCWriter.java
Do checks on metadata line before calling
preWriteRecordTasks.
This latter has side effects -- it writes out the gzip
header meaning
there'll be empty records if the record metadata is bad.
Formatting.
(getMetaLine): Added.



Attached File

No Files Currently Attached

Changes ( 4 )

Field Old Value Date By
assigned_to nobody 2004-10-14 01:37 stack-sf
status_id Open 2004-10-14 01:18 stack-sf
resolution_id None 2004-10-14 01:18 stack-sf
close_date - 2004-10-14 01:18 stack-sf