Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

5 ARCWriter makes FAT gzip header - ID: 982909
Last Update: Comment added ( karl-ia )

ARCWriter has java write the gzip header. Java writes
an operating system of FAT (MS-DOS, OS/2, NT/Win32).
To align with the alexa toolset which always writes an
OS of UNIX ('3') and because, comparing ARC records,
this one byte is all that differs in records written by
ARCWriter when compared to arc records written by the
alexa toolset, we should change it so ARCWriter always
says UNIX.

Igor spotted this one.


Michael Stack ( stack-sf ) - 2004-06-30 18:09

5

Closed

None

Michael Stack

i/o

None

Public


Comments ( 2 )

Date: 2007-03-14 01:31
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-790 -- please add further
comments at that location.


Date: 2004-06-30 18:28
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Fixed.

Index: src/java/org/archive/io/arc/ARCWriter.java
===================================================================
RCS file:
/cvsroot/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/io/arc/ARCWriter.java,v
retrieving revision 1.13
diff -u -r1.13 ARCWriter.java
--- src/java/org/archive/io/arc/ARCWriter.java 24 Jun 2004
00:32:01 -0000 1.13+++
src/java/org/archive/io/arc/ARCWriter.java 30 Jun 2004
18:25:25 -0000
@@ -437,9 +437,9 @@
// Set the GZIP FLG header to '4' which says
that the GZIP header
// has extra fields. Then insert the alex
{'L', 'X', '0', '0', '0, // '0'} 'extra' field.
The IA GZIP header will also set byte
- // 9, the OS byte, to 3 (Unix). We won't do
that since its java
- // that is doing the gzipping.
+ // 9 (zero-based), the OS byte, to 3 (Unix).
We'll do the same.
gzippedMetaData[3] = 4;
+ gzippedMetaData[9] = 3;
byte [] assemblyBuffer = new
byte[gzippedMetaData.length +
ARC_GZIP_EXTRA_FIELD.length];


Attached File

No Files Currently Attached

Changes ( 2 )

Field Old Value Date By
status_id Open 2004-06-30 18:28 stack-sf
close_date - 2004-06-30 18:28 stack-sf