From: <bi...@us...> - 2008-07-14 21:21:43
|
Revision: 2440 http://archive-access.svn.sourceforge.net/archive-access/?rev=2440&view=rev Author: binzino Date: 2008-07-14 14:21:52 -0700 (Mon, 14 Jul 2008) Log Message: ----------- Add file offset to ARCRecord created when reading WARC records. Related to JIRA: WAX-12. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/src/java/org/archive/nutchwax/ArcReader.java Modified: trunk/archive-access/projects/nutchwax/archive/src/java/org/archive/nutchwax/ArcReader.java =================================================================== --- trunk/archive-access/projects/nutchwax/archive/src/java/org/archive/nutchwax/ArcReader.java 2008-07-14 02:59:21 UTC (rev 2439) +++ trunk/archive-access/projects/nutchwax/archive/src/java/org/archive/nutchwax/ArcReader.java 2008-07-14 21:21:52 UTC (rev 2440) @@ -212,7 +212,7 @@ arcMetadataFields.put( ARCConstants.MIMETYPE_FIELD_KEY, header.getHeaderValue( null ) ); // We don't know the MIME type of the *payload* in a WARC (yet) arcMetadataFields.put( ARCConstants.LENGTH_FIELD_KEY, header.getHeaderValue( WARCConstants.CONTENT_LENGTH ) ); arcMetadataFields.put( ARCConstants.VERSION_FIELD_KEY, header.getHeaderValue( null ) ); // FIXME: Do we need actual values for these? - arcMetadataFields.put( ARCConstants.ABSOLUTE_OFFSET_KEY, header.getHeaderValue( null ) ); // FIXME: Do we need actual values for these? + arcMetadataFields.put( ARCConstants.ABSOLUTE_OFFSET_KEY, header.getOffset( ) ); // Dates must be converted from WARC format to 14-digit format, // that is, from YYYY-MM-DDTHH:MM:SSZ to YYYYMMDDHHMMSS This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |