From: <Ism...@te...> - 2013-06-07 14:09:36
|
Hi I'm am performing file format conversions on WARC record payloads (for example turning .doc files to .pdf). Based on the WARC specification I believe the method of doing this is to create a new WARC record resembling this format: WARC-Type: conversion. WARC-Target-URI: Same as the original record WARC-Date: The date-time when this conversion record is created WARC-Record-ID: new unique ID WARC-Refers-To: old record id Content-Type: New content type ... (for an example conversion record see appendix C.7 of ISO 28500:2009) I would like to use WARC files containing conversion records with WBM. I would like to be able to set a configuration option to tell WBM to always use the most recent conversion record payload when I request the target URI. In other words, the most recent conversion record payload/content-type is used in place of the original record. Will this feature be implemented in WBM and is there a time scale for it? I feel this is an important feature to have in WBM since the alternative method of carrying out file format conversions requires modifying the contents of the WARC which would compromise its integrity so it is something I'd like to avoid. - Ismail Programmer, Tessella Ltd |