From: Paolo <oo...@us...> - 2007-08-06 23:31:45
|
On Mon, Aug 06, 2007 at 09:31:29PM +0200, Gerrit E.G. Hobbelt wrote: > > - start each file with a versioned header (I'll come back to that later) that's well established for Fidelis' OSBF > The way to provide the forward portability would be through providing an > export/import mechanism (already exists for a few formats: cssdump) ... > The versioned header should contain enough information for an > export/import function to operate correctly: > a) import all acceptable data, or there's a catch, as the original arch on which to do the export 1st might not be avail anymore ... > b) report the incompatibility and hence the need to 'recreate/relearn' > the files. ... and b) might not always be an option. > Especially (b) is important as that'd enable (automated) upgrades to > properly interact with the users: one would then be able to select yep, but I'd consider a bug (which might be just a TODO) a convertion util/function which is unable to properly convert our own stuff from arch1 to arch2, both ways, whatever arch* are. Such converters won't be exactly trivial (byte swapping, aligning, padding, etc) but feasable. > The binary format header will include these information items (at least): > > - the crm version used to create the file > - the platform (integer size and format(endianess), floating point size > and format, structure alignment, etc.) > - the classifier used to create the file > - the data content type (some classifiers use multiple files) > - space for future expansion (this is a research tool too: allow folks > to add their own stuff which may not fit the header items above) +file-format version and, since there'll be plenty of space, plain-text file-format blurb and summary file-stats, so that head -x css would be just fine to report the relevant things. > The approach includes the existence of an export/import tool to convert > the data to/from a cross-platform portable text format, where that's the current CSV inter-format, though the converter should be able to do it at once binary-2-binary. > What are your thoughts on this matter? Is this worth persuing (and hence > augmenting the code to support such a header from now on) or is this, > well... for spam filtering, it's easier (and usually better) to start from scratch, but in other applications hashes DB might be precious stuff, so as people extends crm114 use to other tasks, such tool might become highly desirable. -- paolo |