From: Larry Stone <lcs@MIT.EDU> - 2008-02-19 22:42:20
> I've noticed that the history manager writes a lot of files on our
> disks and takes up a lot of space (and inodes)...
> ...what's it for?
The original History system was intended to record every modification
to the DSpace data model (i.e. Bitstreams, Items, Collections, etc) so
the provenance of every object can be established. For example, if
someone submits a work, and later on somebody else changes the
descriptive metadata, the original submitter can look at the history to
see who made the changes and when.
This is essential if you are concerned about digital preservation, since it
effectively gives you an audit trail (aka _provenance_) showing the object's
original submission and every change since.
The old HistoryManager writes at least one file under [dspace]/history
for every "transaction" that changes the data model. Each file is an
independent record, in RDF/XML. If you do not need provenance for your
archive, it is safe to delete the files under [dspace]/history -- although
you'll have to keep deleting them periodically. Some sites modify the code
(see class HistoryManager) to disable the writing of history.
Note that the old History system was removed for the 1.5 release, as
part of the Event System changes. I wrote a prototype of a new
implementation based on Events, which records more meaningful (and
documented) data and uses a modern RDF database (the OpenRDF
triple-store) so it is also a lot more efficient. See the wiki:
This is only a prototype, and it's fallen out of date, so it would
have to be ported to the current 1.5 release and packaged with maven
to become an add-on. It's got a lot of advantages over the old system,
most notably that you can get a history report for an object showing all
its activity - extracting the provenance. The old system had no queries