We're contemplating using Fedora as our digital object datastream store.
Towards that end, we'd like to implement some sort of hashing
algorithm to generate fairly random-looking filesystem paths to the
objects, so that all our filesystems fill up evenly, and space
allocation doesn't depend on any attribute of the objects (when it was
created, what collection it belongs in, its name, etc.)
Fedora by default stores objects in paths generated using a datetime
algorithm, based on when you stuck it in the repository, such as:
Instead, we'd like to see something like:
[ ... ]
where you take a datastream, calculate a hash from something (unique
identifier, filename+timestamp, whatever) and use that to break out your
path into subdirectories. Since any good hash generates random outputs
based on given inputs, we can break up the directory structure into big
filesystem "buckets" that all fill up at the same rate.
As far as I can tell, this would not be too difficult in Fedora 2.2:
path generation seems to be handled in the package
fedora.server.storage.lowlevel, and you configure which implementation
to use at runtime in the main "fedora.fcfg" file, specifying which
PathAlgorithm implementation class to use:
<comment>The java class used to determine the path algorithm;
So to implement the suggested hashing algorithm, it would simply be a
matter of writing our own
fedora.server.storage.lowlevel.MyHashPathAlgorithm class, and
configuring it in fedora.cfg, correct?
Has anyone done anything like this? Any caveats or pieces I'm missing here?
Library, Instructional, and Research Applications (LIRA)
Division of Information Technology (DoIT)
University of Wisconsin - Madison