Below is a forwarded thread from the OAI-PMH list. I haven't received much feedback there at all, so I'll try these lists.
Sorry it's a bit long, but the issues are a little icky. There is an impact on those running (or are planning to run) 'live' DSpaces that have metadata harvested via OAI-PMH; if you are running such a system, please read this and let me know what you think (rather than ignoring it and then complaining about what I do later ;-)
Basically, there are a number of OAI-related issues.
1/ Updated to 1.2 'touches' every record in the system, so OAI harvesters will have to re-harvest all your metadata.
Side-effect: If you update a production DSpace instance, you may need to brace yourself (and up your database connection limits) for a week or so for lots of OAI harvesting activity.
The only feasible way to avoid this I can think of is to have the DSpace update process not really 'touch' the records -- i.e. to make the required changes but set the 'last modified' date back to what it was, pretending that the item hasn't really changed.
2/ There have been various requests to change the identifiers that DSpace gives OAI metadata records to OAI-style identifiers (and not Handles). The scenario which tipped the balance for me was if two DSpace contain the same item with the same Handle -- the corresponding OAI metadata record identifiers should be different as the two DSpace might have different metadata to export, and a harvester would need
to be able to distinguish between them.
As well as resulting in a complete re-harvest, as in 1, the old records with Handle ID's would need to be flagged as 'deleted' so that harvesters didn't have old, duplicate record around. This would involve remembering these Handles and the date of the update so that future harvests of a date range including the date of the update would get the 'deleted' flags. Not impossible, but awkward.
3/ The default DSpace OAI 'base URL' causes confusion because it ends with a /, and this also means it's hard to add other stuff to your DSpace OAI application such as stylesheet transforms and the like. However, changing the base URL effectively means you are starting a new repository, as OAI has no agreed repository lifecycle procedures or mechanisms. Essentially, your old OAI repository would just disappear, and you'd have to register the new base URL, presumably causing harvesters to re-harvest from scratch.
4/ Since the set structure is changing (mainly due to the new sub-community change in the data model), many OAI records' 'set spec' is changing. However, there are problems with this -- if you're interested, see:
So, I'm wondering if the easiest thing isn't to do this: Change to the OAI identifier scheme, and change the default base URL, meaning that an upgrade to DSpace 1.2 means you have to re-register your OAI repository. Announcing the change on the OAI lists would also probably be a good idea so those running harvesters can update their systems.
This means the problems with changes identifiers and sets are addressed, albeit in a rather inelegant way. However, as far as I can tell from http://www.openarchives.org/Register/BrowseSites.pl only 4 Dspaces are actually registered as OAI data providers, so maybe the impact won't be that bad.
What do others think?
Robert Tansley / Digital Media Systems Programme / HP Labs
> > > On Thu, 10 Jun 2004, Tansley, Robert wrote:
> > > > - Are Handles appropriate IDs for the OAI metadata records?
> > > > Originally, it seemed pointless to me to use anything
> > else; forcing
> > > > people to jump through yet another hoop to get yet another
> > > unique ID
> > > > to get set up. However, on occasion people have
> indicated they'd
> > > > rather use an oai: prefixed identifier. If anyone can
> > > offer reasons
> > > > and suggestions either way I'd appreciate it.
> > >
> > > Aside from the possibility of changing ids to get around the
> > > changing sets problem, I can't see why an oai-identifier
> > > would be preferred to a handle.
> > One potential scenario I've thought of is where >1 DSpace
> > contains the same item, with the same Handle (resource ID).
> > Each DSpace might have different/additional metadata though
> > and in any case probably shouldn't use the same ID for
> > OAI-PMH. So, perhaps an easy way forward is to use an OAI
> > identifier of the form:
> > oai:(dspace hostname):(handle)
> > E.g.
> > oai:dspace.mit.edu:1721.1/1234
> So, is there any agreed process for the lifecycle of OAI data
> providers? If we make the change below in DSpace (Handles ->
> OAI scheme identifiers) there seem to be two basic choices:
> 1/ 'Delete' all the metadata records with Handles so that
> harvesters know that records have been deleted. This would
> be a pain to implement as it'd need to keep a record.
> 2/ Change the base URL, so effectively a DSpace would become
> a 'new' OAI-PMH data provider (the old URL would just stop working)
> I'd be hesitant to do 2, since that would mean for a lot of
> harvesters DSpace at MIT and a whole bunch of other places
> would seem to just 'disappear'. Openarchives.org doesn't
> seem to offer any way of changing or removing an entry, and
> in any case, do service providers actually regularly pay
> attention to this?
> I'm guessing that unless the relevant people are paying
> attention to this list, most providers wouldn't see the
> change, and even then they'd only catch the MIT change, other
> sites running DSpace would have to announce when theirs changed.
> Do any others have experience of doing something like this?