As promised (and faster than expected), I have added regular-expression-based ID manipulation in VuFind’s OAI-PMH harvester tool, so it should now be possible to munge IDs however you like and get consistent results from the harvest, import and delete tools.  See the new idSearch/idReplace parameters in harvest/oai.ini of VuFind after r3060.


- Demian


From: Demian Katz
Sent: Thursday, October 21, 2010 8:36 AM
To: 'mikan.d.dspace listmail';
Subject: RE: [VuFind-Tech] Automated importing


Eoghan covered most of the details here, but just one more thing: I’m soon going to be adding some features to the OAI-PMH harvester to allow more ID manipulation at the harvest stage.  Hopefully this will simplify the process of distinguishing between MARC records from different sources.  I was initially resistant to adding extra complexity to the harvester, but Fang Peng convinced me that it’s worthwhile, since if you normalize IDs within the harvester, it makes it easier to deal with incremental updates and deletes.  I’ll post more details when this is ready.


- Demian


From: mikan.d.dspace listmail []
Sent: Thursday, October 21, 2010 4:21 AM
Subject: [VuFind-Tech] Automated importing


We're planning to import large amount of data from different sources to VuFind. Some items have ID's, which may overlap and they might need additional prefix to be added in order to fall in SOLR nicely. What would be the preferred way of doing this kind of conversion? Does importer scripts have means/tools for this, or should I write a script of my own to do this?

Since the other data sources are still active, I need to run these batch imports nightly to keep VuFind up to date. Any experience on how to arrange this kind of automation; any considerations / problems / good practices I should take advice on?

Thanks for the tips,
Mika Stenberg