first of all I like to express my thanks for the excellent support here in the forum.
I recognized the following case where I can not find an explanation for it.
When I query our OAI Provider with <URL>provider?verb=ListIdentifiers&metadataPrefix=iso19139&from=2010-09-13T15:39:41z
I get the following response
If I check on operating system level
Size: 23911 Blocks: 48 IO Block: 4096 regular file
Device: 902h/2306d Inode: 417462 Links: 1
Access: (0644/-rw-r-r-) Uid: ( 502/ rss) Gid: ( 500/ gisc)
Access: 2010-06-15 15:00:46.678466254 +0000
Modify: 2010-06-15 15:00:46.679465679 +0000
Change: 2010-06-15 15:00:46.679465679 +0000
So the OAI response tells me that the file was inserted/changed 2010-09-14T11:56:38Z while on operating system level the last access was Access: 2010-06-15 15:00:46.678466254 +0000 - nearly 3 month before.
For synchronization purposes I like to use the from attribute to avoid unnecessary synchronization of file without changes. I'm now a bit confused that OAI tells me that the file was change in September file the last access was in June.
Could you please help me to clarify this point - maybe I make a wrong assumption.
You can reliably use the 'from' specifier in the OAI protocol to perform incremental harvests from a jOAI data provider.
jOAI uses an internal index that keeps track of a given record's modification datestamp, but note that this datestamp does not correspond directly to the file modification date that you see on the file system. Instead, the modification datestamp in jOAI corresponds to the date and time in which the record was *indexed* by jOAI. When a new record is indexed, that becomes the OAI modification datestamp. jOAI then monitors the file (every 8 hours by default) to see if it changes. When a change is detected, the record is then updated in the index and that becomes the new OAI modification datestamp for the record.
In this way the record datestamps are maintained within jOAI to support incremental harvests, but they do differ from the datestamps for the corresponding files seen file system.
thank you very much for your fast reply. I'm sorry that I was not clear enough and have to follow-up.
In my example above I insert a new record into my repository and execute a reindex manually. When the index process is finished, I send the OAI query <URL>provider?verb=ListIdentifiers&metadataPrefix=iso19139&from=2010-09-13T15:39:41z
The reponse contains my new record - as expected - but also files not touched for 3 month. During this 3 month I have inserted a couple of other records but never touched the listed records - see OS stat output. I'm wondering about this fact. A harvester would harvest in this case the records and believe there was a change but there wasn't one.
I hope I could clarify my problem a bit more.
If you like to suggest a test procedure please let me know.
The OAI datestamp should only change if the underlying file has changed, as indicated by a change in the modification time returned by Java. So what you describe is not the expected behavior.
Is it all of your records that are seeing new datestamps or just some? Is there anything in common about those records that might explain a pattern? In terms of testing, can you determine what triggers the datestamps to be updated when they should not. For example, do the datestamps only change when one record is modified or do they change even when no other records are updated?
Is your OAI data provider public? If so, would you mind passing along the Base URL for me to look at?
thank you very much for you help.
Here is the Base URL of our OAI Proivder
I will come back to your other questions later.
I recognized that files which show this behaviour are 2 times in the repository with different case:
e.g. de.dwd.routine.meg.gme.germany.eddl.xml and de.dwd.routine.meg.gme.germany.EDDL.xml
But I do not get an index error. I guess it may be a good strategy to insert only file with unique filename ignoring case to avoid this effect.
Thank you very much - you helped me a lot to understand this problem.
Thank you for this information. I agree that an error should be thrown if the file names are the same ignoring case. I will look into that.
I assume that your problem has been solved, however, please let me know if not.
Log in to post a comment.