OAI datestamp vs. stat on OS

Markus
2010-09-14
2012-12-12
  • Markus

    Markus - 2010-09-14

    Hi all,

    first of all I like to express my thanks for the excellent support here in the forum.

    I recognized the following case where I can not find an explanation for it.

    When I query our OAI Provider with <URL>provider?verb=ListIdentifiers&metadataPrefix=iso19139&from=2010-09-13T15:39:41z

    I get the following response
    <header>
    <identifier>int.wmo.gts.smaa01_edzw</identifier>
    <datestamp>2010-09-14T11:56:38Z</datestamp>
    <setSpec>EDZW</setSpec>
    </header>

    If I check on operating system level
    stat int.wmo.gts.smaa01_edzw.xml
      File: `int.wmo.gts.smaa01_edzw.xml'
      Size: 23911           Blocks: 48         IO Block: 4096   regular file
    Device: 902h/2306d      Inode: 417462      Links: 1
    Access: (0644/-rw-r-r-)  Uid: (  502/     rss)   Gid: (  500/    gisc)
    Access: 2010-06-15 15:00:46.678466254 +0000
    Modify: 2010-06-15 15:00:46.679465679 +0000
    Change: 2010-06-15 15:00:46.679465679 +0000

    So the OAI response tells me that the file was inserted/changed 2010-09-14T11:56:38Z while on operating system level the last access was  Access: 2010-06-15 15:00:46.678466254 +0000 - nearly 3 month before.

    For synchronization purposes I like to use the from attribute to avoid unnecessary synchronization of file without changes. I'm now a bit confused that OAI tells me that the file was change in September file the last access was in June.

    Could you please help me to clarify this point - maybe I make a wrong assumption.

    Cheers,
    Markus

     
  • John Weatherley

    John Weatherley - 2010-09-15

    Hi Markus,

    You can reliably use the 'from' specifier in the OAI protocol to perform incremental harvests from a jOAI data provider.

    jOAI uses an internal index that keeps track of a given record's modification datestamp, but note that this datestamp does not correspond directly to the file modification date that you see on the file system. Instead, the modification datestamp in jOAI corresponds to the date and time in which the record was *indexed* by jOAI. When a new record is indexed, that becomes the OAI modification datestamp. jOAI then monitors the file (every 8 hours by default) to see if it changes. When a change is detected, the record is then updated in the index and that becomes the new OAI modification datestamp for the record.

    In this way the record datestamps are maintained within jOAI to support incremental harvests, but they do differ from the datestamps for the corresponding files seen file system.

    -john

     
  • Markus

    Markus - 2010-09-15

    Hi John,

    thank you very much for your fast reply. I'm sorry that I was not clear enough and have to follow-up.

    In my example above I insert a new record into my repository and execute a reindex manually. When the index process is finished, I send the OAI query <URL>provider?verb=ListIdentifiers&metadataPrefix=iso19139&from=2010-09-13T15:39:41z

    The reponse contains my new record - as expected - but also files not touched for 3 month. During this 3 month I have inserted a couple of other records but never touched the listed records - see OS stat output. I'm wondering about this fact. A harvester would harvest in this case the records and believe there was a change but there wasn't one.

    I hope I could clarify my problem a bit more.

    If you like to suggest a test procedure please let me know.

    Cheers,
    Markus

     
  • John Weatherley

    John Weatherley - 2010-09-15

    Hi Markus,

    The OAI datestamp should only change if the underlying file has changed, as indicated by a change in the modification time returned by Java. So what you describe is not the expected behavior.

    Is it all of your records that are seeing new datestamps or just some? Is there anything in common about those records that might explain a pattern? In terms of testing, can you determine what triggers the datestamps to be updated when they should not. For example, do the datestamps only change when one record is modified or do they change even when no other records are updated?

    Is your OAI data provider public? If so, would you mind passing along the Base URL for me to look at?

    -john

     
  • Markus

    Markus - 2010-09-15

    Hi John,
    thank you very much for you help.
    Here is the Base URL of our OAI Proivder
    http://china2.dwd.de:8080/oai/provider

    I will come back to your other questions later.

    Best wishes,
    Markus

     
  • Markus

    Markus - 2010-09-15

    Hi John,

    I recognized that files which show this behaviour are 2 times in the repository with different case:

    e.g. de.dwd.routine.meg.gme.germany.eddl.xml and de.dwd.routine.meg.gme.germany.EDDL.xml

    But I do not get an index error. I guess it may be a good strategy to insert only file with unique filename ignoring case  to avoid this effect.

    Thank you very much - you helped me a lot to understand this problem.

    Cheers,
    Markus

     
  • John Weatherley

    John Weatherley - 2010-09-15

    Hi Markus,

    Thank you for this information. I agree that an error should be thrown if the file names are the same ignoring case. I will look into that.

    I assume that your problem has been solved, however, please let me know if not.

    Thanks,

    -john

     

Log in to post a comment.