Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
I harvested the following URL using JOAI
I got 167 records. However, the records don\'t pass XMLLint. I get the message "Input:1: namespace error : Namespace prefix xsi for schemaLocation on dc is not defined". I tried to post the full message from XMLLint but Sourceforge thinks that the message is spam.
I think this is a bug in JOAI. Please let me know if it is not.
This appears to be a problem with records being returned by the data provider. The records are missing the namespace declaration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Without this declaration, the xsi namespace is not bound, and thus the records cannot be validated.
The root element in the records should look like this:
But instead look like:
See this example record from the OAI-PMH specification here:
Ideally the data provider should fix this on their end. Alternatively you could post-process the records to add the namespace declaration in yourself.
I have looked at the XML returned by
At the top level of the document the string xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; DOES appear. My understanding of the XML standard is that the document is not required to repeat this namespace declaration, but JOAI must copy it down into the individual files.
I believe you are correct that the XML standard only requires that the XML schema namespace appear once in a given document so from that point of view the ListRecords response is valid.
The OAI-PMH protocol, however, requires that the XML schema namespace also be included in the root tag inside the <metadata> portion of the ListRecords and GetRecord responses. From section 2.5 of the protocol specification (http://www.openarchives.org/OAI/openarchivesprotocol.html#Record):
"xml schema namespace - every metadata part must include the attribute xmlns:xsi, the value of which must always be the URI shown in the example, which is the namespace URI for XML schema."
The example shown as such:
I believe the intention of the protocol requirements are that the contents of the <metadata> elements be complete, valid XML documents that can stand on their own without modification.
Given that some of the repositories that I am harvesting are not including an xmlns:xsi declaration on the level of record, but on the level of the document, I would like to manipulate the record file production to include that declaration if necessary. Is it possible to get the source code in order to do that? If not, is that something that could be added?
Information about how to get the source code and build jOAI is available here:
When fetching from CVS, use release tag oai_v3_1_1_2 (current release).
The relevant code block that writes the metadata files out is in Harvester.java - see method extractContent() that starts on line 991:
It would be nice to roll this functionality back into jOAI (e.g. detect if the schema namespace is in the metadata record and if not, write it there). If you are willing, please let me know and I can assist in integrating the code back into the project.
I emailed you with a one line (two with comments) change to Harvester.java that fixes the problem described in this thread. Did you get the email? I am installing JOAI on a stripped down machine and would appreciate if I could get a binary distribution (for some reason, the binary I am compiling works on the machine I compiled it on, but not on the one I want to run it on)
I got the code. I'll send separate info about obtaining a binary distribution with this included…
John - Any progress?
I'm waiting on a couple other contributions for the next release. Should be soon…
The latest release includes this enhancement.