Share

UIUC OAI Metadata Harvesting Project

File Release Notes and Changelog

Release Name: 3.0

Notes:
This package contains the release of version 3.0 of a 
well-documented ActiveX DLL which can be used for writing 
programs on the Windows platform for harvesting Open Archive 
Initiative (OAI) Metadata Repositories .  Included are two different 
harvesters written using this DLL.  

Reap.wsf is a feature-rich, command-line harvester which has been 
in use for several years at the Grainger Engineering Library and 
Information Center at the University of Illinois at Urbana-Champaign. 
It creates a seperate file for each harvested OAI record.

BigHarvest.wsf is a simpler, less well-tested, command-line 
harvester which creates one large XML for the entire harvest.


Changes: # The format of the string returned by the AllObjects.Class property of all classes has been changed to be a concatenation of the App.Title TypeName App.Major App.Minor and SourceSafe Revision. This will now be used in all error messages. # When an object is first instantiated a check is run to ensure that all dependent libraries are present. If a dependency is missing, a hopefully useful message will be displayed. # In general, there is better error handling, particulary in the handling of non-compliant OAI repositories. For details see the change logs for each file. # Support has been added for the optional provenance container that may be present in OAI records. The following properties and methods have been added to the OAIRecordObj to support provenance: OAIRecordObj.HasProvenance Returns True if the record has a provenance container; otherwise, it returns False. OAIRecordObj.OriginalRecord If the record has a provenance, this property will return an OAIRecordObj for the original record as indicated by the provenance. OAIRecordObj.OriginalRepository If the record has a provenance, this property will return an OAIRepositoryObj for the original repository as indicated by the provenance. OAIRecordObj.OriginalRepositoryBaseURL If the record has a provenance, this property will return the baseURL for the original repository as indicated by the provenance. # There is a new OAIRecordObj.StrippedRecord property which will return an MSXML2.DOMDocument2 object containing a version of the record with all namespaces stripped out. This is to support some older XML-based databases that do not have support for namespaces. # The OAIRegRepoListObj.CurrentRepository property now supports two optional parameters usr and pwd. This allows repositories that require a userid and password to be instantiated using this property. # The OAIRegRepoListObj class now supports various different ListFriends formats, not just the one returned by the http://www.openarchives.org/Register/ListFriends. This includes plain-text lists of baseURLs, the list format used by OLAC, and the list format used by the optional friends container in an Identify response. The OAIRegRepoListObj.ListFriendsURL can be set to any one of these types of lists. # Various new properties and methods have been added to the OAIRegRepoListObj class to set or retrieve different HTTP headers thyat can control how the friends list is retrieved, such as OAIRegRepoListObj.From, OAIRegRepoListObj.UserAgent, OAIRegRepoListObj.SetBasicAuthorization, and OAIRegRepoListObj.Get/SetTimeouts. # There is a new OAIRepositoryObj.ListFriends property that can be used to instantiate a OAIRegRepoListObj from a repository that includes the optional friends container as part of its Identify response. # The class library now supports compression. There is a new property OAIRepositoryObj.UseCompression. The property can be set to True or False to indicate whether the compression should be used for those repositories that support it. The default is False. Currently the gzip or deflate compression schemes are supported via the zlib.dll. There are also a number of new errors codes that may occur due to compression errors. # The class library has improved support for HTTPS (secure HTTP). There is a new property OAIRepositoryObj.IgnoreServerCertErrors which can be set to some combination of: * SXH_SERVER_CERT_IGNORE_NONE (0) to not ignore any errors (the default) * SXH_SERVER_CERT_IGNORE_UNKNOWN_CA (256) to ignore unknown certifcate authorities * SXH_SERVER_CERT_IGNORE_WRONG_USAGE (512) to ignore incorrect certificate usage * SXH_SERVER_CERT_IGNORE_CERT_CN_INVALID (4096) to ignore invalid certificate names * SXH_SERVER_CERT_IGNORE_CERT_DATE_INVALID (8192) to ignore invalid certificate dates * SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS (13056) to ignore all of the above # For cases where the OAIRepositoryObj.ResetBaseURLFromIdentifyResponse is set to True, there is a new property OAIRepositoryObj.OriginalBaseURL which can be used to return the URL as it was originally set by the OAIRepositoryObj.BaseURL property. # There are two new properties for retrieving the HTTP Headers from the most recent response. OAIRepositoryObj.LastResponseServer will just return the Server header value. OAIRepositoryObj.LastResponseHTTPHeaders will return a string containing all of the HTTP headers returned as part of the last response. # There is a new property OAIRepositoryObj.IdentifyNamespaceURI which will return the XML namespace of the root element of the Identify response. This can be useful for identify which version of the protocol is being used or whether the response is valid at all. # The included reap.wsf command-line harvesting script has undergone significant development. It now supports ~25 different parameters for very flexible and robust harvesting, and it can be scheduled using the Windows Task Scheduler to automate both full and incremental harvests. For details see the REAP ReadMe File. # There is a new command-line harvester called BigHarvest.wsf. This is a less developed harvester that returns harvest results to the standard output as one large XML file. It is less well-tested and would probably have difficulty with non-conformant OAI data providers, but it is simple and could easily be customized to meet local needs.