Download Latest Version Release 1 (74.2 kB)
Email in envelope

Get an email when there's a new version of METSbuilder

Home
Name Modified Size InfoDownloads / Week
METSbuilder-README.txt 2009-10-12 4.8 kB
METSbuilder.zip 2009-09-16 74.2 kB
Totals: 2 Items   79.0 kB 1
METSBUILDER README
=================

Contents
---------

1.	General Information
2.	METSbuilder inputs/outputs
3.	METSbuilder code structure
4.	Development resources



1.	General Information
====================

METSbuilder was created during the JISC/RLUK 19th Century Pamphlets digitisation project, which was part of the JISC Digitisation Programme Phase 2.

http://www.jisc.ac.uk/whatwedo/programmes/digitisation/pamphlets.aspx

The tool was written to take scanned TIFF images and produce a digital object composed of those TIFF images, OCR generated from those images by a third-party package and metadata constructed from technical information extracted from those files, combined with pre-existing bibliographic metadata for the source objects.

The tool is supplied as is, direct from the production environment of the BOPCRIS Digitisation Centre at the University of Southampton, where the digitisation for the project took place. The tool has not been modified to remove any of the project-specific code, such as database interfaces and local filenaming conventions, except to remove database connection strings for reasons of security. The project initially hoped to be able to modify the tool for more generic use, but due to constraints of time this was not possible.

The core of the tool was designed to be flexible, so other projects would be able to modify the specifics of the tool for their own implementation. Due to the developmental stage of the tool at present this will require a .NET developer.



2.	METSbuilder outputs
====================

Inputs
-------

METSbuilder is designed to work with files of any type. 

It can extract technical image metadata from image files. This includes the extraction of information from TIFF headers.

Outputs
--------

The primary output from METSbuilder is a METS document.

The METS document contains technical image metadata in MIX/Z39.87 format, and preservation/technical file metadata in PREMIS, including checksums in MD5/SHA1 for all component files.

Code has also been provided to include a mapping of plain text files (TAB or comma delimited) to Dublin Core.

Further code has been provided that will locate XML metadata, for example in MODS, and include that in the METS document.

Additional Functionality
----------------------

Ability to compress TIFF images.

Ability to derive JPEG images from master TIFFs.

Ability to package all metadata and component files into a ZIP file (with third-party, freeware class).




3.	METSbuilder code structure
=========================

METSbuilder is a Visual Basic .NET 2005 project. (.NET Framework 2.x).

The core module is METSbuilder.vb which drives the creation of the METS document. The core class is METS.vb, which contains METS.Document. This writes the METS XML. It has interfaces for metadata at the object level (generally bibliographic metadata) and the page level (generally technical image and file metadata). New classes can be written for a new project that instantiate these interfaces and source additional metadata in any way you choose for inclusion into a METS document.

Biblio.vb contains a class that instantiates the object level interface of METS.Document for the mapping of plain text to Dublin Core. TechMD.vb contains a class that wraps the ImageFunctions.vb technical metadata extractors in a class that instantiates the interface for page level metadata, using mappings for the technical metadata to MIX/Z39.87 and PREMIS that appear in MIX.vb and PREMIS.vb. Checksums are generated using code in FileChecksum.vb. FileCrawler.vb can traverse nested directories and reorganise, compress or derive image files. Config.vb loads configuration files that control the operation of the tool, from defining file groups and their relationships to file locations. The configuration files are located in config\.

All classes with names starting "prj" are project-specific and will require updating to use local filenaming conventions etc.

Unfortunately detailed documentation is not available, as this code is provided from the production environment in which it was created. There was not time available during the project to produce detailed documentation. If you have any queries or require assistance with your own developments of METSbuilder then please feel free to contact the developer via SourceForge. I hope that this code may prove useful for your digitisation project.




4.	Development Resources
=======================

By way of indication, METSbuilder has been modified for use in production by two other projects. In each case the development was complete within a full-time week (35hrs) of development and testing.

For further information please contact the developer via SourceForge.
Source: METSbuilder-README.txt, updated 2009-10-12