Xena - Digital Preservation Software / Wiki / 10_-_Creating_a_metadata_package

10_-_Creating_a_metadata_package_wrapper

Authors:

Now we need to implement the metadata wrapper. We know that the:

metadata will require the department name and the user name
all Xena metadata wrappers should store some kind of ID to use when we denormalise. The exception to this is the empty metadata wrapper.
[TOC]

What does the metadata wrapper actually do?

The metadata wrapper is responsible for two things:

writing metadata
extracting metadata from files normalised using that particular metadata wrapper.

But what if I don't want any metadata? Then, we have a built in empty metadata wrapper, that doesn't write anything! Its name is Empty Metadata Wrapper and its class is:

au.gov.naa.digipres.xena.kernel.metadatawrapper.EmptyWrapper

This is a public static final string in the MetaDataWrapperManager - EMPTY_WRAPPER_NAME and can be set programmatically with code such as:

xena.setActiveMetaDataWrapperPlugin(MetaDataWrapperManager.EMPTY_WRAPPER_NAME);

Extend the DemoInfoProvider

Before we start work on our metadata wrapper we will first extend our DemoInfoProvider to provide the department name. Like the user name and department code this will also be a random choice from a list of department names. Here is the updated DemoInfoProvider.java file:

package au.gov.naa.digipres.xena.demo.orgx;

import java.util.Random;

public class DemoInfoProvider implements InfoProvider {

    private String userName;
    private String departmentCode;
    private String departmentName;

    private String randomUserNames[] = {"Homer", "Karl", "Kenny", "Monty Burns", "Smithers"};
    private String randomDepartmentNames[] = {"Sector 7G", "Sector 7A", "Corporate", "Administration"}; 
    private String randomDepartmentCodes[] = {"S7G", "S7A", "COR", "ADM"};

    private Random random = new Random();

    /**
     * Return the username if it is set, or a random one from randomUserNames if it is not.
     */
    public String getUserName() {
        return randomUserNames[random.nextInt(randomUserNames.length)];
    }

    /**
     * Return the departmentCode if it is set, or a random one from randomDepartmentCodes if it is not.
     * @return the department code.
     */
    public String getDepartmentCode() {
        return randomDepartmentCodes[random.nextInt(randomDepartmentCodes.length)];
    }

    /**
     * @return Returns the deparmentName.
     */
    public String getDepartmentName() {
        if (departmentName == null) {
            departmentName = randomDepartmentNames[random.nextInt(randomDepartmentNames.length)];
        }
        return departmentName;
    }

}

Create the metadata wrapper

The metadata wrapper object extends the AbstractMetaDataWrapper, and since it is a concrete class, it must implement the abstract methods:

package au.gov.naa.digipres.xena.demo.orgx;

import au.gov.naa.digipres.xena.kernel.XenaException;
import au.gov.naa.digipres.xena.kernel.XenaInputSource;
import au.gov.naa.digipres.xena.kernel.metadatawrapper.AbstractMetaDataWrapper;

public class OrgXMetaDataWrapper extends AbstractMetaDataWrapper {

    @Override
    public String getOpeningTag() {
        //Default method stub
        return null;
    }

    @Override
    public String getSourceId(XenaInputSource input) throws XenaException {
        //Default method stub
        return null;
    }

    @Override
    public String getSourceName(XenaInputSource input) throws XenaException {
        //Default method stub
          return null;
    }

    @Override
    public String getName() {
        //Default method stub
        return null;
    }

}

Create static constants

We need to make static final constants for all the things that aren't going to change. A quick chat to the Organisation X metadata standards steering committee reveals the following tag names:

OPENING TAG
orgx

METADATA SECTION
meta

DEPARTMENT NAME
department

USER NAME
user_name

XENA ID
input_name

ORGANISATION X ID
orgx_id

So with this information, we can implement the tag names and make the constants:

public class OrgXMetaDataWrapper extends AbstractMetaDataWrapper {

    public static final String ORGX_OPENING_TAG = "orgx";    
    public static final String ORGX_META_TAG = "meta";    
    public static final String ORGX_DEPARTMENT_TAG = "department";    
    public static final String ORGX_USER_TAG = "user_name";    
    public static final String ORGX_INPUT_NAME_TAG = "input_name";    
    public static final String ORGX_CONTENT_TAG = "record_data";   
    public static final String ORGX_ID_TAG = "orgx_id";

    @Override
    public String getOpeningTag() {
        //Default method stub
        return null;
    }

    @Override
    public String getSourceId(XenaInputSource input) throws XenaException {
        //Default method stub
        return null;
    }

    @Override
    public String getSourceName(XenaInputSource input) throws XenaException {
        //Default method stub
          return null;
    }

    @Override
    public String getName() {
        //Default method stub
        return null;
    }

}

Implement the methods

In the Xena util package, there is an object called the TagContentFinder. It has a single static method, which finds the content of a specified tag within an XML document. Using this method, we can look up a given tag in a single line. So, let's have a go at implementing these methods:

    @Override
    public String getOpeningTag() {
        return ORGX_OPENING_TAG;
    }

    @Override
    public String getSourceId(XenaInputSource input) throws XenaException {
        return TagContentFinder.getTagContents(input, ORGX_ID_TAG);
    }

    @Override
    public String getSourceName(XenaInputSource input) throws XenaException {
        return TagContentFinder.getTagContents(input, INPUT_NAME_TAG);
    }

    @Override
    public String getName() {
        return "OrgX Meta Data Wrapper";
    }

The AbstractMetaDataWrapper object extends the default SAX Filter implementation, XMLFilterImpl. XMLFilterImpl implements all the XML Filter interface methods with empty bodies. When a document is to be normalised:

1. The metadata wrapper is obtained.

2. The startDocument method is called.

3. After the normaliser is done, the endDocument method is called.

4. The normaliser.parse() method is called on the input source.

In order for us to actually write some metadata, the metadata wrapper must override the startDocument and endDocument methods. Properties are set for the metadata wrapper to reference the normaliser. When normalising, we call MetaDataWrapper.startDocument(), then normaliser.parse(), then MetaDataWrapper.endDocument().

The metadata will be created as follows:

1. Write the opening tag.

2. Open the metadata tag.

3. Close the metadata tag.

4. Open the record content tag.

5. Normalise the input source.

6. Close the record content tag and the opening tag.

To do all this, we will set up the content handler by calling super.startDocument(), which goes all the way up to the default XML filter implementation. Then we will get our content handler and use it to write out our information. Since we are not using namespaces for this demonstration, the namespace string is always set to null, and we use unqualified tag names. Also, since there are no attributes for any of these at the moment, we can just create a single AttributesImpl object, leave it empty, and pass it to all the start element calls. If we wanted to, we could use namespaces but we will do without them for now. In this case, local and qualified names will be the same:

    @Override
    public void startDocument() throws SAXException {
        super.startDocument();
        ContentHandler th = getContentHandler();
        AttributesImpl att = new AttributesImpl();
        th.startElement(null, ORGX_OPENING_TAG, ORGX_OPENING_TAG, att);
        th.startElement(null, ORGX_META_TAG, ORGX_META_TAG, att);
        th.endElement(null, ORGX_META_TAG, ORGX_META_TAG);
        th.startElement(null, ORGX_CONTENT_TAG, ORGX_CONTENT_TAG, att); 
    }

    @Override
    public void endDocument() throws org.xml.sax.SAXException {
        ContentHandler th = getContentHandler();
        th.endElement(null, ORGX_CONTENT_TAG, ORGX_CONTENT_TAG);
        th.endElement(null, ORGX_OPENING_TAG, ORGX_OPENING_TAG);
        super.endDocument();
    }

To retrieve the information for our metadata we will use our InfoProvider interface; just like we did for the fileNamer. The same DemoInfoProvider object will be passed to both the OrgXFileNamer and OrgXMetaDataWrapper, ensuring that the (random) information returned will be the same for each. The cool thing about this is when we completely change the way we retrieve the information in step 11, we won't have to change the startDocument method since the only thing that will change is within the metadata tags. So, lets do it!

    private InfoProvider myInfoProvider = null;

        public InfoProvider getInfoProvider() {
                return myInfoProvider;
        }

    public void setInfoProvider(InfoProvider infoProvider) {
        myInfoProvider = infoProvider;
    }

    public void startDocument() throws SAXException {
        super.startDocument();
        ContentHandler th = getContentHandler();
        AttributesImpl att = new AttributesImpl();
        th.startElement(null, ORGX_OPENING_TAG, ORGX_OPENING_TAG, att);
        th.startElement(null, ORGX_META_TAG, ORGX_META_TAG, att);

        // The department name
        th.startElement(null, ORGX_DEPARTMENT_TAG, ORGX_DEPARTMENT_TAG, att);
        th.characters(getInfoProvider().getDepartmentName().toCharArray(),
                      0,
                      getInfoProvider().getDepartmentName().toCharArray().length);

        // The User name
        th.startElement(null, ORGX_USER_TAG, ORGX_USER_TAG, att);
        th.characters(getInfoProvider().getUserName().toCharArray(),
                      0,
                      getInfoProvider().getUserName().toCharArray().length);
        th.endElement(null, ORGX_USER_TAG, ORGX_USER_TAG);

        th.endElement(null, ORGX_META_TAG, ORGX_META_TAG);
        th.startElement(null, CONTENT_TAG, CONTENT_TAG, att);

    }

Generate the Organisation X ID

Now all we need is the code to generate the Organisation X ID, and the input ID. For the moment we will construct the Organisation ID by combining the user name, department name and filename. But therein lies a problem - what do we call the file? Should the whole URI source string be used, or only the filename component? Or should some of the folders be included? Fortunately Xena has a facility to set a base path, so that in the case of a file being normalised, we can get the name of the file relative to that path. If this doesn't work, Xena will simply return the whole name from the URI. This is actually done a few times in Xena, and has been added to the utility class, SourceFileNameParser.

Since we will be using the department and user names, the code has been slightly refactored:

    @Override
    public void startDocument() throws SAXException {
        String departmentName = getInfoProvider().getDepartmentName();
        String userName = getInfoProvider().getUserName();
        String fileName = "";
        try {
            XenaInputSource xis = (XenaInputSource)getProperty("http://xena/input");
            if (xis != null) {
                fileName = SourceURIParser.getRelativeSystemId(xis, metaDataWrapperManager.getPluginManager());
            }
        } catch (SAXException saxe) {
            fileName = "Unknown";
        }

        super.startDocument();
        ContentHandler th = getContentHandler();
        AttributesImpl att = new AttributesImpl();
        th.startElement(null, ORGX_OPENING_TAG, ORGX_OPENING_TAG, att);
        th.startElement(null, ORGX_META_TAG, ORGX_META_TAG, att);

        // department name
        th.startElement(null, ORGX_DEPARTMENT_TAG, ORGX_DEPARTMENT_TAG, att);
        th.characters(departmentName.toCharArray(), 0, departmentName.toCharArray().length);
        th.endElement(null, ORGX_DEPARTMENT_TAG, ORGX_DEPARTMENT_TAG);

        // user name
        th.startElement(null, ORGX_USER_TAG, ORGX_USER_TAG, att);
        th.characters(userName.toCharArray(), 0, userName.toCharArray().length);
        th.endElement(null, ORGX_USER_TAG, ORGX_USER_TAG);

        // input name
        th.startElement(null, ORGX_INPUT_NAME_TAG, ORGX_INPUT_NAME_TAG, att);
        th.characters(fileName.toCharArray(), 0, fileName.toCharArray().length);
        th.endElement(null, ORGX_INPUT_NAME_TAG, ORGX_INPUT_NAME_TAG);

        // org x ID
        th.startElement(null, ORGX_ID_TAG, ORGX_ID_TAG, att);
        String orgx_id = fileName + "_" + departmentName + "_" + userName + "_";
        th.characters(orgx_id.toCharArray(), 0, orgx_id.toCharArray().length);
        th.endElement(null, ORGX_ID_TAG, ORGX_ID_TAG);

        th.endElement(null, ORGX_META_TAG, ORGX_META_TAG);
        th.startElement(null, ORGX_CONTENT_TAG, ORGX_CONTENT_TAG, att);

    }

Create a class to remove the metadata on export

When we export, we need a class to remove all the metadata we added previously. This class is expected by the MetaDataPluginManager when a new metadata wrapper is loaded.

When Xena exports something, it does the following:

1. Attempts to find the MetaDataWrapper that wrapped the object during normalisation.

2. If it can figure out which one it is, (this is a matter of looking for and recognising the opening tag of the XML document) it then unwraps the metadata XML from the file, and finds the tag that is the opening tag of the actual content. This is done by creating an XML filter, then making the unwrapper be the content handler for that filter.

3. The metadata wrapper parses the document, checks to see if it is within the actual content, and if so calls it's content handler to parse the output.

4. Our XML filter returns the opening tag of the content, and this is used to identify the normaliser for this tag.

When we have a denormaliser that can handle the content, almost the same thing happens - only this time the unwrapper will have the appropriate denormaliser set as it's content handler - and this will perform the appropriate denormalisation.

In the case of the empty package wrapper being used - and there being no metadata at all, Xena will attempt to identify the normaliser based on the opening tag of the document. The default XMLFilterImpl object is used as the unwrapper.

All we will do is make an object that extends XMLFilterImpl object, and overrides the startElement, endElement and characters methods. When we are within the normalised part of the XML, we will call super.startElement(), and it will handle all of that for us. Here it is:

package au.gov.naa.digipres.xena.demo.orgx;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.XMLFilterImpl;

public class OrgXUnwrapper extends XMLFilterImpl {
        int packagesFound = 0;

        boolean contentFound = false;

        @Override
        public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {

                if (contentFound) {
                        super.startElement(namespaceURI, localName, qName, atts);
                }
                if (qName.equals(OrgXMetaDataWrapper.ORGX_CONTENT_TAG)) {
                        contentFound = true;
                }
        }

        @Override
        public void endElement(String namespaceURI, String localName, String qName) throws SAXException {
                if (qName.equals(OrgXMetaDataWrapper.ORGX_CONTENT_TAG)) {
                        contentFound = false;
                }
                if (contentFound) {
                        super.endElement(namespaceURI, localName, qName);
                }
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
                if (contentFound) {
                        super.characters(ch, start, length);
                }
        }

        protected boolean pass() {
                return contentFound;
        }
}

Update the OrgXPlugin class

Now we need to update the OrgXPlugin class so that our metadata wrapper (and associated unwrapper) are loaded. We will implement the getMetaDataWrappers method. This method needs to return a map of AbstractMetaDataWrappers to XMLFilters, so we will create a map that associates our OrgXMetaDataWrapper with our OrgXUnwrapper:

        @Override
        public Map&lt;AbstractMetaDataWrapper, XMLFilter&gt; getMetaDataWrappers() {
                Map&lt;AbstractMetaDataWrapper, XMLFilter&gt; wrapperMap = new HashMap&lt;AbstractMetaDataWrapper, XMLFilter&gt;();
                wrapperMap.put(new OrgXMetaDataWrapper(), new OrgXUnwrapper());
                return wrapperMap;
        }

Test the metadata wrapper

To test, we will use the same NormaliseTester as for the FileNamer, but we will add some code to look at the wrappers. Here it is:

public static void main(String[] argv) {
        Xena xena = new Xena();

        // our orgx jar will already be on the class path, so load it by name...
        Vector&lt;String&gt; pluginList = new Vector&lt;String&gt;();
        pluginList.add("au.gov.naa.digipres.xena.demo.orgx.OrgXPlugin");
        xena.loadPlugins(pluginList);

        // set the base path to be the current working directory
        xena.setBasePath(System.getProperty("user.dir"));
        System.out.println(System.getProperty("user.dir"));

        // create the new input source
        File f = new File("../../../data/example_file.foo");
        XenaInputSource xis = new XenaInputSource(f);

        // guess its type
        Guess fooGuess = xena.getBestGuess(xis);

        //print the guess...
        System.out.println("Here is the best guess returned by Xena: ");
        System.out.println(fooGuess.toString());
        System.out.println("-----------------------------------------");

        // normalise the file!
        NormaliserResults results = xena.normalise(xis);
        System.out.println("Here are the results of the normalisation:");
        System.out.println(results.toString());
        System.out.println("-----------------------------------------");

        System.out.println("Meta data wrappers...");
        for (String metaDataWrapperName : xena.getPluginManager().getMetaDataWrapperManager().getMetaDataWrapperNames()) {
                //at this stage, we dont know if it is a denormaliser or normaliser...
                System.out.println(metaDataWrapperName);
        }

        System.out.println("Active wrapper:");
        System.out.println(xena.getPluginManager().getMetaDataWrapperManager().getActiveWrapperPlugin().getName());
        System.out.println("-----------------------------------------");

}

And the output from this:

# java -cp orgx.jar;../../../xena/xena.jar au.gov.naa.digipres.xena.demo.orgx.test.NormaliseTester
/home/dpuser/workspace/plugin-howto/08_meta_data_package_wrapper/orgx_plugin/dist
Here is the best guess returned by Xena:
Guess... type: Binary
possible: Unknown
dataMatch:Unknown
magicNumber: Unknown
extensionMatch: Unknown
mimeMatch: Unknown
certain: Unknown
priority: LOW
-----------------------------------------
Here are the results of the normalisation:
example_file.foo_Smithers_S7G_0000.xena
-----------------------------------------
Metadata wrappers...
Default Metadata wrapper
Empty Metadata Wrapper
OrgX Metadata Wrapper
Active wrapper:
OrgX Metadata Wrapper
-----------------------------------------

However we are mainly interested in the contents of our normalised file, to see the metadata wrapping that has been added. Here are the contents of the normalised file:

&lt;orgx&gt;
        &lt;meta&gt;
                &lt;department&gt;Sector 7G&lt;/department&gt;
                &lt;user_name&gt;Smithers&lt;/user_name&gt;
                &lt;input_name&gt;file:/../../../data/example_file.foo&lt;/input_name&gt;
                &lt;orgx_id&gt;file:/../../../data/example_file.foo_Sector 7G_Smithers_&lt;/orgx_id&gt;
        &lt;/meta&gt;
        &lt;record_data&gt;
                &lt;binary-object:binary-object xmlns:binary-object="http://preservation.naa.gov.au/binary-object/1.0" 
                        description="The following data is a MIME-compliant (RFC 2045) PEM base64 (RFC 1421) representation of the original file contents."&gt;
                        fmJlZ2luRm9vfnRoaXMgaXMgdGhlIGZpcnN0IHBhcnQgb2YgdGhlIGZvbyBmaWxlfnRoaXMgaXMg
                        dGhlIHNlY29uZCBwYXJ0LiBcfnRoaXMgaXMgc3RpbGwgdGhlIHNlY29uZCBwYXJ0IGFzIHdlIHVz
                        ZWQgdGhlIGVzY2FwZSBjaGFyYWN0ZXIu
                &lt;/binary-object:binary-object&gt;
        &lt;/record_data&gt;
&lt;/orgx&gt;

If we run the program again we can see that the metadata has changed:

&lt;orgx&gt;
        &lt;meta&gt;
                &lt;department&gt;Administration&lt;/department&gt;
                &lt;user_name&gt;Monty Burns&lt;/user_name&gt;
                &lt;input_name&gt;file:/../../../data/example_file.foo&lt;/input_name&gt;
                &lt;orgx_id&gt;file:/../../../data/example_file.foo_Administration_Monty Burns_&lt;/orgx_id&gt;
        &lt;/meta&gt;
        &lt;record_data&gt;
                &lt;binary-object:binary-object xmlns:binary-object="http://preservation.naa.gov.au/binary-object/1.0" 
                        description="The following data is a MIME-compliant (RFC 2045) PEM base64 (RFC 1421) representation of the original file contents."&gt;
                        fmJlZ2luRm9vfnRoaXMgaXMgdGhlIGZpcnN0IHBhcnQgb2YgdGhlIGZvbyBmaWxlfnRoaXMgaXMg
                        dGhlIHNlY29uZCBwYXJ0LiBcfnRoaXMgaXMgc3RpbGwgdGhlIHNlY29uZCBwYXJ0IGFzIHdlIHVz
                        ZWQgdGhlIGVzY2FwZSBjaGFyYWN0ZXIu
                &lt;/binary-object:binary-object&gt;
        &lt;/record_data&gt;
&lt;/orgx&gt;

And that completes our metadata wrapper! Next we will use the Properties component to allow the user to decide the values which will be entered into the metadata.

Wiki: Main_Page

Xena - Digital Preservation Software Wiki

NO LONGER MAINTAINED