Now we need to implement the metadata wrapper. We know that the:
The metadata wrapper is responsible for two things:
But what if I don't want any metadata? Then, we have a built in empty metadata wrapper, that doesn't write anything! Its name is Empty Metadata Wrapper and its class is:
au.gov.naa.digipres.xena.kernel.metadatawrapper.EmptyWrapper
This is a public static final string in the MetaDataWrapperManager - EMPTY_WRAPPER_NAME and can be set programmatically with code such as:
xena.setActiveMetaDataWrapperPlugin(MetaDataWrapperManager.EMPTY_WRAPPER_NAME);
Before we start work on our metadata wrapper we will first extend our DemoInfoProvider to provide the department name. Like the user name and department code this will also be a random choice from a list of department names. Here is the updated DemoInfoProvider.java file:
package au.gov.naa.digipres.xena.demo.orgx; import java.util.Random; public class DemoInfoProvider implements InfoProvider { private String userName; private String departmentCode; private String departmentName; private String randomUserNames[] = {"Homer", "Karl", "Kenny", "Monty Burns", "Smithers"}; private String randomDepartmentNames[] = {"Sector 7G", "Sector 7A", "Corporate", "Administration"}; private String randomDepartmentCodes[] = {"S7G", "S7A", "COR", "ADM"}; private Random random = new Random(); /** * Return the username if it is set, or a random one from randomUserNames if it is not. */ public String getUserName() { return randomUserNames[random.nextInt(randomUserNames.length)]; } /** * Return the departmentCode if it is set, or a random one from randomDepartmentCodes if it is not. * @return the department code. */ public String getDepartmentCode() { return randomDepartmentCodes[random.nextInt(randomDepartmentCodes.length)]; } /** * @return Returns the deparmentName. */ public String getDepartmentName() { if (departmentName == null) { departmentName = randomDepartmentNames[random.nextInt(randomDepartmentNames.length)]; } return departmentName; } }
The metadata wrapper object extends the AbstractMetaDataWrapper, and since it is a concrete class, it must implement the abstract methods:
package au.gov.naa.digipres.xena.demo.orgx; import au.gov.naa.digipres.xena.kernel.XenaException; import au.gov.naa.digipres.xena.kernel.XenaInputSource; import au.gov.naa.digipres.xena.kernel.metadatawrapper.AbstractMetaDataWrapper; public class OrgXMetaDataWrapper extends AbstractMetaDataWrapper { @Override public String getOpeningTag() { //Default method stub return null; } @Override public String getSourceId(XenaInputSource input) throws XenaException { //Default method stub return null; } @Override public String getSourceName(XenaInputSource input) throws XenaException { //Default method stub return null; } @Override public String getName() { //Default method stub return null; } }
We need to make static final constants for all the things that aren't going to change. A quick chat to the Organisation X metadata standards steering committee reveals the following tag names:
OPENING TAG
orgx
METADATA SECTION
meta
DEPARTMENT NAME
department
USER NAME
user_name
XENA ID
input_name
ORGANISATION X ID
orgx_id
So with this information, we can implement the tag names and make the constants:
public class OrgXMetaDataWrapper extends AbstractMetaDataWrapper { public static final String ORGX_OPENING_TAG = "orgx"; public static final String ORGX_META_TAG = "meta"; public static final String ORGX_DEPARTMENT_TAG = "department"; public static final String ORGX_USER_TAG = "user_name"; public static final String ORGX_INPUT_NAME_TAG = "input_name"; public static final String ORGX_CONTENT_TAG = "record_data"; public static final String ORGX_ID_TAG = "orgx_id"; @Override public String getOpeningTag() { //Default method stub return null; } @Override public String getSourceId(XenaInputSource input) throws XenaException { //Default method stub return null; } @Override public String getSourceName(XenaInputSource input) throws XenaException { //Default method stub return null; } @Override public String getName() { //Default method stub return null; } }
In the Xena util package, there is an object called the TagContentFinder. It has a single static method, which finds the content of a specified tag within an XML document. Using this method, we can look up a given tag in a single line. So, let's have a go at implementing these methods:
@Override public String getOpeningTag() { return ORGX_OPENING_TAG; } @Override public String getSourceId(XenaInputSource input) throws XenaException { return TagContentFinder.getTagContents(input, ORGX_ID_TAG); } @Override public String getSourceName(XenaInputSource input) throws XenaException { return TagContentFinder.getTagContents(input, INPUT_NAME_TAG); } @Override public String getName() { return "OrgX Meta Data Wrapper"; }
The AbstractMetaDataWrapper object extends the default SAX Filter implementation, XMLFilterImpl. XMLFilterImpl implements all the XML Filter interface methods with empty bodies. When a document is to be normalised:
1. The metadata wrapper is obtained.
2. The startDocument method is called.
3. After the normaliser is done, the endDocument method is called.
4. The normaliser.parse() method is called on the input source.
In order for us to actually write some metadata, the metadata wrapper must override the startDocument and endDocument methods. Properties are set for the metadata wrapper to reference the normaliser. When normalising, we call MetaDataWrapper.startDocument(), then normaliser.parse(), then MetaDataWrapper.endDocument().
The metadata will be created as follows:
1. Write the opening tag.
2. Open the metadata tag.
3. Close the metadata tag.
4. Open the record content tag.
5. Normalise the input source.
6. Close the record content tag and the opening tag.
To do all this, we will set up the content handler by calling super.startDocument(), which goes all the way up to the default XML filter implementation. Then we will get our content handler and use it to write out our information. Since we are not using namespaces for this demonstration, the namespace string is always set to null, and we use unqualified tag names. Also, since there are no attributes for any of these at the moment, we can just create a single AttributesImpl object, leave it empty, and pass it to all the start element calls. If we wanted to, we could use namespaces but we will do without them for now. In this case, local and qualified names will be the same:
@Override public void startDocument() throws SAXException { super.startDocument(); ContentHandler th = getContentHandler(); AttributesImpl att = new AttributesImpl(); th.startElement(null, ORGX_OPENING_TAG, ORGX_OPENING_TAG, att); th.startElement(null, ORGX_META_TAG, ORGX_META_TAG, att); th.endElement(null, ORGX_META_TAG, ORGX_META_TAG); th.startElement(null, ORGX_CONTENT_TAG, ORGX_CONTENT_TAG, att); } @Override public void endDocument() throws org.xml.sax.SAXException { ContentHandler th = getContentHandler(); th.endElement(null, ORGX_CONTENT_TAG, ORGX_CONTENT_TAG); th.endElement(null, ORGX_OPENING_TAG, ORGX_OPENING_TAG); super.endDocument(); }
To retrieve the information for our metadata we will use our InfoProvider interface; just like we did for the fileNamer. The same DemoInfoProvider object will be passed to both the OrgXFileNamer and OrgXMetaDataWrapper, ensuring that the (random) information returned will be the same for each. The cool thing about this is when we completely change the way we retrieve the information in step 11, we won't have to change the startDocument method since the only thing that will change is within the metadata tags. So, lets do it!
private InfoProvider myInfoProvider = null; public InfoProvider getInfoProvider() { return myInfoProvider; } public void setInfoProvider(InfoProvider infoProvider) { myInfoProvider = infoProvider; } public void startDocument() throws SAXException { super.startDocument(); ContentHandler th = getContentHandler(); AttributesImpl att = new AttributesImpl(); th.startElement(null, ORGX_OPENING_TAG, ORGX_OPENING_TAG, att); th.startElement(null, ORGX_META_TAG, ORGX_META_TAG, att); // The department name th.startElement(null, ORGX_DEPARTMENT_TAG, ORGX_DEPARTMENT_TAG, att); th.characters(getInfoProvider().getDepartmentName().toCharArray(), 0, getInfoProvider().getDepartmentName().toCharArray().length); // The User name th.startElement(null, ORGX_USER_TAG, ORGX_USER_TAG, att); th.characters(getInfoProvider().getUserName().toCharArray(), 0, getInfoProvider().getUserName().toCharArray().length); th.endElement(null, ORGX_USER_TAG, ORGX_USER_TAG); th.endElement(null, ORGX_META_TAG, ORGX_META_TAG); th.startElement(null, CONTENT_TAG, CONTENT_TAG, att); }
Now all we need is the code to generate the Organisation X ID, and the input ID. For the moment we will construct the Organisation ID by combining the user name, department name and filename. But therein lies a problem - what do we call the file? Should the whole URI source string be used, or only the filename component? Or should some of the folders be included? Fortunately Xena has a facility to set a base path, so that in the case of a file being normalised, we can get the name of the file relative to that path. If this doesn't work, Xena will simply return the whole name from the URI. This is actually done a few times in Xena, and has been added to the utility class, SourceFileNameParser.
Since we will be using the department and user names, the code has been slightly refactored:
@Override public void startDocument() throws SAXException { String departmentName = getInfoProvider().getDepartmentName(); String userName = getInfoProvider().getUserName(); String fileName = ""; try { XenaInputSource xis = (XenaInputSource)getProperty("http://xena/input"); if (xis != null) { fileName = SourceURIParser.getRelativeSystemId(xis, metaDataWrapperManager.getPluginManager()); } } catch (SAXException saxe) { fileName = "Unknown"; } super.startDocument(); ContentHandler th = getContentHandler(); AttributesImpl att = new AttributesImpl(); th.startElement(null, ORGX_OPENING_TAG, ORGX_OPENING_TAG, att); th.startElement(null, ORGX_META_TAG, ORGX_META_TAG, att); // department name th.startElement(null, ORGX_DEPARTMENT_TAG, ORGX_DEPARTMENT_TAG, att); th.characters(departmentName.toCharArray(), 0, departmentName.toCharArray().length); th.endElement(null, ORGX_DEPARTMENT_TAG, ORGX_DEPARTMENT_TAG); // user name th.startElement(null, ORGX_USER_TAG, ORGX_USER_TAG, att); th.characters(userName.toCharArray(), 0, userName.toCharArray().length); th.endElement(null, ORGX_USER_TAG, ORGX_USER_TAG); // input name th.startElement(null, ORGX_INPUT_NAME_TAG, ORGX_INPUT_NAME_TAG, att); th.characters(fileName.toCharArray(), 0, fileName.toCharArray().length); th.endElement(null, ORGX_INPUT_NAME_TAG, ORGX_INPUT_NAME_TAG); // org x ID th.startElement(null, ORGX_ID_TAG, ORGX_ID_TAG, att); String orgx_id = fileName + "_" + departmentName + "_" + userName + "_"; th.characters(orgx_id.toCharArray(), 0, orgx_id.toCharArray().length); th.endElement(null, ORGX_ID_TAG, ORGX_ID_TAG); th.endElement(null, ORGX_META_TAG, ORGX_META_TAG); th.startElement(null, ORGX_CONTENT_TAG, ORGX_CONTENT_TAG, att); }
When we export, we need a class to remove all the metadata we added previously. This class is expected by the MetaDataPluginManager when a new metadata wrapper is loaded.
When Xena exports something, it does the following:
1. Attempts to find the MetaDataWrapper that wrapped the object during normalisation.
2. If it can figure out which one it is, (this is a matter of looking for and recognising the opening tag of the XML document) it then unwraps the metadata XML from the file, and finds the tag that is the opening tag of the actual content. This is done by creating an XML filter, then making the unwrapper be the content handler for that filter.
3. The metadata wrapper parses the document, checks to see if it is within the actual content, and if so calls it's content handler to parse the output.
4. Our XML filter returns the opening tag of the content, and this is used to identify the normaliser for this tag.
When we have a denormaliser that can handle the content, almost the same thing happens - only this time the unwrapper will have the appropriate denormaliser set as it's content handler - and this will perform the appropriate denormalisation.
In the case of the empty package wrapper being used - and there being no metadata at all, Xena will attempt to identify the normaliser based on the opening tag of the document. The default XMLFilterImpl object is used as the unwrapper.
All we will do is make an object that extends XMLFilterImpl object, and overrides the startElement, endElement and characters methods. When we are within the normalised part of the XML, we will call super.startElement(), and it will handle all of that for us. Here it is:
package au.gov.naa.digipres.xena.demo.orgx; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.XMLFilterImpl; public class OrgXUnwrapper extends XMLFilterImpl { int packagesFound = 0; boolean contentFound = false; @Override public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { if (contentFound) { super.startElement(namespaceURI, localName, qName, atts); } if (qName.equals(OrgXMetaDataWrapper.ORGX_CONTENT_TAG)) { contentFound = true; } } @Override public void endElement(String namespaceURI, String localName, String qName) throws SAXException { if (qName.equals(OrgXMetaDataWrapper.ORGX_CONTENT_TAG)) { contentFound = false; } if (contentFound) { super.endElement(namespaceURI, localName, qName); } } @Override public void characters(char[] ch, int start, int length) throws SAXException { if (contentFound) { super.characters(ch, start, length); } } protected boolean pass() { return contentFound; } }
Now we need to update the OrgXPlugin class so that our metadata wrapper (and associated unwrapper) are loaded. We will implement the getMetaDataWrappers method. This method needs to return a map of AbstractMetaDataWrappers to XMLFilters, so we will create a map that associates our OrgXMetaDataWrapper with our OrgXUnwrapper:
@Override public Map<AbstractMetaDataWrapper, XMLFilter> getMetaDataWrappers() { Map<AbstractMetaDataWrapper, XMLFilter> wrapperMap = new HashMap<AbstractMetaDataWrapper, XMLFilter>(); wrapperMap.put(new OrgXMetaDataWrapper(), new OrgXUnwrapper()); return wrapperMap; }
To test, we will use the same NormaliseTester as for the FileNamer, but we will add some code to look at the wrappers. Here it is:
public static void main(String[] argv) { Xena xena = new Xena(); // our orgx jar will already be on the class path, so load it by name... Vector<String> pluginList = new Vector<String>(); pluginList.add("au.gov.naa.digipres.xena.demo.orgx.OrgXPlugin"); xena.loadPlugins(pluginList); // set the base path to be the current working directory xena.setBasePath(System.getProperty("user.dir")); System.out.println(System.getProperty("user.dir")); // create the new input source File f = new File("../../../data/example_file.foo"); XenaInputSource xis = new XenaInputSource(f); // guess its type Guess fooGuess = xena.getBestGuess(xis); //print the guess... System.out.println("Here is the best guess returned by Xena: "); System.out.println(fooGuess.toString()); System.out.println("-----------------------------------------"); // normalise the file! NormaliserResults results = xena.normalise(xis); System.out.println("Here are the results of the normalisation:"); System.out.println(results.toString()); System.out.println("-----------------------------------------"); System.out.println("Meta data wrappers..."); for (String metaDataWrapperName : xena.getPluginManager().getMetaDataWrapperManager().getMetaDataWrapperNames()) { //at this stage, we dont know if it is a denormaliser or normaliser... System.out.println(metaDataWrapperName); } System.out.println("Active wrapper:"); System.out.println(xena.getPluginManager().getMetaDataWrapperManager().getActiveWrapperPlugin().getName()); System.out.println("-----------------------------------------"); }
And the output from this:
# java -cp orgx.jar;../../../xena/xena.jar au.gov.naa.digipres.xena.demo.orgx.test.NormaliseTester /home/dpuser/workspace/plugin-howto/08_meta_data_package_wrapper/orgx_plugin/dist Here is the best guess returned by Xena: Guess... type: Binary possible: Unknown dataMatch:Unknown magicNumber: Unknown extensionMatch: Unknown mimeMatch: Unknown certain: Unknown priority: LOW ----------------------------------------- Here are the results of the normalisation: example_file.foo_Smithers_S7G_0000.xena ----------------------------------------- Metadata wrappers... Default Metadata wrapper Empty Metadata Wrapper OrgX Metadata Wrapper Active wrapper: OrgX Metadata Wrapper -----------------------------------------
However we are mainly interested in the contents of our normalised file, to see the metadata wrapping that has been added. Here are the contents of the normalised file:
<orgx> <meta> <department>Sector 7G</department> <user_name>Smithers</user_name> <input_name>file:/../../../data/example_file.foo</input_name> <orgx_id>file:/../../../data/example_file.foo_Sector 7G_Smithers_</orgx_id> </meta> <record_data> <binary-object:binary-object xmlns:binary-object="http://preservation.naa.gov.au/binary-object/1.0" description="The following data is a MIME-compliant (RFC 2045) PEM base64 (RFC 1421) representation of the original file contents."> fmJlZ2luRm9vfnRoaXMgaXMgdGhlIGZpcnN0IHBhcnQgb2YgdGhlIGZvbyBmaWxlfnRoaXMgaXMg dGhlIHNlY29uZCBwYXJ0LiBcfnRoaXMgaXMgc3RpbGwgdGhlIHNlY29uZCBwYXJ0IGFzIHdlIHVz ZWQgdGhlIGVzY2FwZSBjaGFyYWN0ZXIu </binary-object:binary-object> </record_data> </orgx>
If we run the program again we can see that the metadata has changed:
<orgx> <meta> <department>Administration</department> <user_name>Monty Burns</user_name> <input_name>file:/../../../data/example_file.foo</input_name> <orgx_id>file:/../../../data/example_file.foo_Administration_Monty Burns_</orgx_id> </meta> <record_data> <binary-object:binary-object xmlns:binary-object="http://preservation.naa.gov.au/binary-object/1.0" description="The following data is a MIME-compliant (RFC 2045) PEM base64 (RFC 1421) representation of the original file contents."> fmJlZ2luRm9vfnRoaXMgaXMgdGhlIGZpcnN0IHBhcnQgb2YgdGhlIGZvbyBmaWxlfnRoaXMgaXMg dGhlIHNlY29uZCBwYXJ0LiBcfnRoaXMgaXMgc3RpbGwgdGhlIHNlY29uZCBwYXJ0IGFzIHdlIHVz ZWQgdGhlIGVzY2FwZSBjaGFyYWN0ZXIu </binary-object:binary-object> </record_data> </orgx>
And that completes our metadata wrapper! Next we will use the Properties component to allow the user to decide the values which will be entered into the metadata.