Help save net neutrality! Learn more.
Close

4_-_Creating_a_basic_normaliser

Allan Cunliffe

It is time to start writing the normaliser for our plugin. Initially, we will create a normaliser that will simply output a piece of static text, then we will extend it to parse the input file and produce an XML representation of the contents of the file. To begin with, the normaliser must extend the abstract class Abstract Normaliser. Initially it will look like this:

package au.gov.naa.digipres.xena.demo.foo;

public class FooNormaliser extends AbstractNormaliser {

    public String getName() {
        //auto generated method stub
        return null;
    }

    public void parse(InputSource source, NormaliserResults results) throws SAXException {
        //auto generated method stub
    }
}

The getName method is the one we will implement first. The getName() method will return "Foo":

public String getName() {
    return "Foo";
}

Create parse method

Initially, we will simply get our content handler and throw out a start and end tag, using the general purpose implementation of the Attribute interface, the AttributesImpl() class. For the content handler, when it writes out an element, we need to give it a number of pieces of information. Since in Xena we assume that we will always be doing Namespace processing, we need to provide the URI for the namespace. If we don't want any namespace processing we provide the empty string, "". For the moment we will put the Namespace URI as a public final static string - in case anyone else needs to access it, and for the sake of argument, give it the value of: http://preservation.naa.gov.au/foo/0.1

The local name will be set to "data" and the qualified name will be set to "foo:data". These will be public static final strings. Strictly speaking, with the namespace and qualified name provided, the local name is not required. However, for completeness, the local name is included. With that in mind, let us have a look at the code that is required for the parse method:

public static final String FOO_URI = "http://preservation.naa.gov.au/foo/0.1";
public static final String FOO_OPENING_ELEMENT_LOCAL_NAME = "data";
public static final String FOO_OPENING_ELEMENT_QUALIFIED_NAME = "foo:data";

public void parse(InputSource source, NormaliserResults results) throws SAXException {
    ContentHandler contentHandler = getContentHandler();
    AttributesImpl openingAttribute = new AttributesImpl();

    contentHandler.startElement(FOO_URI, FOO_OPENING_ELEMENT_LOCAL_NAME, FOO_OPENING_ELEMENT_QUALIFIED_NAME, openingAttribute);
    char[] message = "The foo file contents will go in here!".toCharArray();
    contentHandler.charachters(message, 0, message.length);
    contentHandler.endElement(FOO_URI, FOO_OPENING_ELEMENT_LOCAL_NAME, FOO_OPENING_ELEMENT_QUALIFIED_NAME);
}

Add imports

The AttributesImpl class comes from the SAX Helpers package and the ContentHandler class, InputSource and SAXException come from org.xml.sax package. We also need to import the IOException class.

import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.AttributesImpl;
import java.io.IOException;

import au.gov.naa.digipres.xena.kernel.normalise.AbstractNormaliser;
import au.gov.naa.digipres.xena.kernel.normalise.NormaliserResults;

Parse the output of the normaliser

Now that we have a Foo Normaliser we must let Xena know the output of the Foo Normaliser. We know the output of the normaliser will be an XML file, and be created by Xena, so we can create a class that extends the abstract XenaFileType to represent the output. We will call this class a XenaFooFileType, and create default implementations of the abstract methods.

package au.gov.naa.digipres.xena.demo.foo;

import au.gov.naa.digipres.xena.kernel.type.XenaFileType;

public class XenaFooFileType extends XenaFileType {

    @Override
    public String getTag() {
        //auto generated method stub
        return null;
    }

    @Override
    public String getNamespaceUri() {
        //auto generated method stub
        return null;
    }
}

That we will make the required methods do something useful. The tag that is being returned is the opening tag. When we reach this tag we know that what follows is the preserved data of the Foo file, and not metadata. The Namespace URI is, of course, the Namespace URI. Both of these have been conveniently defined as public static final fields in the Normaliser, so we will simply return them:

@Override
public String getTag() {
    return FooNormaliser.FOO_OPENING_ELEMENT_QUALIFIED_NAME;
}

@Override
public String getNamespaceUri() {
    return FooNormaliser.FOO_URI;
}

Get Xena to user the normaliser

Now we must get Xena to use the normaliser, and let it know about our new type - the XenaFooFileType. The first thing to do is update our FooPlugin.java to tell Xena that our Foo plugin now contains the Foo normaliser, and the XenaFooFileType. As well, we will tell Xena that when it wants to normalise a FooType, it will use the Foo normaliser, and the output produced by the Foo normaliser is a file that can be represented by the XenaFooFileType.

This information is stored in two maps - and input map and an output map. Each map a normaliser object to a set of types, which means we can encapsulate a normaliser that has multiple inputs and/or outputs. We override the getNormaliserInputMap and getNormaliserOutputMap methods to return the new normaliser and our types:

@Override
public Map<Type>> getNormaliserInputMap() {
    Map<Type>> inputMap = new HashMap<Type>>();

    // Normaliser
    FooNormaliser normaliser = new FooNormaliser();
    Set<Type> normaliserSet = new HashSet<Type>();
    normaliserSet.add(new FooFileType());
    inputMap.put(normaliser, normaliserSet);

    return inputMap;
}

@Override
public Map<Type>> getNormaliserOutputMap() {
    Map<Type>> outputMap = new HashMap<Type>>();

    // Normaliser
    FooNormaliser normaliser = new FooNormaliser();
    Set<Type> normaliserSet = new HashSet<Type>();
    normaliserSet.add(new XenaFooFileType());
    outputMap.put(normaliser, normaliserSet);

    return outputMap;
}

Test the normaliser

To test the normaliser, we will first modify the PluginLoadTester to output the names of any loaded normaliser. With luck, we will see the string "Foo" output. At the bottom of the body of the main method we will add the code:

System.out.println("Normalisers...");
for (Object element : xena.getPluginManager().getNormaliserManager().getAll()) {
    AbstractNormaliser normaliser = (AbstractNormaliser) element;
    System.out.println(normaliser.getName());
}
System.out.println("------------------------------------------------");

Just before we go and recompile the plugin, we have to import the abstract normaliser class to the PluginLoadTester, along with the IOException and XenaException classes.

import java.io.IOException;
import au.gov.naa.digipres.xena.kernel.XenaException;
import au.gov.naa.digipres.xena.kernel.normalise.AbstractNormaliser;

Here is the output of the load tester so far, as run from within the dist folder:

#java -cp foo.jar;../../../xena/xena.jar au.gov.naa.digipres.xena.demo.foo.test.PluginLoadTester
Types...
Foo
Xena type, tag -->> binary-object:binary-object
Binary

----------------------------->>>>

package au.gov.naa.digipres.xena.demo.foo.test;

import java.util.Vector;
import au.gov.naa.digipres.xena.core.Xena;
import au.gov.naa.digipres.xena.kernel.XenaInputSource;
import au.gov.naa.digipres.xena.kernel.XenaException;
import au.gov.naa.digipres.xena.kernel.guesser.Guess;
import au.gov.naa.digipres.xena.kernel.normalise.NormaliserResults;
import java.io.IOException;
import java.io.File;

public class NormaliseTester {
    public static void main(String[] argv) throws XenaException, IOException {
        Xena xena = new Xena();
        // our foo jar will already be on the class path, so load it by name...
        Vector<String> pluginList = new Vector<String>();
        pluginList.add("au.gov.naa.digipres.xena.demo.foo.FooPlugin");
        xena.loadPlugins(pluginList);
        // create the new input source
        File f = new File("../../../data/example_file.foo");
        XenaInputSource xis = new XenaInputSource(f);
        xena.setBasePath(System.getProperty("user.dir");
        // guess its type
        Guess fooGuess = xena.getBestGuess(xis);
        //print the guess...
        System.out.println("Here is the best guess returned by Xena: ");
        System.out.println(fooGuess.toString());
        System.out.println("-----------------------------------------");
        // normalise the file!
        NormaliserResults results = xena.normalise(xis);
        System.out.println("Here are the results of the normalisation:");
        System.out.println(results.getResultsDetails());
        System.out.println("-----------------------------------------");                
    }
}

A quick breakdown of this program reveals that, of the 19 lines within the main method:

  • six are outputting data to "System.out"
  • five lines are comments
  • four lines are for instantiating Xena and loading the 'Foo' plugin
  • two lines are for creating our input source
  • one line for guessing the input source type
  • one line for normalising input source.

The example.foo file contents are as follows:

~beginFoo~this is our first foo file! hooray!

After running the plugin, with the latest version of Xena, the results look something like this:

#java -cp foo.jar;../../../xena/xena.jar au.gov.naa.digipres.xena.demo.foo.test.NormaliseTester
/home/dpuser/workspace/plugin-howto/03_basic_normaliser_part_i/foo_plugin/dist
Here is the best guess returned by Xena:
Guess... type: Foo
possible: Unknown
dataMatch:True
magicNumber: True
extensionMatch: True
mimeMatch: Unknown
certain: Unknown
priority: Default
-----------------------------------------
Here are the results of the normalisation:
Normalisation successful.
The input source name file:/home/dpuser/workspace/plugin-howto/03_basic_normaliser_part_i/foo_plugin/dist/../../../data/example_file.foo
normalised to: example_file.foo_Foo.xena
with normaliser: "Foo"
to the folder: /home/dpuser/workspace/plugin-howto/03_basic_normaliser_part_i/foo_plugin/dist
and the Xena id is: file:/../../../data/example_file.foo
-----------------------------------------

And, viewing the contents of the folder we are in, we see that a new file has been created: example_file.foo_Foo.xena. Looking at the file, we can see that it is pretty simple:

<xena>
        <meta_data>
                <meta_data_wrapper_name>Default Package Wrapper</meta_data_wrapper_name>
                <normaliser_name>au.gov.naa.digipres.xena.demo.foo.FooNormaliser</normaliser_name>
                <input_source_uri>file:/example_file.foo</input_source_uri>
        </meta_data>
        <content>
                <foo:data xmlns:foo="http://preservation.naa.gov.au/foo/0.1">The foo file contents will go in here!</foo:data>
        </content>
</xena>

Related

Wiki: Main_Page