Descriptors

The descriptor calculation and storing facility of JOELib will be explained in detail in this section.

Get and calculate descriptors

There exists different descriptor types and the two main types are: native value descriptors, atom property descriptors. There exists still some methods for getting lists of descriptor names, which,for example, are required to calculate the descriptors.

Example 9-10. Get a list of all available descriptors

Enumeration enum = DescriptorHelper.instance().descriptors();
System.out.println("Descriptors:");
String descName;
for (; enum.hasMoreElements();)
{
  descName = (String) enum.nextElement();
  System.out.println(descName);
}

Example 9-11. Get a list of all native value descriptors

Vector nativeDescs = DescriptorHelper.instance().getNativeDescs();
System.out.println("Native value descriptors:");
int size = nativeDescs.size();
String descName;
for (int i = 0; i < size; i++)
{
  descName = nativeDescs.get(i);
  System.out.println(descName);
}

Example 9-12. Get a list of all atom property descriptors

Vector atomPropDescs = DescriptorHelper.instance().getAtomPropDescs();
System.out.println("Atom property descriptors:");
int size = atomPropDescs.size();
String descName;
for (int i = 0; i < size; i++)
{
  descName = atomPropDescs.get(i);
  System.out.println(descName);
}

Get descriptors creating a new descriptor instance, which is slow when used multiple times, because it's not cached internally.

Example 9-13. Get and calculate descriptors using always a new instance (slow)

DescResult result = null;
// the molecule must ba already available in the mol object
// descName contains the descriptor name which should be calculated
try
{
  // calculate descriptor for the given molecule
  result = DescriptorHelper.instance().descFromMol(mol, descName);
}
catch (DescriptorException ex)
{
  // TODO: descriptor calculation preconditions are not valid
  // TODO: handle exception
}
if (result == null)
{
  // TODO: descriptor can not be calculated
  // TODO: handle this case
}
else
{
  // add calculated descriptor result to this molecule
  JOEPairData dp = new JOEPairData();
  // use default descriptor name to store result
  dp.setAttribute(descName);
  dp.setValue(result);
  // add descriptor result to molecule without
  // overwriting old result with the same descriptor name
  mol.addData(dp);
}

Get descriptors creating in the first step an descriptor instance, which can then be used for descriptor calculation multiple times (fast).

Example 9-14. Get and calculate descriptors creating only one descriptor calculation instance (fast)

Descriptor descriptor=null;
DescDescription descDescription=null;
try
{
  descriptor = DescriptorFactory.getDescriptor(descNames);
  if (descriptor == null)
  {
    // TODO: descriptor calculation method can not be loaded
  }
  else
  {
    descDescription = descriptor.getDescription();
  }
}
catch (DescriptorException ex)
{
  // TODO: descriptor calculation preconditions are not valid
  // TODO: handle exception
}

// calculate descriptors for a set of molecules
...
  // TODO: iterate over a set of molecules
  // or load a set of molecules
  // precondition for the following lines: mol contains a molecule
  DescResult results=null;
  try
  {
    // initialize descriptor calculation properties
    // we will here use no properties, this can be e.g.
    // an atom property when calculating the autocorrelation function
    Hashtable calculationProperties = new Hashtable();
    descriptor.clear();
    results = descriptor.calculate(mol, calculationProperties);
    if (result == null)
    {
      // TODO: descriptor can not be calculated
      // TODO: handle this case
    }
    else
    {
      // add calculated descriptor result to this molecule
      JOEPairData dp = new JOEPairData();
      // use default descriptor name to store result
      dp.setAttribute(descName);
      dp.setValue(result);
      // add descriptor result to molecule without
      // overwriting old result with the same descriptor name
      mol.addData(dp);
    }
  }
  catch (DescriptorException ex)
  {
    // TODO: descriptor calculation preconditions are not valid
    // TODO: handle exception
  }
...

Create own descriptor classes

There exists different abstraction levels to create own descriptor calculation methods. For native value descriptors there exists already some simple abstract class implementations, so we will begin with this really simple example.

The simple absract native descriptor classes exist for boolean, double and int value descriptors. The abstract simple classes are joelib/desc/SimpleBooleanDesc.java, joelib/desc/SimpleDoubleDesc.java, joelib/desc/SimpleIntDesc.java, joelib/desc/AtomsCounter.java and joelib/desc/SMARTSCounter.java. The abstract methods which must be implemented are getBooleanValue(JOEMol),getDoubleValue(JOEMol) and getIntValue(JOEMol). All other needed methods are already implemented and you can ignore these implementations for these simple descriptors.

Example 9-15. Create own native descriptor calculation classes

// use default descriptor calculation package
package joelib.desc.types;

// import base classes and the molecule class
import joelib.desc.DescriptorHelper;
import joelib.desc.DescriptorInfo;
import joelib.desc.SimpleDoubleDesc;
import joelib.molecule.JOEMol;

// import logging tool
import org.apache.log4j.Category;

//
public class MyMolecularWeight extends SimpleDoubleDesc
{
  // initialize logging tool for this class
  private static Category logger=Category.getInstance("joelib.desc.types.MyMolecularWeight");

  // initialize public DESC_KEY (descriptor name) by which this descriptor can be
  // calculated
  // IMPORTANT: This should be always be a 'public static final' variable
  // IMPORTANT: to avoid misinterpretations during runtime
  public static final String DESC_KEY = "My_molecular_weight";

  public My_molecular_weight()
  {
    // show basic logging message if debugging is enabled
    if (logger.isDebugEnabled())logger.debug("Initialize " + this.getClass().getName());

    // IMPORTANT: initialize descriptor informations
    // IMPORTANT: use DescriptorHelper to facilitate this task
    // IMPORTANT: relevant parameters are the descriptor name, the
    // IMPORTANT: calculation representation and the descriptor result

    descInfo=DescriptorHelper.generateDescInfo(DESC_KEY,this.getClass(),
				DescriptorInfo.TYPE_NO_COORDINATES,null,
				"joelib.desc.result.DoubleResult");
  }

  // get double value for molecular weight
  public double getDoubleValue(JOEMol mol)
  {
    double mw;
    mw = mol.getMolWt();
    return mw;
  }
}

JOELib must now be told that there is a new descriptor calculation method. You must add the following line joelib.descriptor.60.representation=joelib.desc.types.MyMolecularWeight to the joelib.properties-file. If you've already implemented some other descriptors you must use another number, e.g. 61, or something else. It's important that these numbers increase monotonically by 1, because the descriptor factory class interrupts the loading process if no higher number (increased by 1) is available.

The abstract simple classes for atom properties are joelib/desc/SimpleDoubleAtomProperty.java, joelib/desc/SimpleDynamicAtomProperty.java.

Example 9-16. Create own atom property descriptor calculation classes

// use default descriptor calculation package
package joelib.desc.types;

// import base classes and the molecule class
import joelib.desc.DescriptorHelper;
import joelib.desc.DescriptorInfo;
import joelib.desc.SimpleDynamicAtomProperty;
import joelib.desc.result.DynamicArrayResult;
import joelib.molecule.JOEAtom;
import joelib.molecule.JOEMol;
import joelib.util.iterator.AtomIterator;

// import logging tool
import org.apache.log4j.Category;

public class ElectronegativityAllredRochow extends SimpleDynamicAtomProperty
{
        // initialize logging tool for this class
	private static Category logger =
		Category.getInstance("joelib.desc.types.ElectronegativityAllredRochow");

        // initialize public DESC_KEY (descriptor name) by which this descriptor can be
        // calculated
        // IMPORTANT: This should be always be a 'public static final' variable
        // IMPORTANT: to avoid misinterpretations during runtime
	public static final String DESC_KEY = "Electronegativity_allred_rochow";

	public ElectronegativityAllredRochow()
	{
	        // show basic logging message if debugging is enabled
		if (logger.isDebugEnabled())
			logger.debug("Initialize " + this.getClass().getName());

                // IMPORTANT: initialize descriptor informations
                // IMPORTANT: use DescriptorHelper to facilitate this task
                // IMPORTANT: relevant parameters are the descriptor name, the
                // IMPORTANT: calculation representation and the descriptor result
		descInfo =
			DescriptorHelper.generateDescInfo(
				DESC_KEY,
				this.getClass(),
				DescriptorInfo.TYPE_NO_COORDINATES,
				null,
				"joelib.desc.result.AtomDynamicResult");

	}

        // get array with atom properties
        // typically we use already deprotonated
        // molecules without hydrogens
	public Object getAtomPropertiesArray(JOEMol mol)
	{
		// get partial charges for all atoms
		JOEAtom atom;
		AtomIterator ait = mol.atomIterator();
		double enAllredRochow[] =
			(double[]) DynamicArrayResult.getNewArray(
				DynamicArrayResult.DOUBLE,
				mol.numAtoms());
		int i = 0;
		while (ait.hasNext())
		{
			atom = ait.nextAtom();
			enAllredRochow[i++] = atom.getENAllredRochow();
		}
		return enAllredRochow;
	}
}

For more complex descriptors, e.g. Moreau-Broto-Autocorrelation, the required interface methods for joelib/desc/Descriptor.java must be all implemented and there is no abstract helper class available. Because this is a complex task it is recommended to use an already implemented descriptor class, to copy and rename this file and modify these implementations for your requirements.