JOELib Tutorial: A Java based cheminformatics/computational chemistry package | ||
---|---|---|
Prev | Chapter 9. JOELib examples and code snippets | Next |
The descriptor calculation and storing facility of JOELib will be explained in detail in this section.
There exists different descriptor types and the two main types are: native value descriptors, atom property descriptors. There exists still some methods for getting lists of descriptor names, which,for example, are required to calculate the descriptors.
Example 9-10. Get a list of all available descriptors
Enumeration enum = DescriptorHelper.instance().descriptors(); System.out.println("Descriptors:"); String descName; for (; enum.hasMoreElements();) { descName = (String) enum.nextElement(); System.out.println(descName); }
Example 9-11. Get a list of all native value descriptors
Vector nativeDescs = DescriptorHelper.instance().getNativeDescs(); System.out.println("Native value descriptors:"); int size = nativeDescs.size(); String descName; for (int i = 0; i < size; i++) { descName = nativeDescs.get(i); System.out.println(descName); }
Example 9-12. Get a list of all atom property descriptors
Vector atomPropDescs = DescriptorHelper.instance().getAtomPropDescs(); System.out.println("Atom property descriptors:"); int size = atomPropDescs.size(); String descName; for (int i = 0; i < size; i++) { descName = atomPropDescs.get(i); System.out.println(descName); }
Get descriptors creating a new descriptor instance, which is slow when used multiple times, because it's not cached internally.
Example 9-13. Get and calculate descriptors using always a new instance (slow)
DescResult result = null; // the molecule must ba already available in the mol object // descName contains the descriptor name which should be calculated try { // calculate descriptor for the given molecule result = DescriptorHelper.instance().descFromMol(mol, descName); } catch (DescriptorException ex) { // TODO: descriptor calculation preconditions are not valid // TODO: handle exception } if (result == null) { // TODO: descriptor can not be calculated // TODO: handle this case } else { // add calculated descriptor result to this molecule JOEPairData dp = new JOEPairData(); // use default descriptor name to store result dp.setAttribute(descName); dp.setValue(result); // add descriptor result to molecule without // overwriting old result with the same descriptor name mol.addData(dp); }
Get descriptors creating in the first step an descriptor instance, which can then be used for descriptor calculation multiple times (fast).
Example 9-14. Get and calculate descriptors creating only one descriptor calculation instance (fast)
Descriptor descriptor=null; DescDescription descDescription=null; try { descriptor = DescriptorFactory.getDescriptor(descNames); if (descriptor == null) { // TODO: descriptor calculation method can not be loaded } else { descDescription = descriptor.getDescription(); } } catch (DescriptorException ex) { // TODO: descriptor calculation preconditions are not valid // TODO: handle exception } // calculate descriptors for a set of molecules ... // TODO: iterate over a set of molecules // or load a set of molecules // precondition for the following lines: mol contains a molecule DescResult results=null; try { // initialize descriptor calculation properties // we will here use no properties, this can be e.g. // an atom property when calculating the autocorrelation function Hashtable calculationProperties = new Hashtable(); descriptor.clear(); results = descriptor.calculate(mol, calculationProperties); if (result == null) { // TODO: descriptor can not be calculated // TODO: handle this case } else { // add calculated descriptor result to this molecule JOEPairData dp = new JOEPairData(); // use default descriptor name to store result dp.setAttribute(descName); dp.setValue(result); // add descriptor result to molecule without // overwriting old result with the same descriptor name mol.addData(dp); } } catch (DescriptorException ex) { // TODO: descriptor calculation preconditions are not valid // TODO: handle exception } ...
There exists different abstraction levels to create own descriptor calculation methods. For native value descriptors there exists already some simple abstract class implementations, so we will begin with this really simple example.
The simple absract native descriptor classes exist for boolean, double and int value descriptors. The abstract simple classes are joelib/desc/SimpleBooleanDesc.java, joelib/desc/SimpleDoubleDesc.java, joelib/desc/SimpleIntDesc.java, joelib/desc/AtomsCounter.java and joelib/desc/SMARTSCounter.java. The abstract methods which must be implemented are getBooleanValue(JOEMol),getDoubleValue(JOEMol) and getIntValue(JOEMol). All other needed methods are already implemented and you can ignore these implementations for these simple descriptors.
Example 9-15. Create own native descriptor calculation classes
// use default descriptor calculation package package joelib.desc.types; // import base classes and the molecule class import joelib.desc.DescriptorHelper; import joelib.desc.DescriptorInfo; import joelib.desc.SimpleDoubleDesc; import joelib.molecule.JOEMol; // import logging tool import org.apache.log4j.Category; // public class MyMolecularWeight extends SimpleDoubleDesc { // initialize logging tool for this class private static Category logger=Category.getInstance("joelib.desc.types.MyMolecularWeight"); // initialize public DESC_KEY (descriptor name) by which this descriptor can be // calculated // IMPORTANT: This should be always be a 'public static final' variable // IMPORTANT: to avoid misinterpretations during runtime public static final String DESC_KEY = "My_molecular_weight"; public My_molecular_weight() { // show basic logging message if debugging is enabled if (logger.isDebugEnabled())logger.debug("Initialize " + this.getClass().getName()); // IMPORTANT: initialize descriptor informations // IMPORTANT: use DescriptorHelper to facilitate this task // IMPORTANT: relevant parameters are the descriptor name, the // IMPORTANT: calculation representation and the descriptor result descInfo=DescriptorHelper.generateDescInfo(DESC_KEY,this.getClass(), DescriptorInfo.TYPE_NO_COORDINATES,null, "joelib.desc.result.DoubleResult"); } // get double value for molecular weight public double getDoubleValue(JOEMol mol) { double mw; mw = mol.getMolWt(); return mw; } }
JOELib must now be told that there is a new descriptor calculation method. You must add the following line joelib.descriptor.60.representation=joelib.desc.types.MyMolecularWeight to the joelib.properties-file. If you've already implemented some other descriptors you must use another number, e.g. 61, or something else. It's important that these numbers increase monotonically by 1, because the descriptor factory class interrupts the loading process if no higher number (increased by 1) is available.
The abstract simple classes for atom properties are joelib/desc/SimpleDoubleAtomProperty.java, joelib/desc/SimpleDynamicAtomProperty.java.
Example 9-16. Create own atom property descriptor calculation classes
// use default descriptor calculation package package joelib.desc.types; // import base classes and the molecule class import joelib.desc.DescriptorHelper; import joelib.desc.DescriptorInfo; import joelib.desc.SimpleDynamicAtomProperty; import joelib.desc.result.DynamicArrayResult; import joelib.molecule.JOEAtom; import joelib.molecule.JOEMol; import joelib.util.iterator.AtomIterator; // import logging tool import org.apache.log4j.Category; public class ElectronegativityAllredRochow extends SimpleDynamicAtomProperty { // initialize logging tool for this class private static Category logger = Category.getInstance("joelib.desc.types.ElectronegativityAllredRochow"); // initialize public DESC_KEY (descriptor name) by which this descriptor can be // calculated // IMPORTANT: This should be always be a 'public static final' variable // IMPORTANT: to avoid misinterpretations during runtime public static final String DESC_KEY = "Electronegativity_allred_rochow"; public ElectronegativityAllredRochow() { // show basic logging message if debugging is enabled if (logger.isDebugEnabled()) logger.debug("Initialize " + this.getClass().getName()); // IMPORTANT: initialize descriptor informations // IMPORTANT: use DescriptorHelper to facilitate this task // IMPORTANT: relevant parameters are the descriptor name, the // IMPORTANT: calculation representation and the descriptor result descInfo = DescriptorHelper.generateDescInfo( DESC_KEY, this.getClass(), DescriptorInfo.TYPE_NO_COORDINATES, null, "joelib.desc.result.AtomDynamicResult"); } // get array with atom properties // typically we use already deprotonated // molecules without hydrogens public Object getAtomPropertiesArray(JOEMol mol) { // get partial charges for all atoms JOEAtom atom; AtomIterator ait = mol.atomIterator(); double enAllredRochow[] = (double[]) DynamicArrayResult.getNewArray( DynamicArrayResult.DOUBLE, mol.numAtoms()); int i = 0; while (ait.hasNext()) { atom = ait.nextAtom(); enAllredRochow[i++] = atom.getENAllredRochow(); } return enAllredRochow; } }
For more complex descriptors, e.g. Moreau-Broto-Autocorrelation, the required interface methods for joelib/desc/Descriptor.java must be all implemented and there is no abstract helper class available. Because this is a complex task it is recommended to use an already implemented descriptor class, to copy and rename this file and modify these implementations for your requirements.