Data entries.
Table 3-1. Predefined data types/names
Data type | Data name | Allowed occurence |
---|---|---|
JOEDataType.JOE_UNDEFINED_DATA | Undefined | multiple |
JOEDataType.JOE_VIRTUAL_BOND_DATA | VirtualBondData | multiple |
JOEDataType.JOE_ROTAMER_LIST | RotamerList | multiple |
JOEDataType.JOE_EXTERNAL_BOND_DATA | ExternalBondData | multiple |
JOEDataType.JOE_COMPRESS_DATA | CompressData | multiple |
JOEDataType.JOE_COMMENT_DATA | Comment | multiple |
JOEDataType.JOE_ENERGY_DATA | EnergyData | multiple |
JOEDataType.JOE_PAIR_DATA | PairData | single attribute name |
The typical data type to store descriptors is the JOEDataType.JOE_PAIR_DATA.
Every descriptor can be accessed by his name. To access the descriptor data entries efficiently the descriptor data entries are stored in a dictionary. Therefore descriptors can only occure once in a molecule.
Example 3-1. Getting descriptor data entries
// getting an iterator over all data elements // including SSSR informations and other stuff GenericDataIterator gdit = mol.genericDataIterator(); while ( gdit.hasNext() ) { // get the next data element genericData = gdit.nextGenericData(); // use only the data elements which contains descriptor // or user defined data if ( genericData.getDataType() == JOEDataType.JOE_PAIR_DATA ) { // write this descriptor data as typical data block // to an SD file ps.printf( "> <%s>", genericData.getAttribute() ); pairData = ( JOEPairData ) genericData; // write data in SD format, lines not longer than 80 characters // per line and remove empty lines in data entries with // ? or a character of your choice ps.println( pairData.toString( IOTypeHolder.instance().getIOType( "SDF" ) ) ); } }
Example 3-2. Setting descriptor data entries
// add a user defined data entry to the molecule JOEPairData dp = new JOEPairData(); // the data entry has the name 'attribute' dp.setAttribute( attribute ); // and a typical String value // own types must have the fromString and toString method !!! dp.setValue( dataEntry.toString() ); mol.addData( dp );
A big advantage is that you can use descriptors from other programs. If no calculation routine in JOELib exists all unknown descriptors (e.g. additional data elements in SDF-files) are handled as String's. If you know the data type you can simply define your own data parser/writer. All known decsriptors can be defined in joelib/data/plain/knownResults.txt. If you access data elements with mol.getData("DataName") the data element will be automatically parsed if the data type is known (e.g. atom or bond properties or matrices or ...).
You can supress data parsing by using mol.getData("DataName", false) which can be usefull if you not want to modify all data elements (should be faster !).
If you have special atom or bond properties you should always implement the joelib.molecule.types.AtomProperties or the joelib.molecule.types.BondProperties classes which guarantees you to access the data elements by the atom index or bond index which were used in JOELib. All implemented result classes are available at joelib/desc/result and contains simple types like int or double but also complex types like double array or int matrix. If you want use this data types in different file formats you should add your needs to the fromString(IOType ioType, String sValue) and toString(IOType ioType).
Kier Shape 1 descriptor [tc00] (see also the Section called Kier Shape 1 in Chapter 5)
Number of hydrogen bond donors (see also the Section called Number of Hydrogen Bond Donors (HBD) 1 in Chapter 5)
Number of nitrogen atoms(see also the Section called Number of nitrogen atoms in Chapter 5)
External rotational symmetry or graph potentials [wy96] (see also the Section called Graph potentials in Chapter 5)
Partial charges after Gasteiger-Marsili [gm78] (see also the Section called Gasteiger-Marsili in Chapter 5)
Descriptor calculation example: joelib.test.TestDescriptor
All new descriptors should implement the joelib.desc.Descriptor-interface and be defined in the joelib.properties-file.
A simple example is the Kier descriptor joelib.desc.types.KierShape1. If you have a group of similar descriptors which uses the same initilization and result class you can write a wrapper class like joelib.desc.SMARTSCounter which can very easily be used to create a lot of SMARTS pattern count descriptors, e.g. joelib.desc.types.HBD1 to count the number of hydrogen donors in a molecule.
To remain user and developer friendly you should always produce a simple set of documentation files (XML, HTML, RTF) in the docs-directory:
The easiest way would be to create a XML DocBook documentation file in the docbook/descriptors-directory. These files can be easily transformed to HTML, RTF and PDF files. If you want using a formation in these descriptor documentation files you must use <sect1>...</sect1> or <sect2>...</sect2>, the <chapter> entries were already used by the tutorial book. Futhermore you can use listitems, tables or analoge elements. All these single descriptor documentation files will be generated by the Ant makefile mechanism (calling ant tutorial) and be available as HTML- and RTF-files in the docs/tutorial/descriptors/documentation-directory