JOELib Tutorial: A Java based cheminformatics/computational chemistry package | ||
---|---|---|
Prev | Chapter 3. Molecule operation methods and classes | Next |
Supported file types can be defined in the joelib.properties-file in the joelib/src directory if you have a source distribution or in the joelib/lib/joelib.jar:joelib.properties-file if you have a binary distribution. If no represenation for an input/output-type is defined JOELib will try to load a default interpreting class which is defined in joelib.io.IOTypeHolder. Here is an example definition:
Example 3-4. Definition of IO types
joelib.filetypes.1.name = SDF joelib.filetypes.1.representation = joelib.io.types.MDLSD joelib.filetypes.2.name = SMILES joelib.filetypes.2.representation = joelib.io.types.Smiles joelib.filetypes.3.name = CTX joelib.filetypes.4.name = CML joelib.filetypes.5.name = POV
The defined input/output definitions can be easily accessed with two helper classes joelib.io.SimplerReader and joelib.io.SimplerWriter which should help you to understand the mechanism to get an suitable reader and writer object for your needs. Here is a list of supported input/output types:
Table 3-3. Supported file formats
Name | is readable | is writeable | Description |
---|---|---|---|
UNDEFINED | false | false | Undefined |
BMP | false | true | Windows Bitmap (BMP) image |
CML | true | true | Chemical Markup Language (CML) |
CTX | true | true | CACTVS clear text format (CTX) |
POV | false | true | Persistence Of Vision (POV) Ray Tracer |
FLAT | true | true | Flat file format |
Gaussian | true | true | Gaussian |
GIF | false | true | CompuServe Graphics Interchange (GIF) image |
JPEG | false | true | JPEG image |
JCAMP | false (but in progress) | false (but in progress) | Joint Commitee on Atomic and Molecular Physical Data |
MOL2 | true | true | Sybyl Mol2 |
MOLCONNZ | true | false | MolConnZ |
MOPACOUT | true | false | MOPAC Output |
PDB | false (but in progress) | true | Protein Data Bank |
false | true | Portable Adobe Document Format (PDF) | |
PNG | false | true | Portable Network Graphics (PNG) image |
PPM | false | true | Portable Pixelmap (PPM) image |
PREP | true | false | Amber PREP |
SDF | true | true | MDL SD file |
SMILES | true | true | Simplified Molecular Input Line Entry System (SMILES) |
TINKER | false | true | Tinker XYZ |
XYZ | true | true | XYZ |
ZIP | true | true | Compressed ZIP file format |
XML respective Chemical Markup Language (CML) [mr99, mr01a, mr01b, wil01] (in progress)
For more informations have a look at http://www.xml-cml.org/. The CML output type can be defined in the joelib.properties-file:
########### # CML # version: 1.0 and 2.0 # ouput: attributearray, array, large, huge # delimiter: if you comment this line, standard white space will be used # force.formalCharge: formal charges will be always written, even when they are zero # partialCharge: write partial atom charge # hydrogenCount: write number of implicite+explicite hydrogens ########### ## use slower memory saving preparser for avoiding to load the complete data set into memory ## This flag will be automatically switched 'ON' for CML files in compressed ZIP files ! ## The basic convert does not need it, because it uses already another sequential ## SAX reader (forced by a callback) joelib.io.types.ChemicalMarkupLanguage.useSlowerMemorySavingPreparser=false ########### joelib.io.types.ChemicalMarkupLanguage.output.defaultVersion=2.0 joelib.io.types.ChemicalMarkupLanguage.defaultDelimiter=\u0020 #joelib.io.types.ChemicalMarkupLanguage.defaultDelimiter=| joelib.io.types.ChemicalMarkupLanguage.output=huge joelib.io.types.ChemicalMarkupLanguage.output.force.formalCharge=false joelib.io.types.ChemicalMarkupLanguage.output.partialCharge=true joelib.io.types.ChemicalMarkupLanguage.output.hydrogenCount=true joelib.io.types.ChemicalMarkupLanguage.output.useNamespace=true joelib.io.types.ChemicalMarkupLanguage.output.namespace=cml joelib.io.types.ChemicalMarkupLanguage.output.xmlDeclaration=http://www.xml-cml.org/schema/cml2/core joelib.io.types.ChemicalMarkupLanguage.DTD.resourceDir=joelib/io/types/cml/data/ ########### ## a first step to 'reproducable' descriptor calculation algorithms joelib.io.types.ChemicalMarkupLanguage.output.storeChemistryKernelInfo=true ## these informations are not really a CML standard ########### joelib.io.types.ChemicalMarkupLanguage.output.symmetryInformations=false ###########
CACTVS's clear text format (CTX) [gas95] http://www2.chemie.uni-erlangen.de/software/cactvs/index.html
The image output properties can be defined in the joelib.properties-file:
# General image writer joelib.gui.render.Mol2Image.defaultWidth=300 joelib.gui.render.Mol2Image.defaultHeight=200 # General 2D rendering options joelib.gui.render.Renderer2DModel.bond.length=30.0 joelib.gui.render.Renderer2DModel.bond.distance=6.0 joelib.gui.render.Renderer2DModel.bond.width=2.0 joelib.gui.render.Renderer2DModel.drawNumbers=false joelib.gui.render.Renderer2DModel.useKekuleStructure=false joelib.gui.render.Renderer2DModel.showEndCarbons=true joelib.gui.render.Renderer2DModel.atomColoring=false joelib.gui.render.Renderer2DModel.orthoLineOffset=20 joelib.gui.render.Renderer2DModel.arrowOffset=10 joelib.gui.render.Renderer2DModel.arrowSize=5 joelib.gui.render.Renderer2DModel.background.color.r=255 joelib.gui.render.Renderer2DModel.background.color.g=255 joelib.gui.render.Renderer2DModel.background.color.b=255 joelib.gui.render.Renderer2DModel.foreground.color.r=0 joelib.gui.render.Renderer2DModel.foreground.color.g=0 joelib.gui.render.Renderer2DModel.foreground.color.b=0 joelib.gui.render.Renderer2DModel.highlight.color.r=255 joelib.gui.render.Renderer2DModel.highlight.color.g=0 joelib.gui.render.Renderer2DModel.highlight.color.b=0 joelib.gui.render.Renderer2DModel.number.color.r=0 joelib.gui.render.Renderer2DModel.number.color.g=0 joelib.gui.render.Renderer2DModel.number.color.b=255 joelib.gui.render.Renderer2DModel.conjugatedRing.color.r=0 joelib.gui.render.Renderer2DModel.conjugatedRing.color.g=0 joelib.gui.render.Renderer2DModel.conjugatedRing.color.b=0 joelib.gui.render.Renderer2DModel.arrow.color.r=0 joelib.gui.render.Renderer2DModel.arrow.color.g=255 joelib.gui.render.Renderer2DModel.arrow.color.b=0 joelib.gui.render.Renderer2DModel.orthogonalLine.color.r=0 joelib.gui.render.Renderer2DModel.orthogonalLine.color.g=0 joelib.gui.render.Renderer2DModel.orthogonalLine.color.b=255
Joint Commitee on Atomic and Molecular Physical Data (JCAMP) format [dl93, dw88, ghhjs91, lhdl94] (in progress)
The Protein Data Bank (PDB) is an archive of experimentally determined three-dimensional structures of biological macromolecules, serving a global community of researchers, educators, and students. The archives contain atomic coordinates, bibliographic citations, primary and secondary structure information, as well as crystallographic structure factors and NMR experimental data.
The PDF output properties can be defined in the joelib.properties-file:
# PDF writer joelib.io.types.PDF.fontSize=10 joelib.io.types.PDF.fontOffset=2 joelib.io.types.PDF.pageBorder=20
Additional the normal image writer properties must be set described in detail in the Section called Image writers (BMP, GIF, JPEG, PPM).
Persistence Of Vision (POV) Ray Tracer: http://www.povray.org/
Here some features:
Three supported visualisation types: Spheres, Balls & Sticks, Sticks.
Aromatic rings are visualized as torus elements. p orbitals can be also be visualized as simple lines. Please let me know if you have a good p orbital element under POVRay.
You can use atom properties for atom coloring.
The PovRay output type can be defined in the joelib.properties-file:
# PovRay # ouput type can be: stick, sphere, ball_and_stick joelib.io.types.POVRay.output=ball_and_stick joelib.io.types.POVRay.atomPropertyColoring=false joelib.io.types.POVRay.atomProperty=Gasteiger_Marsili
Structured Data File (SDF) format or MDL molfile format [sdf] This is the mostly used molecule format in JOELib. Please let me know if you have further improving proposals. Golden rule: Empty lines are not allowed in the data block of MDL SD files, because an empty line is the signal for the end of an actual data entry. Because the internal data representation has this as precondition all empty lines in file format loaders which are contained in data entries must be converted to ?, 0.0 or whatever you want, except an empty line.
You can force JOELib to write kekulized molecular structures instead of molecular structures with assigned aromaticity, by using:
#SD Files joelib.io.types.MDLSD.writeAromaticityAsConjugatedSystem=false
Simplified Molecular Input Line Entry System (SMILES) [smiles, wei88, wei89].
The SMILES input/output-line-definition can be defined in the joelib.properties-file:
#SMILES joelib.io.types.Smiles.canonical=false joelib.io.types.Smiles.lineStructure=SMILES|TITLE joelib.io.types.Smiles.lineStructure.delimiter=| joelib.io.types.Smiles.lineStructure.input.delimiter=\u0020\t\n\r joelib.io.types.Smiles.lineStructure.output.delimiter=\u0020
The canonical/unique SMILES representation of a molecule can be calculated, if the canonical-property entry is set to true (see the Section called Morgan: Unique atom numbering in Chapter 6).
A mol2 file (.mol2) is a complete, portable representation of a SYBYL molecule. It is an ASCII file which contains all the information needed to reconstruct a SYBYL molecule.
Some fortran programs like Tinker are very sensitive to Unix or Windows files, because the new line characters. Remember this if you have problems under Unix systems with files generated under Windows.
You are missing file formats ? Write your own import and export filter! Use the MoleculeFileType class as abstract parent and fill the methods with functionality. The formatting of the input or output is pretty easy with the ScanfFormat and PrintfFormat classes from John E. Lloyd at:
http://www.cs.ubc.ca/~lloyd/java/doc/cformat.html
Internally atoms have special atom types, which were defined as SMARTS pattern in the joelib/data/plain/atomtype.txt-file. These types can be used for exporting easily to other file formats, especially force filed or ab inito programs. For the last task there is the joelib.data.JOETypeTable helper class available which uses the default converting types in joelib/data/plain/types.txt.
Golden rule: Empty lines are not allowed in the data block of MDL SD files, because an empty line is the signal for the end of an actual data entry. Because the internal data representation has this as precondition all empty lines in file format loaders which are contained in data entries must be converted to ?, 0.0 or whatever you want, except an empty line.