Input and output

Supported input and output formats

Supported file types can be defined in the joelib.properties-file in the joelib/src directory if you have a source distribution or in the joelib/lib/joelib.jar:joelib.properties-file if you have a binary distribution. If no represenation for an input/output-type is defined JOELib will try to load a default interpreting class which is defined in joelib.io.IOTypeHolder. Here is an example definition:

Example 3-4. Definition of IO types

joelib.filetypes.1.name           = SDF
joelib.filetypes.1.representation = joelib.io.types.MDLSD
joelib.filetypes.2.name           = SMILES
joelib.filetypes.2.representation = joelib.io.types.Smiles
joelib.filetypes.3.name           = CTX
joelib.filetypes.4.name           = CML
joelib.filetypes.5.name           = POV

The defined input/output definitions can be easily accessed with two helper classes joelib.io.SimplerReader and joelib.io.SimplerWriter which should help you to understand the mechanism to get an suitable reader and writer object for your needs. Here is a list of supported input/output types:

Table 3-3. Supported file formats

Nameis readableis writeableDescription
UNDEFINEDfalsefalseUndefined
BMPfalsetrueWindows Bitmap (BMP) image
CMLtruetrueChemical Markup Language (CML)
CTXtruetrueCACTVS clear text format (CTX)
POVfalsetruePersistence Of Vision (POV) Ray Tracer
FLATtruetrueFlat file format
GaussiantruetrueGaussian
GIFfalsetrueCompuServe Graphics Interchange (GIF) image
JPEGfalsetrueJPEG image
JCAMPfalse (but in progress)false (but in progress)Joint Commitee on Atomic and Molecular Physical Data
MOL2truetrueSybyl Mol2
MOLCONNZtruefalseMolConnZ
MOPACOUTtruefalseMOPAC Output
PDBfalse (but in progress)trueProtein Data Bank
PDFfalsetruePortable Adobe Document Format (PDF)
PNGfalsetruePortable Network Graphics (PNG) image
PPMfalsetruePortable Pixelmap (PPM) image
PREPtruefalseAmber PREP
SDFtruetrueMDL SD file
SMILEStruetrueSimplified Molecular Input Line Entry System (SMILES)
TINKERfalsetrueTinker XYZ
XYZtruetrueXYZ
ZIPtruetrueCompressed ZIP file format

XML respective Chemical Markup Language (CML)

XML respective Chemical Markup Language (CML) [mr99, mr01a, mr01b, wil01] (in progress)

For more informations have a look at http://www.xml-cml.org/. The CML output type can be defined in the joelib.properties-file:

###########
# CML
# version:            1.0 and 2.0
# ouput:              attributearray, array, large, huge
# delimiter:          if you comment this line, standard white space will be used
# force.formalCharge: formal charges will be always written, even when they are zero
# partialCharge:      write partial atom charge
# hydrogenCount:      write number of implicite+explicite hydrogens
###########
## use slower memory saving preparser for avoiding to load the complete data set into memory
## This flag will be automatically switched 'ON' for CML files in compressed ZIP files !
## The basic convert does not need it, because it uses already another sequential
## SAX reader (forced by a callback)
joelib.io.types.ChemicalMarkupLanguage.useSlowerMemorySavingPreparser=false
###########
joelib.io.types.ChemicalMarkupLanguage.output.defaultVersion=2.0
joelib.io.types.ChemicalMarkupLanguage.defaultDelimiter=\u0020
#joelib.io.types.ChemicalMarkupLanguage.defaultDelimiter=|
joelib.io.types.ChemicalMarkupLanguage.output=huge
joelib.io.types.ChemicalMarkupLanguage.output.force.formalCharge=false
joelib.io.types.ChemicalMarkupLanguage.output.partialCharge=true
joelib.io.types.ChemicalMarkupLanguage.output.hydrogenCount=true
joelib.io.types.ChemicalMarkupLanguage.output.useNamespace=true
joelib.io.types.ChemicalMarkupLanguage.output.namespace=cml
joelib.io.types.ChemicalMarkupLanguage.output.xmlDeclaration=http://www.xml-cml.org/schema/cml2/core
joelib.io.types.ChemicalMarkupLanguage.DTD.resourceDir=joelib/io/types/cml/data/
###########
## a first step to 'reproducable' descriptor calculation algorithms
joelib.io.types.ChemicalMarkupLanguage.output.storeChemistryKernelInfo=true
## these informations are not really a CML standard
###########
joelib.io.types.ChemicalMarkupLanguage.output.symmetryInformations=false
###########

Image writers (BMP, GIF, JPEG, PPM)

The image output properties can be defined in the joelib.properties-file:

# General image writer
joelib.gui.render.Mol2Image.defaultWidth=300
joelib.gui.render.Mol2Image.defaultHeight=200

# General 2D rendering options
joelib.gui.render.Renderer2DModel.bond.length=30.0
joelib.gui.render.Renderer2DModel.bond.distance=6.0
joelib.gui.render.Renderer2DModel.bond.width=2.0
joelib.gui.render.Renderer2DModel.drawNumbers=false
joelib.gui.render.Renderer2DModel.useKekuleStructure=false
joelib.gui.render.Renderer2DModel.showEndCarbons=true
joelib.gui.render.Renderer2DModel.atomColoring=false
joelib.gui.render.Renderer2DModel.orthoLineOffset=20
joelib.gui.render.Renderer2DModel.arrowOffset=10
joelib.gui.render.Renderer2DModel.arrowSize=5

joelib.gui.render.Renderer2DModel.background.color.r=255
joelib.gui.render.Renderer2DModel.background.color.g=255
joelib.gui.render.Renderer2DModel.background.color.b=255

joelib.gui.render.Renderer2DModel.foreground.color.r=0
joelib.gui.render.Renderer2DModel.foreground.color.g=0
joelib.gui.render.Renderer2DModel.foreground.color.b=0

joelib.gui.render.Renderer2DModel.highlight.color.r=255
joelib.gui.render.Renderer2DModel.highlight.color.g=0
joelib.gui.render.Renderer2DModel.highlight.color.b=0

joelib.gui.render.Renderer2DModel.number.color.r=0
joelib.gui.render.Renderer2DModel.number.color.g=0
joelib.gui.render.Renderer2DModel.number.color.b=255

joelib.gui.render.Renderer2DModel.conjugatedRing.color.r=0
joelib.gui.render.Renderer2DModel.conjugatedRing.color.g=0
joelib.gui.render.Renderer2DModel.conjugatedRing.color.b=0

joelib.gui.render.Renderer2DModel.arrow.color.r=0
joelib.gui.render.Renderer2DModel.arrow.color.g=255
joelib.gui.render.Renderer2DModel.arrow.color.b=0

joelib.gui.render.Renderer2DModel.orthogonalLine.color.r=0
joelib.gui.render.Renderer2DModel.orthogonalLine.color.g=0
joelib.gui.render.Renderer2DModel.orthogonalLine.color.b=255

Joint Commitee on Atomic and Molecular Physical Data (JCAMP)

Joint Commitee on Atomic and Molecular Physical Data (JCAMP) format [dl93, dw88, ghhjs91, lhdl94] (in progress)

Protein Data Base (PDB)

Protein Data Base

The Protein Data Bank (PDB) is an archive of experimentally determined three-dimensional structures of biological macromolecules, serving a global community of researchers, educators, and students. The archives contain atomic coordinates, bibliographic citations, primary and secondary structure information, as well as crystallographic structure factors and NMR experimental data.

Portable Adobe Document Format (PDF)

The PDF output properties can be defined in the joelib.properties-file:

# PDF writer
joelib.io.types.PDF.fontSize=10
joelib.io.types.PDF.fontOffset=2
joelib.io.types.PDF.pageBorder=20

Additional the normal image writer properties must be set described in detail in the Section called Image writers (BMP, GIF, JPEG, PPM).

Persistence Of Vision (POV) Ray Tracer

Persistence Of Vision (POV) Ray Tracer: http://www.povray.org/

Here some features:

  • Three supported visualisation types: Spheres, Balls & Sticks, Sticks.

  • Aromatic rings are visualized as torus elements. p orbitals can be also be visualized as simple lines. Please let me know if you have a good p orbital element under POVRay.

  • You can use atom properties for atom coloring.

The PovRay output type can be defined in the joelib.properties-file:

# PovRay
# ouput type can be: stick, sphere, ball_and_stick
joelib.io.types.POVRay.output=ball_and_stick
joelib.io.types.POVRay.atomPropertyColoring=false
joelib.io.types.POVRay.atomProperty=Gasteiger_Marsili

Structured Data File (SDF)

Structured Data File (SDF) format or MDL molfile format [sdf] This is the mostly used molecule format in JOELib. Please let me know if you have further improving proposals. Golden rule: Empty lines are not allowed in the data block of MDL SD files, because an empty line is the signal for the end of an actual data entry. Because the internal data representation has this as precondition all empty lines in file format loaders which are contained in data entries must be converted to ?, 0.0 or whatever you want, except an empty line.

You can force JOELib to write kekulized molecular structures instead of molecular structures with assigned aromaticity, by using:

#SD Files
joelib.io.types.MDLSD.writeAromaticityAsConjugatedSystem=false

Simplified Molecular Input Line Entry System (SMILES)

Simplified Molecular Input Line Entry System (SMILES) [smiles, wei88, wei89].

The SMILES input/output-line-definition can be defined in the joelib.properties-file:

#SMILES
joelib.io.types.Smiles.canonical=false
joelib.io.types.Smiles.lineStructure=SMILES|TITLE
joelib.io.types.Smiles.lineStructure.delimiter=|
joelib.io.types.Smiles.lineStructure.input.delimiter=\u0020\t\n\r
joelib.io.types.Smiles.lineStructure.output.delimiter=\u0020

The canonical/unique SMILES representation of a molecule can be calculated, if the canonical-property entry is set to true (see the Section called Morgan: Unique atom numbering in Chapter 6).

Sybyl (MOL2)

Tripos Mol2 File Format

A mol2 file (.mol2) is a complete, portable representation of a SYBYL molecule. It is an ASCII file which contains all the information needed to reconstruct a SYBYL molecule.

Tinker (TINKER)

Tinker

Some fortran programs like Tinker are very sensitive to Unix or Windows files, because the new line characters. Remember this if you have problems under Unix systems with files generated under Windows.

Writing your own import/export filter

You are missing file formats ? Write your own import and export filter! Use the MoleculeFileType class as abstract parent and fill the methods with functionality. The formatting of the input or output is pretty easy with the ScanfFormat and PrintfFormat classes from John E. Lloyd at:

http://www.cs.ubc.ca/~lloyd/java/doc/cformat.html

Internally atoms have special atom types, which were defined as SMARTS pattern in the joelib/data/plain/atomtype.txt-file. These types can be used for exporting easily to other file formats, especially force filed or ab inito programs. For the last task there is the joelib.data.JOETypeTable helper class available which uses the default converting types in joelib/data/plain/types.txt.

Golden rule: Empty lines are not allowed in the data block of MDL SD files, because an empty line is the signal for the end of an actual data entry. Because the internal data representation has this as precondition all empty lines in file format loaders which are contained in data entries must be converted to ?, 0.0 or whatever you want, except an empty line.