The JOELib Wiki is now online
http://joelib.sourceforge.net/wiki/
An algorithm and feature dictionary is now online available at http://wiki.cubic.uni-koeln.de/dokuwiki/doku.php?id=wiki:joelib_algorithmdictionary
The Blue Obelisk Movement http://www.blueobelisk.org/
Blog http://wiki.cubic.uni-koeln.de/planetbo/
BibTex entry and link-out for this article
@article{ghhmrsww06,
author = {R. Guha and M.T. Howard and G.R. Hutchison and P. Murray--Rust and H. Rzepa and C. Steinbeck and J. Wegner and E. L. Willighagen},
title = {{T}he {B}lue {O}belisk--{I}nteroperability in {C}hemical {I}nformatics},
journal = {Journal of Chemical Information and Modeling},
year = {2006},
url = {http://dx.doi.org/10.1021/ci050400b}
doi = {10.1021/ci050400b}
}
Two actual papers are published in the 'QSAR & Comb. Chem.' journal http://dx.doi.org/10.1002/qsar.200510135 and http://dx.doi.org/10.1002/qsar.200510009
The Maximum Common Substructure (MCS) method works uses single atom properties, which can be also numeric.
The Optimal Assignment (Graph) Kernel works on multiple chemical atom and bond labels and is an extremely fast chemical similarity measure. Due to the fact that it is a positive definite matrix, this measure can be used for any SVM based learning method, like classification, regression, and clustering.... read more
We present two lectures at the ACS meeting (http://www.chemistry.org) discussing results for graph mining, data mining and clustering. All results were obtained by using JOELib (some unpublished work), the Weka and Spider machine learning libraries.
Lecture details can be found at
http://www-ra.informatik.uni-tuebingen.de/mitarb/wegner/
The Le Verrier-Faddeev-Frame method was added to JOELib2, which allows to calculate characteristic polynomials for general weighted graphs with user defined atom and bond labels.
For theoretical and application details see:
1. Trinajstic, N. Chemical Graph Theory CRC Press, Florida, U.S.A., 1992
2. Bonchev, D. & Rouvray, D.H. (ed.) Chemical Graph Theory: Introduction and Fundamentals Gordon and Breach Science Publishers, 1990.
After finishing the most critical refactorings and optimizing the stability metric (distance to optimal instability/abstractness line), we switch now from pre-alpha to alpha.
The good news for all users is that the interface will be frozen at this stage. All further changes can then be made to another major and hypthetical JOELib3 release.
The initial pre-alpha-JOELib2 release is now available via CVS under joelib2 or in the download section. This release is a heavily refactored version of the original JOELib version, with an emphasis on software design and future trends.
The tutorial contains now a formal definition of a molecular graph.
I've started with a code cleanup to reduce the PMD warnings a little bit.
Kind regards, Joerg
Hi all,
the new JOELib release adds an extensive SMARTS testing framework, which can be used to test and use SMARTS. For verbosity these results can be checked against molecule files (addressed by molecule name).
Have much fun, Joerg
New Quantitative Structure Activity Relationship (QSAR) project opened.
http://sourceforge.net/projects/qsar/
Sorry, the sequential CML2 (for uncompressed files ONLY !) reader caused null pointer exceptions for complex available descriptor values, like array, matrices, atom-pairs, ...
This was fixed and added to CVS.
A RSS feed adressing topics for Quantitative Structure Activity Relationship (QSAR), Ligand Based Drug Design (LBDD) and Structure Based Drug Design (SBDD) was established. It's still a test which contains also structural information based on CML2.
http://joelib.sourceforge.net/rss/index.xml
See actual CML-RSS paper in the Journal of Chemical Information and Computer Science (JCICS).
Chemical Markup, XML, and the World Wide Web. 5. Applications of Chemical
Metadata in RSS Aggregators,
Peter Murray-Rust, Henry S. Rzepa, Mark J. Williamson, and Egon L. Willighagen, DOI: 10.1021/ci034244p
A sequential SAX parser was added for uncompressed and compressed files.
This release contains an updated CML2 support including namespaces, descriptors and stereochemistry. BTW, the missing stereochemistry output was added to the MDL SD format !
if you are interested in QSAR, here we go. All mentioned descriptor calculation methods are part of the actual JOELib distribution !
Part I - Data preparation and feature selection:
http://dx.doi.org/10.1021/ci0342324
Part II - Human Intestinal Absorption:
http://dx.doi.org/10.1021/ci034233w
Regards, Joerg
The missing descriptions for the new descriptors were added and are now also available as HTML format in the joelib/src/docs/descs-directory. So the startup-warnings are now a thing of the past.
New published JOELib descriptor calculation classes (+AtomPair descriptor):
Atom_in_acceptor
Atom_in_conjugated_environment
Atom_in_donor_or_acceptor
Atom_in_donor
Atom_in_ring
Atom_in_terminal_carbon
Atom_is_negative
Atom_is_positive
Atom_mass
Atom_property_breadth_first_search
Atom_property_distance_matrix
Atom_valence
Atom_van_der_waals_volume
Auto_correlation
Breadth_first_search
Burden_modified_eigenvalues
Conjugated_electrotopological_state_index
Conjugated_topological_distance
Depth_first_search
Distance_matrix
Electrogeometrical_state_index
Electron_affinity
Electronegativity_pauling
Electrotopological_state_index
Fraction_of_rotatable_bonds
Gasteiger_Marsili
Geometrical_diameter
Geometrical_distance_matrix
Geometrical_radius
Geometrical_shape_coefficient
Global_topological_charge_index
Graph_potentials
Graph_shape_coefficient
Intrinsic_state
Kier_shape_1
Kier_shape_2
Kier_shape_3
LogP
MolarRefractivity
Molecular_weight
Number_of_B_atoms
Number_of_Br_atoms
Number_of_C_atoms
Number_of_Cl_atoms
Number_of_F_atoms
Number_of_HBA_1
Number_of_HBA_2
Number_of_HBD_1
Number_of_HBD_2
Number_of_I_atoms
Number_of_NO2_groups
Number_of_N_atoms
Number_of_OSO_groups
Number_of_O_atoms
Number_of_P_atoms
Number_of_SO2_groups
Number_of_SO_groups
Number_of_S_atoms
Number_of_acidic_groups
Number_of_aliphatic_OH_groups
Number_of_aromatic_OH_groups
Number_of_aromatic_bonds
Number_of_atoms
Number_of_bonds
Number_of_halogen_atoms
Number_of_heavy_bonds
Number_of_heterocycles
Number_of_hydrophobic_groups
Number_of_rotatable_bonds
Pharmacophore_fingerprint_1
PolarSurfaceArea
RDF
Topological_atom_pair
Topological_diameter
Topological_radius
Weighted_burden_modified_eigenvalues
Zagreb_group_index_1
Zagreb_group_index_2
The white space character extension caused a bug for Linux systems which is now fixed.
The improved PDF-writer stores now also descriptor entries. More complex descriptor entries like arrays, matrices and so on are truncated. Otherwise the layout would be too bad.
2D rendering facility for molecules using AWT and image creation facility for BMP, GIF, JPEG, PDF, PNG and PPM.
The package path in the LibGhemical JNI interface was corrected.
The source code was checked for duplicate code in descriptor calculation classes and simple atom property calculation classes were added to facilitate this task.
The API documentation was a little bit improved, especially for the descriptor and the io classes.
The tutorial was extended with 26 pages explained examples (source code snippets) for standard tasks.
Unfortunately now the duplicate code moved to these helper classes !;-) Much greetings to the PrettyMessDetection (PMD) crew (http://pmd.sourceforge.net/cgi-bin/webpmd.pl). removing duplicated code is much more complicated than expected.
Developer (!) interfaces to Weka (http://www.cs.waikato.ac.nz/ml/weka/) and Ghemical (http://bioinformatics.org/project/?group_id=41) added to CVS. Ghemical works ONLY under/with: Windows/Cygwin/SUN-JRE1.4 and Linux/IBM-JRE1.4