From: Peter Murray-R. <pm...@ca...> - 2005-02-26 12:29:37
|
Until recently the public discussion on CML has been fairly low and contributions have come from a core of 5-10 groups or individuals, many of whom have met IRL. With the increasing acceptance and deployment of CML, the release of JUMBO4.6 (the first version whose architecture I have confidence in :-) and a number of projects which have adopted CML, we want to widen the discussion and involvement of the community. This mail sets out some history, suggests ground rules, thanks current contributors and invites and welcomes new ones. In my own mind, CML started in about 1980 when I was writing software to analyze crystal structures in the Cambridge database. With Sam Motherwell and Jim Raftery we created a system (GEOSTAT) which read numbers of database entries, carried out 2-and 3-D substructure search, correlated fragments and mapped them, calculated geometrical parameters, carried out statics and rendered the results. There was no formal data structure or externalisation and it is clear that these were major impediments. As an example the system would only read data in then database format and would share it through COMMON blocks. This meant there was no way that it could be used to analyse (say) the structure of ligands in PDB. It is an imminent hope that, 25 years after, we can represent the same functionality in a CML system! We're nearly there... 1980s technology could not support proper software development and we saw the explosion of "file formats" which currently bedevil us - confusing storage, data structures, formalisation of semantics and generally locking chemistry into a visual rather than semantic subject. However other disciplines began to adopt newer ways and I have been particularly influenced by: libraries (e.g. NAG), packages (e.g. SPSS), workflows (AVS) and of course OO-programming. Henry and I met in the early 1990's - I can't remember exactly how we made contact but he visited Glaxo (where I then was) - probably in ca 1992. Over that period we have evolved a symbiotic relationship and meet frequently IRL. I tend to take the initial lead in code development and Henry in areas of web deployment but everything ends up as a joint work, especially the concept "CML". Most of the JUMBO system is written by me, at least in first instance, but increasing numbers of people (listed in the distrib) are starting to contribute. In our experience most Open projects benefit from a benign dictatorship, based largely on Eric Raymond's principles - contribution is fundamental. CML is an architecture and architectures are difficult to create. They usually require continual refactoring and that is why JUMBO4.6 is called a "major release". The architecture also embodies a vision and this has changed over the years from a toolkit to visualiser and back to a semantic toolkit. Among the forces driving this have been the availability of technology. For JUMBO1 I had to create a complete DOM, tree renderer and editor, molecular visualiser, SGML Parser, chemical perception engine and more, all in AWT 1.02 (i.e with little library help). With the availability of quality tools (both generic and molecular) JUMBO has now converged to a toolkit. Because of the contributions from the Open community and the slow but increasing availability of CML-aware commercial tools many components can be removed from JUMBO. It may sound Zen-like but I feel extremely happy when I am able to delete a functionality from Jumbo. Therefore the core of CML is currently Henry and me - not a committee, not a voting system and not a standards body. Maybe at some stage CML will become a formal standard somewhere (this has been requested) but it would be inappropriate at this stage. Standards are to enforce conformance or to resolve questions in law courts. However we do take conformance very seriously and are currently devising a de facto approach towards this. While it will not be an ISO/OASIS/IUPAC standard, it will be available to support the chemical community. For example we work very closely with IUPAC on XML. CML is also not universal. It is not intended to be the only way of doing chemistry in XML. For example the analytical community has developed AniML which represents a more formal and complete way of representing data and spectra. We have frequent interaction with AniML and design CML so that it will provide "hooks" into it where required. The same is true of ThermoML. CML is mainly driven by existing practice rather than trying to change the conceptualisation of chemistry. In a few cases (e.g. CMLReact/CMLSnap) we think there is an opportunity for representing chemistry in a slightly different way but in general there are few surprises, just formalisation. However chemistry has so much implicit semantics that this formalisation is very challenging. In general, therefore, CML is driven by current examples. We have developed CMLReact by asking "can we represent these reactions in CML?" We'd be very grateful for contributions of examples - these might well stress the system. We are doing the same with CMLComp. To summarise this and other mails we invite collaborations and contributions - all are formally credited. Suggested areas are * bugfixing * examples * documentation and tutorials * wrappers * conversion kits to/from other XML and legacy systems * databases * interfaces and adapters and perhaps in conjunction with other projects * editing tools * renderers We are happy to work with anyone (Open, non-Open, non-commercial, commercial, etc.). The contributions of the Open community are publicly visible on many lists. We particularly want to acknowledge the support over several years of Dan Zaharevitz at the National Cancer Institute for parts of JUMBO. We are pleased to see commercial products which include CML but make it clear that we have not been involved in these and offer no assertion that they are conformant. (If a product makes inappropriate claims or representations we shall contact its developers) We do not normally approach commercial companies about incorporating CML but are happy to be approached by them if they want to explore CML. P. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |