Hi all,

We are tantalisingly close to primarily using maven to build the CDK. Using a standardised way of building will not only make the it easier to developer but the separate source trees will also give a different perspective on how the project is organised. 

Egon recently showed an overview of the currently module dependencies: http://egonw.tumblr.com/post/70669027045/current-overview-of-cdk-module-dependencies-in. There’s a bit of a hairball in the middle but it generally looks okay. The true picture is a little more complicated as inherited decencies are not shown there.

One thing I’ve been thinking about recently is to group the modules together into ‘super' modules. This coarser partition of functionality breaks down the code into a much clearer view of what is available and where. In the scheme below I have organised the current modules in to 6 (7) top level groups. Unfortunately some of best names that would be obvious to use (i.e. ‘io’, ‘qsar’) are already taken and so it’s a bit difficult to think of concise descriptive for these. I’ve tried to stick to noun’s as the verb’s are bit yuk. We could also go plural, ‘ios', ‘qsars’, ‘descriptors’ etc...

If you have suggestions (or objections) to the group names please indicate. Also, there are still many corners of library that I’m not 100% sure of their function / utility. Let me know if there are any parts out of place.

Thanks,
John

Scheme:

base - contains the object-model and ‘core’ parts that everything else builds upon, alternative name: ‘domain’?
annotation
atomtype
core
data
silent
datadebug
dict
interfaces
reaction
standard
valencycheck
tool - contains general utilities and tools that are utilised together or separately in other other modules
fingerprint
builder3d
builder3dtools
forcefield
  charges
  cip
formula
fragment
group
hash
isomorphism - could go in base
pcore
sdg
signature
smarts
structgen
tautomer
smsd - this could also be a top level module as it sits atop everything else and was originally a separate project

storage - contains functionality for storing data to different formats, ideally we would use ‘io’ but that module is quite essential (contains Molfile readers) and it would not be feasible to refactor the naming. alternative names: ios, store (verb), persist (verb) - e.k. very corporate
inchi
io
ioformats
ionpot
iordf
libiocml
libiomd
pdb
pdbcml
smiles

prediction - contains quantitative modelling descriptors we could perhaps use ‘qsar' here and change that modules name to ‘qsar-core’ as it only contains base classes and no descriptors. alternative name: describe, descriptors
qsar
qsaratomic
qsarbond
qsarcml
qsarionpot
qsarmolecular
qsarprotein

display - contains functionality for drawing depictions (not laying out atoms) this would also include ‘render-svg’, ‘render-eps' etc. alternative names: depict (verb)
render
renderawt
renderbasic
renderextra

deprecated - contains old code that is not used in the library but is required for backwards compatibility, only included in downstream projects when needed

miscellaneous - bits I can’t place anywhere else, alternative name: other
control - undo/redo framework and modifications to structures
extra   - dumping ground for classes that need a home, includes IO, matrix utilities and everything but the kitchen sink :)
diff    - primarily used in tests to display and describe difference in attributes, it could go in base but I see this as auxiliary functionality
log4j   - implementation of CDK's own logging framework, could go in base but it would be good to switch to SLF4J at some point instead of maintaining a custom logging facade. Placing it here highlight that.
qm      - quantum mechanics? not in tools as only seems to be a few data structures and no real functionality? also contains a renderer eh?