Re: [cml/ccml-discuss] ECCP wiki and some questions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

At 10:06 25/02/2005 -0600, Jeongnim Kim wrote:
> >
> > Thanks very much for your comments. The eCCP1 project has set up an 
> email list and this will be useful for these discussions (eccp-data 'at' 
> forge.nesc.ac.uk). Any discussion will also be important for the CML 
> community, since Peter Murray-Rust is interested in building some of this 
> into CML. For now I have copied in both mail lists.

I think this is a good strategy

> >
> > It was decided that the initial approach in the design of the markup 
> would be minimalist and as certain information was found to be useful 
> then it could be added.

Yes. Our experience has been to start small and put in the essential things 
as they are required. Trying to create a universal approach usually fails 
and is also much much more difficult to implement in software

> >
> > The number of primitives will be fixed for each contraction within a 
> basisGroup. The number of Gaussians could therefore be provided by an 
> attribute of the basisGroup element-type - perhaps:
> >
> > <basisGroup minL="0" maxL="0" nprim="6">
> > <exponents>3047.524880 457.369518 103.948685 29.210155 9.286663 
> 3.163927</exponents>
> > <contraction>0.001835 0.014037 0.068843 0.232184 0.467941 
> 0.362312</contraction>
> > </basisGroup>

I think that there is considerable scope for new components here. I suggest 
that we try to keep it to no more than 10 (excluding trivial ones like 
FooList to hold Foos). Remember that for each new element we now require:
         - a schema entry
         - one or more examples (which will be algorithmically converted to 
code)
         - a stylesheet
         - a Tool (if there are non-trivial semantics) including unit tests
         - a renderer (if it has non-textual representation(s))
These are not as frightening as they sound but it's a warning not to create 
50 new elements!

In addition they will generate needs for:
         - conversion from (to) legacy (though this primary scales with the 
number of legacy apps)

I have no strong preconceptions here. As mentioned before we exclude 
pseudopotentials

> >
> > Which QC codes are you working with?
>
>I plan to support Gaussian, Crystal and NWChem in our QMC code to start
>with. It is just because they are supported our centers.
>
>I realized that all these codes use different definitions about the
>contraction (e.g., include/exclude the normalization factors), the
>ordering of the m channels for a given l and depending upon l, the sign of
>the spherical functions etc. Since I need the correct information to do
>anything, I had to come up with the generic way to deal with all the
>variation. At this point, I temporarily decided to treat the package as
>one of the attribute to tell what the complete basis set and eigen vectors
>should mean.

I would personally find it extremely helpful to have a summary of the 
different approaches and some examples. It sounds as if this is one of the 
major problems we have to address. Without it all solutions will be 
program-dependent and we want to avoid that.

Are these approaches semantically equivalent, or are they philosophically 
distinct? As an example, crystallographers can represent a unit cell as 
lengths+angles, lattice vectors, or metric tensor. They all contain the 
same basic information, though lattice vectors also contain an orientation. 
Because I am a crystallographer, and most chemical crystallographers use 
lengths+angles, CML started with the first. Under pressure from the solid 
state community, latticeVector is now being explored. It means, of course, 
that CML needs to know the semantics between them.

In this case it may be useful to take an arbitrary decision and adopt one 
approach as the CMLComp one and to provide conversion to the others. Comments?

>Unless QC packages decide to produce a cml file as an output, I have to
>somehow convert the output from a package to "something" anyway. I  would
>hope
>that cml can define this "something" without ambiguity.

I fully agree. Externalising the semantics is an enormous benefit in 
constructing programs, and more recently workflows and web services.

>In addition, the similar concept of basisGroup has been used for
>slater-type orbitals and numerical orbitals stored in hdf5 files in our
>code using ref as the mapping.

Look forward to seeing details. Ideally it would be useful to share this 
information on Wikis. However SF doesn't provide Wikis and we are not 
easily able to open our own Wiki. However the FSATOM group do have a wiki - 
I have contributed there before - and I think this would be a good place to 
explore

Very excited by your posting,

P.

>Jeongnim

Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069