From: <ten...@gm...> - 2007-07-09 09:52:17
|
Hi, sorry for answering such late, but I was quite busy. I read the last 30 mails or so and I like to give some general comments on the atom typing/perception stuff (maybe it is no unique information, because I maybe have skipped a mail or cannot rememeber, sorry for this). Atom typing depends on the defintion of the atom types. As Egon suggested one could do an own super-list or use a "commercial" one as a standard. From this standard list one could generate a conversion list. The definition of atom types should be independent from given hydrogens, these are suggested or assigned after the atom typing. All commercial implementations (as far as I know) of atom type perception are based on SMARTS (It is a petty that we cannot use the MQL for this). The question is, what information do one need to percieve the atom type for a given atom? It is charge, valence electrons, max bond order (sum), formal neigbours, hybridisation and the environment. Not for all atom types all information is needed and not in every format all information is given. The first step is to divide between pdb similar file formats and the MDL, SMILE format. PDB similar file formats will need a connection and bond guesser for the ligand. After this step an MDL-like atom typing could be performed. MDL like file formats and SMILES should both work fine for non metal elements, with the information given in it. Metals are in general a problem and should be treated in an extension. In the atom typing process one should have the option to assign to non-typed atoms a dummy type. This kind of atom type is used in MMFF94 and other force fields mostly for metals. After this step I could think of a (as mentioned also in a previously mail) AtomTypeGuesser which then will assign the best fitting atom type to the given atom which was assigned to a Dummy or is UNSET. The atom typing/perception is quite important because a lot of things in chemoinformatic depeneds on it, e.g. hydrogen assignment, therefore structure generation, descriptors, force fields etc. That also implies that the quality of the atom type guesser is quite crucial. Of course one could think to apply atom types by hand for a molecule, but this you maybe can do for a handfull of compounds in a nice GUI - not in a library. Imho the atom typing depends strongly on the SMART support. If one cannot use SMARTS you have to go the way I did with the MMFF94 atom types, which is a ugly one. When the SMART support is good, one can start to think about using which atom type list as a standard and to implement this - and this list should be used and assigned by standard (maybe directly in the FILE format readers, to make assure that the other functions/methods will work. This is a little against the idea of CDK, so it would be necessary to create an extended FILEREADER class, which use the single file readers and the atom typing. If one like to do another atom typing as standard one could use still the single classes). A conversion list is not always easy because here one could run into the problem of multiple possiblities - where one would need again the AtomGuesser (and maybe SMART pattern for the non standard atom types). My conclusion: Without SMART or MQL no fast and reliable atom typing will be possible. best regards Christian -------- Original-Nachricht -------- Datum: Tue, 3 Jul 2007 09:58:31 -0400 Von: Rajarshi Guha <rg...@in...> An: Egon Willighagen <ego...@gm...> CC: cdkdevel <cdk...@li...> Betreff: Re: [Cdk-devel] VERY IMPORTANT: hybridization atom type woes and a rant > > On Jul 3, 2007, at 1:10 AM, Egon Willighagen wrote: > > > On 7/2/07, Rajarshi Guha <rg...@in...> wrote: > >> On Jul 2, 2007, at 12:55 AM, Egon Willighagen wrote: > >>> 1. find the best matching atom type (e.g. prefer C.sp2 over C) > >> > >> I disagree on this. If one is using an atom typing scheme X that > >> defines a set of type {A,B,C}, then if an atom in a molecule does not > >> match any of the defined types the matcher should return NULL. > > > > NULL surely makes sense: the algorithm could not figure it out, but > > what if the atom type list has backups? What about Sybyl atom types > > like Any, Hal, Du, Du.C then? > > > > (trying to think this out...) > > Certainly! > > My question remains: so the typing scheme is unsure of the exact atom > type and returns a list of possible atom types. How does the caller > choose one? > > > Let's define a meaning for a returned NULL: > > > > A. insufficient informations to decide? > > B. broken atom, e.g. C#C#C ? > > B. not matching atom type in the list? > > C. either/all of them ... > >> > > A, B and C seems the correct way to go. Options B and C would allow > the caller to realize that either their molecule is broken and/or the > atom typing scheme needs to be updated with a new atom type. > > Option A is the one where we need a good decision: insufficient > information could lead to multiple possible atom types. In such a > case, what would the behavior be? > > >> But my > >> earlier question still stands: given a list of possible atom types, > >> on what basis does one select a specific type? > > > > A structure generator might want to try them all... > > OK - but if we're planning on a super list of atom types, we also > need a general solution. One possibility is that for applications > which need only a single atom type they would assume that a list of > multiple atom types is the same as a NULL. Those that can handle > multiple possible atom types will be happy with a list > > >>> I also think there should be room for C.default, as some FF atom > >>> type > >>> lists have such a concept too. That may actually be C.sp3, but we > >>> need > >>> to check that. > >> > >> Hmm, I'm wary of this. Having a default atom type would let us avoid > >> returning a NULL type in some cases, but it still doesn't explicitly > >> flag missing types. > > > > To me, this suggests to return NULL if the IAtom is broken: either no > > information is available at all (new Atom(Elements.CARBON)), or > > incorrect, like a carbon with two triple bonds. > > I agree > > > I can also imagine that while the super atom type lists has a Du atom > > type for whatever left over (though not for broken atoms), some > > certainly do not, and then the translation from super-to-specific list > > might be NULL too. > > Yes > > ------------------------------------------------------------------- > Rajarshi Guha <rg...@in...> > GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE > ------------------------------------------------------------------- > A bug in the code is worth two in the documentation. > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail |