From: Peter MurrayRust <pm286@ca...>  20040322 13:31:39

At 10:37 22/03/2004 +0100, Joerg K. Wegner wrote: >Hi, > >taking a change to mention my hopefully objective arguments ! > >>>b) to give CDK a module for generating reasonable 3D geometries quickly >Mmmh, you should definitley use wellknown approaches for this task: Of course  and I am not always uptodate with literature :) >Use fast pseudopotential and torsion libraries: >1. Short summary and references: >http://citeseer.ist.psu.edu/finn99computational.html >Computational Appraoches to Drug Design I definitely favour a librarydriven approach. The question is whether the resources you quote are open. For this we need crystal structures (or good calculations). We have 250000 MOPAC calculations and, with the arrival of crystal ePrints at Southampton we may well have an open source of crystals. Then we have to build the library  I have been proofing this and can share it, but if anyone has made more progress? Essentially the idea is to create a large library of fragments, including variance >2. Complete summary (german only), efficient data structure, search trees, >steric overlaps, ...: >@PHDTHESIS{sch01, > author = {C. Schwab}, > title = {{K}onformative {F}lexibilit{\"a}t von {L}iganden im > {W}irkstoffdesign}, > school = {Erlangen}, > year = {2001} >} > >3. algorithms in drug design (german only): >http://wwwra.informatik.unituebingen.de/lehre/ws03/wirkstoffdesign_ausarbeitung/Christoph_Wilke_Docking.pdf >http://wwwra.informatik.unituebingen.de/lehre/ss03/pro_wirkstoffdesign_ausarbeitung/MarcLohrer.pdf >http://wwwra.informatik.unituebingen.de/lehre/ss02/pro_wirkstoffdesign_ausarbeitung/jens_joachim.pdf >http://wwwra.informatik.unituebingen.de/lehre/ss01/pro_wirkstoffdesign/ingste.pdf > >If you want to be competitive, i recommend strongly a SMARTS based torsion >angle library, as already mentioned. This seems a good idea if the library is available Best P. Peter MurrayRust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +441223763069 
From: Teng Lin <tlin@nd...>  20040318 23:12:05

Hi folks, I have some questions before settling down the design. I need some suggestions from your guys. In my own opinion, Molecular Dynamics and Energy Minimization Module should only focus on the mechanics itself. We propose to use a ForceFieldManager to handle the different forcefileds such as amber and CHARMM. Since these force field use X or C* something like that to represent the atom type, joergy's suggestion sounds good. And we could have a model which setup everything and expose neccesary interface to MD/EM module. As to Energy Minimization, (1)Which kind of minimization methods we should support? Since energy function is nonlinear, what we actually encount is nonlinear optimization problem. (a) Direct search method only need to know the energy (b) Steepest Descents and Conjugate Gradient Methods need to know the gradient, as well as energy (c) Newton family methods need to know hessian as well as gradient and energy. However, some methods of newton family use gradient to approximate the hessian. If we do that, we had better seperate the algorithm and the model. (2)Do we need to support bondconstraint during minimization? Optimization problem can also divide into two categories, one is unconstrained, the other is constrained. Constrained problem is not more complicate than unconstrained one. I don't think we need to support constrained one. I don't think we need to develope the constrained optimization algorithm. However, it will be nice to support bondconstraint during minimization by using unconstrained optimization algorithm. As far as I know, only a few packages support this feature using SHAKE algorithm. CHARMM is one of them. An unpublished package named OOPSE developed in our group can also do that. Unfortunately, SHAKE algorithm can only combined with Steepest Descents and Conjugate Gradient Methods. (3)Do we need to support minimization of rigidbody? As far as I know, only tinker and XPLOR support it.If we do, the gradient will include force and torque. As to Molecular Dynamics, I have to admit the design of molecular dynamics is not that easy. (1) Which kind of integrator we should provide? Verlet, Velocity Verlet, LeapFrog or even Multiple Time Step integrator (RESPA)? Or we may just pick one, let's say velocity verlet. (2)which kind of ensemble we should provide? NVE, NVT, NPT, NHT, etc. (3)Do we need to support minimization of rigidbody? Once again, if we do, the angular velocity etc will come into play. Below webpage is about Molecular Dynamics Simulation Packages http://mrflip.com/resources/MDPackages.html I have to go now, talk to your guys later. teng 
From: Peter MurrayRust <pm286@ca...>  20040319 09:02:14

At 18:12 18/03/2004 0500, Teng Lin wrote: Thanks very much for your offer. Some general comments Firstly your architecture should be as modular as possible. Ideally it consists of at least:  job control/workflow  users parameterization  atom type management  i/o (ideally this should be XML and we have made good progress with representing FF in CMLComp).  minimization algorithms  forcefield functional forms (including derivatives) In principle is should be possible to choose forcefield and algorithm on the fly. However this is a lot of work. Next, why does CDK need this? Not to compete with GROMACS, CHARMM etc. These programs have involved hundreds of years' work if not thousands. IMO CDK needs it for quick and dirty optimisation of crude structures (and that will be very valuable). But if/when you want greater quality it will be preferable to interface to existing codes. >Hi folks, > I have some questions before settling down the design. I need some > suggestions from your guys. > In my own opinion, Molecular Dynamics and Energy Minimization Module > should only focus on the mechanics itself. We propose to use a > ForceFieldManager to handle the different forcefileds such as amber and > CHARMM. Since these force field use X or C* something like that to > represent the atom type, joergy's suggestion sounds good. And we could > have a model which setup everything and expose neccesary interface to > MD/EM module. > >As to Energy Minimization, > >(1)Which kind of minimization methods we should support? > Since energy function is nonlinear, what we actually encount is > nonlinear optimization problem. > > (a) Direct search method only need to know the energy > (b) Steepest Descents and Conjugate Gradient Methods need to know the > gradient, as well as energy > (c) Newton family methods need to know hessian as well as gradient and > energy. However, some methods of newton family use gradient to > approximate the hessian. I think it depends on whether you already have the first and or second derivatives in the code. Creating and coding analytical 2nd derivs is a lot of work. However if you already have them then I would use them. Here again it depends on what you want to do. Conjugate gradient and BFGS are widely used. Simple linear search will explode on bad structures. > If we do that, we had better seperate the algorithm and the model. Yes >(2)Do we need to support bondconstraint during minimization? > Optimization problem can also divide into two categories, one is > unconstrained, the other is constrained. Constrained problem is not more > complicate than unconstrained one. I don't think we need to support > constrained one. I don't think we need to develope the constrained > optimization algorithm. However, it will be nice to support > bondconstraint during minimization by using unconstrained optimization > algorithm. As far as I know, only a few packages support this feature > using SHAKE algorithm. CHARMM is one of them. An unpublished package > named OOPSE developed in our group can also do that. Unfortunately, SHAKE > algorithm can only combined with Steepest Descents and Conjugate Gradient > Methods. > >(3)Do we need to support minimization of rigidbody? > As far as I know, only tinker and XPLOR support it.If we do, the > gradient will include force and torque. IMO not at the start. The user will not want to select the rigid groups and you will not find it easy to identify them automatically. >As to Molecular Dynamics, I have to admit the design of molecular dynamics >is not that easy. > >(1) Which kind of integrator we should provide? > Verlet, Velocity Verlet, LeapFrog or even Multiple Time Step integrator > (RESPA)? > Or we may just pick one, let's say velocity verlet. >(2)which kind of ensemble we should provide? > NVE, NVT, NPT, NHT, etc. What do you want dynamics for? For serious science I suggest using a link to GROMACS. For animations or optimisation I don't think the ensemble will matter. >(3)Do we need to support minimization of rigidbody? > Once again, if we do, the angular velocity etc will come into play. > >Below webpage is about Molecular Dynamics Simulation Packages >http://mrflip.com/resources/MDPackages.html > >I have to go now, talk to your guys later. In haste P. Peter MurrayRust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +441223763069 
From: Christoph Steinbeck <c.steinbeck@un...>  20040321 13:03:11

> Next, why does CDK need this? Not to compete with GROMACS, CHARMM etc. Absolutely not. For me the major driving force is a) to learn how that Force Fields/MD work in detail b) to give CDK a module for generating reasonable 3D geometries quickly With respect to a) I have learned during the recent student project here at CUBIC how many pretty basic things about Force Fields I do *not* know :) Cheers, Chris  Dr. rer. nat. habil. Christoph Steinbeck (c.steinbeck@...) Groupleader Junior Research Group for Applied Bioinformatics Cologne University BioInformatics Center (http://www.cubic.unikoeln.de) Zülpicher Str. 47, 50674 Cologne Tel: +49(0)2214707426 Fax: +49 (0) 2214707786 What is man but that lofty spirit  that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. 
From: Peter MurrayRust <pm286@ca...>  20040322 09:05:59

At 14:01 21/03/2004 +0100, Christoph Steinbeck wrote: >>Next, why does CDK need this? Not to compete with GROMACS, CHARMM etc. > >Absolutely not. >For me the major driving force is > >a) to learn how that Force Fields/MD work in detail Good point! >b) to give CDK a module for generating reasonable 3D geometries quickly Agreed. I think the main emphasis would be:  organic compounds (even pblock  SF4  is nontrivial)  starting with very poor geometry (perhaps even "flat"  i.e. 2D) The sort of think I would like is:  enter a connection table or SMILES  CDK generates 2D coordinates  CDK identifies nonplanar centres and adds chiral constraints  CDK identifies constrained planar double bonds  CDK sets bond radii as sum of covalent radii. If bond order is know2, make Pauling correction (trivial)  at this stage we don;t care about precise force constants. just that atoms don't bump  set bond, angle constraints hard; neglect torsion  minimise. This geometry will be poor but is a starting point for better methods. It can be displayed, *with warnings*  send result to (cheap) standalone method (MM2, MOPAC, whatever else you have) there is a big need in openSource to bridge the 2D3D gap and I think this should be the main effort. Many "drawing" programs have some sort of functionality here. Sometimes it is OK, sometimes hideous. We should do the same. P. >With respect to a) I have learned during the recent student project here >at CUBIC how many pretty basic things about Force Fields I do *not* know :) Even building an interface to an existing prog is nontrivial Peter MurrayRust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +441223763069 
From: Joerg K. Wegner <wegnerj@in...>  20040322 09:36:44

Hi, taking a change to mention my hopefully objective arguments ! >> b) to give CDK a module for generating reasonable 3D geometries quickly Mmmh, you should definitley use wellknown approaches for this task: Use fast pseudopotential and torsion libraries: 1. Short summary and references: http://citeseer.ist.psu.edu/finn99computational.html Computational Appraoches to Drug Design 2. Complete summary (german only), efficient data structure, search trees, steric overlaps, ...: @PHDTHESIS{sch01, author = {C. Schwab}, title = {{K}onformative {F}lexibilit{\"a}t von {L}iganden im {W}irkstoffdesign}, school = {Erlangen}, year = {2001} } 3. algorithms in drug design (german only): http://wwwra.informatik.unituebingen.de/lehre/ws03/wirkstoffdesign_ausarbeitung/Christoph_Wilke_Docking.pdf http://wwwra.informatik.unituebingen.de/lehre/ss03/pro_wirkstoffdesign_ausarbeitung/MarcLohrer.pdf http://wwwra.informatik.unituebingen.de/lehre/ss02/pro_wirkstoffdesign_ausarbeitung/jens_joachim.pdf http://wwwra.informatik.unituebingen.de/lehre/ss01/pro_wirkstoffdesign/ingste.pdf If you want to be competitive, i recommend strongly a SMARTS based torsion angle library, as already mentioned. So, use as much available open source code as possible and try to plan an approach with has non short term deadends, e.g. avoid hard coded atom type assignments, like the stalled biomer project. So are you planning to use JOELib and the still unfinished rotamer generation from there ? Or are you planning to invent your own LGPLed wheel ? If yes, start additionally with a SMARTS algorithm implementation and use, until not finished, any other library. A design recommendation: From the medicinal chemistry standpoint of view global structure optimization (vacuum) is not that important, so docking or MD would be much more interesting. Kind regards, Joerg  Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 EMail: mailto:wegnerj@... WWW: http://wwwra.informatik.unituebingen.de  Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) 
From: Peter MurrayRust <pm286@ca...>  20040322 13:31:39

At 10:37 22/03/2004 +0100, Joerg K. Wegner wrote: >Hi, > >taking a change to mention my hopefully objective arguments ! > >>>b) to give CDK a module for generating reasonable 3D geometries quickly >Mmmh, you should definitley use wellknown approaches for this task: Of course  and I am not always uptodate with literature :) >Use fast pseudopotential and torsion libraries: >1. Short summary and references: >http://citeseer.ist.psu.edu/finn99computational.html >Computational Appraoches to Drug Design I definitely favour a librarydriven approach. The question is whether the resources you quote are open. For this we need crystal structures (or good calculations). We have 250000 MOPAC calculations and, with the arrival of crystal ePrints at Southampton we may well have an open source of crystals. Then we have to build the library  I have been proofing this and can share it, but if anyone has made more progress? Essentially the idea is to create a large library of fragments, including variance >2. Complete summary (german only), efficient data structure, search trees, >steric overlaps, ...: >@PHDTHESIS{sch01, > author = {C. Schwab}, > title = {{K}onformative {F}lexibilit{\"a}t von {L}iganden im > {W}irkstoffdesign}, > school = {Erlangen}, > year = {2001} >} > >3. algorithms in drug design (german only): >http://wwwra.informatik.unituebingen.de/lehre/ws03/wirkstoffdesign_ausarbeitung/Christoph_Wilke_Docking.pdf >http://wwwra.informatik.unituebingen.de/lehre/ss03/pro_wirkstoffdesign_ausarbeitung/MarcLohrer.pdf >http://wwwra.informatik.unituebingen.de/lehre/ss02/pro_wirkstoffdesign_ausarbeitung/jens_joachim.pdf >http://wwwra.informatik.unituebingen.de/lehre/ss01/pro_wirkstoffdesign/ingste.pdf > >If you want to be competitive, i recommend strongly a SMARTS based torsion >angle library, as already mentioned. This seems a good idea if the library is available Best P. Peter MurrayRust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +441223763069 
From: Joerg K. Wegner <wegnerj@in...>  20040322 14:20:31

Hi, > I definitely favour a librarydriven approach. > The question is whether the resources you quote are open. For this we > need crystal structures (or good calculations). We have 250000 MOPAC > calculations and, with the arrival of crystal ePrints at Southampton we > may well have an open source of crystals. Then we have to build the > library  I have been proofing this and can share it, but if anyone has > made more progress? Essentially the idea is to create a large library of > fragments, including variance 1. Own library. Excellent ! By using of openbabel/joelib internal atom type you can easily create such a library ! I think 250000 are more than enough ! 2. Published one. If i remind this correctly a small Sybyl atom type based library was published in: @ARTICLE{km94, author = {G. Klebe and T. Mietzner}, title = {{A} fast and efficient method to generate biologically relevant conformations}, journal = {J. Comput.Aid. Mol. Des.}, year = {1994}, volume = {8}, } 3.1. A library based approach is of course the fastest using the pseudopotential. 3.2. Another approach is to create 'usefull' rotamers, based on torsion rotation increments and SMARTS patterns, then this can be the input into a MM method. For larger search spaces you can then apply an evolutionary algorithm. Out group has a Java based library (we are still using it heavily!) for such tasks. It will be going online for the public community in the next months. http://wwwra.informatik.unituebingen.de/forschung/javaeva/welcome_e.html Kind regards, Joerg  Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 EMail: mailto:wegnerj@... WWW: http://wwwra.informatik.unituebingen.de  Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) 
From: Peter MurrayRust <pm286@ca...>  20040323 08:24:05

At 15:21 22/03/2004 +0100, Joerg K. Wegner wrote: >Hi, > >>I definitely favour a librarydriven approach. >>The question is whether the resources you quote are open. For this we >>need crystal structures (or good calculations). We have 250000 MOPAC >>calculations and, with the arrival of crystal ePrints at Southampton we >>may well have an open source of crystals. Then we have to build the >>library  I have been proofing this and can share it, but if anyone has >>made more progress? Essentially the idea is to create a large library of >>fragments, including variance >1. Own library. Excellent ! By using of openbabel/joelib internal atom >type you can easily create such a library ! I think 250000 are more than >enough ! Please let YY know how you would like these structures. We have them in CML. All atoms including H are present. And the molecular charge. The correspondence with the original NCI number is sometimes broken so please use them simply as high quality structures. >2. Published one. If i remind this correctly a small Sybyl atom type based >library was published in: >@ARTICLE{km94, > author = {G. Klebe and T. Mietzner}, > title = {{A} fast and efficient method to generate biologically > relevant conformations}, > journal = {J. Comput.Aid. Mol. Des.}, > year = {1994}, > volume = {8}, >} > >3.1. A library based approach is of course the fastest using the >pseudopotential. Have you done this? >3.2. Another approach is to create 'usefull' rotamers, based on torsion >rotation increments and SMARTS patterns, then this can be the input into a >MM method. >For larger search spaces you can then apply an evolutionary algorithm. Out >group has a Java based library (we are still using it heavily!) for such >tasks. It will be going online for the public community in the next months. >http://wwwra.informatik.unituebingen.de/forschung/javaeva/welcome_e.html Best. P. >Kind regards, Joerg > > >Dipl. Chem. Joerg K. Wegner >Center of Bioinformatics Tuebingen (ZBIT) >Department of Computer Architecture >Univ. Tuebingen, Sand 1, D72076 Tuebingen, Germany >Phone: (+49/0) 7071 29 78970 >Fax: (+49/0) 7071 29 5091 >EMail: mailto:wegnerj@... >WWW: http://wwwra.informatik.unituebingen.de > >Never mistake motion for action. > (E. Hemingway) > >Never mistake action for meaningful action. > (Hugo Kubinyi,2004) Peter MurrayRust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +441223763069 
From: Joerg K. Wegner <wegnerj@in...>  20040323 11:31:47

Hi, >> 1. Own library. Excellent ! By using of openbabel/joelib internal atom >> type you can easily create such a library ! I think 250000 are more >> than enough ! > Please let YY know how you would like these structures. We have them in > CML. All atoms including H are present. And the molecular charge. The > correspondence with the original NCI number is sometimes broken so > please use them simply as high quality structures. If i'm reminding this correctly a statistic, or more exactly a occurence histogram is needed for all a1a2a3a4 torsion angles. From this histogram we can calculate a pseudopotential (please look a the references in the previous mail). So we need something like: 1. atomType1to4="a1 a2 a3 a4" (OpenBabel/JOELib internal are more recommended than Sybyl ones, because they are less descriptive). Full iteration required over all pairs (no inverse), can be easily created by using unique SMARTS matching of these types (use bond connection?: any or single/double,triple,ring, aromatic ? Must be looked up from the references !!! I'm not sure about this) taken from their definition in types.txt. 2. torsion angle 3. occurence of torsion angle for every molecule and then the full statistic over the complete data set. >> 2. Published one. If i remind this correctly a small Sybyl atom type >> based library was published in: >> @ARTICLE{km94, >> author = {G. Klebe and T. Mietzner}, >> title = {{A} fast and efficient method to generate biologically >> relevant conformations}, >> journal = {J. Comput.Aid. Mol. Des.}, >> year = {1994}, >> volume = {8}, >> } >> 3.1. A library based approach is of course the fastest using the >> pseudopotential. > Have you done this? No that's out of my actual focus (data mining), but we are now technically able to do this. The required Morgan algorithm for detecting the central bond of the molecule is available, but at all this is not an easy task. There are a lot of heuristics required, which must all being implemented. We need a full time student or somebody which has time to understand and implement the details. I've already 5 students, which are working for me (MCS, pharmacophores, data mining) and i can not advise/attend more reasonably. A 'slow' first shot is implemented in a shorttime period, but the details which makes this appraoch really fast takes a while. Remind that two or three Ph.D. students in the group of Prof. Gasteiger were needed to implement Corina ! Which is, of course, the leading software program in this area for such a task. So a EA approach is nearer in my focus, because my colleagues are developing these algorithms and i can 'simply' use them. Best, J.  Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 EMail: mailto:wegnerj@... WWW: http://wwwra.informatik.unituebingen.de  Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) 