You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(3) |
Oct
(1) |
Nov
|
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(10) |
Mar
|
Apr
(2) |
May
(4) |
Jun
(1) |
Jul
(1) |
Aug
(13) |
Sep
(1) |
Oct
|
Nov
(4) |
Dec
|
2004 |
Jan
(5) |
Feb
(9) |
Mar
(13) |
Apr
(25) |
May
(10) |
Jun
(21) |
Jul
(13) |
Aug
(8) |
Sep
(6) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2005 |
Jan
(9) |
Feb
(15) |
Mar
(8) |
Apr
(8) |
May
(3) |
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(2) |
Feb
(2) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(5) |
Oct
|
Nov
|
Dec
|
2007 |
Jan
(1) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Tommi H. <tha...@me...> - 2004-02-24 11:15:38
|
On Mon, 23 Feb 2004, Joerg K. Wegner wrote: > 1. JOELib has no structure editor functionality, so it is not possible > to use it directly for such tasks. > BUT if you will publish the HTTP adress with command line for the CGI > script i would be really interested. I hope it will be possible to > install such a service also for other servers (e.g. our APACHE server). The cgi program will be made freely available, so you could setup a local service.=A0In fact I have no possibilities or plans to set up a wider service here. > 2. Technical comment: What's about JSP instead of CGI (i'm not familiar > with, but seems to be state-of-the-art) ? > Egon: Are you familiar with JSP ? The server program is C++ so that's why cgi fits well. Regards, =09Tommi |
From: Joerg K. W. <we...@in...> - 2004-02-23 14:04:40
|
Hi Tommi, > Some time ago I got interested about "Jmol" molecular viewer/editor, and I > got an idea about an alternative way how to connect Jmol and libghemical ; > by creating a CGI "web service" to calculate energies, do geometry > optimizations etc. The plan is to make a Jmol plugin (which should work > for other CDK-based programs as well?) that sends requests to this service > and reads the results back. I does not work yet, but if you are interested > you should look jmol-developers archives for the last week, there is some > more about it. Sounds interesting. Egon said also that we can/should establish an interface between JOELib and JMol. Unfortunately i've less time for such things although i would be REALLY interested. Nonetheless i would be really interested in such a service, too. And i surely know some people which would be interested, too ! Furhtermore there exists already some services: http://www.dkfz-heidelberg.de/spec/mgms/web_award/beitraege2003.php 1. JOELib has no structure editor functionality, so it is not possible to use it directly for such tasks. BUT if you will publish the HTTP adress with command line for the CGI script i would be really interested. I hope it will be possible to install such a service also for other servers (e.g. our APACHE server). 2. Technical comment: What's about JSP instead of CGI (i'm not familiar with, but seems to be state-of-the-art) ? Egon: Are you familiar with JSP ? Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: E.L. W. <eg...@us...> - 2004-02-22 02:14:34
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 19 February 2004 12:17, Joerg K. Wegner wrote: > sorry EGON ! I was focused on the technical question. > Would be great !!! I've also some time ago also used the commercial > Marvin as interface for testing purpose, but never really needed this > functionality, because i used other tools. > > If you have an actual JMol package, which uses the JOELib import/export > from GUI i will be glad to add this as optional package to the file > downloads, additional to Ghemical, Weka and the Software design libraries. > http://sourceforge.net/project/showfiles.php?group_id=3D39708 (Jmol with a lower case m... :) What I would love to see, is a JOELib plugin for Jmol... I'm not sure what = the=20 main funtion of JOELib is (descriptor calculation, I guess...), but it is=20 relatively easy to put that into a Jmol plugin... but, second uncertainty..= =2E=20 I'm not sure what the best way would be to interact with JOELib...=20 In other words, what would be most interesting for Jmol to depict? Anyway, I'm seeing a plugin that would calculate the JOELib descriptors for= =20 the shown structure... more ideas? Have a look at=20 http://cdk.sf.net/plugins.html to see what the plugins are about... > If such things are already available i can add a short description to > the XML DocBook tutorial or you can add it and your name as author ... > as you like. In fact im using SGML not XML, ... > > But this does not change my time priorities, so i would be really happy > to add any interfaces, but i'm not able to maintain they actively, > because my actual focus lies on our interal JCompChem cheminformatics > library with data mining and maximum common substructure search > algoritms.=20 Sure. I have your two recent articles on my desk, but have not found time y= et=20 to read them... BTW, CDK already provides MCSS code... why not use that? It has the best=20 algorithm available at this moment... unless you're interesting in finding = a=20 better one... > These things are really alpha, because the packages are > refactored yet, because we have found some blind alleys, which restricts > further algorithm development, so ... in progress. Eventually these > things will be publicly available in the future, too ... as part of > JOELib ... and after (hopefully) publishing some nice combinations ... > > Regards, Joerg > > P.S.: This reminds me to update the CDK installation instruction ... and > add some more instructions to the tutorial ... my actual deprecated CDK > version has some problems in 2D layout ... Yes, please do keep up... where gaining momentum every month...=20 Egon =2D --=20 eg...@sc... PhD on Molecular Representation in Chemometrics Nijmegen University http://www.cac.sci.kun.nl/people/egonw/ GPG: 1024D/D6336BA6 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (SunOS) iD8DBQFANJ3zd9R8I9Yza6YRAqljAJwLuwh44ehn/Ew6PWNMYU3m7IPXawCaAyDY xNHnzDaLo7nWwVYbcDoqi1Y=3D =3D/HaH =2D----END PGP SIGNATURE----- |
From: E.L. W. <eg...@sc...> - 2004-02-21 00:55:40
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 19 February 2004 10:12, Joerg K. Wegner wrote: > > 1) thank you for your immediate response. I can wait till I can see the > > atom symbols in 3D (i feel it is very much required for chemist to > > understand the molecule in 3D view) > > I will try the 2D technique suggested by you. > > I agree, but it the priority is still very low for me :-) Hi karthikeyan, Jmol (jmol.sf.net) is an excellent 3D viewer which can label atoms by eleme= nt=20 and by number ... it's not based on Java3D and has excellent performance. I= t=20 shares at least the CML formats with JOELib, so it should interoperate=20 without much trouble.... Joerg, have you considered distributing Jmol with JOELib? It's not based on= =20 Java3D (a plus or minus, does not really matter), but is still actively=20 developed... Egon =2D --=20 eg...@sc... PhD on Molecular Representation in Chemometrics Nijmegen University http://www.cac.sci.kun.nl/people/egonw/ GPG: 1024D/D6336BA6 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (SunOS) iD8DBQFANIfsd9R8I9Yza6YRAraNAJ9xOo/cuB4vQOTiBj3h+Zp0Ydd8bgCguaEG aVqzT18LxiE2FWxRQ/E136M=3D =3D21oG =2D----END PGP SIGNATURE----- |
From: Joerg K. W. <we...@in...> - 2004-02-19 12:09:50
|
Hi Egon, first: if the e-mail of C. Steinbeck is not correct, please feel free to forward this message. Eventually he could be interested in such things also. I've read his actual JCICS-CASE paper. > (Jmol with a lower case m... :) Jmol :-) > In other words, what would be most interesting for Jmol to depict? Import&Export and descriptors, as you surely know i use the OELib kernel, also OpenBebl kernel, so we can easily add all supported types of them, if anyone can find to port the C++ to Java, which is VERY easy ... but costs still time ... the important point are the atom types, and these are available. > Anyway, I'm seeing a plugin that would calculate the JOELib descriptors for > the shown structure... more ideas? Have a look at > http://cdk.sf.net/plugins.html to see what the plugins are about... I'll have a short look ... and again ... >>But this does not change my time priorities, so i would be really happy >>to add any interfaces, but i'm not able to maintain they actively, >>because my actual focus lies on our interal JCompChem cheminformatics >>library with data mining and maximum common substructure search >>algoritms. > Sure. I have your two recent articles on my desk, but have not found time yet > to read them... Machine learning and algorithm stuff ... > BTW, CDK already provides MCSS code... why not use that? It has the best > algorithm available at this moment... unless you're interesting in finding a > better one... I'm not interested in finding a better one ... i'm interested in using them with a general defined approach, so that we can adapt the algorithm to our cheminformatics requirements. The CDK implementation uses, if i'm understanding this correctly the association graph method, so i use the same association matrix method with a generalized atomType assignment, which contains also the ESTATE or my improved CESTATE one ... which works already great using some Hashing ... The association graph is not the problem, but the missing atom types and atom properties (descriptors) in CDK. The relevant clique detection algorithm in my implementation is just an interface and can be replaced by any other implementation ... depends on your needs, because fast implementations uses, of course, heuristic approaches ! BTW, the original reference for this kind of MCS was 1976 ! The performance depends on the used Clique detection algorithm, references can be also found in my papers, because i use already two implementations, ...there exists a lot of them ... but i do not believe that it is possible to improve the performance easily, because there was a lot of work already done by graph experts. We are working on a multiple interpretation and reimplementing three other literature approaches, to be able for multiple MCS which is much more interesting if you are interested in finding a pharmacophore based description based on ligands ... so this is in progress ... or if you like some CASE relevant analysing stuff ... at the moment we have reached the pre-alpha, so we are testing, testing and testing ... to find suitable combinations and parameters ... Regards, Joerg > > >>These things are really alpha, because the packages are >>refactored yet, because we have found some blind alleys, which restricts >>further algorithm development, so ... in progress. Eventually these >>things will be publicly available in the future, too ... as part of >>JOELib ... and after (hopefully) publishing some nice combinations ... >> >>Regards, Joerg >> >>P.S.: This reminds me to update the CDK installation instruction ... and >>add some more instructions to the tutorial ... my actual deprecated CDK >>version has some problems in 2D layout ... > > > Yes, please do keep up... where gaining momentum every month... > > Egon > > - -- > eg...@sc... > PhD on Molecular Representation in Chemometrics > Nijmegen University > http://www.cac.sci.kun.nl/people/egonw/ > GPG: 1024D/D6336BA6 > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.0.7 (SunOS) > > iD8DBQFANJ3zd9R8I9Yza6YRAqljAJwLuwh44ehn/Ew6PWNMYU3m7IPXawCaAyDY > xNHnzDaLo7nWwVYbcDoqi1Y= > =/HaH > -----END PGP SIGNATURE----- > > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2004-02-19 11:23:00
|
Hi, sorry EGON ! I was focused on the technical question. Would be great !!! I've also some time ago also used the commercial Marvin as interface for testing purpose, but never really needed this functionality, because i used other tools. If you have an actual JMol package, which uses the JOELib import/export from GUI i will be glad to add this as optional package to the file downloads, additional to Ghemical, Weka and the Software design libraries. http://sourceforge.net/project/showfiles.php?group_id=39708 If such things are already available i can add a short description to the XML DocBook tutorial or you can add it and your name as author ... as you like. In fact im using SGML not XML, ... But this does not change my time priorities, so i would be really happy to add any interfaces, but i'm not able to maintain they actively, because my actual focus lies on our interal JCompChem cheminformatics library with data mining and maximum common substructure search algoritms. These things are really alpha, because the packages are refactored yet, because we have found some blind alleys, which restricts further algorithm development, so ... in progress. Eventually these things will be publicly available in the future, too ... as part of JOELib ... and after (hopefully) publishing some nice combinations ... Regards, Joerg P.S.: This reminds me to update the CDK installation instruction ... and add some more instructions to the tutorial ... my actual deprecated CDK version has some problems in 2D layout ... E.L. Willighagen wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Thursday 19 February 2004 10:12, Joerg K. Wegner wrote: > >>>1) thank you for your immediate response. I can wait till I can see the >>>atom symbols in 3D (i feel it is very much required for chemist to >>>understand the molecule in 3D view) >>>I will try the 2D technique suggested by you. >> >>I agree, but it the priority is still very low for me :-) > > > Hi karthikeyan, > > Jmol (jmol.sf.net) is an excellent 3D viewer which can label atoms by element > and by number ... it's not based on Java3D and has excellent performance. It > shares at least the CML formats with JOELib, so it should interoperate > without much trouble.... > > Joerg, have you considered distributing Jmol with JOELib? It's not based on > Java3D (a plus or minus, does not really matter), but is still actively > developed... > > Egon > > - -- > eg...@sc... > PhD on Molecular Representation in Chemometrics > Nijmegen University > http://www.cac.sci.kun.nl/people/egonw/ > GPG: 1024D/D6336BA6 > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.0.7 (SunOS) > > iD8DBQFANIfsd9R8I9Yza6YRAraNAJ9xOo/cuB4vQOTiBj3h+Zp0Ydd8bgCguaEG > aVqzT18LxiE2FWxRQ/E136M= > =21oG > -----END PGP SIGNATURE----- > > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2004-02-19 09:17:35
|
Hi karthikeyan, > 1) thank you for your immediate response. I can wait till I can see the > atom symbols in 3D (i feel it is very much required for chemist to understand the > molecule in 3D view) > I will try the 2D technique suggested by you. I agree, but it the priority is still very low for me :-) > 2) Second query: (pl.) > I have problem in reading Molconnz (.s) file in Joelib. I get an error: > Molecule entry (#1) skipped: .. io.MoleculeIOexception: Line 14(1) > should contain 7 > descriptor not 11: 2.828427 2.00000 1.4142 1.000 0.0000 etc., > I am using the molconnz output directly This depends on your MolConnZ version ! The supported format is defined in joelib\src\joelib\data\plain\molconnz350.txt and the first lines should contain: id nvx nrings ncirc nelem fw aname ... so if you are using any other version you must supply your file definition, e.g. molconnz400.txt and define these descriptors in joelib\src\joelib\data\plain\knownResults.txt where you must define if your descriptors are double, integer, boolean, or whatever. If you do not define these things JOELib can still of course load the entries and select them, but always as string ! So if you plan to normalize your data or filters, or something else i recommend the definition in knownResults.txt. > or should I convert into some format before submitting to joelib? Is possible also, e.g. SDF which is my preffered format, but still not the best one, because you must still define the descriptors in knownResults.txt. The CML in JOELib is the most verbose format, because the descriptors obtains already a format (after defined in knownResults.txt )! Then they can be loaded without any definitions ! Furthermore the CML reader/writer is consistent in JOELib, but eventually not up-to-date with the Murray-Rust CML2 implementation in his Java library and OpenBabel, because he develops a huge amount of code i'm not able to follow so fast. And i'm not sure about their descriptor abilities, because OpenBabel has no descriptor storing facility. Please correct me anybody if this is not true. The conversion of XML files should be not to difficult, so ... > The final objective is: to read all molconnz descriptor and 'optionally' > write in a clean format (col/row) > > mol1 d1 d2 d3 d4.. > mol2 d1 d2 d3 d4.. SMILES/Flat file format is supported also in sh convertSkip.sh where the flat file format should be defined in a separate file format.txt with mol1-ID d1 d2 d3 d4 or in joelib.properties for the SMILES, but i think i've added a command line switch also!!! > 3) finally... regarding compiling using ant, the output is directed to > build directory > and how to run from main *.bat files if the output is in build > directory? > as a shortcut I copied all the *.bat files to build directory.. and it > is working ok As already discussed i do not like the bat-files and they are not really supported and up-to-date. If possible in any way, i recommend cygwin !!! A unix shell for windows, so you can use the shell scripts !!! Of course the bat files will work if all required libraries are added to the classpath (classes in build-directory and all lib/*jar files), but that's a boring work and changes often ... The shell scripts or ant will resolve the dependencies automatically ! Regards, Joerg > > regards > > > > -- > M. Karthikeyan, Ph.D., Scientist > _| _| _|_|_| _| > _|_| _| _| _| > _| _| _| _| _| > _| _|_| _| _| > _| _| o _|_|_|o _|_|_|_| > National Chemical Laboratory > Pune - 411 008, INDIA > Ph: +91-(0)20-5893 457 FAX: 5893 973 > http://www.ncl-india.org/ > > > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2004-02-13 11:12:14
|
Hi all, if you are interested in QSAR, here we go. All mentioned descriptor calculation methods are part of the actual JOELib distribution. So decide on your own if you like them. Part I - Data preparation and feature selection: http://dx.doi.org/10.1021/ci0342324 Part II - Human Intestinal Absorption: http://dx.doi.org/10.1021/ci034233w Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2004-02-10 12:24:36
|
Hi all, The tutorial was updated. PDF version increases from 77 to 90 pages :-) The missing HTML descriptorions for the new descriptors where added to CVS and are availabale direct from CVS or from the next release (in 2-3 weeks). Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2004-01-16 13:44:37
|
Hi all, i've published a huge amount of descriptor calculation methods (at all 78 classes). This enables you to calculate, e.g. up to 3500 descriptors. The definitions for the CESTATE can be found in the publications using JOELib: http://www-ra.informatik.uni-tuebingen.de/software/joelib/users.html If you are interested there is now also the AtomPair descriptor available ! Please let me know, if you will use it, because a detailed analysis can eventually be used to publish a paper. So, be a nice scientist and let me know !!! Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2004-01-12 16:33:47
|
Hi Christoph, wrong ... mmhh ... depends on the standpoint of the observer. You surely know, and this is still an 'small' actual problem for JOELib and OpenBabel and a 'big' problem for all NOT-OPENSOURCE programs. Which means: The descriptor calculation process depends on four different expert systems (aromaticity, hybridization, implicite valence and finally atom type). See: http://www-ra.informatik.uni-tuebingen.de/software/joelib/tutorial/atomtyper.html and actual discussions: http://sourceforge.net/mailarchive/message.php?msg_id=6905792 http://sourceforge.net/mailarchive/forum.php?thread_id=3732947&forum_id=3042 Mostly all ALL descriptors depends on the assigned atom type, so the result for each descriptors will change for each other program, because most of them have their own atomTyper. JOELib and OpenBabel uses the same and are OpenSource, but there is still room for improvement to be more general. So the group contribution definitions are joelib\src\joelib\data\plain\LogP.contributions joelib\src\joelib\data\plain\MR.contributions joelib\src\joelib\data\plain\PSA.contributions If you use hydrogens from other programs, they will be used to assign the SMART patterns, if you remove the hydrogens, JOELib/OpenBabel will calculate the implicite hydrogen count on its own, which is, taking my descriptor calculation experience into account, really good. For the definition of the SMARTS group parts see original literature reference for these definitions or ask Stephen Jelfs (Gillet/Willet group) for more details, because he has implemented this algorithm. s....@sh... Furthermore, you surely know that the LogP is not really good, i've once ago published a paper about LogP/LogS prediction and this LogP was one of the worst, although my models were overfitted (can be seen in actual accepted papers, available in 2-3 weeks). See publication section: http://www-ra.informatik.uni-tuebingen.de/mitarb/wegner So yes, i recommend to remove all hydrogens, until no general public definition FOR ALL PROGRAMS is available ... see mailing list discussion above. Often people believe that descriptors are program independent. Especially taking into account that descriptors depends on a complex atom assignment process this IS JUST WRONG !!! I complain this in my two actual papers and this is actually one of the big problems. So i believe, that nearly all descriptor calculation programs produce different results (most use their own atom typer), even if the descriptor calculation algorithm is exactly the same, which is also not always true. Regards, Joerg > I have a question regarding the descriptors LogP and MolarRefractivity: > > If I calculate these descriptors for the molecule Oc1ccccc1OC (which is also > used > in the class GroupContributionTest) using a 3D-structure (see file "tst.sdf" > attached at the end of the mail) > with the command: > > convert.sh +d tst.sdf > > I get the following wrong results: > > >> <MolarRefractivity> > > 4.3751E1 > >> <LogP> > > 2.2822 > > > But if I delete all H-Atoms by using: > convert.sh -h +d tst.sdf > > I get the following results, > which are correct according to the information in GroupContributionTest: > > >> <MolarRefractivity> > > 3.46588E1 > >> <LogP> > > 1.4007999999999998 > > My question is: Why do I get the wrong results, if I do not delete the H-Atoms? > Do I am on the save side, if I always delete H-Atoms when using a > group contribution method?? > (For the "PolarSurfaceArea" it seems, that it doesn't matter if I use "-h" or > not.....) > > (By the way, I've used version 2003-08-04) > > All the best and thank you very much in advance, > Christoph Niederalt > _________________________________________ > Bayer Technology Services GmbH > PT-AS-CS > Leverkusen, K 9 > Tel.: +49 (0)214 30 75414 > Fax: +49 (0)214 30 64801 > E-Mail: chr...@ba... > Internet : http://www.bayertechnology.com > > tst.sdf: > ===snip=== > Model1 > Cerius2 01120415373D 1 1.00000 > Structure written by MSI Cerius2 SD Exporter > 17 17 0 0 0 0 0 0 0 0999 V2000 > 1.0524 -1.7053 -0.0293 C 0 0 0 0 0 0 > 2.1287 -1.6085 -0.0417 H 0 0 0 0 0 0 > 0.2023 -0.5743 0.0099 C 0 0 0 0 0 0 > 0.6723 0.7567 0.0358 O 0 0 0 0 0 0 > -1.1895 -0.7709 0.0242 C 0 0 0 0 0 0 > -2.0332 0.3430 0.0629 O 0 0 0 0 0 0 > -1.7189 -2.0726 -0.0001 C 0 0 0 0 0 0 > -2.7901 -2.2293 0.0108 H 0 0 0 0 0 0 > -0.8683 -3.1781 -0.0387 C 0 0 0 0 0 0 > -1.2814 -4.1784 -0.0573 H 0 0 0 0 0 0 > 0.5140 -2.9945 -0.0532 C 0 0 0 0 0 0 > 1.1714 -3.8540 -0.0831 H 0 0 0 0 0 0 > 2.0823 0.9796 0.0217 C 0 0 0 0 0 0 > -3.0188 0.1062 0.0707 H 0 0 0 0 0 0 > 2.2712 2.0723 0.0457 H 0 0 0 0 0 0 > 2.5572 0.5287 0.9194 H 0 0 0 0 0 0 > 2.5320 0.5731 -0.9095 H 0 0 0 0 0 0 > 1 2 1 0 0 0 > 1 3 1 0 0 0 > 1 11 2 0 0 0 > 3 4 1 0 0 0 > 3 5 2 0 0 0 > 4 13 1 0 0 0 > 5 6 1 0 0 0 > 5 7 1 0 0 0 > 6 14 1 0 0 0 > 7 8 1 0 0 0 > 7 9 2 0 0 0 > 9 10 1 0 0 0 > 9 11 1 0 0 0 > 11 12 1 0 0 0 > 13 15 1 0 0 0 > 13 16 1 0 0 0 > 13 17 1 0 0 0 > M END > $$$$ > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Perforce Software. > Perforce is the Fast Software Configuration Management System offering > advanced branching capabilities and atomic changes on 50+ platforms. > Free Eval! http://www.perforce.com/perforce/loadprog.html > _______________________________________________ > Joelib-devel mailing list > Joe...@li... > https://lists.sourceforge.net/lists/listinfo/joelib-devel > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: <chr...@ba...> - 2004-01-12 16:03:46
|
Hi all, I have a question regarding the descriptors LogP and MolarRefractivity: If I calculate these descriptors for the molecule Oc1ccccc1OC (which is also used in the class GroupContributionTest) using a 3D-structure (see file "tst.sdf" attached at the end of the mail) with the command: convert.sh +d tst.sdf I get the following wrong results: > <MolarRefractivity> 4.3751E1 > <LogP> 2.2822 But if I delete all H-Atoms by using: convert.sh -h +d tst.sdf I get the following results, which are correct according to the information in GroupContributionTest: > <MolarRefractivity> 3.46588E1 > <LogP> 1.4007999999999998 My question is: Why do I get the wrong results, if I do not delete the H-Atoms? Do I am on the save side, if I always delete H-Atoms when using a group contribution method?? (For the "PolarSurfaceArea" it seems, that it doesn't matter if I use "-h" or not.....) (By the way, I've used version 2003-08-04) All the best and thank you very much in advance, Christoph Niederalt _________________________________________ Bayer Technology Services GmbH PT-AS-CS Leverkusen, K 9 Tel.: +49 (0)214 30 75414 Fax: +49 (0)214 30 64801 E-Mail: chr...@ba... Internet : http://www.bayertechnology.com tst.sdf: ===snip=== Model1 Cerius2 01120415373D 1 1.00000 Structure written by MSI Cerius2 SD Exporter 17 17 0 0 0 0 0 0 0 0999 V2000 1.0524 -1.7053 -0.0293 C 0 0 0 0 0 0 2.1287 -1.6085 -0.0417 H 0 0 0 0 0 0 0.2023 -0.5743 0.0099 C 0 0 0 0 0 0 0.6723 0.7567 0.0358 O 0 0 0 0 0 0 -1.1895 -0.7709 0.0242 C 0 0 0 0 0 0 -2.0332 0.3430 0.0629 O 0 0 0 0 0 0 -1.7189 -2.0726 -0.0001 C 0 0 0 0 0 0 -2.7901 -2.2293 0.0108 H 0 0 0 0 0 0 -0.8683 -3.1781 -0.0387 C 0 0 0 0 0 0 -1.2814 -4.1784 -0.0573 H 0 0 0 0 0 0 0.5140 -2.9945 -0.0532 C 0 0 0 0 0 0 1.1714 -3.8540 -0.0831 H 0 0 0 0 0 0 2.0823 0.9796 0.0217 C 0 0 0 0 0 0 -3.0188 0.1062 0.0707 H 0 0 0 0 0 0 2.2712 2.0723 0.0457 H 0 0 0 0 0 0 2.5572 0.5287 0.9194 H 0 0 0 0 0 0 2.5320 0.5731 -0.9095 H 0 0 0 0 0 0 1 2 1 0 0 0 1 3 1 0 0 0 1 11 2 0 0 0 3 4 1 0 0 0 3 5 2 0 0 0 4 13 1 0 0 0 5 6 1 0 0 0 5 7 1 0 0 0 6 14 1 0 0 0 7 8 1 0 0 0 7 9 2 0 0 0 9 10 1 0 0 0 9 11 1 0 0 0 11 12 1 0 0 0 13 15 1 0 0 0 13 16 1 0 0 0 13 17 1 0 0 0 M END $$$$ |
From: Joerg K. W. <we...@in...> - 2004-01-07 08:39:31
|
Joerg K. Wegner wrote: > Message: 1 > Reply-To: "Chris Morley" <c.m...@ga...> > From: "Chris Morley" <c.m...@ga...> > To: <ope...@li...> > Date: Sun, 4 Jan 2004 15:09:00 -0000 > Subject: [Open Babel] Suggested modified conversion framework > > "Geoff Hutchison" wrote (some time back) > > >> So as a self-proclaimed "maintainer" of the project, I have to keep > >> some idea in the back of my head how we'll get to 2.0, 2.1, 2.2, 3.0?, > >> etc. releases. > > .... > > >> (What new formats are needed? What new features? What architecture > >> changes are needed?) > >> 3) Discussions on plans/roadmaps for things that don't go into the 2.0 > >> release. > >> > > > Here are some suggestions for longer-term mods to the conversion process > in OpenBabel. I feel that, although they may not be backward compatible, > these features would be desirable to provide flexibility and > maintainability > for the future. Most have been previously discussed. > > - A clearer separation of functions - the chemistry needs to be more > separated from the conversion. (e.g. OBMol should not be where the > input and output file formats are stored). The FileFormat class could > be beefed up to do this. > > - Each format needs to be self contained. A new format should not > require any changes in old code. (This is what abstract base classes > are for.) In a Windows system you might want to have each as a > precompiled DLL - a plugin - which would ease upgrading and allow > installation only of relevant formats. I guess something similar is > possible in Unix. Even without this feature the formats would be > dynamic - information about them and their options would be retieved > at run time. > > - The user interface for conversion would have a clear interface to the > conversion process itself to allow alternatives (GUIs etc). This would > include handling of formats (file extensions, options) in a dynamic way. > > - The input and output routines need to be more aware of the conversion > process so that they can adjust. Examples are the previously discussed > need deal on-the-fly with generated molecules during CML input, and > the need for a conditional <cml>...</cml> wrapper during output. > > - The conversion framework needs to handle more than just OBMol. > I'm sure use would be made of the facility to convert different types of > molecule, sets of molecules(conformers?), reactions, sets of reactions, > etc. > > - There should be more support for non-expert users. There is a big > activation barrier to using the program, which is not appropriate if it is > to be used to just convert the format of some files. I realise precompiled > code is not the Unix way, but Windows (and Mac?) users expect it. > It would be nice to support it on appropriate platforms, while keeping > a single version of the source code. > > I've put together some working code which implements all of these > features, described in more detail at > http://www.arcl02.dsl.pipex.com/OB/obframework.html > > It is written so that it can be deployed as separate DLLs containing: > the main chemistry (the code is almost as at present); > the conversion process (a new class); > one or more formats (can be the existing code with a small wrapper); > > A user interface exe file makes use of these DLLs. The console interface > feels much the same as at present and there is a Windows GUI interface > which is a drop-in replacement. > > Alternatively, the code can be compiled together, as at present, without > changing the source code. I hope it is platform independent except for > the GUI and the deployment of the DLLs. > > Separating all the parts so that they can be separately compiled has been a > challenge, because I wanted the conversion DLL and the user interfaces > not to use directly any of the chemistry - they do not #include mol.h. > This has meant the use of C++ a bit more adventurous than in the current > code. For instance I found it necessary to use a smart pointer from > the Boost library. This is not part of the standard language (although > pretty close). I also need to point out that I am not a C++ expert - > but it is all working ok at present. > > Using the DLLs, existing applications can add a much broader input format > compatibility while not needing to be recompiled when the OB code changes. > > To illustrate the use of a non-OBMol conversion I have added a format > for converting to and from a RXN file describing a reaction. > > The Windows interface has a novel feature that uses the text description > of the various conversion options (previously output as help in the command > line interface) to dynamically construct a set of checkboxes, etc > appropriate to the requested file format. You can try a statically-linked > compiled version of the GUI-driven framework with a few formats by just > downloading, extracting and running > http://www.arcl02.dsl.pipex.com/OB/OBGUIs.zip (407K) > It should work on any 32bit Windows system. > > It may be that making changes like this to a project that puts the emphasis > on the chemistry rather than programing is a bit over the top. Is it worth > developing a non-backward compatible framework like this any > further? > > Chris Morley > > > > > > > > > --__--__-- > > Message: 2 > Date: Sun, 04 Jan 2004 17:34:04 +0000 > To: <ope...@li...> > From: Peter Murray-Rust <pm...@ca...> > Subject: Re: [Open Babel] Suggested modified conversion framework > > At 15:09 04/01/2004 +0000, Chris Morley wrote: > > >>"Geoff Hutchison" wrote (some time back) > > > >>> > So as a self-proclaimed "maintainer" of the project, I have to keep > >>> > some idea in the back of my head how we'll get to 2.0, 2.1, 2.2, > 3.0?, > >>> > etc. releases. > > > >>.... > > > >>> > (What new formats are needed? What new features? What > architecture > >>> > changes are needed?) > >>> > 3) Discussions on plans/roadmaps for things that don't go into > the 2.0 > >>> > release. > >>> > > > > >> > >>Here are some suggestions for longer-term mods to the conversion process > >>in OpenBabel. I feel that, although they may not be backward compatible, > >>these features would be desirable to provide flexibility and > maintainability > >>for the future. Most have been previously discussed. > > > I'd like to support the discussion here and encourage refactoring of babel. > Having spent the last ca 2 weeks rewriting the C++ support for CML (it's > virtually ready I think it's critical that Babel's design evolves along > modular lines as suggested here and elsewhere. > > My vision of babel development is that it should be an API/plugin type of > approach. A developer should be able to write the readFoo and writeFoo > modules by using an API rather than having to understand the whole > architecture of the program. This depends, however, on having very clear > and open architecture and clear understanding of the semantics/ontology > (i.e. exactly what each piece of information means). > > I am currently going through this process with CML - it now has about 100 > elements ("objects"). Probably about half of these correspond to concepts > in Babel. I am optimistic that most of the concepts in chemistry are > universal - the difficulties lie in different representations. A few > concepts (e.g. aromaticity) depend critically on the algorithms used and so > there is a need for these to be spelled out clearly. (I do not care whether > pyrrole is aromatic or not. however if system A decrees that it is, and > system B does not, we may need both those algorithms to convert between A's > representation and B's.) Such concepts therefore depend on "perception" and > it is critical that the perception is modularised (and in principle > variable on demand). > > Most concepts are easier - they depend on careful definition rather than > perception - so it is important to define carefully what is meant by (say) > hydrogen count , e.g. in B2H6. > > The core of OB, therefore, is a representation of these concepts. (Whether > it is in C++, Java, XML, UML or RDF/OWL is probably unimportant. At present > the OB core is a mixture of the data fields in mol.h and the ancillary > files (e.g. aromatic.txt). It is important that developers are able to find > the concepts they need quickly and accurately - then writing code is much > easier. In fact I am working towards a system where CML++ code is generated > automatically from the schema. > > A Foo developer therefore could follow the following steps: > - identify the concepts in Foo > - map them onto Babel API concepts. > - where they map precisely code the Foo syntax onto the OB API. This can be > almost trivial. > > where they do not match, the developer has the options: > - ignore the concept. An good example is that OB ignores bibliographic > info. The information is then lost in the conversion process. > - convert the data to an equivalent OB concept. Examples may be wedge/hatch > bonds converted to atom Parities (though this is not always possible - some > wedges do not correspond to atom-centered stereo). Conversion might be > provided by babel or might be added by the Foo developer. > - write code to add information (an example is molecular formula/mass - not > supported in OB) which can be algorithmically generated. > > Where possible it will help if the concepts and representations are > consistent over the OpenSource chemistry community. > > > >>- A clearer separation of functions - the chemistry needs to be more > >>separated from the conversion. (e.g. OBMol should not be where the > >>input and output file formats are stored). The FileFormat class could > >>be beefed up to do this. > > > Pattern-based design suggests that specialist modules should be created to > manage generic tasks and subclassed where necessary. Thus in CML software > there are decorators which add functionality to classes (e.g. a > moleculeDecorator can wrap a molecule and add getMolecularMass() to it. > Similarly there are serializers (writers) for output and eventReaders > (SAX-like) for input. Each of these is subclassed for different file > formats. A typical pattern for Foo could be > > FooReader extends AbstractMolReader implements MolReader > FooWriter extends AbstractMolWriter implements MolWriter > > > > >>- Each format needs to be self contained. A new format should not > >>require any changes in old code. (This is what abstract base classes > >>are for.) In a Windows system you might want to have each as a > >>precompiled DLL - a plugin - which would ease upgrading and allow > >>installation only of relevant formats. I guess something similar is > >>possible in Unix. Even without this feature the formats would be > >>dynamic - information about them and their options would be retieved > >>at run time. > >> > >>- The user interface for conversion would have a clear interface to the > >>conversion process itself to allow alternatives (GUIs etc). This would > >>include handling of formats (file extensions, options) in a dynamic way. > >> > >>- The input and output routines need to be more aware of the conversion > >>process so that they can adjust. Examples are the previously discussed > >>need deal on-the-fly with generated molecules during CML input, and > >>the need for a conditional <cml>...</cml> wrapper during output. > > > This is a generic problem for multiple molecules and could be something > like > > MolWriter.setMultipleMolecules(bool) > MolWriter.addOutputMolecule(mol) // fails unless MolWriter allows > multiple mols. > > > >>- The conversion framework needs to handle more than just OBMol. > >>I'm sure use would be made of the facility to convert different types of > >>molecule, sets of molecules(conformers?), reactions, sets of > reactions, etc. > > > Yes. > > It is important to have a clear data structure for these. CML has been > extended to support these concepts > > > >>- There should be more support for non-expert users. There is a big > >>activation barrier to using the program, which is not appropriate if > it is > >>to be used to just convert the format of some files. I realise > precompiled > >>code is not the Unix way, but Windows (and Mac?) users expect it. > >>It would be nice to support it on appropriate platforms, while keeping > >>a single version of the source code. > > > Agreed. We have mounted some *.exe on our site - but they tend to get > dated. It is a really tough problem even for smart people to compile C++ on > Windows - make and configure are useless. Note that sourceforge has compile > farms so it should be possible to get a whole range of compilers. > > The main problem is commitment to making this happen. It's hard work and > not normally recognised by those outside the development process. If you > write a better architecture and lose 1% functionality you get few thanks! > > P. > > >>Peter Murray-Rust > > > Unilever Centre for Molecular Informatics > Chemistry Department, Cambridge University > Lensfield Road, CAMBRIDGE, CB2 1EW, UK > Tel: +44-1223-763069 Hi, Happy new year to all ! Interesting ... i see two basic problems: The molecule representation and their definition is based on four expert system. And for conversion you need any kind of atom typer. OpenBabel/JOELib process to assign atom types. http://www-ra.informatik.uni-tuebingen.de/software/joelib/tutorial/atomtyper.html So what is the problem ? 1. There are some algorithms required, like SSSR, SMARTS (also partly based on this assigning process), aromaticTyper. Some code fragments modified version of the already published algorithms, but notStandaloneEnough to publish these fragments again. 2. There are definitions for the assignment needed, like aromatic.txt, atomtype.txt, phmodel.txt ... which are also based on SMARTS ... so that's fine, so users can define their own protonation model. FINALLY: 1. The separation is from the object oriented design recommended ! This will be a great benefit ! Good work ! 2. Before no abstract definition (pseudocode or something else) of the assigning process is available the disconnection between the molecules and the conversion is NOT possible. Special cases can be treated, but nothing else. Otherwise every molecule must have it's own atom typer, which is in my opinion, a huge performance problem. But what's about a atomTyperCache and not Singleton classes (JOELib) or static data/methods (C++) as the actual implementations? I do not know any publication or exact definition of what is a OpenBabel/JOELib molecule, because this is really complex. A possibility could be to define a huge molecular data set with atom types and formulate a classification+optimization problem, so this is the most transparent and most correct way for a computer scientist, but who will create and publish such a huge database (manpower ?). What's about tautomers in this data base ? Can this be another classification task ? I would like, to have such a database, because i'm interested in the optimization and dataMining approach, but time is rare ... Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2004-01-07 08:37:41
|
Message: 1 Reply-To: "Chris Morley" <c.m...@ga...> From: "Chris Morley" <c.m...@ga...> To: <ope...@li...> Date: Sun, 4 Jan 2004 15:09:00 -0000 Subject: [Open Babel] Suggested modified conversion framework "Geoff Hutchison" wrote (some time back) >> So as a self-proclaimed "maintainer" of the project, I have to keep >> some idea in the back of my head how we'll get to 2.0, 2.1, 2.2, 3.0?, >> etc. releases. .... >> (What new formats are needed? What new features? What architecture >> changes are needed?) >> 3) Discussions on plans/roadmaps for things that don't go into the 2.0 >> release. >> Here are some suggestions for longer-term mods to the conversion process in OpenBabel. I feel that, although they may not be backward compatible, these features would be desirable to provide flexibility and maintainability for the future. Most have been previously discussed. - A clearer separation of functions - the chemistry needs to be more separated from the conversion. (e.g. OBMol should not be where the input and output file formats are stored). The FileFormat class could be beefed up to do this. - Each format needs to be self contained. A new format should not require any changes in old code. (This is what abstract base classes are for.) In a Windows system you might want to have each as a precompiled DLL - a plugin - which would ease upgrading and allow installation only of relevant formats. I guess something similar is possible in Unix. Even without this feature the formats would be dynamic - information about them and their options would be retieved at run time. - The user interface for conversion would have a clear interface to the conversion process itself to allow alternatives (GUIs etc). This would include handling of formats (file extensions, options) in a dynamic way. - The input and output routines need to be more aware of the conversion process so that they can adjust. Examples are the previously discussed need deal on-the-fly with generated molecules during CML input, and the need for a conditional <cml>...</cml> wrapper during output. - The conversion framework needs to handle more than just OBMol. I'm sure use would be made of the facility to convert different types of molecule, sets of molecules(conformers?), reactions, sets of reactions, etc. - There should be more support for non-expert users. There is a big activation barrier to using the program, which is not appropriate if it is to be used to just convert the format of some files. I realise precompiled code is not the Unix way, but Windows (and Mac?) users expect it. It would be nice to support it on appropriate platforms, while keeping a single version of the source code. I've put together some working code which implements all of these features, described in more detail at http://www.arcl02.dsl.pipex.com/OB/obframework.html It is written so that it can be deployed as separate DLLs containing: the main chemistry (the code is almost as at present); the conversion process (a new class); one or more formats (can be the existing code with a small wrapper); A user interface exe file makes use of these DLLs. The console interface feels much the same as at present and there is a Windows GUI interface which is a drop-in replacement. Alternatively, the code can be compiled together, as at present, without changing the source code. I hope it is platform independent except for the GUI and the deployment of the DLLs. Separating all the parts so that they can be separately compiled has been a challenge, because I wanted the conversion DLL and the user interfaces not to use directly any of the chemistry - they do not #include mol.h. This has meant the use of C++ a bit more adventurous than in the current code. For instance I found it necessary to use a smart pointer from the Boost library. This is not part of the standard language (although pretty close). I also need to point out that I am not a C++ expert - but it is all working ok at present. Using the DLLs, existing applications can add a much broader input format compatibility while not needing to be recompiled when the OB code changes. To illustrate the use of a non-OBMol conversion I have added a format for converting to and from a RXN file describing a reaction. The Windows interface has a novel feature that uses the text description of the various conversion options (previously output as help in the command line interface) to dynamically construct a set of checkboxes, etc appropriate to the requested file format. You can try a statically-linked compiled version of the GUI-driven framework with a few formats by just downloading, extracting and running http://www.arcl02.dsl.pipex.com/OB/OBGUIs.zip (407K) It should work on any 32bit Windows system. It may be that making changes like this to a project that puts the emphasis on the chemistry rather than programing is a bit over the top. Is it worth developing a non-backward compatible framework like this any further? Chris Morley --__--__-- Message: 2 Date: Sun, 04 Jan 2004 17:34:04 +0000 To: <ope...@li...> From: Peter Murray-Rust <pm...@ca...> Subject: Re: [Open Babel] Suggested modified conversion framework At 15:09 04/01/2004 +0000, Chris Morley wrote: >>"Geoff Hutchison" wrote (some time back) > >>> > So as a self-proclaimed "maintainer" of the project, I have to keep >>> > some idea in the back of my head how we'll get to 2.0, 2.1, 2.2, 3.0?, >>> > etc. releases. > >>.... > >>> > (What new formats are needed? What new features? What architecture >>> > changes are needed?) >>> > 3) Discussions on plans/roadmaps for things that don't go into the 2.0 >>> > release. >>> > > >> >>Here are some suggestions for longer-term mods to the conversion process >>in OpenBabel. I feel that, although they may not be backward compatible, >>these features would be desirable to provide flexibility and maintainability >>for the future. Most have been previously discussed. I'd like to support the discussion here and encourage refactoring of babel. Having spent the last ca 2 weeks rewriting the C++ support for CML (it's virtually ready I think it's critical that Babel's design evolves along modular lines as suggested here and elsewhere. My vision of babel development is that it should be an API/plugin type of approach. A developer should be able to write the readFoo and writeFoo modules by using an API rather than having to understand the whole architecture of the program. This depends, however, on having very clear and open architecture and clear understanding of the semantics/ontology (i.e. exactly what each piece of information means). I am currently going through this process with CML - it now has about 100 elements ("objects"). Probably about half of these correspond to concepts in Babel. I am optimistic that most of the concepts in chemistry are universal - the difficulties lie in different representations. A few concepts (e.g. aromaticity) depend critically on the algorithms used and so there is a need for these to be spelled out clearly. (I do not care whether pyrrole is aromatic or not. however if system A decrees that it is, and system B does not, we may need both those algorithms to convert between A's representation and B's.) Such concepts therefore depend on "perception" and it is critical that the perception is modularised (and in principle variable on demand). Most concepts are easier - they depend on careful definition rather than perception - so it is important to define carefully what is meant by (say) hydrogen count , e.g. in B2H6. The core of OB, therefore, is a representation of these concepts. (Whether it is in C++, Java, XML, UML or RDF/OWL is probably unimportant. At present the OB core is a mixture of the data fields in mol.h and the ancillary files (e.g. aromatic.txt). It is important that developers are able to find the concepts they need quickly and accurately - then writing code is much easier. In fact I am working towards a system where CML++ code is generated automatically from the schema. A Foo developer therefore could follow the following steps: - identify the concepts in Foo - map them onto Babel API concepts. - where they map precisely code the Foo syntax onto the OB API. This can be almost trivial. where they do not match, the developer has the options: - ignore the concept. An good example is that OB ignores bibliographic info. The information is then lost in the conversion process. - convert the data to an equivalent OB concept. Examples may be wedge/hatch bonds converted to atom Parities (though this is not always possible - some wedges do not correspond to atom-centered stereo). Conversion might be provided by babel or might be added by the Foo developer. - write code to add information (an example is molecular formula/mass - not supported in OB) which can be algorithmically generated. Where possible it will help if the concepts and representations are consistent over the OpenSource chemistry community. >>- A clearer separation of functions - the chemistry needs to be more >>separated from the conversion. (e.g. OBMol should not be where the >>input and output file formats are stored). The FileFormat class could >>be beefed up to do this. Pattern-based design suggests that specialist modules should be created to manage generic tasks and subclassed where necessary. Thus in CML software there are decorators which add functionality to classes (e.g. a moleculeDecorator can wrap a molecule and add getMolecularMass() to it. Similarly there are serializers (writers) for output and eventReaders (SAX-like) for input. Each of these is subclassed for different file formats. A typical pattern for Foo could be FooReader extends AbstractMolReader implements MolReader FooWriter extends AbstractMolWriter implements MolWriter >>- Each format needs to be self contained. A new format should not >>require any changes in old code. (This is what abstract base classes >>are for.) In a Windows system you might want to have each as a >>precompiled DLL - a plugin - which would ease upgrading and allow >>installation only of relevant formats. I guess something similar is >>possible in Unix. Even without this feature the formats would be >>dynamic - information about them and their options would be retieved >>at run time. >> >>- The user interface for conversion would have a clear interface to the >>conversion process itself to allow alternatives (GUIs etc). This would >>include handling of formats (file extensions, options) in a dynamic way. >> >>- The input and output routines need to be more aware of the conversion >>process so that they can adjust. Examples are the previously discussed >>need deal on-the-fly with generated molecules during CML input, and >>the need for a conditional <cml>...</cml> wrapper during output. This is a generic problem for multiple molecules and could be something like MolWriter.setMultipleMolecules(bool) MolWriter.addOutputMolecule(mol) // fails unless MolWriter allows multiple mols. >>- The conversion framework needs to handle more than just OBMol. >>I'm sure use would be made of the facility to convert different types of >>molecule, sets of molecules(conformers?), reactions, sets of reactions, etc. Yes. It is important to have a clear data structure for these. CML has been extended to support these concepts >>- There should be more support for non-expert users. There is a big >>activation barrier to using the program, which is not appropriate if it is >>to be used to just convert the format of some files. I realise precompiled >>code is not the Unix way, but Windows (and Mac?) users expect it. >>It would be nice to support it on appropriate platforms, while keeping >>a single version of the source code. Agreed. We have mounted some *.exe on our site - but they tend to get dated. It is a really tough problem even for smart people to compile C++ on Windows - make and configure are useless. Note that sourceforge has compile farms so it should be possible to get a whole range of compilers. The main problem is commitment to making this happen. It's hard work and not normally recognised by those outside the development process. If you write a better architecture and lose 1% functionality you get few thanks! P. >>Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2003-11-27 09:08:34
|
Hi Wayne, > I grabbed JOELib-bin-20031117 and some data from the nci database, as I= =20 > wanted to compare property calculations from JOELib to other=20 > calculators. I queried for all structures that had an experimental log= =20 > P associated with them, 3576 structures came back. >=20 > I saw a number of discrepancies between what=92s calculated by JOELib a= nd=20 > what was recorded in the NCI dataset. For example : >=20 > - In 16% of the cases # of rotational bonds disagreed >=20 > - In ~29% of the cases, Number_of_HBA2 disagreed with the number of=20 > acceptors in the nci database >=20 > - In ~1% of the cases, Number_of_HBD1 disagreed with the number of=20 > donors in the nci database >=20 > - JOELib logP had a correlation of 0.64 with the experimental log P=20 > values; KOW & acd labs predictions in nci had correlations of 0.98 and = 0.92 >=20 > Below is a snippet of the code I=92m using=97before I look any further = at=20 > the reasons for the differences noted above, could you let me know if=20 > I=92m using the library correctly? Sounds reasonably !!! I've actually submitted two 'model-building-papers' which includes the=20 following sentences: 'For comparing models it should be guaranteed that the descriptors are=20 using all the same atom typer, aromaticity- and hybridization-model.=20 Because many programs use text definitions for the atom types=20 [JOELib,OpnBabel] we recommend to use the same definitions or the same=20 data processing workflow to avoid bad prediction results for new molecule= s.' As already mentioned several times, the descriptor calculation process=20 is the LAST step after processing four expert systems: http://www-ra.informatik.uni-tuebingen.de/software/joelib/tutorial/atomty= per.html In my opinion most of the programs have their own atom typer, which is=20 really critical !!! I thrust, taking my descriptor calculation=20 experience into account, mostly JOELib and OpenBabel, because both uses=20 the same atom typing definitions, which are open-source, open-content=20 and based on text files !!! Let's say these expert systems fails for some compounds, we can at least=20 be sure that they will also fail for analogue compounds, so we will have=20 a systematic error. Because these models have a long tradition they are=20 still really good, in my opinion. My cooperation partner told me that=20 the models are sometimes better than Sybyl. So far the results for the first definition of rotatable bonds, H-donors=20 and H-acceptors. The second definition is based on YOUR definition of=20 these descriptors. JOELib supports e.g. two different kind of donors and=20 acceptors, and will be never a gurantee for completeness !!! Most of the=20 authors in the literature gives their SMARTS pattern for this definition=20 or say, which is VERY BAD, we used program XYZ. A program is from the=20 computer scientist standpoint of view not transparent !!! Use always SMARTS or detailed descriptions of these descriptors !!! To LogP. I've already published a paper for LogP prediction. As you=20 surely know there are two main ways to predict values: 1. GroupContribution approach: The open-source model in JOELib is such=20 one. The model is really not that good (i checked this). See literature=20 reference in source code. 2. Descriptor/DataMining approach: Part of my published paper: J. K. Wegner, A. Zell, Prediction of Aqueous Solubility and Partition=20 Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection=20 Method, Journal of Chemical Information and Computer Science (JCICS),=20 2003, 43(3), 1077-1084, DOI: 10.1021/ci034006u Concluding my mail i will say that my main work i'm paid for is Chemical=20 DataMining, so i know a lot of the problems in this area, please don't=20 hesitate to ask me, although these topics can be a little bit off-topic=20 for this mailing list. Do you know the JOELib interface to Weka ??? Regards, Joerg >=20 > Thanks, > Wayne >=20 > =20 >=20 > public class test { >=20 > =20 >=20 > public test() { >=20 > } >=20 > =20 >=20 > /** >=20 > * @param args the command line arguments >=20 > */ >=20 > public static void main(String[] args) throws Exception { >=20 > SimpleReader sdfile =3D new SimpleReader(args[0]); >=20 > JOEMol mol =3D new JOEMol(); >=20 > =20 >=20 > PrintStream out =3D new PrintStream(new FileOutputStream("out.d= at")); >=20 > =20 >=20 > DescResult LogP =3D null; >=20 > =20 >=20 > out.println("E_NSC\tjoe_logP\tkow_LogP\texp_logP\tacd_logP"); >=20 > while (sdfile.readNext(mol)) { >=20 > //System.out.println(mw.getDoubleValue(mol)); >=20 > LogP =3D DescriptorHelper.instance().descFromMol(mol, "LogP= "); >=20 > =20 >=20 > String kow_LogP =3D convert(mol.getData("E_LOGP")); >=20 > String exp_LogP =3D convert(mol.getData("E_LOGP/2")); >=20 > String acd_LogP =3D convert(mol.getData("E_LOGP/3")); >=20 > =20 >=20 > String nsc =3D mol.getData("E_NSC").toString(); >=20 > =20 >=20 > out.print(nsc + "\t" + LogP + "\t" + kow_LogP + "\t"); >=20 > out.println(exp_LogP + "\t" + acd_LogP); >=20 > } >=20 > } >=20 > =20 >=20 > static String convert(joelib.data.JOEGenericData in) { >=20 > String result =3D ""; >=20 > if (in !=3D null) { >=20 > result =3D in.toString().trim().substring(0,=20 > in.toString().indexOf(' ' >=20 > , 1)); >=20 > } >=20 > if (result =3D=3D null) { >=20 > return ""; >=20 > } else { >=20 > return result; >=20 > } >=20 > } >=20 > } >=20 > =20 >=20 > The =93convert=94 function is used to clean off spaces & a zero appende= d to=20 > the field containing the predicted log p values. I don=92t know why th= e=20 > sd file dump from the nci has that. >=20 > =20 >=20 > -----------------------------------------------------------------------= --------------------------=20 >=20 > This email may contain material that is confidential and privileged and= =20 > is for the sole use of the intended recipient. Any review, reliance or=20 > distribution by others or forwarding without express permission is=20 > strictly prohibited. If you are not the intended recipient, please=20 > contact the sender and delete all copies. >=20 --=20 Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2003-11-27 08:42:43
|
Hi Hongmei, > How could I remove all H atoms in a JOEMol? JOEMol mol; mol=YourGetMolecule(); mol.deletetHydrogens(); BTW, I've found a dead-loop-bug in mol.deletetHydrogen(); which works independantly from mol.deletetHydrogens(); so all will be fine for you. I will check in the bug fixes with some code cleanup (hopefully) this week. Regards, Joerg > > Happy Thanksgiving. > > > Thanks. > > Hongmei > > > -----Original Message----- > From: Joerg K. Wegner [mailto:we...@in...] > Sent: Thursday, November 13, 2003 2:20 AM > To: Sun, Hongmei* > Cc: 'joe...@li...' > Subject: Re: [Joelib-help] aromatize a molecule > > Hi all, > > if you have loaded a molecule and then call, e.g. > JOEAtom atom=mol.getAtom(); > boolean isAromatic=atom.isAromatic(); > > JOELib will determine the aromaticity on it's own; only once if you will > not change atoms or bonds for beeing faster. That's one reason for the > beginModify and endModify flags when working on molecules. > > When you want to check the state on your own you can call: > boolean assignedAlready=molecule.hasAromaticPerceived(); > > or you can assign the flags on your own: > JOEAromaticTyper.instance().assignAromaticFlags(molecule); > > For a short description see: > http://www-ra.informatik.uni-tuebingen.de/software/joelib/tutorial/atomtyper > .html > and all TEXT DEFINITION files in joelib/src/joelib/data/plain or online at: > http://cvs.sourceforge.net/viewcvs.py/joelib/joelib/src/joelib/data/plain/ > which are based on SMARTS, so if you want to change things in the expert > system models feel free to do so. > > Regards, Joerg > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2003-11-05 21:47:07
|
Hi Boryeu, i can speak only for JOELib: 1.1. Download latest source code release 1.2. Replace joelib/data/JOEAromaticTyper by the file given at http://sourceforge.net/tracker/index.php?func=detail&aid=823931&group_id=39708&atid=425969 1.3. compile complete project with: joelib> sh build.sh compile (works without ant installation) OR joelib/ant> ant compile (ant must be installed) 2.1. get complete new CVS version (recommended, to avoid side effects): Follow instruction at: http://sourceforge.net/cvs/?group_id=39708 Use: cvs -d:pserver:ano...@cv...:/cvsroot/joelib co joelib 2.2. compile complete project with: joelib> sh build.sh compile (works without ant installation) OR joelib/ant> ant compile (ant must be installed) For OpenBabel you should add the extended root atom picking to the aromatic typer, which is not too difficult. Regards, Joerg |
From: Joerg K. W. <we...@in...> - 2003-11-04 07:53:59
|
Hi all, the SMARTS bug for bridging root atoms (also in inner rings!) and then selecting sometimes two root atoms in the same ring (bug!) is fixed. Changes were added to CVS also. http://sourceforge.net/tracker/index.php?func=detail&aid=823931&group_id=39708&atid=425969 OELib/OpenBabel works here fine, because they use the primitive root atom picking which causes problems in inner rings. This should be fixed. JOELib works with both cases, please see joelib.data.JOEAromaticTyper and 'root atom picking'-method. By the way: the improved PDF writer in CVS writes also descriptor informations and can be used to visualize SMARTS if you have a little programming experience. In my opinion a very usefull tool. Every structure+descriptors starts at a new page. Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2003-09-03 08:14:30
|
hi all, the package path in the LibGhemical JNI interface was corrected. Now the precompiled shared libraries should work correctly !;-) Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2003-08-26 16:39:01
|
Hi, thanks ! Sorry, for this distribution bug (in the ant xml-script), both are now available in the new release. http://sourceforge.net/project/showfiles.php?group_id=39708 BTW, when talking about the atom typer ... the aromaticity model bug for complex rings is fixed: http://sourceforge.net/tracker/index.php?func=detail&aid=795313&group_id=39708&atid=425969 The weka library and patch mechanism is now available as optional library. http://sourceforge.net/project/showfiles.php?group_id=39708 Regards, Joerg > I've found that the three files: > LogP.contributions > MR.contributions > PSA.contributions > are missing from the source distribution. > > Furthermore it seems that "atomtyper.html" is missing from the "tutorial" directory (even from CVS!) > > Regards > > Ambrogio > > > ------------------------------------------------------- > This SF.net email is sponsored by: VM Ware > With VMware you can run multiple operating systems on a single machine. > WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines > at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0 > _______________________________________________ > Joelib-help mailing list > Joe...@li... > https://lists.sourceforge.net/lists/listinfo/joelib-help > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2003-08-26 12:19:02
|
Hi Bo, Yes and No, i've fixed the bug definitely last week and you should check out the latest CVS version. Sorry, but my first fix contained still an inconsistency and was breaking the 'new root atom picker' too early. Your examples work in my SMARTS testing environment. Regards, Joerg > I tried matching the smarts, Oc1ccccc1O, against the smiles string of > 4,5-Dihydroxypyrene, Oc1c(O)c2cccc3ccc4cccc1c4c23. > > But there is no match. > > I think there might be some problem related to pyrene. > > I made it with the smarts, Oc1ccccc1O, against the smiles string of > 9,10-Dihydroxyphenanthrene, Oc1c(O)c2ccccc2c3ccccc13. > > please, check out that problem. > > Thanks for your effort. > > Bo Hou. > UM-BBD from University of Minnesota. > > > > > > > <http://www.dreamwiz.com/>생활인터넷 드림위즈 http://www.dreamwiz.com > > [지식검색] 드림위즈 지식 커뮤니티 궁금한 건 참지마세요. 다 알려드릴께요. > <http://ksearch.dreamwiz.com/> > [쥬니어메일] 어린이에게 스팸없는 건강하고 깨끗한 메일환경을 만들어주십시 > 오. <http://www.dreamwiz.com/ma/jr/info_jr.htm> -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2003-08-20 06:24:19
|
From: Gert S. <Ger...@re...> - 2003-08-19 10:19:13
|
> So if you want to use your own formatter, please use analogue options. I have been using emacs for quite some time now and would appreciate if I could continue doing so. Is anyone aware of ports of the eclipse JAVA source formatting to emacs? And how should I change this formatting manually? Thanks, Gert |
From: Joerg K. W. <we...@in...> - 2003-08-19 10:00:19
|
Hi, for avoiding CVS diff complications i recommend to use the same source code formatter for all source code files. 1. I use eclipse, eventually we can also use another tool as ant task, but i would still prefer eclipse, e.g. for formatting also impoerts and the good wraning and error styles when editing. 2. for all comments i recommend // TODO: my question or todo tag because eclipse shows nice blue targets in the editor for these tags. and not: // QUESTION // ??? // XXX Link: http://www.eclipse.org/ Eclipse-formatter preferences to use: New Lines: (x) insert before opening brace (x) insert in control statements ( ) clear blank lines ( ) insert new line between else-if (x) insert inside empty block Line Splitting: 80 Style: ( ) Compact assignment (x) Insert space after cast (x) Insert tabs, not spaces space identation level: 4 So if you want to use your own formatter, please use analogue options. Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |
From: Joerg K. W. <we...@in...> - 2003-08-14 10:50:38
|
Hi Gert, >>So i think adding somewere a note in the tutorial would be fine, because >>i've an internal CVS version also. That's my way, when developing >>'machine learning' tools and hacking them. > This would indeed be the least we should do. And in the build.xml file. Documentation added to API, build.xml and to the tutorial. For the next release i will publish a weka.jar for the usage with joelib analogue to the libghemical binaries (which is some more stuff to administrate !;-( Eventually a simple weka_build.xml file should work ... has anyone already some experience with patching source code ? If not i will look at my own at the ant tutorial .... Regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |