From: Joerg K. W. <we...@in...> - 2003-06-25 13:42:32
|
Dear Dr. Ambrogio, > first of all thanks for making available JOELib. thanks ... ... i'm crossposting this message also to the JOELib-help mailing list. You can join if you want ... http://sourceforge.net/mail/?group_id=39708 > I successfully used the Convert application for adding descriptors to SD files. However I do not need all the descriptors. Then I tried to use DescriptorSelection to "select" the descripors I need. That's exactly the same way i use it, especially when calculating autocorrelation or other more extended descriptors (not public). > The problem is that it's not clear to me the syntax, in particular the format of the descriptors list and what the "delimiter" is. Let's try to resolve that... > I tried: > C:\FreeSoft\JOELib>java -cp %CLASSPATH% joelib.test.Convert -iSDF -oSDF +d c:\temp\sample.sdf c:\tem > p\sample_wd.sdf All fine ! > Then: > C:\FreeSoft\JOELib>java.exe -cp %CLASSPATH% joelib.test.DescriptorSelection -iSDF c:\temp\sample_wd. > sdf -oSDF c:\temp\sample_wd2.sdf c:\temp\Sample_desc.txt normal ; > 13:34:44 [INFO ] joelib.data.JOEElementTable - Using element table: joelib/data/plain/e > lement.txt > 13:34:44 [INFO ] joelib.io.IOTypeHolder - 12 input/output types loaded. > 13:34:44 [INFO ] joelib.process.filter.FilterFactory - 5 filter informations loaded. > Exception in thread "main" java.lang.NullPointerException > at joelib.test.DescriptorSelection.parseCommandLine(DescriptorSelection.java:167) > at joelib.test.DescriptorSelection.main(DescriptorSelection.java:301) > > where c:\temp\Sample_desc.txt is: > > Number_of_bonds;LogP;Topological_radius No, this should be simply: Number_of_bonds LogP Topological_radius so you can simply use the JOELib statistic-file and import it under Excel and extract the descriptor names you want !;-) > Can you help me in getting DescriptorSelection work? > Thanks in advance. Was me a pleasure. > By the way I've also have a couple of "bug reports": > - In the Tutorial I've noticed that "bor", "chlor", "brom", "iod", "fluor" and "phosphor" are used instead of "boron", "chlorine", "bromine", "iodine", "fluorine" and "phosphorus". > > - In the .bat files, there are extra semicolons at the end of the "set CLASSPSTH" instructions. Thanks, i will fix it. > Thanks again. > > Ambrogio Regards, Joerg P.S.: BTW there exists also a descriptor normalization under joelib.process.types.DescVarianceNorm if you plan to normalize your data with mean=0, sdtdev=1, which requires a little bit of programming. E.g. you can add this process simply to the selection process pipe in joelib.test.DescriptorSelection P.P.S.: All descriptors in joelib\data\plain\desc2ignore.txt defined in the joelib.properties file as: joelib.process.types.DescVarianceNorm.descriptors2ignore=joelib/data/plain/desc2ignore.txt will be ignored for normalization. This is extremely usefull if you have an nominal classification problem or the sdf-file contains also some id's, ... -- Dipl. Chem. Joerg K. Wegner Univ. Tuebingen, Computer Architecture, Sand 1, D-72076 Tuebingen, Germany Tel. (+49/0) 7071 29 78970, Fax (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de |