From: Joerg W. <we...@in...> - 2004-06-04 15:15:46
|
Hi, this is interesting :-) 1. A profiling is recommended. I will be happy if you've time for this, after you've found a good and=20 free Java profiling tool. 2. Only the CVS can help. Around this time i've added all the descriptors= , so the bottleneck could be the parsing process using the regular expression patterns in joelib/src/joelib/data/plain/knownResults.txt So replacing this file with a file without regular expressions can be a solution. As already tested by myself the descriptors are the first bottleneck, eventually a SDF reader class without converting them can help, because i= n the previous version they were tored as uparsed entries, but i've changed this to avoid problems with CML2 export. Try your own MyMDLSD.java (removed descriptor parsing) and replace the current MDLSD reader in joelib.properties. Kind regards, Joerg On Thu, 3 Jun 2004, Oliva, Ambrogio wrote: > Hi. >=20 > I've found big differences in the performance of SimpleReader from olde= r and newer versions of the joelib library. Running a simple application = that counts the number of molecules in a SDF file with different versions= of joelib I've got the following results. >=20 > - with version 20040116: >=20 > C:\eclipse\workspace\MolCounter>java -cp .;log4j.jar;itext-0.94.jar;joe= lib-20040116.jar MolCounter sample.sdf > 16:04:49 [INFO ] joelib.data.JOEElementTable - Using eleme= nt table: joelib/data/plain/element.txt > 16:04:49 [INFO ] joelib.io.IOTypeHolder - 13 input/ou= tput types loaded. > 16:04:50 [INFO ] joelib.io.SimpleReader - ... 500 mol= ecules successful loaded in 922 ms. > Done: 500 found >=20 > - with version 20040323: > C:\eclipse\workspace\MolCounter>java -cp .;log4j.jar;itext-0.94.jar;joe= lib-20040323.jar MolCounter sample.sdf > 16:04:37 [INFO ] joelib.data.JOEElementTable - Using eleme= nt table: joelib/data/plain/element.txt > 16:04:37 [INFO ] joelib.io.IOTypeHolder - 22 input/ou= tput types loaded. > 16:04:38 [INFO ] joelib.desc.DescriptorHelper - 78 descript= or informations loaded. > 16:04:38 [INFO ] joelib.data.JOEAtomTyper - Using atom = type model: joelib/data/plain/atomtype.txt > 16:04:38 [INFO ] joelib.data.JOEPhModel - Using pH va= lue correction model: joelib/data/plain/phmodel.txt > 16:04:42 [INFO ] joelib.io.SimpleReader - ... 500 mol= ecules successful loaded in 4844 ms. > Done: 500 found >=20 > Could someone explain me the different behaviour of the two libraries, = and how to speed up the process using the newer versions? >=20 > The source code of MolCounter is below >=20 > // Imports > import org.apache.log4j.*; > import joelib.io.*; > import joelib.molecule.*; > import java.io.*; >=20 > public class MolCounter { > =09 > //Obtain a suitable logger. > private static Category logger =3D Category.getInstance("MolCounter"); >=20 > public static void main(String[] args) { > =09 > SimpleReader reader =3D null; //input SDF file > IOType inputType =3D IOTypeHolder.instance().getIOType("SDF"); > try { > reader =3D new SimpleReader(new FileInputStream(args[0]), inputType); > } catch (Exception ex) { > ex.printStackTrace(); > } > JOEMol mol =3D new JOEMol(inputType, inputType); > long lCounter =3D 0; > try {=09 > while (reader.readNext(mol)) { > lCounter++; > } > }=09 > catch (IOException ex) { > // occurs if file can not be found > ex.printStackTrace(); > } > catch (MoleculeIOException ex) { > // occurs if molecule entry is invalid > ex.printStackTrace(); =09 > } > reader.close();=09 >=20 > System.out.println("Done: "+ lCounter + " found"); > System.exit(0); > } > =09 > } >=20 >=20 >=20 > Thanks in advance. >=20 > Ambrogio >=20 >=20 >=20 >=20 >=20 >=20 >=20 > QUESTO MESSAGGIO E=1A PER USO ESCLUSIVO DEL DESTINATARIO IN ESSO INDICA= TO E PUO=1A CONTENERE INFORMAZIONI RISERVATE, SOGGETTE ALLA NORMATIVA SUL= SEGRETO PROFESSIONALE O AZIENDALE E/O RILEVANTI AI FINI DEL DECRETO LEGI= SLATIVO 30 GIUGNO 2003, N. 196 (CODICE IN MATERIA DI PROTEZIONE DEI DATI = PERSONALI). SE NON AUTORIZZATI, L=1AESAME, USO, COMUNICAZIONE O DIFFUSION= E DI QUESTO MESSAGGIO O DEI SUOI CONTENUTI SONO VIETATI. > QUALORA NON FOSTE IL DESTINATARIO DI QUESTO MESSAGGIO, VI PREGHIAMO DI = CORTESEMENTE DARCENE NOTIZIA A MEZZO TELEFAX O E-MAIL, CONFERMANDO LA DI= STRUZIONE DEL MESSAGGIO STESSO E DELLE EVENTUALI COPIE. PREVIA VOSTRA RIC= HIESTA IN TAL SENSO, PROCEDEREMO A RIMBORSARVI I RAGIONEVOLI COSTI DA VOI= SOSTENUTI IN RELAZIONE A QUANTO PRECEDE. >=20 > THIS MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT AND MAY CONT= AIN INFORMATION WHICH IS CONFIDENTIAL, PRIVILEGED, PROPRIETARY AND/OR COV= ERED BY THE PROVISIONS OF ITALIAN LEGISLATIVE DECREE N. 196 OF JUNE 30, 2= 003 (CODE FOR THE PROTECTION OF PERSONAL DATA). ANY UNAUTHORIZED REVIEW, = USE, DISCLOSURE OR DISTRIBUTION OF THIS MESSAGE OR ITS CONTENTS IS PROHIB= ITED. IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE NOTIFY US BY TELEFAX= OR BY E-MAIL, CONFIRMING THAT THE MESSAGE AND ALL COPIES HAVE BEEN DESTR= OYED. UPON YOUR REQUEST, WE SHALL REIMBURSE YOU ALL REASONABLE COST BORNE= IN CONNECTION WITH THE ABOVE. > N=18=ACHS^=B5=E9=9A=8AX=AC=B2=9A'=B2=8A=DEu=BC=AD=85=E9=DE=C0=89=EC=B5=A9= eJ=18=9E=95=D5=C5=AE=89=96=8Awh=C2=CBh.)=EE=C6=C7=AB=BD=EA=EC=B6=89=A8n)^= "{-jYR=86'=A5ux=AC=B6=17=A8=9D=E8=A7=B2=D6=A5=95=ABb=A2v=AE=B6=1A+=8Ax,=A2= [=AD=8A=89=ED=85=AB]=A1=EB"=B5=A9e-=E6=AB=9Ej+y=A9=DDz=F6=A5=B9=AB^=B6=87= Z=CA=1Bm=A7=EF=FF=C3=0C"=9E=CBZ=96[!=89=E9]r=89=BF=EB=F6=EB=FF=D3=9D8&=87= =A5=89=B8^=96=99=9A=8AX=A7=82X=AC=B4=9A=1E=96&=E1zZe=8A=CBl=B2=8B=ABq=E7=E8= =AE=07=A7z=D8m=B6=9B>=FF=F9b=B2=DB,=A2=EA=DCy=FA+=81=E9=DE=B7=F9b=B2=DB?=96= +-=8Aw=E8=FE:=1E=96&=E1zZ Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) =20 Never mistake action for meaningful action. (Hugo Kubinyi,2004) = =20 |