From: Joerg W. <we...@in...> - 2002-09-07 11:59:18
|
Hi all, here are some theoretical aspects at loading molecule files with JOELib: 0. TEST: loading speed 07/09/2002, AMD1400+, ASUS board, 1GB DDR RAM, Win2K, SUN JDK1.4.0-beta2-b77 At the moment the loading process is very transparent, because of using text based files and maximal flexible, because descriptors can be simple integer/double value but also user defined values, like integer/double array/matrices, mixed input formats used for CTX files or anything you can imagine. For descriptor development and processng that's really great, but let's now talk about speeding up the loading process... 1. Molecular data only molecules 10000 molecules successful loaded in 11406 ms. 20000 molecules successful loaded in 22562 ms. 30000 molecules successful loaded in 33228 ms. -->1.1seconds/1000molecules 2. Molecular with descriptor data with 204 double value descriptors 10000 molecules successful loaded in 92663 ms. 20000 molecules successful loaded in 185727 ms. 30000 molecules successful loaded in 273794 ms. -->9.13seconds/1000molecules OPTIMIZATION POSSIBILITIES: The question is, what do we want exactly ... 0. Techniques a. Can we define a faster SDF loader or an user defined loader ? YES, import/export types can be dynamically be defined. b. The text loading process can be optimized by defining a loader which works directly on the input stream, which makes it necessary to define a stream SDF parser. One possibility can be to write an own parser or to use the JavaCompilerCompiler to generate a parser. Both possibilities are a lot of work, i assume the loading process can speeded up to a factor of 1.3 to 1.9 c. Use a binary import format for which you can define a loader. That's less flexible and less transparent, but the speed up should be very high (i assume a factor greater 2). 1. Molecular data a. Speeding up loading molecular data especially is only possibly by using techniques from 0. 2. Descriptor data a. Speeding up loading descriptor data especially is possible by using a text or binary based flat file format or the techniques 0. Descriptor data sets have a greter potential for optimization. Regards, Joerg K. Wegner Dipl. Chem. Joerg K. Wegner Univ. Tuebingen, Computer Architecture, Sand 1, D-72076 Tuebingen, Germany Tel. (+49/0) 7071 29 78970, Fax (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de |