Assuming the tool compares the 2 jars by running them sequentially in the same JVM...

correct me if i am wrong, but i think the default garbage collection mechanism is serial (or maybe it now depends on your OS?).  with a serial garbage collector, after the first run there could be a large amount of objects scheduled for gc that are collected all in one hit during the second run.  that would hit performance.

try running the test n times to see if each time it is slower or if more memory is required each time.  (run 'top' in a shell or open windows task manager.  look for resourced used by javaw). 

could be a memory leak due to invisible references in the java code: that could be much more difficult to trace.  for example, if you have a thread that has a run method that takes a long time to execute, references assigned, for example, in a try block should be dereferenced after use otherwise the object will only get gc'd at the end of the run method.

could could also get a trial version of yourkit to check if there any instances of any objects hanging around that could be filling up memory...

On a similar note, I am currently looking at the performance of the calculate method of the tanimoto class.  i am experimenting to see of it is quicker to calculate the index using the current method compared with an alternative (but very similar method)

the current method: divide cardinality of and(..) by intersection count which is obtained by subtraction

the new method: divide cardinality of and(..) by intersection count which is obtained by or(..) e.g.
static float calculate(BitSet fingerprint1, BitSet fingerprint2) {
BitSet andSet = ((BitSet)fingerprint1.clone)).and(fingerprint2);
fingerprint1.or(fingerprint2);
BitSet orSet = fingerprint1;
float tanimoto = ((float)andSet.cardinality()) / ((float)orSet.cardinality());
return tanimoto;
}



It appears that, with a PubChem type fingerprint (approx 800 bits) it is slightly quicker to use my method, but with 160bit fingerprints there's very little difference.  However, in this case, the test executed first is always a bit slower.  Probably because the JVM is assigning pre-compiled hotspots for the code frequently used.

This is just to show that if you try to run performance tests one after the other in the same jvm, then there is always the possibility that they are not run under the same conditions and the test is not likely to be 'fair'.

regards

paul






========================================
Message Received: Feb 11 2009, 08:38 AM
From: "Miguel Rojas Cherto"
To: cdk-devel@lists.sourceforge.net
Cc:
Subject: [Cdk-devel] Using cheminfbenchmark functionality


Hi all,

I was experimenting with the new bechmark toolkit for chemoinforrmatic
http://github.com/egonw/cheminfbenchmark/tree/master. One of my surprise
was when I compared any module with two identical cdk.jar's. I would
guess that the results should be completely similar. But in reality the
last test drive always takes longer.

Is that normal? Should we not expected identical? Am I doing some thing
wrong?

Best regards,
Miquel


------------------------------------------------------------------------------
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
_______________________________________________
Cdk-devel mailing list
Cdk-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-devel