From: Rajarshi G. <rg...@in...> - 2008-09-11 14:50:35
|
Hi, while doing some performance testing on the following program: public class StorageClassSizes { public static void main(String[] args) throws FileNotFoundException, InterruptedException { String fileName = "/Users/rguha/src/java/cdk-qa/trunk/ projects/zinc-structures/ZINC_subset3_3D_charged_wH_maxmin1000.sdf"; Thread.sleep(30000); for (int i = 0; i < 1; i++) { List<IAtom> atomSizes = new ArrayList<IAtom>(); List<IBond> bondSizes = new ArrayList<IBond>(); IteratingMDLReader reader = new IteratingMDLReader(new FileReader(new File(fileName)), DefaultChemObjectBuilder.getInstance()); while (reader.hasNext()) { IAtomContainer molecule = (IAtomContainer) reader.next(); for (IAtom atom : molecule.atoms()) { atomSizes.add(atom); } for (IBond bond : molecule.bonds()) { bondSizes.add(bond); } } System.out.println("atomSizes.size() = " + atomSizes.size ()); System.out.println("bondSizes.size() = " + bondSizes.size ()); } } } So basically, we're just reading in a 1000 molecules and storing the atoms, bonds. What I was interested in was the memory usage for individual Atom and Bond objects. The shallow sizes for them are as follows: Object Num Object Size (bytes) Avg Size (bytes) Atom 13,923 1,447,992 104 Bond 14,335 688,080 48 boolean[] 31,810 764,800 24.04 I've included the entry for boolean[] just out of interest - I assume these correspond to the flags for each atom, bond (as well as IAtomContainer), so that explains the large number of them. But as expected the size if quite small. What is a little worrisome is the size of Atom objects! ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |
From: Egon W. <ego...@gm...> - 2008-09-11 16:10:16
|
Rajarshi, thanx for posting this! On Thu, Sep 11, 2008 at 4:50 PM, Rajarshi Guha <rg...@in...> wrote: > What I was interested in was the memory usage for individual Atom and > Bond objects. The shallow sizes for them are as follows: > > Object Num Object Size (bytes) Avg Size (bytes) > Atom 13,923 1,447,992 104 > Bond 14,335 688,080 48 > boolean[] 31,810 764,800 24.04 I have had similar observations... > I've included the entry for boolean[] just out of interest - I assume > these correspond to the flags for each atom, bond (as well as > IAtomContainer), so that explains the large number of them. But as > expected the size if quite small. > > What is a little worrisome is the size of Atom objects! Same here... this is why I like to see IAtomType, IElement, IIsotope as fields, instead of using hierarchical structure. But I'm sure there are other places where we can optimize... e.g. using enum for the common CDKConstant's, short instead of integer for atomic number (!), drop the use of Symbol is primary identifier and rely on the atomic number (and use IPseudoAtom for every symbol for which there is no symbol, which we expect people do anyway), use float for coordinates instead of double. Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-09-11 16:21:23
|
On Sep 11, 2008, at 12:10 PM, Egon Willighagen wrote: >> >> I've included the entry for boolean[] just out of interest - I assume >> these correspond to the flags for each atom, bond (as well as >> IAtomContainer), so that explains the large number of them. But as >> expected the size if quite small. >> >> What is a little worrisome is the size of Atom objects! > > Same here... this is why I like to see IAtomType, IElement, IIsotope > as fields, instead of using hierarchical structure. Hmm, I'm not sure that this will make a difference. The size reported were 'shallow' sizes - so they don't include the size of a non- primitive (beyond the size of the ref to that object). So, if I understand correctly, the size of the Atom objects is not including the a IAtomType object, but just its reference ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |
From: Egon W. <ego...@gm...> - 2008-09-12 04:10:05
|
On Thu, Sep 11, 2008 at 6:21 PM, Rajarshi Guha <rg...@in...> wrote: > On Sep 11, 2008, at 12:10 PM, Egon Willighagen wrote: >>> What is a little worrisome is the size of Atom objects! >> >> Same here... this is why I like to see IAtomType, IElement, IIsotope >> as fields, instead of using hierarchical structure. > > Hmm, I'm not sure that this will make a difference. The size reported were > 'shallow' sizes - so they don't include the size of a non-primitive (beyond > the size of the ref to that object). OK, that indeed has changed... had still in mind that IAtomType added natives, etc, etc, but there are now indeed all just pointers, null by default... So, the shallow size should just be determined by what's in IChemObject, but then I do not get the difference with IBond... > So, if I understand correctly, the size of the Atom objects is not including > the a IAtomType object, but just its reference Interesting suggestion... that would mean that internally Java represent hierarchy (IAtom extends IAtomType) with references? I would have guessed it create a new IAtom Object which copied all IAtomType fields/methods... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-09-12 04:15:01
|
On Sep 12, 2008, at 12:10 AM, Egon Willighagen wrote: > On Thu, Sep 11, 2008 at 6:21 PM, Rajarshi Guha <rg...@in...> > wrote: >> On Sep 11, 2008, at 12:10 PM, Egon Willighagen wrote: >>>> What is a little worrisome is the size of Atom objects! >>> >>> Same here... this is why I like to see IAtomType, IElement, IIsotope >>> as fields, instead of using hierarchical structure. >> >> Hmm, I'm not sure that this will make a difference. The size >> reported were >> 'shallow' sizes - so they don't include the size of a non- >> primitive (beyond >> the size of the ref to that object). > > So, the shallow size should just be determined by what's in > IChemObject, but then I do not get the difference with IBond... True > >> So, if I understand correctly, the size of the Atom objects is not >> including >> the a IAtomType object, but just its reference > > Interesting suggestion... that would mean that internally Java > represent hierarchy (IAtom extends IAtomType) with references? I would > have guessed it create a new IAtom Object which copied all IAtomType > fields/methods... I must say I'm guessing :) Java may actually have the whole object in the hierarchy, the profiler probably chooses not to descend fully through the hierarchy (cf shallow copy) ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |
From: Egon W. <ego...@gm...> - 2008-09-12 04:20:05
|
On Fri, Sep 12, 2008 at 6:14 AM, Rajarshi Guha <rg...@in...> wrote: > On Sep 12, 2008, at 12:10 AM, Egon Willighagen wrote: >> Interesting suggestion... that would mean that internally Java >> represent hierarchy (IAtom extends IAtomType) with references? I would >> have guessed it create a new IAtom Object which copied all IAtomType >> fields/methods... > > I must say I'm guessing :) > > Java may actually have the whole object in the hierarchy, the profiler > probably chooses not to descend fully through the hierarchy (cf shallow > copy) Questions like this is actually why I introduced the interfaces... we can now easily make an implementation that does not do "Atom extends AtomType" and see if that helps performance... Egon -- ---- http://chem-bla-ics.blogspot.com/ |