On Wed, May 14, 2008 at 4:27 PM, Stefan Kuhn <email@example.com> wrote:
> > I am very new with all this chemoinformatic specialities being aWell, you can do so (btw. no need to write the mol file, mol file size is
> > bioinformatician myself but the sdf file is a concatenation of mol
> > files right? When I have read a molecule from it with cdk can I not
> > simply generate the corresponding mol file and check the size of
> > that text and by a simple comparison of the total size of the sdf
> > file get a very good approximation of how much I have read without
> > doing very much expensive IO?
number of atoms + number of bonds+4, apart from unusual configurations). But
you only have 1 file then, and this might be untypically large/small. So you
need to do more than 1 and this comes down to what was suggested.
Just in case you didn't have the idea yourself: For counting the number of
molecules in e. g. 1000 lines, you do not need to read the molecules. It
should be enough to count the $$$$ or M END lines, which should be fairly