On Wed, May 14, 2008 at 4:27 PM, Stefan Kuhn <stefhk3@web.de> wrote:
> > I am very new with all this chemoinformatic specialities being a
> > bioinformatician myself but the sdf file is a concatenation of mol
> > files right? When I have read a molecule from it with cdk can I not
> > simply generate the corresponding mol file and check the size of
> > that text and by a simple comparison of the total size of the sdf
> > file get a very good approximation of how much I have read without
> > doing very much expensive IO?
Well, you can do so (btw. no need to write the mol file, mol file size is
number of atoms + number of bonds+4, apart from unusual configurations). But
you only have 1 file then, and this might be untypically large/small. So you
need to do more than 1 and this comes down to what was suggested.
Just in case you didn't have the idea yourself: For counting the number of
molecules in e. g. 1000 lines, you do not need to read the molecules. It
should be enough to count the $$$$ or M  END lines, which should be fairly
fast.

I got some help from Egon and am now going on the "count the number of $$$$" approach.

--
// Jonathan