AFAIK, you'd have to iterate over the file to count the number of entries, and then iterate once again to actually read them
On May 9, 2008, at 8:00 AM, Jonathan Alvarsson wrote:
I am working on a structure database for Bioclipse and when importing a very big sdf file (which takes time...) some form of status bar would be HIGHLY appreciated. Is there a quick way in cdk to just get the numbers of entries in an sdf file?
Aouch I don't wanna do that.
For examples see
If not, is there possible to quickly get the number of lines in a file and if so maybe we could find a way to keep track of the number of slurped rows when iterating over it?
Since this is for visual feedback more than accuracy, you could do a first iteration over the file, and read a max of 1000 molecules. Then, evaluate the average count of lines per molecule entry and use that along with the filesize to get a (very) approximate count of the number of molecules in the whole file.
That's not great either.
I am very new with all this chemoinformatic specialities being a bioinformatician myself but the sdf file is a concatenation of mol files right? When I have read a molecule from it with cdk can I not simply generate the corresponding mol file and check the size of that text and by a simple comparison of the total size of the sdf file get a very good approximation of how much I have read without doing very much expensive IO?