So the IteratingSDFReader isn't as slow as other parts of the CDK but it could certainly be improved. 

Here's a couple of general thoughts

In terms of reading sections of a file - if it's uncompressed it would be nice to have a utility to do something with memory mapping (http://javarevisited.blogspot.co.uk/2012/01/memorymapped-file-and-io-in-java.html).

For the faster basic reader I've been hacking on/off at a reimplementation for the last year or so. I don't have time at the moment though and I really want to get other stuff finished so we can release 1.6. There may be a faster implementation in future versions but as Joos says this requires some significant effort.

If you can guarantee the format is correct (i.e. digits padding correctly) you can write a very fast parser of the atom block line.

https://github.com/johnmay/cdk/blob/82507e981b8acb5ac3e7b94b829eb7f242b38d48/src/main/org/openscience/cdk/io/MDLV2000AtomBlock.java

I think as this stands will read most of what the current reader does. Other parts of the parser need adapting but speeding up the atom block parsing is certainly a start.

J

On 18 Sep 2013, at 19:34, Joos Kiener <joos@sunrise.ch> wrote:

Hi Lochana,

I think you sure will need Multi-threading to keep the UI responsive and also some form of cache, meaning if you display 10 compounds in the "viewport" of the table, keep a lot more in memory, maybe 100 both up or down and when the user scrolls, adjust the cache accordingly but the scrolling will be fluid and fast.

Use this with the RandomAccessSDFReader. I don't know it's performance and I guess the indexing phase can take pretty long. Hence that should be run in the background. Maybe you can extend it to use above mentioned cache because it sill uses file access and that is always very, very slow compared to memory.

Or create your own reader that can quickly jump to the desired record but does not need any indexing. eg. if you display record 100 to 110, and have as example records 50 to 150 in cache and user scrolls up 1 page read records 40-49 from the file into the cache. Hint: record 40 (assuming index 0-based) starts at the 40th occurrence  of $$$$. So maybe just indexing all $$$$ positions could suffice (no idea how fast that is in a large file). With this I would probably cache more like in the 1000 record range and not adjust cache for every single page change.

Anyway I don't think this can be made fast without some significant effort.

Best Regards,

Joos

Am 18.09.2013 19:46, schrieb lochana menikarachchi:
Hi Everyone,

I need to quickly load 5-10 molecules to a jTable from a large SD file(say 1 million structures). The table needs to be updated by loading only 5-10 structures as the user scrolls down the table. I tried various SDF readers in CDK. Iterating reader, RandomAccessSDFReader but, they are extremely slow compared to what MarvinView (written in java) has. MarvinView can load 5-10 structures from extremely large files in few seconds. I wonder how marvin does this?? Any suggestions to replicate this functionality with CDK??

Thanks.

Lochana


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk


_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user