Diversity assessments and comparisons of large compound databases require calculating similarities of millions of compounds in an affordable time. ParaSim addresses this challenge by parallelizing the calculations according to the number of computing cores available on a single machine. It is optimized for the throughput of very large numbers of query structures against very large numbers of reference structures. As as special feature, ParaSim allows to store and and to access frequently queried datasets as persistant objects in memory for short response times.
ParaSim calculates chemical similarities based on binary structural fingerprints. It does not compute fingerprints by itself but relies on third party software to do so. Basically, all types of structural fingerprints which can be stored in an array of bits (a bitset) can be used by ParaSim.
See the Wiki (https://sourceforge.net/p/parasim/wiki/Documentation/) for detailed documentation.
FEEDBACK HIGHLY APPRECIATED!
Features
- Makes use of multiple computing cores
- Can store and access fingerprints persistenly in memory
- Results reported: Query molecule ID, reference molecule ID, similarity coefficient and, optionally, additional data like Smiles/InChi-Key/Chime string etc.
- Allows user-defined number of hits per query and similarity thresholds
- Allows different fingerprint bitset sizes
- Provides detailed progress information on request
- No installation required
- Reads fingerprints as Base64-encoded bitsets from .txt or .txt.gz files
- Demo applications for RDKit, PipelinePilot(TM) and Knime included
- Scripts included to search directly from SDF or Smiles files
