1. Load file pubchem-test-featmorgan3.txt into persistent memory object with key 0:
perl fp2mem.pl -create 0 -file data/pubchem-test-featmorgan3.txt Reading Reference fingerprints from base64... File data/pubchem-test-featmorgan3.txt: Number of columns: 3, has bitcounts ID: CID ADDITIONAL DATA: -no- FP type: FEATMORGAN_3 FP length: 1024 Fingerprints read from file: 10 Reference: 10 fingerprints read in total Required memory size : 1'450 bytes Maximum segment size : 18'446'744'073'709'551'615 bytes Applied segment size : 65'536 bytes Created key 0 from file data/pubchem-test-featmorgan3.txt: KEY : 0 RECORDS : 10 FILE : <your_parasim_path>/data/pubchem-test-featmorgan3.txt ID FIELD : CID FP TYPE : FEATMORGAN_3 FP LENGTH : 1024 ADDITIONAL DATA : -no- DATE : Fri Aug 30 15:02:33 2013 CREATOR : m02853 PERMISSIONS : 660 SEGMENT COUNT : 1 SEGMENT SIZE : 65'536 BYTES NET : 1'450 BYTES GROSS : 65'536
2. For file pubchem-test-featmorgan3.txt, find the two nearest neighbours in itself (versus the memory object 0), applying the Dice similarity coefficient:
perl parasim.pl -n 2 -c dice -q data/pubchem-test-featmorgan3.txt -r mem:0 QUERY REFERENCE DICE 71923 68664 0.285714285714286 71923 71923 1.000000000000000 68664 68664 1.000000000000000 68664 71542 0.348623853211009 68938 71917 0.387096774193548 68938 68938 1.000000000000000 71360 71360 1.000000000000000 71360 68938 0.347107438016529 71696 71696 1.000000000000000 71696 71917 0.380000000000000 71917 71917 1.000000000000000 71917 68938 0.387096774193548 71542 68664 0.348623853211009 71542 71542 1.000000000000000 71107 71107 1.000000000000000 71107 68938 0.321428571428571 71227 71227 1.000000000000000 71227 71360 0.291970802919708 71767 71767 1.000000000000000 71767 71360 0.333333333333333
3. Same query, but from the SDF file directly and including only dice similarities between 0.35 and 0.999:
perl simsearch.pl -n 2 -c dice -min 0.35 -max 0.99 -q data/pubchem-test.sdf -id CID -fp featmorgan_3 -r mem:0 QUERY REFERENCE DICE 71696 68938 0.365217391304348 71696 71917 0.380000000000000 71917 68938 0.387096774193548 71917 71696 0.380000000000000 68938 71696 0.365217391304348 68938 71917 0.387096774193548
4. Search a Smiles string directly against pubchem-test-featmorgan3.txt which was stored in memory:
perl simsearch.pl -fp featmorgan_3 -r mem:0 -q 'o1c2c\(cccc2\)cc1C\(=O\)N3CCNCC3' QUERY REFERENCE TANIMOTO AVG_TANIMOTO 1 68664 0.542372881355932 0.174922486279414
In this case, the structure ID was generated during runtime.
5. Search pubchem-test-fcfp6.txt versus zinc-test-fcfp6-smiles.txt with direct SMILES output:
perl parasim.pl -q data/pubchem-test-fcfp6.txt -r data/zinc-test-fcfp6-smiles.txt QUERY REFERENCE TANIMOTO AVG_TANIMOTO SMILES 71923 2 0.154761904761905 0.117569042869504 CC1c2c(CCC1C [...] 71542 10 0.185185185185185 0.107759423159295 CN(C(Cc1cccc [...] 68664 10 0.198019801980198 0.104496307506587 CN(C(Cc1cccc [...] 68938 3 0.160377358490566 0.122158970101436 COCCOCC(C(Oc [...] 71227 3 0.181818181818182 0.129684949182247 COCCOCC(C(Oc [...] 71767 7 0.174418604651163 0.122120643622887 C1=C(\C=C\Br [...] 71360 5 0.133333333333333 0.103979492391050 CC12OC(n3c(c [...] 71696 3 0.163636363636364 0.118017086925888 COCCOCC(C(Oc [...] 71917 3 0.147368421052632 0.102165139370256 COCCOCC(C(Oc [...] 71107 3 0.173076923076923 0.128406853662191 COCCOCC(C(Oc [...]
6. Destroy memory object with key 0:
perl fp2mem.pl -destroy 0 WARNING: Key 0 is already present! The next action will destroy all existing data! Continue (y/n): y Killed memory object with key 0 and all attached data.
7. Generate histogramme data for the occurence of distances of nearest neighbors between pubchem-test-fcfp6.txt and zinc-test-fcfp6.txt, rounded to two decimal places:
perl parasim.pl -q data/pubchem-test-fcfp6.txt -r data/zinc-test-fcfp6.txt | awk '{printf("%.2f\n",$3)}' | sort -n | uniq -c 1 0.00 1 0.13 2 0.15 2 0.16 2 0.17 1 0.18 1 0.19 1 0.20