1. Load file pubchem-test-featmorgan3.txt into persistent memory object with key 0:
perl fp2mem.pl -create 0 -file data/pubchem-test-featmorgan3.txt
Reading Reference fingerprints from base64...
File data/pubchem-test-featmorgan3.txt:
Number of columns: 3, has bitcounts
ID: CID
ADDITIONAL DATA: -no-
FP type: FEATMORGAN_3
FP length: 1024
Fingerprints read from file: 10
Reference: 10 fingerprints read in total
Required memory size : 1'450 bytes
Maximum segment size : 18'446'744'073'709'551'615 bytes
Applied segment size : 65'536 bytes
Created key 0 from file data/pubchem-test-featmorgan3.txt:
KEY : 0
RECORDS : 10
FILE : <your_parasim_path>/data/pubchem-test-featmorgan3.txt
ID FIELD : CID
FP TYPE : FEATMORGAN_3
FP LENGTH : 1024
ADDITIONAL DATA : -no-
DATE : Fri Aug 30 15:02:33 2013
CREATOR : m02853
PERMISSIONS : 660
SEGMENT COUNT : 1
SEGMENT SIZE : 65'536
BYTES NET : 1'450
BYTES GROSS : 65'536
2. For file pubchem-test-featmorgan3.txt, find the two nearest neighbours in itself (versus the memory object 0), applying the Dice similarity coefficient:
perl parasim.pl -n 2 -c dice -q data/pubchem-test-featmorgan3.txt -r mem:0
QUERY REFERENCE DICE
71923 68664 0.285714285714286
71923 71923 1.000000000000000
68664 68664 1.000000000000000
68664 71542 0.348623853211009
68938 71917 0.387096774193548
68938 68938 1.000000000000000
71360 71360 1.000000000000000
71360 68938 0.347107438016529
71696 71696 1.000000000000000
71696 71917 0.380000000000000
71917 71917 1.000000000000000
71917 68938 0.387096774193548
71542 68664 0.348623853211009
71542 71542 1.000000000000000
71107 71107 1.000000000000000
71107 68938 0.321428571428571
71227 71227 1.000000000000000
71227 71360 0.291970802919708
71767 71767 1.000000000000000
71767 71360 0.333333333333333
3. Same query, but from the SDF file directly and including only dice similarities between 0.35 and 0.999:
perl simsearch.pl -n 2 -c dice -min 0.35 -max 0.99 -q data/pubchem-test.sdf -id CID -fp featmorgan_3 -r mem:0
QUERY REFERENCE DICE
71696 68938 0.365217391304348
71696 71917 0.380000000000000
71917 68938 0.387096774193548
71917 71696 0.380000000000000
68938 71696 0.365217391304348
68938 71917 0.387096774193548
4. Search a Smiles string directly against pubchem-test-featmorgan3.txt which was stored in memory:
perl simsearch.pl -fp featmorgan_3 -r mem:0 -q 'o1c2c\(cccc2\)cc1C\(=O\)N3CCNCC3'
QUERY REFERENCE TANIMOTO AVG_TANIMOTO
1 68664 0.542372881355932 0.174922486279414
In this case, the structure ID was generated during runtime.
5. Search pubchem-test-fcfp6.txt versus zinc-test-fcfp6-smiles.txt with direct SMILES output:
perl parasim.pl -q data/pubchem-test-fcfp6.txt -r data/zinc-test-fcfp6-smiles.txt
QUERY REFERENCE TANIMOTO AVG_TANIMOTO SMILES
71923 2 0.154761904761905 0.117569042869504 CC1c2c(CCC1C [...]
71542 10 0.185185185185185 0.107759423159295 CN(C(Cc1cccc [...]
68664 10 0.198019801980198 0.104496307506587 CN(C(Cc1cccc [...]
68938 3 0.160377358490566 0.122158970101436 COCCOCC(C(Oc [...]
71227 3 0.181818181818182 0.129684949182247 COCCOCC(C(Oc [...]
71767 7 0.174418604651163 0.122120643622887 C1=C(\C=C\Br [...]
71360 5 0.133333333333333 0.103979492391050 CC12OC(n3c(c [...]
71696 3 0.163636363636364 0.118017086925888 COCCOCC(C(Oc [...]
71917 3 0.147368421052632 0.102165139370256 COCCOCC(C(Oc [...]
71107 3 0.173076923076923 0.128406853662191 COCCOCC(C(Oc [...]
6. Destroy memory object with key 0:
perl fp2mem.pl -destroy 0
WARNING: Key 0 is already present! The next action will destroy all existing data! Continue (y/n): y
Killed memory object with key 0 and all attached data.
7. Generate histogramme data for the occurence of distances of nearest neighbors between pubchem-test-fcfp6.txt and zinc-test-fcfp6.txt, rounded to two decimal places:
perl parasim.pl -q data/pubchem-test-fcfp6.txt -r data/zinc-test-fcfp6.txt | awk '{printf("%.2f\n",$3)}' | sort -n | uniq -c
1 0.00
1 0.13
2 0.15
2 0.16
2 0.17
1 0.18
1 0.19
1 0.20