Menu

5. Application Examples

cherhaus

1. Load file pubchem-test-featmorgan3.txt into persistent memory object with key 0:

perl fp2mem.pl -create 0 -file data/pubchem-test-featmorgan3.txt

Reading Reference fingerprints from base64...

  File data/pubchem-test-featmorgan3.txt:
    Number of columns: 3, has bitcounts
    ID: CID
    ADDITIONAL DATA: -no-
    FP type: FEATMORGAN_3
    FP length: 1024
    Fingerprints read from file: 10

Reference: 10 fingerprints read in total

Required memory size : 1'450 bytes
Maximum segment size : 18'446'744'073'709'551'615 bytes
Applied segment size : 65'536 bytes

Created key 0 from file data/pubchem-test-featmorgan3.txt:

KEY             : 0
RECORDS         : 10
FILE            : <your_parasim_path>/data/pubchem-test-featmorgan3.txt
ID FIELD        : CID
FP TYPE         : FEATMORGAN_3
FP LENGTH       : 1024
ADDITIONAL DATA : -no-
DATE            : Fri Aug 30 15:02:33 2013
CREATOR         : m02853
PERMISSIONS     : 660
SEGMENT COUNT   : 1
SEGMENT SIZE    : 65'536
BYTES NET       : 1'450
BYTES GROSS     : 65'536


2. For file pubchem-test-featmorgan3.txt, find the two nearest neighbours in itself (versus the memory object 0), applying the Dice similarity coefficient:

perl parasim.pl -n 2 -c dice -q data/pubchem-test-featmorgan3.txt -r mem:0

QUERY   REFERENCE       DICE
71923   68664   0.285714285714286
71923   71923   1.000000000000000
68664   68664   1.000000000000000
68664   71542   0.348623853211009
68938   71917   0.387096774193548
68938   68938   1.000000000000000
71360   71360   1.000000000000000
71360   68938   0.347107438016529
71696   71696   1.000000000000000
71696   71917   0.380000000000000
71917   71917   1.000000000000000
71917   68938   0.387096774193548
71542   68664   0.348623853211009
71542   71542   1.000000000000000
71107   71107   1.000000000000000
71107   68938   0.321428571428571
71227   71227   1.000000000000000
71227   71360   0.291970802919708
71767   71767   1.000000000000000
71767   71360   0.333333333333333


3. Same query, but from the SDF file directly and including only dice similarities between 0.35 and 0.999:

perl simsearch.pl -n 2 -c dice -min 0.35 -max 0.99 -q data/pubchem-test.sdf -id CID -fp featmorgan_3 -r mem:0

QUERY   REFERENCE       DICE
71696   68938   0.365217391304348
71696   71917   0.380000000000000
71917   68938   0.387096774193548
71917   71696   0.380000000000000
68938   71696   0.365217391304348
68938   71917   0.387096774193548


4. Search a Smiles string directly against pubchem-test-featmorgan3.txt which was stored in memory:

perl simsearch.pl -fp featmorgan_3 -r mem:0 -q 'o1c2c\(cccc2\)cc1C\(=O\)N3CCNCC3'

QUERY   REFERENCE       TANIMOTO        AVG_TANIMOTO
1       68664   0.542372881355932       0.174922486279414

In this case, the structure ID was generated during runtime.

5. Search pubchem-test-fcfp6.txt versus zinc-test-fcfp6-smiles.txt with direct SMILES output:

perl parasim.pl -q data/pubchem-test-fcfp6.txt -r data/zinc-test-fcfp6-smiles.txt

QUERY   REFERENCE       TANIMOTO        AVG_TANIMOTO    SMILES
71923   2       0.154761904761905       0.117569042869504       CC1c2c(CCC1C [...]
71542   10      0.185185185185185       0.107759423159295       CN(C(Cc1cccc [...]
68664   10      0.198019801980198       0.104496307506587       CN(C(Cc1cccc [...]
68938   3       0.160377358490566       0.122158970101436       COCCOCC(C(Oc [...]
71227   3       0.181818181818182       0.129684949182247       COCCOCC(C(Oc [...]
71767   7       0.174418604651163       0.122120643622887       C1=C(\C=C\Br [...]
71360   5       0.133333333333333       0.103979492391050       CC12OC(n3c(c [...]
71696   3       0.163636363636364       0.118017086925888       COCCOCC(C(Oc [...]
71917   3       0.147368421052632       0.102165139370256       COCCOCC(C(Oc [...]
71107   3       0.173076923076923       0.128406853662191       COCCOCC(C(Oc [...]


6. Destroy memory object with key 0:

perl fp2mem.pl -destroy 0

WARNING: Key 0 is already present! The next action will destroy all existing data! Continue (y/n): y

Killed memory object with key 0 and all attached data.


7. Generate histogramme data for the occurence of distances of nearest neighbors between pubchem-test-fcfp6.txt and zinc-test-fcfp6.txt, rounded to two decimal places:

perl parasim.pl -q data/pubchem-test-fcfp6.txt -r data/zinc-test-fcfp6.txt | awk '{printf("%.2f\n",$3)}' | sort -n | uniq -c
      1 0.00
      1 0.13
      2 0.15
      2 0.16
      2 0.17
      1 0.18
      1 0.19
      1 0.20



Related

Wiki: Documentation

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.