[OpenBabel-scripting] Createing a submolecule

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello!

I'm writing a script that takes a few connected atoms from one molecule 
and finds connected atoms (submolecule) in another molecule. The script 
first creates one-atom-submolecule. If this atom is a part of another 
molecule, than the script adds another atom to the submolecule. The 
added atom is connected to the first atom from the submolecule. Then the 
script tests if this bigger submolecule is part of another molecule. 
This continues until the submolecule is still a part of another molecule.

I have a problem when I create a "submolecule". The structure of this 
part of script is:

-> First, I take the atoms and the bonds that connect the atoms.
-> Than I create the openBabel molecule from these atoms and bonds. If 
it nessesery, I set the atom's and the bond's aromatic flags.
-> finally I use the findall() function from pybel to find the 
submolecule in another molecule

If I take a submolecule from a molecule and I try to find it in this 
molecule afterwards, the script doesn’t find it. This problem occurs, 
when there are aromatic cycles. I made three different modifications to 
the code which makes the submolecule from the COP(=S)(OC)SCn1nnc2ccccc2c1=O:

FIRST:
Before the adding atoms to the molecule and after it I added 
beginModify() and endModify() functions.
-> the atoms are written correctly (capital instead of small case 
letters) only if the aromatic cycle is closed:

0. N
1. NC
2. NCC
3. NCCC
4. NCC(C)C
5. NCC(CC)C
6. NCC(CC)CC
7. NCc1ccccc1
8. [N](Cc1ccccc1)[NH]
9. n1cc2c(cccc2)nn1

The submolecules from 0. to 8. are not written correctly but the ninth 
is. Also the findall() function finds the ninth submolecule in 
COP(=S)(OC)SCn1nnc2ccccc2c1=O.
So when I close the aromatic cycle I get correctly written carbon and 
nitrous atoms.

SECOND:
Without the use of the beginModify() and endModify functions.
-> that's not good, because the atoms are newer written correctly

1. N
2. NC
3. NCC
4. NCCC
5. NCC(C)C
6. NCC(CC)C
7. NCC(CC)CC
8. NCC1=CC=CC=C1
9. [N](CC1=CC=CC=C1)[NH]
10. N1=CC2=C(C=CC=C2)N=N1
11. [N]1(=CC2=C(C=CC=C2)N=N1)C

THIRD:
Every time when I add the atom in the submolecule I print out the 
submolecule. That's done by adding the
pyMolecule = py.Molecule(mol)
print pyMolecule.write()
lines at the end of the loop which creates the submolecule. 
BeginModify() an EndModify() are not used

-> the atoms are written correctly before the cycle is closed

1. N
2. nc
3. ncc
4. nccc
5. ncc(c)c
6. ncc(cc)c
7. ncc(cc)cc
8. ncc1ccccc1
9. [nH](cc1ccccc1)[n]
10. [nH]1cc2c(cccc2)[n][n]1
11. n1(cc2c(cccc2)[n][n]1)C
12. n1(cc2c(cccc2)[n][n]1)C[S]
13. n1(cc2c(cccc2)[n][n]1)C[S]P

Problem is that submolecules 1., 8. and 9. are not written correctly. 
The findall() function does not find them in 
COP(=S)(OC)SCn1nnc2ccccc2c1=O. The others are OK.

I'll would to have all the submolecules written correctly.

Thank you for your help,
Matic