This problem arose with OpenBabel 2.2.3 on Ubuntu 10.04. When presented with a chiral SMILES whose chiral center is bonded to more than four neighbors, src/formats/smilesformat.cpp may overrun memory. A patch is attached.
I'm not a chemist, so please pardon my lack of clarity :)
I encountered the problem while using babel to convert a proprietary SMILES in which a chiral center was bonded to more than four other atoms. Here is a non-proprietary sample SMILES which helps demonstrate the problem:
[N@@]1234([C@@]9([C@H]1[C@H]3[C@@H]4[C@@H]29))
If this is saved to a file named problem.smi, and if babel is run under valgrind to convert the smiles to another format, valgrind will detect the memory overrun:
$ valgrind babel -i smi problem.smi -o sdf problem.sdf
(See below for full valgrind output.)
The attached patch for src/formats/smilesformat.cpp addresses the problem by resizing (ChiralSearch->second)->refs as needed.
{{{
$ valgrind babel -i smi problem.smi -o sdf problem.sdf
==4287== Memcheck, a memory error detector
==4287== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==4287== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==4287== Command: babel -i smi problem.smi -o sdf problem.sdf
==4287==
==4287== Invalid write of size 4
==4287== at 0xC061A37: OpenBabel::OBSmilesParser::ParseComplex(OpenBabel::OBMol&) (smilesformat.cpp:2398)
==4287== by 0xC069DFA: OpenBabel::OBSmilesParser::ParseSmiles(OpenBabel::OBMol&) (smilesformat.cpp:988)
==4287== by 0xC06A3AB: OpenBabel::OBSmilesParser::SmiToMol(OpenBabel::OBMol&, std::string const&) (smilesformat.cpp:938)
==4287== by 0xC06AD27: OpenBabel::SMIBaseFormat::ReadMolecule(OpenBabel::OBBase, OpenBabel::OBConversion) (smilesformat.cpp:890)
==4287== by 0x4F0C864: OpenBabel::OBMoleculeFormat::ReadChemObjectImpl(OpenBabel::OBConversion, OpenBabel::OBFormat) (obmolecformat.cpp:93)
==4287== by 0x4EF5E67: OpenBabel::OBConversion::Convert() (obconversion.cpp:481)
==4287== by 0x4EF650E: OpenBabel::OBConversion::Convert(std::istream, std::ostream) (obconversion.cpp:420)
==4287== by 0x4EFBDB8: OpenBabel::OBConversion::FullConvert(std::vector<std::string, std::allocator<std::string=""> >&, std::string&, std::vector<std::string, std::allocator<std::string=""> >&) (obconversion.cpp:1321)
==4287== by 0x4040DE: main (babel.cpp:340)
==4287== Address 0x6250060 is 0 bytes after a block of size 16 alloc'd
==4287== at 0x4C28CC1: operator new(unsigned long) (vg_replace_malloc.c:261)
==4287== by 0x4E89E36: std::vector<unsigned int,="" std::allocator<unsigned="" int=""> >::operator=(std::vector<unsigned int,="" std::allocator<unsigned="" int=""> > const&) (new_allocator.h:89)
==4287== by 0xC06047E: OpenBabel::OBSmilesParser::ParseComplex(OpenBabel::OBMol&) (smilesformat.cpp:2276)
==4287== by 0xC069DFA: OpenBabel::OBSmilesParser::ParseSmiles(OpenBabel::OBMol&) (smilesformat.cpp:988)
==4287== by 0xC06A3AB: OpenBabel::OBSmilesParser::SmiToMol(OpenBabel::OBMol&, std::string const&) (smilesformat.cpp:938)
==4287== by 0xC06AD27: OpenBabel::SMIBaseFormat::ReadMolecule(OpenBabel::OBBase, OpenBabel::OBConversion) (smilesformat.cpp:890)
==4287== by 0x4F0C864: OpenBabel::OBMoleculeFormat::ReadChemObjectImpl(OpenBabel::OBConversion, OpenBabel::OBFormat) (obmolecformat.cpp:93)
==4287== by 0x4EF5E67: OpenBabel::OBConversion::Convert() (obconversion.cpp:481)
==4287== by 0x4EF650E: OpenBabel::OBConversion::Convert(std::istream, std::ostream) (obconversion.cpp:420)
==4287== by 0x4EFBDB8: OpenBabel::OBConversion::FullConvert(std::vector<std::string, std::allocator<std::string=""> >&, std::string&, std::vector<std::string, std::allocator<std::string=""> >&) (obconversion.cpp:1321)
==4287== by 0x4040DE: main (babel.cpp:340)
==4287==
}}}
Patch for src/formats/smilesformat.cpp
Thanks for reporting the problem, but the patch causes Refs to be greater than 4. OpenBabel currently does not support chirality involving more than 4 stereo atoms (nor indeed does the SMILES spec - this is one of the limitations). We can probably only handle this by ignoring the stereochemistry at that atom. (Is this structure real?)
Thanks for the quick response.
I failed to mention that we discovered the memory write error after OBConversion segfaulted on the proprietary SMILES. So ignoring stereochemistry would definitely be helpful. Would this also be accompanied by a warning or error message?
I'm told the proprietary structure is real. The sample SMILES is definitely not :)
In case it's useful -- admittedly unlikely, since I can't provide the proprietary structure: the original structure was delivered in an OpenBabel-generated SD file. The SMILES which caused the memory error was generated from the SD representation, using babel.
I just checked the current development code. It doesn't segfault - instead you get:
>>> conv.ReadString(mol, "[N@@]1234([C@@]9([C@H]1[C@H]3[C@@H]4[C@@H]29))")
*** Open Babel Error in OpenBabel::OBSmilesParser::InsertTetrahedralRef
Error: Overwriting previously set reference id.
Since it doesn't segfault, I'm marking this bug as closed.