Loading a pubchem XML document with an XML extension will give a segfault.
I am using "Open Babel 2.3.2" from this debian package.
Steps to reproduce:
1. run obprop CID_2519.xml
. It segfaults with a null pointer dereference.
The attached file CID_2519.xml
comes from the pubchem page for caffeine by downloading the "2D XML: Save" option. If you rename the file to ".pc" as openbabel expects, then the file will be parsed correctly and there will be no segfault.
This issue is moderately inconvenient, because it means that avogadro cannot directly open the .xml files downloaded from pubchem.
Stack Trace:
# 0 OpenBabel::XMLFormat::ReadMolecule(OpenBabel::OBBase*, OpenBabel::OBConversion*) at 0x7fffd09bbf3a in /usr/lib/openbabel/2.3.2/xmlformat.so # 1 OpenBabel::OBConversion::Read(OpenBabel::OBBase*, std::istream*) at 0x7ffff7727f6a in /usr/lib/libopenbabel.so.4.0.2 # 2 None at 0x7ffff5cc3303 in /usr/lib/libavogadro.so.1.0.3 # 3 None at 0x7ffff61f42ef in /usr/lib/x86_64-linux-gnu/libQtCore.so.4.8.6 # 4 start_thread at 0x7ffff48ebe0e in /lib/x86_64-linux-gnu/libpthread-2.17.so # 5 clone at 0x7ffff7b180fd in /lib/x86_64-linux-gnu/libc-2.17.so (BL)
I can reproduce this behavior on CentOS 6.7 using OpenBabel 2.3.2 and a build of the latest (2.4.1) development code.
This trivial patch (against OpenBabel 2.3.2 source) will prevent the segmentation fault, and produce what was probably the intended error message all along. It ought to be applicable to later versions and the Debian Jessie version, although the specific line number may differ.
Prior to processing the .xml documents retrieved from PubChem (provided they all exist in a dedicated folder), executing this step,
should make everything work fine. It would, of course, be of greater convenience were the .xml document opened (from which any number of heuristics could detect it is of the PubChem schema) and the format determined automatically. There appears an attempt to do something like this in the code, based on Namespace URI, although at the point obprop calls ReadMolecule(), the OutFormat is always NULL.
Last edit: Derek Harmon 2017-01-12