I am trying to convert a listing of SMILES codes generated by OSCAR 3 into Molecules and than into Fingerprints.
Surprisingly the CDK seems to have problems parsing atom symbols with two characters like Se, Ce, Eu etc. This happens both for organic and inorganic structures. You can find several examples (amongst other molecules with parsing errors) in the attached file.
I am using the SMILES parser like this:
final SmilesParser sparser = new SmilesParser(DefaultChemObjectBuilder.getInstance()); final Fingerprinter fingerprinter = new Fingerprinter(); final IMolecule mol = sparser.parseSmiles(smiles); final BitSet fingerprint = fingerprinter.getFingerprint(mol);
As I assume that I am using the API in the right way, my question is, is it a bug or is it stupidity of me and if so where is the usage error?