Hi Nick,

So it is hard coded in Beam - we can change it there but I don’t think it’s needed. Firstly the new parser will accept ‘$’ for quadruple bond as per OpenSMILES specification. 

The most portal way of encoding extra semantics with SMILES is use an auxiliary suffix. This is basically what InChI does with AuxInfo and ChemAxon do with extended CXSMILES  It’s a good way of layering information on top the minimum structural skeleton and keeping the parsing fast. The SMILES input order is guaranteed and known at output - you can therefore refer to atoms by index or explicitly use the atom classes / maps. Here’s an example of using atom maps to indicate bond is broken.

C[CH:1]=[CH:2]C 1,2=brk

Another example to encode an aromatic fragment but will parse correctly, ‘ccc' is not valid

o1cccc1 fragment=1,2,3

There is an example in the 1.5.4 release notes on how to add coordinates, 1.5.5 release notes have an example with atom classes / maps.


You can encode and extend your encoding without having to modify the parser.

Hope that helps,

On 4 Feb 2014, at 10:37, Nick Vandewiele <Nick.Vandewiele@UGent.be> wrote:

I am in the process of migrating some of my own code to CDK 1.5.4 from version 1.4.5.
In the past, I modified SMILESParser by allowing the parsing of my custom symbols (eg ‘$’) that were not part of the original syntax of SMILES. I used them to represent transitioning bond orders in reactions for example.
That was easy in v1.4.5 because you could just hard code these extra rules at the location where other bond order symbols such as ‘=’ and ‘#’ were parsed.
I see that in v1.5.4 the job is taken over by BEAM, and this hard coded part of symbol parsing is not part of CDK anymore.
Do you see how I can extend the SMILESParser with my own custom symbols?
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
Cdk-user mailing list