From: Andrew D. <da...@da...> - 2013-09-10 10:26:56
|
On Sep 10, 2013, at 5:05 AM, Richard Apodaca wrote: > What are the chances that a grammar could be developed that constrains > rings closures, like parentheses, to be paired? Such a grammar is theoretically possible, because there are only 100 ring closures. However, the grammar would be huge, because it would need to keep track of the parity of each ring closure. For example, with only 3 ring closures, the state diagram looks like: start: <seen an even number of ring closures> if ring closure '1', go to state 1 if ring closure '2', go to state 2 if ring closure '3', go to state 4 1: <odd number of '1's, even number of '2's and 3's> if ring closure '1', go to state 0 if ring closure '2', go to state 3 if ring closure '3', go to state 5 2: <odd number of '2's, even number of '1's and '3's> if ring closure '1', go to state 3 if ring closure '2', go to state 0 if ring closure '3', go to state 6 3: <odd number of '1's and '2', even number of '3's> .... and the corresponding grammar, much simplified, would be: pattern := ('1' + state1_pattern) | ('2' + state2_pattern) | ('3' + state4_pattern); state1_pattern: = ('1' + pattern) | ('2' + state3_pattern) | ('3' + state5_pattern); state2_pattern := ('1' + state3_pattern) | ('2' + pattern) | ('3' + state6_pattern); state3_pattern := .... In total the grammar would have about 2**100 times more states than it has now. In real-world use it's much simpler to keep track of 100 boolean flags with a post-processing check to make sure they are all 0. To make matters worse, C=2.C#2 is also not allowed in valid SMILES, but is allowed by the grammar. Again, a theoretical grammar could prohibit that, but would require some 5**100 times more states. As a historical note, one of the differences between SMILES and OpenSMILES is that [C+H] is allowed under the SMILES grammar but not OpenSMILES. That is, SMILES has something like: bracket_atom ::= '[' isotope? symbol (chiral | hcount | charge | class)* ']' while OpenSMILES requires the more restrictive ordering bracket_atom ::= '[' isotope? symbol chiral? hcount? charge? class? ']' I proposed this restriction because we could find no real-world SMILES strings which had a different ordering, and allowing arbitrary ordering leads to diverse interpretations, like Open Babel, which interpreted [C+5-4] as [C+1]. It is possible to define "0 or 1 but in arbitrary ordering" at the grammar level, but the result is the ugly grammar with some 16 states. It starts: bracket_atom ::= '[' isotope? symbol after_symbol; after_symbol := chiral after_chiral | hcount after_hcount | charge after_charge | class after_class | ']'; after_chiral := hcount after_chiral_hcount | charge after_chiral_charge | class after_chiral_class | ']'; after_hcount := chiral after_chiral_hcount | charge after_hcount_charge | class after_hcount_class | ']'; after_charge := chiral after_chiral_charge | hcount after_hcount_charge | class after_hcount_class | ']'; .... and so on ... I don't think this is worth the effort, and I'm glad that for this case we could restrict the grammar, in order to enforce things at the syntax level. But with ring closures we cannot, so we leave it to a post-processing stage. Cheers, Andrew da...@da... |