This branch contains various fixes for regressions (though some arguably) of the new SMILES stack:
https://github.com/egonw/cdk/compare/476-m-fixes
With these patches on top of master I have:
total: run=20278 failed=41 error=4 skipped=65 time=568.2530000000002 secs
Looks good but the the new SMILES gives bake kekule molecules... need to set setPreserveAromaticity(true) somewhere. Otherwise it's not testing anything :-).
Perhaps it's counter intuitive to give back a kekule structure by default but we can discuss Friday on Skype. Might be one for a vote on the mailing list.
Added some code yesterday to beam which will verify if a structure could be assigned without actually assigning bonds. The times on for ChEMBL: 3 seconds vs 10 seconds. I still think though giving back a structure without aromatic bonds if more friendly to the rest of the CDK.
Well, your patch seems to define preserveAromaticity as the opposite as doing kekulisation...
But you can do both... you can keep the aromaticity labels from the input and do the kekulization at the same time...
In fact, I think that that is what most SMILES users would expect. In fact, I had feedback in Mainz from someone who said that the rings in benzene rings is what all tools must do; I personally, like to be able to switch between them... and by reading them both, or by doing aromaticity perception afterwards, we get both...
So, my point is, that the SMILES parser should not be either/or...
Less tired now... just left some comments but all looks good. Will check it compiles/tests and push.
Might be good to give delocalised molecules to the ATASC though but that can be a separate commit.
Urgh slow connection and I can't pull in changes from master...
Yep but it's overkill, with a fast parser it's a shame to add these overheads. From some 'competitive' software - obviously doesn't mean we have to go the same way.
Daylight
String -> Parser -> Kekulise -> Aromatize -> Give Molecule
OpenEye
String -> Parser -> Verify
ChemAxon
String -> Parser -> Verify (optional)
Not sure I understand here.. let's discuss tomorrow.
On 3 Oct 2013, at 10:10, Egon Willighagen egonw@users.sf.net wrote:
Related
Patches:
#687Applied and pushed - interestingly I don't get the timeout on smiles9. It will take a long time though because it has an aromatic part and so the whole molecule is checked for aromatic parts. Looking at the error it actually times out on the SpanningTree. We should test just the parsing of this molecule in reasonable time which is defiantly possible (MarvinSketch timeouts).