the inchi reader was written in such a way that it
1) needed a further line after the inchi=, which was not read, but needed to avoid npes
2) It could only process one branch on a level
3) it required the inchi line to start with INChI, newer versions require InChI=
All this has been fixed
The code for this is in the ebi cdk git repository.
commit 90701addfbb635af96ddb7f179a90480dfad639c and e4a49566fea3eb905285e672f94f3e4c0cd9d766
Patch looks good, applied to CDK master and pushed
Rajarshi,
these are now c7c92dfcf290c15e02021561bb59b36fce76d791 and cb5486c6fa726dbb51c85995eaccbfd033912d4a in CDK git?
The INChI= prefix is used for a different algorithm than InChI=... however, I do not see any changes in the parsing code of the string content... Neither do I see any update of the JavaDoc explaining that now two flavors of the InChI are supported...
Can this be clarified?
Yes, missed the Javadocs. Regarding parsing code, I thought that InChiContentProcessingTool handled that.
Exactly. That parsing code only supports INChI 1.12beta ... that is still a IUPAC/NIST Chemical Identifier... not a IUPAC International Chemical Identifier, hence the capital N in the prefix...
I was always told that since that beta, the actual algorithm that creates the identifier changed. That parser is not written for the final InChI, and I am not convinced that while the algorithm changed, the output did not.
The patch now seems to accept newer InChIs to be parsed with a parser written for an old INChI flavor...
Is that save?
Hmm, not sure about this. Stefan, can you comment on this?
My understanding was that the principle of that connection layer was not changed. But to be absolutly sure we would need to check this for each version (the reader than should check for version, and not just for spelling of "inchi", which might also be identical for several versions). In any case the problem with the line break and the branches was also true for the 1.12beta, making it impossible to read anything relevant. This should go in anyway.