#188 SMILESParser incorrectly converts c1ccccc1 to an aromatic

closed
8
2012-10-08
2004-04-01
No

SMILESParser incorrectly converts c1ccccc1 to an
aromatic. Lower case subset elements actually
represent sp2 hybridization, and not aromaticity,
though these often coincide. Moreover, lower case 'c'
is mostly used (incorrectly) to indicate
aromaticity. See "The full SMILES language allows
sp2-hybridized atoms to be indicated by writing the
atomic symbol in lower case." from
http://www.daylight.com/smiles/f_smiles.html.

Discussion

  • Egon Willighagen

    Logged In: YES
    user_id=25678

    To clarify/complicate this further :), note that
    aromaticity is always perceived by the SMILES parser, but
    only in a second step. Consider c1ccccc1: it 'knows' that
    a sp2 C can only have three neighbors, so two carbons and
    one hydrogen (which is added because 'c' is from the
    organic subset). Thus, there must be made another bond,
    which must be in the ring. Hence, benzene. Note that this
    SMILES is valid too: c1=cc=cc=c1, still benzene.

     
  • Nobody/Anonymous

    Logged In: NO

    I agree with this analysis - and that aromatic detection is
    separable from "hybridisation". However the SMILES use of
    "hybridisation" is unfortunate - as far as I can see it
    means something like "c or n atom has at least one formal
    double bond to it unless it is nitrogen in pyrrole or
    similar compounds"

     
  • Christoph Steinbeck

    Logged In: YES
    user_id=54358

    I aggree with you proposal, Egon.
    The solution to the problem is related to an old problem of
    ours ( see for example RFE #815253 "An utility to convert
    aromatic bonds into Kekule structure").
    So one would probably first create the chemical graph, then
    add missing H's, then insert the missing double bonds (note
    that this is not only about benzene - the ring system can be
    quite extended).
    Things like pyrole are a problem. Not sure how to treat
    five-membered rings with one or more N's consistently.

     
  • Egon Willighagen

    Logged In: YES
    user_id=25678

    SMILES parser is adapted, but there are now six JUnit failures, and two
    SMILES strings which no longer can be parsed. The latter is caused by
    the aromaticity detection and will be filed later as separate bugs when I
    know a bit more about the cause.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks