#1272 single spaces in SMILES shouldn't trigger syntax error

cdk-1.4.x
closed
nobody
None
5
2012-11-03
2012-10-31
No

it's the bazillionth time I cut&paste a multiline SMILES in a program to parse it: the newlines get converted to spaces and those trigger a syntax error. Then I go hunting the spaces in the overly long line. PLEASE remove this feature. My guess is that half of the people who gave up on chemistry software quit because of this silly inflexibility.

Discussion

  • Point taken...

    The Open SMILES specification writes, however:

    "A SMILES string is terminated by a whitespace terminator character (space, tab, newline, carriage-return), or by the end of the string."

    Not sure what to do here...

     
  • John May
    John May
    2012-10-31

    I do not think the trigger should be removed from the main SMILES parser it should however be accepted in JChemPaint (e.g. when you paste a SMILES string).

    If you get the SMILES from the clipboard just run it through a regex to remove the newlines/spaces.

    J

    On 31 Oct 2012, at 18:17, Egon Willighagen egonw@users.sf.net wrote:

    Point taken...

    The Open SMILES specification writes, however:

    "A SMILES string is terminated by a whitespace terminator character (space, tab, newline, carriage-return), or by the end of the string."

    Not sure what to do here...

    bugs:1272 single spaces in SMILES shouldn't trigger syntax error

    Status: open Created: Wed Oct 31, 2012 06:06 PM UTC by Ralf Stephan Last Updated: Wed Oct 31, 2012 06:11 PM UTC Owner: nobody

    it's the bazillionth time I cut&paste a multiline SMILES in a program to parse it: the newlines get converted to spaces and those trigger a syntax error. Then I go hunting the spaces in the overly long line. PLEASE remove this feature. My guess is that half of the people who gave up on chemistry software quit because of this silly inflexibility.

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/bugs/1272/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/


    Everyone hates slow websites. So do we.
    Make your web apps faster with AppDynamics
    Download AppDynamics Lite for free today:
    http://p.sf.net/sfu/appdyn_sfd2d_oct_______
    Cdk-bugs mailing list
    Cdk-bugs@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/cdk-bugs

     
  • John May
    John May
    2012-11-01

    Okay will do - just a note: compiling regex once in a field makes it's faster

    ~~~~~:::java
    private static final Pattern CLEAN_SMILES = Pattern.compile("\s+");

    then you do this...
    
    ~~~~~:::java
    CLEAN_SMILES.matcher("<-smiles to clean->").replaceAll("");
    

    also this handy regex tells you if something looks like SMILES (well it tells you if you only have SMILES characters).

    [^J][0-9BCOHNSOPrIFla@+\\-\\[\\]\\(\\)\\\\/%=#$]+
    

    you can negate this to remove invalid characters - this will now remove anything (including spaces) that is not valid in SMILES grammar.

    ~~~~~:::java
    private static final Pattern CLEAN_SMILES = Pattern.compile("[^0-9BCOHNSOPrIFla@+\-\[\]\(\)\\/%=#$]+");
    ~~~~~

    Hope it helps.

     
    Last edit: John May 2012-11-01
  • John May
    John May
    2012-11-01

    You might need to tweak that regex as It doesn't do aromatic carbons for example.

     
  • John May
    John May
    2012-11-01

    • status: open --> closed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks