Menu

#69 SmartsHelper toSmarts method incomplete?

ambit-3.0.3
open
None
5
2016-06-24
2013-08-01
Duece99
No

Hi, I'm going to assume this is a bug rather than a missing feature, as I'm unsure what this class is truely used for.

I see a "toSmarts" method in the SmartsHelper class, which I wish to make use of as I want to obtain the SMARTS from a QueryAtomContainer object. I notice however that it doesn't report SMARTS bond information and has limited atom support, as shown by these methods:

static public String atomToString(IAtom a)
    {
        if (a instanceof SmartsAtomExpression)
            return(a.toString());       
        if (a instanceof AliphaticSymbolQueryAtom)
            return(a.getSymbol());
        if (a instanceof AromaticSymbolQueryAtom)
            return("Ar-"+a.getSymbol());

        return(a.getSymbol());      
    }

static public String bondToString(IBond b)
    {
        //TODO - to improve it ???

        if (b instanceof SmartsBondExpression)
            return(b.toString());       
        if (b instanceof SingleOrAromaticBond)
            return("");

        if (b.getOrder() == IBond.Order.SINGLE)
            return("-");
        if (b.getOrder() == IBond.Order.DOUBLE)
            return("=");
        if (b.getOrder() == IBond.Order.TRIPLE)
            return("#");

        return("-");
    }

AFAIK the top method is actually alright, though the bottom one is missing functionality.

I have also noticed that it generates a bond for ALL atoms, even when you don't want one (such as when you want support for aromatic or single bonds, being :,- in SMARTS). As an example, this SMARTS query

C1C~C[#7][#7](CO)C1

becomes

C1-C-C-[#7]-[#7](-C-O)-C-1

As I said I don't know what this class is used for but I assume there's a reason why this method is here.

Ed.

Related

Bugs: #7

Discussion

  • Nina Jeliazkova

    Nina Jeliazkova - 2013-08-01
    • assigned_to: ntk
     
  • Nikolay Kochev

    Nikolay Kochev - 2013-08-02

    Hi Ed,
    Thank you for your interest in Ambit-SMARTS software. We would be glad if it is useful for your research and work.

    The code you addressed in this ticket is rather old actually and it is incomplete indeed (not buggy :-) ). This code was used in the early stages of the development of SMARTS parser in order to see what happens within the internal representation via java objects. The goal was not to output complete SMARTS (but some indicative info). The latter actually is a good idea and we will implement it in the future. Currently the main means for inputting information into a QueryAtomContainer is via SMARTS linear notation, so it was not crucial for us to output backwards the same information from QueryAtomContainer to a SMARTS string. The latter option would be very useful if there is another tool for inputting QueryAtomContainer objects (e.g. Graphical editor or another protocol).

    You mentioned that you wish to use this method (toSmarts()). What is your purpose?

    With best regards
    Nick

     
  • Nina Jeliazkova

    Nina Jeliazkova - 2013-08-02

    The toSmarts() code is now updated (thanks to Nick) and generates SMARTS from QueryAtomContainer (since r.4931)
    The 2.4.13-SNAPSHOT http://ambit.uni-plovdiv.bg:8083/nexus/index.html#nexus-search;gav~~ambit2-smarts~2.4.13-SNAPSHOT~~ is also up to date.

    May be now it's time to open a feature request for a nice JChemPaint extension to draw SMARTS queries :)

     
  • Duece99

    Duece99 - 2013-08-02

    Nina and Nick,

    Ah ok, it was intended for testing purposes. Currently ChemAxon's Marvin tools (notably Marvin Exporter) is the only API I know that supports producing SMARTS. CDK currently has no suck feature (though this class is the nearest thing I know of to it).

    I'm producing SMARTS as I'm producing hyperstructures - the resulting supergraph of overlapping several molecules, thus compressing them into one molecule. I want support for degenerate bond matching, hence SMARTS (though I recently found out that MDLV2000 also supports degenerate bonds to an extent).

    Thanks for the update, though it'd be great if again the source code is provided for the updated class :-) . I haven't seen the methods fixed (or whatever) on the SVN repository.

    Thanks,
    Ed.

     
  • Duece99

    Duece99 - 2013-08-02

    Hi,

    Thanks for the update. The bonds seem to render properly, but rings do not, as shown with this example:

    N~C19(N23%%C-1~CC%(C(C2)(~O-3)-O)-O%-C6%%C%C4%%(C%%-%-C7%(=O)~C5C8~C-4~C%N5%C-%(C6~%(~O)-N~%-%C-7~%(-N)-C-8~9)(-N:C(C-%~%)~CC~%)(~O-S-%-%(-O)-O)-C(C~%)~C-%~C~%)(-F)(-F)-F)~N N~C19(N23%%C-1~CC%(C(C2)(~O-3)-O)-O%-C6%%C%C4%%(C%%-%-C7%(=O)~C5C8~C-4~C%N5%C-%(C6~%(~O)-N~%-%C-7~%(-N)-C-8~9)(-N:C(C-%~%)~CC~%)(~O-S-%-%(-O)-O)-C(C~%)~C-%~C~%)(-F)(-F)-F)~N

    Ed.

     
  • Nikolay Kochev

    Nikolay Kochev - 2013-08-02

    Hi Ed,
    Please, clarify how did you get this SMARTS? It is not correct SMARTS obviously. Is it an output result from Ambit-SMARTS code or it is used as an input SMARTS?

    Nick

     
  • Nina Jeliazkova

    Nina Jeliazkova - 2013-08-02

    Let me guess - he's creating the QueryAtomContainer with some other code (extracting those hyperstructures) - great idea btw - and then perhaps the QueryAtomContainer could contain something that is not expected by the SMARTS writing code, or will not normally be filled in the same way by the parser. Am I close?

     
  • Nikolay Kochev

    Nikolay Kochev - 2013-08-02

    I did a quick test and, yes it appears that there is a problem when there are more than 9 rings in the molecule. All will see for this bug.

    Otherwise if the molecules are more simple (less than 10 rings) the code should work.

    Nick

     
  • Duece99

    Duece99 - 2013-08-02

    Nina and Nick,

    Yes you're correct - I'm creating the said QueryAtomContainers using a combination of ChemAxon methods and "in-house" methods, though ultimately I'm using the AMBIT methodology to parse the things as SMARTS.

    Unhelpfully I'm also using some of the CDK SMARTS atom and bond types (which really I probably shouldn't use now that the parser and helper classes here are more usable). I've been using them until recently as placeholders (such as AromaticOrSingleQueryBond in CDK, which isn't supported in the AMBIT parser). Your parser/helper however works a lot quicker than the CDK ones (I think it has something to do with ring perception as my hyperstructures often have several rings as you can see).

    Ed.

     

    Last edit: Duece99 2013-08-02
  • Nikolay Kochev

    Nikolay Kochev - 2013-08-02

    You are right, the Ambit-SMARTS parser and substructure searching algorithm are quite efficient. We publish it in the reference:

    N. Jeliazkova, N. Kochev, AMBIT-SMARTS: Efficient Searching of Chemical Structures and Fragments, Mol. Inf., 30: 707–720, 2011

    If you are creating the QueryAtomContainer objects and then you are converting them to SMARTS strings with ambit2 code, please use only the Atom and Bond classes which are handled within functions:
    atomToString() and bondToString()

    Those are enough to write any thing within SMARTS standard. Some of these classes are from CDK and some are new ones developed in Ambit-SMARTS.

    If there are some other classes from the CDK package (probably there may be few) - they may be written incorrectly in to the SMARTS string as they will trigger the "default exit" return atom/bond info.

    Even though that the latter could make some problems if used, the last reported problem (as far as I did a separate test) is not due to them. I am looking for it.

    Nick

     
  • Nikolay Kochev

    Nikolay Kochev - 2013-08-02

    Hi Ed,
    I have good news. The problem is solved. Now the SMARTS writer should work properly with structures that contain more than 9 rings. As it appears you are working with quite large molecules (probably bio-molecules) and indeed these are good tests for the code :-)

    Please find the latest version of class SmartsHelper in SVN repository as revision 4935.

    Here is a link as well to the package:
    https://sourceforge.net/p/ambit/code/4935/tree/trunk/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/

    and directly to the file:
    https://sourceforge.net/p/ambit/code/4935/tree/trunk/ambit2-all/ambit2-smarts/src/main/java/ambit2/smarts/SmartsHelper.java

    Additionally, if some one is curious, the bug was caused because:
    in java, operator " ? : " has lower priority than "+" operator and this caused the bug in following line:

    It was:
    String ind = (curIndex>9)?"%":"" + curIndex;

    But now it is with additional brackets:

    String ind = ((curIndex>9)?"%":"") + curIndex;
    

    When I programmed that code (3 years ago) I assumed that " ? : " has higher priority and additional brackets were not needed which caused ring indexes larger that 9 to be omitted and only '%' to be shown.

    Tanks Ed again, your tests were useful to track this bug, because it would have not be found if the test queries did not contain at least 10 rings.

    Best regards
    Nick

     
  • Duece99

    Duece99 - 2013-08-06

    No worries, once again I'll let you know if I see any future complications. I'm testing for contradictions between this and the ChemAxon SMARTS generator at various points so if something crops up I'll report it.

    I wanted the source code because I've been using some of the CDK classes and thus needed to program these into it. Though TBH I'll probably just excise those out for clarity.

    Also, why are you using the QueryAtomContainer class, instead of the IQueryAtomContainer interface? Doesn't cause me any problems, but if anyone makes a parallel implementation through that interface, then it might.

    Ed.

     
    • Nina Jeliazkova

      Nina Jeliazkova - 2013-08-06

      Thanks, comparison with other SMARTS implementations will be quite useful!

      About IQueryAtomContainer interface - nicely spotted. The answer is simply historical - Ambit SMARTS implementation started quite a while ago, when the CDK has not switched entirely to use interfaces ... there was a large refactoring at the time to replace Molecule and AtomContainer classes in Ambit with corresponding interfaces, but obviously there are leftovers.

       

      Last edit: Nina Jeliazkova 2013-08-06
  • Nina Jeliazkova

    Nina Jeliazkova - 2013-08-07

    SMARTS package (and dependent classes) refactored to use IQueryAtomContainer since r4945
    https://sourceforge.net/p/ambit/code/4944

     
  • Nina Jeliazkova

    Nina Jeliazkova - 2015-11-23
    • Group: ambit2-www-2.7.5 --> ambit-3.0.1
     

Log in to post a comment.