Menu

#581 add handling of disconnected SMILES

Needs_Review
closed
nobody
SMILES (2)
cdk-1.4.x
1
2013-09-23
2012-11-15
No

This adds two functions to the SmilesParser API that let users poll disconnected molecules from the formula. The main API only returns a molecule which by def. is connected. Thus, those about ten tests with disconnected input were broken from the start and correctly fail with this code. Updates for tests will follow.

Took me some time to figure out my problem was that individual SmilesParserTests do not create a SmilesParser object (which is unexpected).

1 Attachments

Discussion

  • John May

    John May - 2012-11-15

    Egon will have to confirm - but I don't think it should use IMolecule. If I understand correctly IMolecule is going to be completely removed and was the point of master in the first place.

     
  • John May

    John May - 2012-11-15
    • branch: master --> cdk-1.4.x
     
    • John May

      John May - 2012-11-15

      did not apply to master, patch won't apply

       
  • Ralf Stephan

    Ralf Stephan - 2012-11-16

    sorry John for mislabeling. attached are further fixes and the fixed tests which should complete the patch now. Note one test more now fails due to an unrelated NPE (testnonBond).

     
  • John May

    John May - 2012-11-16

    Okay thanks. This will probably need two pairs of eyes but here's some minor style points I noticed.
    * rename variables in isolation, there are changes of molecule -> mol and make it harder to find the actual behaviour changes
    * 0001 this line is bad form: if (smiles.charAt(position-2) < '0' || smiles.charAt(position-2) > '9'). Characters are not numbers.
    * you can avoid prefix increment unless you actually need it - ++molSetIndex. ++i != i++
    * 0002 has a yoda condition: return molSet != null && molSet.getMoleculeCount() > molSetIndex+1
    * 0003 has several do,while loop without braces
    * I'm not sure if 'changing' the test assertion counts as fixing.

    Also, I'm still not convinced on the benefit of getting an AtomContainerSet. I've spoken to a couple of people and all of them agree that the 'dot' should not split a molecule apart. [Na+].[Cl-] should be read as a single container. I said in the other thread but making an API change as it helps a single use case is not wise. Especially when the use case can already be possible.

    If you look as Daylight SMILES tutorial

    Dot can be used to delineate mixtures. Each dot-disconnected SMILES is considered a "component" of the overall molecule or mixture.

    I would interpet that as it should be read as one.

     

    Last edit: John May 2012-11-16
  • Egon Willighagen

    John, Ralf, it seems to me that the discussion on the expected behavior of a SMILES parser is outside the scope of patch reviewing...

    Can you please discuss on the cdk-user mailing list how the parser should handle this? So that other users of the code can tune in?

    Personally, I am not against the SMILES parser returning more specific IChemObject's matching the input... if the SMILES has disconnected structures, an IAtomContainerSet makes a lot of sense to me... note that we also have a method to read reaction SMILES...

     
  • John May

    John May - 2012-11-16

    Agreed

    On 16 Nov 2012, at 09:58, Egon Willighagen egonw@users.sf.net wrote:

    John, Ralf, it seems to me that the discussion on the expected behavior of a SMILES parser is outside the scope of patch reviewing...

    Can you please discuss on the cdk-user mailing list how the parser should handle this? So that other users of the code can tune in?

    Personally, I am not against the SMILES parser returning more specific IChemObject's matching the input... if the SMILES has disconnected structures, an IAtomContainerSet makes a lot of sense to me... note that we also have a method to read reaction SMILES...

    patches:581 add handling of disconnected SMILES

    Status: open
    Labels: SMILES
    Created: Thu Nov 15, 2012 05:17 PM UTC by Ralf Stephan
    Last Updated: Fri Nov 16, 2012 09:28 AM UTC
    Owner: nobody

    This adds two functions to the SmilesParser API that let users poll disconnected molecules from the formula. The main API only returns a molecule which by def. is connected. Thus, those about ten tests with disconnected input were broken from the start and correctly fail with this code. Updates for tests will follow.

    Took me some time to figure out my problem was that individual SmilesParserTests do not create a SmilesParser object (which is unexpected).

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/patches/581/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/


    Monitor your physical, virtual and cloud infrastructure from a single
    web console. Get in-depth insight into apps, servers, databases, vmware,
    SAP, cloud infrastructure, etc. Download 30-day Free Trial.
    Pricing starts from $795 for 25 servers or applications!
    http://p.sf.net/sfu/zoho_dev2dev_nov_______
    Cdk-patches mailing list
    Cdk-patches@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/cdk-patches

     
  • Ralf Stephan

    Ralf Stephan - 2012-11-16

    I can agree to only add a getter for an ACSet to SmilesParser. No need for public discussion of any real API change.

     
  • John May

    John May - 2012-11-16

    AtomContainerSet would be a lot better. I also just realised the iterator also means you can't use the parse in multiple threads (although you probably couldn't before).

    Here's a front end of example of what would actually change: https://gist.github.com/4086209

    Can still post to users/dev if you like?

    On 16 Nov 2012, at 10:31, "Ralf Stephan" rwst@users.sf.net wrote:

    I can agree to only add a getter for an ACSet to SmilesParser. No need for public discussion of any real API change.

    patches:581 add handling of disconnected SMILES

    Status: open
    Labels: SMILES
    Created: Thu Nov 15, 2012 05:17 PM UTC by Ralf Stephan
    Last Updated: Fri Nov 16, 2012 09:58 AM UTC
    Owner: nobody

    This adds two functions to the SmilesParser API that let users poll disconnected molecules from the formula. The main API only returns a molecule which by def. is connected. Thus, those about ten tests with disconnected input were broken from the start and correctly fail with this code. Updates for tests will follow.

    Took me some time to figure out my problem was that individual SmilesParserTests do not create a SmilesParser object (which is unexpected).

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/patches/581/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/

     
  • John May

    John May - 2013-02-08

    Marking for closure

     
  • John May

    John May - 2013-09-23
    • status: open --> closed
     
  • John May

    John May - 2013-09-23

    Marked for closure - no response. With a decent graph implementation getting the connected components is linear time and very fast. The API works better reading the SMILES and then disconnected if needed. I think the real issue here was the SDG didn't warn (which it now does).

     

Log in to post a comment.