This adds two functions to the SmilesParser API that let users poll disconnected molecules from the formula. The main API only returns a molecule which by def. is connected. Thus, those about ten tests with disconnected input were broken from the start and correctly fail with this code. Updates for tests will follow.
Took me some time to figure out my problem was that individual SmilesParserTests do not create a SmilesParser object (which is unexpected).
fixes an error in this patch
Egon will have to confirm - but I don't think it should use IMolecule. If I understand correctly IMolecule is going to be completely removed and was the point of master in the first place.
did not apply to master, patch won't apply
sorry John for mislabeling. attached are further fixes and the fixed tests which should complete the patch now. Note one test more now fails due to an unrelated NPE (testnonBond).
cant attach two patches grmbl
Okay thanks. This will probably need two pairs of eyes but here's some minor style points I noticed.
* rename variables in isolation, there are changes of molecule -> mol and make it harder to find the actual behaviour changes
* 0001 this line is bad form: if (smiles.charAt(position-2) < '0' || smiles.charAt(position-2) > '9'). Characters are not numbers.
* you can avoid prefix increment unless you actually need it - ++molSetIndex. ++i != i++
* 0002 has a yoda condition: return molSet != null && molSet.getMoleculeCount() > molSetIndex+1
* 0003 has several do,while loop without braces
* I'm not sure if 'changing' the test assertion counts as fixing.
Also, I'm still not convinced on the benefit of getting an AtomContainerSet. I've spoken to a couple of people and all of them agree that the 'dot' should not split a molecule apart. [Na+].[Cl-] should be read as a single container. I said in the other thread but making an API change as it helps a single use case is not wise. Especially when the use case can already be possible.
If you look as Daylight SMILES tutorial
I would interpet that as it should be read as one.
Last edit: John May 2012-11-16
John, Ralf, it seems to me that the discussion on the expected behavior of a SMILES parser is outside the scope of patch reviewing...
Can you please discuss on the cdk-user mailing list how the parser should handle this? So that other users of the code can tune in?
Personally, I am not against the SMILES parser returning more specific IChemObject's matching the input... if the SMILES has disconnected structures, an IAtomContainerSet makes a lot of sense to me... note that we also have a method to read reaction SMILES...
Agreed
On 16 Nov 2012, at 09:58, Egon Willighagen egonw@users.sf.net wrote:
I can agree to only add a getter for an ACSet to SmilesParser. No need for public discussion of any real API change.
AtomContainerSet would be a lot better. I also just realised the iterator also means you can't use the parse in multiple threads (although you probably couldn't before).
Here's a front end of example of what would actually change: https://gist.github.com/4086209
Can still post to users/dev if you like?
On 16 Nov 2012, at 10:31, "Ralf Stephan" rwst@users.sf.net wrote:
Marking for closure
Marked for closure - no response. With a decent graph implementation getting the connected components is linear time and very fast. The API works better reading the SMILES and then disconnected if needed. I think the real issue here was the SDG didn't warn (which it now does).