From: SourceForge.net <no...@so...> - 2011-09-25 11:47:27
|
Patches item #3412340, was opened at 2011-09-21 17:07 Message generated for change (Comment added) made by egonw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3412340&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Revision Status: Open Resolution: None Priority: 5 Private: No Submitted By: John May (jwmay) Assigned to: Rajarshi Guha (rajarshi) Summary: Fix for non-read of pseudo atoms (MDLV2000) Initial Comment: MDLV2000Reader would only read pseudo atom numbers that are in single digits. In some cases (a few from Brenda) the pseudo atom will be larger then a single digit. For example the BRENDA entry for (R)-3-Hydroxyacyl-CoA (http://www.brenda-enzymes.org/php/ligand_flatfile.php4?brenda_ligand_id=18180) (download mol: http://www.brenda-enzymes.org/brenda_ligand/download_ligand_molfile.php4?LigandID=18180) the pseudo atom is assigned to 56. The pseudo atom wasn't being assigned correctly as regex was "matches("^A \\d+")". When the number exceeds one digit the spaces are swallowed infront of the digit and not after as it is right aligned; "A\s\s\s\s1" "A\s\s\s11" "A\s\s111" I have changed the two regex's to allow varied amount of spaces and added have a JUnit test (testRGroupHighAtomNumber) with sample mol file in patch. Note: This patch causes a JUnit test fail for Mol2WriterTest : testMissingAtomType() however the javadoc for this test informs "This test just ensures that Mol2Writer does not throw an NPE. It does not test whether the output is correct or not.". The test fails as there is an assertTrue on a content index. I'm not entirely sure what the assertTrue is trying to accomplish but I think it may be failing because the file is being read properly now. John With thanks to Pablo Moreno for finding this bug. ---------------------------------------------------------------------- >Comment By: Egon Willighagen (egonw) Date: 2011-09-25 13:47 Message: Ah, I did not know that of matches(). Excellent! Then we only need confirmation from Rajarshi now that my patch for his test in Mol2WriterTest is OK, and the patches are good to go. ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2011-09-25 13:27 Message: Hi Egon, Yes I did... but it makes no differences to how it matches as the 'matches()' method tries to match the entire regular expression from start to finish of the string (http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#matches()). It effectively wraps the pattern in '^' and '$' and explicitly adding the '^' actually slows the query down (only slightly though). Incidentally I think it might be possible to speed up the reader by compiling the regular expressions. Will give it a try on a branch next week. John ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-09-24 12:11 Message: Tasks: 1. John, please comment on the '^' question 2. Rajarshi, please comment on the patch for the Mol2Writer unit test you wrote ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2011-09-24 12:09 Message: John, did you remove the ^ as first character of the regexp deliberately? What was the reason? I would probably prefer it there, to reduce false hits... Regarding the failing unit tests, the atom order seems to change due to your patch, but I am puzzled as to why... I do note that the SD file read in that unit test in fact too has A numbers with two digits, and that the reading of the input therefore got changed... Rajarshi, does that explain who atom 24 in the unit test becomes atom 2 after the patch? This change updates the unit test: - Assert.assertTrue(mol2file.indexOf("24 R24 -1.209 -18.043 49.44 X") > 0); + Assert.assertTrue(mol2file.contains("-1.209 -18.043 49.44 X")); (patch attached) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3412340&group_id=20024 |