#555 Improved RGP support in MDL Mol V2000 and added alais handling

Accepted
closed
nobody
io (6)
master
1
2012-12-02
2012-10-19
John May
No

This patch fixes some long standing issues with the pseudo atoms (bugs/1263) in MDLV2000Writer. It's actually relatively small but always good to give a full synopsis :-).

Added more unit tests for the MDLV2000Writer - including corner cases such as multiline RGPs, atom aliases, truncated atom aliases (when longer then 70 characters) and pseudo atoms with null labels. The MDLV2000Writer can now handle all of these cases correctly with the following modifications additions.
- improved RGP conditional - originally the RGP number (i.e. R1, R2, R3) was extracted from pseudo labels which would cause errors where it wasn't actually an R group bug a label for something else (e.g. a protein). The new test is done with a regular expression.
- changed RGP to be in a map instead of a list - this makes a lot more sense
- normalised pseudo label to an empty string if unset (null)
- fixed error in RGP line where there were up to 16 RGP's being printed out - this was far too long and should be 8 (see. below).
- atom aliases are now handled correctly. If the label is short enough it is written to the atom block otherwise it is stored in a map an written after the RGPs as an atom alais.
- add truncation for very long alias labels

One thing I changed as it was definitely wrong was the RGP ling allowed up to 16 entries. Writing the unit test I was able to observe that for the fixed width MDL this was too long.

...
    0.0000    0.0000    0.0000 R#  0  0  0  0  0  0  0  0  0  0  0  0
M  RGP 16   1   1   2   2   3   3   4   4   5   5   6   6   7   7   8   8   9   9  10  10  11  11  12  12  13  13  14  14  15  15  16  16
M  RGP  3  17  17  18  18  19  19
M  END

Testing on other software and common sense says which should be a maximum of 8. As the values are pairs I think this was how the value was mistakenly chosen (8 x 2 = 16).

...
    6.7179   -4.9500    0.0000 R#  0  0  0  0  0  0  0  0  0  0  0  0
M  RGP  8   1   1   2   2   3   3   4   4   5   5   6   6   7   7   8   8
M  RGP  8   9   9  10  10  11  11  12  12  13  13  14  14  15  15  16  16
M  RGP  3  17  17  18  18  19  19
M  END

This now fits within the 80* character limit of MDL (from Fortran).

Please could someone review and confirm the tests pass correctly.

Thanks,
J

*actually looking around I think the limit is 72 - never used Fortran but 16 is definitely too many

1 Attachments

Discussion

  • John May

    John May - 2012-10-28
    • milestone: Accepted --> Needs_Review
     
    • Anonymous - 2012-11-22

      I have reviewed the patch and found no errors. The unit tests run fine and the modifications comply to the V2000 format. Running the code on a larger data set with various R groups / types resulted in no erroneous behavior.

       
  • John May

    John May - 2012-10-28

    grr... sourceforge

     
  • John May

    John May - 2012-11-23

    Could you sign off the commits in a *.patch or on a branch.

    Thanks,
    J

    On 22 Nov 2012, at 23:10, SBeisken beisken@users.sf.net wrote:

    I have reviewed the patch and found no errors. The unit tests run fine and the modifications comply to the V2000 format. Running the code on a larger data set with various R groups / types resulted in no erroneous behavior.

    patches:555 Improved RGP support in MDL Mol V2000 and added alais handling

    Status: open
    Created: Fri Oct 19, 2012 04:19 PM UTC by John May
    Last Updated: Sun Oct 28, 2012 07:06 PM UTC
    Owner: nobody

    This patch fixes some long standing issues with the pseudo atoms (bugs/1263) in MDLV2000Writer. It's actually relatively small but always good to give a full synopsis :-).

    Added more unit tests for the MDLV2000Writer - including corner cases such as multiline RGPs, atom aliases, truncated atom aliases (when longer then 70 characters) and pseudo atoms with null labels. The MDLV2000Writer can now handle all of these cases correctly with the following modifications additions.
    - improved RGP conditional - originally the RGP number (i.e. R1, R2, R3) was extracted from pseudo labels which would cause errors where it wasn't actually an R group bug a label for something else (e.g. a protein). The new test is done with a regular expression.
    - changed RGP to be in a map instead of a list - this makes a lot more sense
    - normalised pseudo label to an empty string if unset (null)
    - fixed error in RGP line where there were up to 16 RGP's being printed out - this was far too long and should be 8 (see. below).
    - atom aliases are now handled correctly. If the label is short enough it is written to the atom block otherwise it is stored in a map an written after the RGPs as an atom alais.
    - add truncation for very long alias labels

    One thing I changed as it was definitely wrong was the RGP ling allowed up to 16 entries. Writing the unit test I was able to observe that for the fixed width MDL this was too long.

    ...
    0.0000 0.0000 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0
    M RGP 16 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16
    M RGP 3 17 17 18 18 19 19
    M END
    Testing on other software and common sense says which should be a maximum of 8. As the values are pairs I think this was how the value was mistakenly chosen (8 x 2 = 16).

    ...
    6.7179 -4.9500 0.0000 R# 0 0 0 0 0 0 0 0 0 0 0 0
    M RGP 8 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8
    M RGP 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16
    M RGP 3 17 17 18 18 19 19
    M END
    This now fits within the 80* character limit of MDL (from Fortran).

    Please could someone review and confirm the tests pass correctly.

    Thanks,
    J

    *actually looking around I think the limit is 72 - never used Fortran but 16 is definitely too many

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/patches/555/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/

     
    • Anonymous - 2012-11-23

      Done. File attached.

      Cheers,
      Stephan

       
  • Anonymous - 2012-11-23

    SF attachment fail. Does it not work in message replies?

     
  • John May

    John May - 2012-11-23

    Apparently not :/

     
  • John May

    John May - 2012-11-23
    • status: open --> accepted
    • milestone: Needs_Review --> Accepted
     
  • Egon Willighagen

    • labels: --> io
    • status: accepted --> closed
     
  • Egon Willighagen

    Applied and pushed.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks