Menu

Code Merge Request #1: New Calculation Plugin Implementation (open)

Merging...

Merged

Something went wrong. Please, merge manually

Checking if merge is possible...

Something went wrong. Please, merge manually

dim8 wants to merge 16 commits from /u/dim8/scaffoldhunter/ to master, 2018-07-26

Within the branch feature-107 we have written a data calculation plug-in that processes a set of molecular descriptors from the CDK Library. The list of descriptors includes:

  • Molecular Weight
  • Hydrogen Bond Acceptor Count
  • Hydrogen Bond Donor Count
  • Topological Polar Surface Area
  • Chi Chain
  • Chi Cluster
  • Chi Path
  • Chi Path Cluster
  • Kappa Shape indices

and all in all adds 47 new separate table values. The full source code is to be found in the
edu.udo.scaffoldhunter.plugins.datacalculation.impl.konplugin and its respective sub-package fingerprints.

We have tested it on some samples and to the best of our knowledge there aren't any apparent issues. Please, review and give us your feedback.

Commit Date  
[66a854] by Dimitar Garkov Dimitar Garkov

Safety value & refactoring

2018-02-02 00:36:58 Tree
[0350f9] by Dimitar Garkov Dimitar Garkov

fixed description overflow, matched indices to description, added chi
path orders description

2018-01-25 20:22:53 Tree
[21515f] by Dimitar Garkov Dimitar Garkov

improved textual description & styling

2018-01-25 15:40:21 Tree
[54cf54] by yingzhang yingzhang

Add description for descriptors

2018-01-24 15:06:34 Tree
[adcd73] by yingzhang yingzhang

Add description for descriptors.
Signed-off-by: yingzhang <yingzhang@...>

2018-01-24 11:53:50 Tree
[f3df65] by Dimitar Garkov Dimitar Garkov

documentation

2018-01-22 10:31:20 Tree
[40b212] by Dimitar Garkov Dimitar Garkov

fingerprint descriptors implementation

2018-01-19 02:57:54 Tree
[b9f9ff] by Dimitar Garkov Dimitar Garkov

refactoring & package javadoc files

2018-01-18 23:12:20 Tree
[ebafb2] by Dimitar Garkov Dimitar Garkov

abstract fingerprint descriptor

2018-01-18 20:58:13 Tree
[a74397] by Dimitar Garkov Dimitar Garkov

output formatting, documenting, refactoring

2018-01-18 17:57:57 Tree
[4376a9] by Dimitar Garkov Dimitar Garkov

fix output of new descriptors

2018-01-18 16:11:04 Tree
[9d5d4c] by Ying Ying

typo correction

2018-01-16 15:48:19 Tree
[58f4f8] by Ying Ying

Extended plugin with new descriptors

2018-01-16 15:23:39 Tree
[d9d98d] by Ying Ying

extended plugin with new descriptors

2018-01-16 15:20:15 Tree
[682e09] by Dimitar Garkov Dimitar Garkov

New Plugin Descriptor Calculations:

molecular weight, chi descriptors, kappa shape indices

2017-12-14 19:20:39 Tree
[e697b7] by Dimitar Garkov Dimitar Garkov

initial plugin commit

2017-11-30 23:18:58 Tree

Discussion

  • Till Schäfer

    Till Schäfer - 2018-02-02

    First of all: Thank you for the contribution.

    Now: Let's get to the technical bits :-)


    I have some genral Questions before we go into source code details:

    • Why is the plugin named KonPlugin?
    • Why are all the differnt descriptors calculated in a single plugin? Typically, we follow the paradigm to do one thing per plugin. That enables the user to select exactly which properties he wants to calculate.
    • Are there any parameters of the descriptors that should be exposed to the user?

    Code-Style:

    • Use the eclipse code formatter on all files. We have configured it to comply with out coding rules in the project settings.
    • remove the non-Javadoc comments for overriden methods.

    Code:

    • KonPluginTransformFunction#apply: Why do you use a copy of structure? Is structure somhow modified by the calculations? If so, why can you re-use the copy for all the calculations.
    • KonPluginTransformFunction#apply: Are there any possible expections here? Please catch them as shown in EStateBitStringCalcPluginTransformFunction.
    • please avoid (re)implementation of standard-functionality as in KonPluginUtils.concatArrays: This will just blow up the code and make it harder to maintain. Thus, just use ArrayLists and return with #toArray(T[] a).
    • small typo in the html: Evaluatessimple
    • why is the html file called scaffold_hunter_konplugin.html? please choose a more describptive name.

    For the feature:

    • Please dont mix differnt purposes together in one merge request (e.g. fixing the javadoc of other files, removing warning). Since this are minimal changes outside the plugin, I am not requesting you to split it up this time.

    Any further questions? Feel free to ask and diskuss them. (for general question not related to this pull request, please use our developer mailing list)

     

    Last edit: Till Schäfer 2018-02-02
    • dim8

      dim8 - 2018-02-06

      Thanks for the quick feedback!


      Why is the plugin named KonPlugin?

      The name is a portmanteau derived from Konstanz plus Plugin. It was thought as a small gesture to the university of the same name, where the plugin has been developed. Frankly, we are open to suggestions. A content-based derived modification from the title (Molecular Descriptors Calculator) of the plug-in, e.g. CDKMolPlugin might, however, be more suitable.

      Why are all the differnt descriptors calculated in a single plugin? Typically, we follow the paradigm to do one thing per plugin. That enables the user to select exactly which properties he wants to calculate.

      Because, the group of chemists, who sought to use Scaffold Hunter in their research had requested many values and separating them in that many plug-ins wasn't feasible (considering each value - 47). So, instead of implementing 9 separate ones, we went for the more straightforward solution. If a user needs only a subset of them, s/he could still take only those for the analysis part.

      Are there any parameters of the descriptors that should be exposed to the user?

      Not really. There are parameters that could be set only for the H-Bond Acceptor/Donor counts and the TPSA descriptor, all in terms of aromaticity. There we opted out for the default option, which is false. About the Weight descriptor, there the molecular weight is calculated by design, and thus the parameter is set internally. One hindrance in general towards any exposure of parameters as settings turned out to be the limited place for plugin details. Because the description is on all accounts pretty lengthy and the panel is in fact not scrollable, an addition of settings panel would have gone hidden. As we weren't sure whether this was by design, this wasn't put in any ticket. Is it in fact so?


      Use the eclipse code formatter on all files. We have configured it to comply with out coding rules in the project settings.

      We have, do you have anything specific in mind?

      remove the non-Javadoc comments for overriden methods.

      I'm not sure that I understand you correctly. Could you give me an example? Do you mean e.g.

      /* (non-Javadoc)
      * @see com.google.common.base.Function#apply(java.lang.Object)
      */
      @Override
      public IAtomContainer apply(IAtomContainer structure)
      

      KonPluginTransformFunction#apply: Why do you use a copy of structure? Is structure somhow modified by the calculations? If so, why can you re-use the copy for all the calculations.

      The structure itself shouldn't be modified. The reason was that we considered the AtomCountPlugin as an example at the start. This will be fixed in the next commit.

      KonPluginTransformFunction#apply: Are there any possible expections here? Please catch them as shown in EStateBitStringCalcPluginTransformFunction.

      Actually, any possible exceptions are caught. See the getDescriptorValue() method, which is called on each descriptor. That way, if one descriptor calculation fails. the following could proceed.

      please avoid (re)implementation of standard-functionality as in KonPluginUtils.concatArrays: This will just blow up the code and make it harder to maintain. Thus, just use ArrayLists and return with #toArray(T[] a).

      I understand. Will see, whether that is possible under the newest JDK and, if possible would update it. Otherwise, I would restrict the class and its methods visibility to package.

      small typo in the html: Evaluatessimple

      Thanks, will fix it.

      why is the html file called scaffold_hunter_konplugin.html? please choose a more describptive name.

      This goes back to your very first question and will be updated synchronously. Because this is a direct description of the plugin, <name>.html is the most suited and minimal enough for this purpose.


      Please dont mix differnt purposes together in one merge request (e.g. fixing the javadoc of other files, removing warning). Since this are minimal changes outside the plugin, I am not requesting you to split it up this time.

      We are aware of SH's code hygiene, and it is admirable. Exactly in the name of keeping it clean, we decided to do those changes, but keep them minimal. Because of their inhomogeneity and their lack of great importance, it would have been too much unnecessary effort for both sides, if we had acted on each one on its own.

      About the mailing list, I looked for sort of "Submit" option in it, when I wanted to report the two issues (295 and 296), but could find any. To add, I could find the respective address, too.

       
      • Till Schäfer

        Till Schäfer - 2018-02-12

        Why is the plugin named KonPlugin?

        The name is a portmanteau derived from Konstanz plus Plugin. It was thought as a small gesture to the university of the same name, where the plugin has been developed. Frankly, we are open to suggestions. A content-based derived modification from the title (Molecular Descriptors Calculator) of the plug-in, e.g. CDKMolPlugin might, however, be more suitable.

        Why are all the differnt descriptors calculated in a single plugin? Typically, we follow the paradigm to do one thing per plugin. That enables the user to select exactly which properties he wants to calculate.

        Because, the group of chemists, who sought to use Scaffold Hunter in their research had requested many values and separating them in that many plug-ins wasn't feasible (considering each value - 47). So, instead of implementing 9 separate ones, we went for the more straightforward solution. If a user needs only a subset of them, s/he could still take only those for the analysis part.

        This is a combined thought on both aspects above: If something is bundled, it should be clear to the user and developer, where to search for some functionality. The displayed plugin name already summarizes the topic. We might just describe it a little bit more specific and call it "QSAR Molecular Descriptors Calculator". Then, the use is aware, what he can expect by the plugin.

        The package name and the classes also should be named accordingly to avoid confusion.
        Of course, you are free to honor the origin of delopment. Thus, feel free to add a notice in the JavaDoc of the plugin class.

        Regarding the Descriptors it would be very nice if they can be selected/toggled in the plugin, e.g. by a list with checkboxes. We already think about extending the functionality of this plugin to resepect all parameter-free IMolecularDescriptors from cdk. If you are interested, you can do this extension to all parameter-free IMolecularDescriptors by yourself, but only the ability to select the current descriptors is required to merge this request.

        Are there any parameters of the descriptors that should be exposed to the user?

        Not really. There are parameters that could be set only for the H-Bond Acceptor/Donor counts and the TPSA descriptor, all in terms of aromaticity. There we opted out for the default option, which is false. About the Weight descriptor, there the molecular weight is calculated by design, and thus the parameter is set internally. One hindrance in general towards any exposure of parameters as settings turned out to be the limited place for plugin details. Because the description is on all accounts pretty lengthy and the panel is in fact not scrollable, an addition of settings panel would have gone hidden. As we weren't sure whether this was by design, this wasn't put in any ticket. Is it in fact so?

        Have you added a ScrollPane?


        Use the eclipse code formatter on all files. We have configured it to comply with out coding rules in the project settings.

        We have, do you have anything specific in mind?

        hmm, maybe you just used another version of the formatter, but here the javadocs are formatted different (e.g. @param is allwas followed by a newline) and the code is formatted differently in some places. Have you applied the formmatter on the whole class? If so, just keep it the current way. Otherwise, please do so.

        remove the non-Javadoc comments for overriden methods.

        I'm not sure that I understand you correctly. Could you give me an example? Do you mean e.g.

        ~~~
        / (non-Javadoc)
        @see com.google.common.base.Function#apply(java.lang.Object)
        */
        @Override
        public IAtomContainer apply(IAtomContainer structure)
        ~~~

        Yes, that are the comments I am talking about. We also have them in some places (its a legacy), but they are completely useless for modern IDEs, which can give you much bettern descriptions about the superclass methods. These comments also tend to get outdated if some super-class gets renamed.


        KonPluginTransformFunction#apply: Are there any possible expections here? Please catch them as shown in EStateBitStringCalcPluginTransformFunction.

        Actually, any possible exceptions are caught. See the getDescriptorValue() method, which is called on each descriptor. That way, if one descriptor calculation fails. the following could proceed.

        Oh, I have not seen them. However, the current implementation will still fail, since #getDescriptorValue() will return null in the case of an Exception. Afterwards you pass the null value to #addFingerprintValues(), which calls toString on the null-result. This will raise another uncatched exception in #apply. Long story should, please make sure to not insert null values (just do not add the property in this case).

        please avoid (re)implementation of standard-functionality as in KonPluginUtils.concatArrays: This will just blow up the code and make it harder to maintain. Thus, just use ArrayLists and return with #toArray(T[] a).

        I understand. Will see, whether that is possible under the newest JDK and, if possible would update it. Otherwise, I would restrict the class and its methods visibility to package.

        Our code should be compatible to Java 7 at this point. The ArrayList#toArray(T[] a) method is available for this version.

        why is the html file called scaffold_hunter_konplugin.html? please choose a more describptive name.

        This goes back to your very first question and will be updated synchronously. Because this is a direct description of the plugin, <name>.html is the most suited and minimal enough for this purpose.

        I was referring to the "scaffold_hunter" part here. Its clear that is has something to do with the scaffold hunter, thus thats just noise. On the other hand the file name does not give any hint about the actual content. Thus, I could think about something like descriptor_help.html, descriptor_description.html ;-), or something like that.


        Please dont mix differnt purposes together in one merge request (e.g. fixing the javadoc of other files, removing warning). Since this are minimal changes outside the plugin, I am not requesting you to split it up this time.

        We are aware of SH's code hygiene, and it is admirable. Exactly in the name of keeping it clean, we decided to do those changes, but keep them minimal. Because of their inhomogeneity and their lack of great importance, it would have been too much unnecessary effort for both sides, if we had acted on each one on its own.

        About the mailing list, I looked for sort of "Submit" option in it, when I wanted to report the two issues (295 and 296), but could find any. To add, I could find the respective address, too.

        You can subscribe to the mailinglist here:
        https://sourceforge.net/p/scaffoldhunter/mailman/?source=navbar
        Afterwards, you can just send regular emails to that list.

         

        Last edit: Till Schäfer 2018-02-12
  • dim8

    dim8 - 2018-02-22

    We have updated the state of the plug-in with the following:

    • New name: QSAR Molecular Descriptors Calculator. The package name is now qsarmol.
    • Formatted all Java files properly and removed the legacy non-Javadoc comments
    • Made sure there is no null descriptor value that is passed to the ScaffoldHunter from the CDK through the plug-in.
    • Removed the additional utility class, while refactored and reverted to built-in Java functionality.
    • Updated the name of the descriptor HTML description file.

    About the ability to select single descriptors for calculation, we haven't added it, because there isn't enough space on the side pane to show it. The settings panel itself could have scrollable pane, but that won't fix the issue. The description panel is the one that has to be scrollable or alternatively the whole side details pane. But changing this is out of scope for the feature, and we can't do it only for this plug-in, since the description has to be given as a bare String.

     
  • Till Schäfer

    Till Schäfer - 2018-07-26

    Sorry for the late feedback. The current verson looks fine, but one small thing is missing:

    The current properties are named in a confusing way:

    All descriptors form the same type are named the same, but have a number atached. One needs to read the (very good) description of the property to understand the meaning. That description is not always available to the user (table view shows it by hovering the proprty, other views (especially drop down lists) do not do so). Thus, it would be much better to just name the proerty accordingly (please keep the description anyway).

    Example: "Chi Chain Descriptor 1" -> "Chi Chain Descriptor SCH-3"

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.