Menu

#73 Infobar and categorical string properties

next release
closed
#194
5
2017-08-18
2015-12-18
Nils Kriege
No

String properties may be used to associate a small set of categorical values to a molecule like {red, green, blue} or may represent unique identifier like SMILES. In the first case it would be nice to show the distribution of molecules for the values by the infobar. However, this does not make sense for the latter case.

Since we do not distinguish these different types of string properties, this is currently not supported. There are three solutions to add support:

  1. Distinguish these types of properties, which would require to store the type in the database and allow the user to select the type when importing data.

  2. Count the number of different values a string property takes and decide the type heuristically, e.g., properties with less that 10 different values are considered categorical.

  3. Allow the user to select all string properties and show a warning message in case of string properties with many different values, which explains the problem and allows the user to confirm his choice.

The first solution is cleaner, but the other solutions should be sufficient for most use-cases and do not require a more complicated import process. I would propose to implement the 3. solution. What do you think? Are there other parts of the program that would benefit from the 1. solution?

Discussion

1 2 3 4 > >> (Page 1 of 4)
  • Till Schäfer

    Till Schäfer - 2015-12-18

    Solution 1 is clean, but not very flexible. One must select this during import, which is not very practical from a user point of view. It adds another layer of complexity. Solution 2 seeme a bit intransparent and might be interpreted as a bug by the user (Why is property XY not shown?). Furthermore, there might be a few scaffold subtrees that contain only a few distinct category values and a Mapping still makes sense, even if the total number is to high.

    => i would go for solution 3 or solution 4 (see below)

    Solution 4: Show a dynamic categorical infobar, that puts all categories which are below a freuqnecy threshot in category "the rest". This category might have a special visual appearance to distinguis it from the other categories (e.g. using a striped filling or something like that)

     
  • Nils Kriege

    Nils Kriege - 2016-11-10
    • assigned_to: Philipp Mewes
     
  • Philipp Mewes

    Philipp Mewes - 2017-01-13
    • status: open --> in-progress
     
  • Philipp Mewes

    Philipp Mewes - 2017-01-13

    Some code in the IntervalPanel already implements a mapping for string-properties. It could be used by editing only a few lines of code. However it looks like there are still some issues. Perhaps the interval-comboboxes are not updated, if another string-property is selected. [6feefc] enables the support for string-properties so far.

     

    Related

    Commit: [6feefc]

  • Philipp Mewes

    Philipp Mewes - 2017-01-19

    Fixed the remaining problems with [841d1e] and [4a395d]. The feature is implemented now, i think. Please have a look, if it looks okay.

     

    Related

    Commit: [4a395d]
    Commit: [841d1e]

  • Philipp Mewes

    Philipp Mewes - 2017-01-19
    • status: in-progress --> needs-review
     
  • Till Schäfer

    Till Schäfer - 2017-01-24

    Subtree accumulation does not work (see screenshot: the two ring node in the center of the image should have some blue in it)

     
  • Till Schäfer

    Till Schäfer - 2017-01-24

    For some string properties the numerical interval panel is shown. Example: Tutorial Dataset / PUBCHEM_MOLECULAR_FORMULA (see Screnshots)

     
  • Till Schäfer

    Till Schäfer - 2017-01-24

    Screenshot: PUBCHEM_MOLECULAR_FORMULA in the table view

     
  • Till Schäfer

    Till Schäfer - 2017-01-24

    The last bug fails silently. Can you please have a look through the code and check if Exceptions are thrown on errors?

     
  • Till Schäfer

    Till Schäfer - 2017-01-24
    • status: needs-review --> re-opened
     
  • Till Schäfer

    Till Schäfer - 2017-01-24
    • labels: --> scaffold tree view, property mapping
     
  • Philipp Mewes

    Philipp Mewes - 2017-01-26
    • status: re-opened --> in-progress
     
  • Philipp Mewes

    Philipp Mewes - 2017-01-31

    The number of distinct string-values is limited to 10 (see SinglePropertyPanel, l.427), but PUBCHEM_MOLECULAR_FORMULA contains more than 500 distinct values. This limitation is reasonable, i think, but how should we handle properties, which do not fulfill this predicate? Just removing them from the dropdown-list?

     
    • Till Schäfer

      Till Schäfer - 2017-02-17

      I think, that it should be always possible to select every value, but the default color mapping should only inlude the most frequent values (lets say with a limit of 10) and only values that have a frequency larger than some X (e.g. no singleton values).

       

      Last edit: Till Schäfer 2017-02-17
      • Philipp Mewes

        Philipp Mewes - 2017-02-20

        Missunderstood your proposal at first, sorry. Sounds good. How should we handle the case that a property only consists of singleton values? There must be at least one string-interval to select.

         

        Last edit: Philipp Mewes 2017-03-01
        • Nils Kriege

          Nils Kriege - 2017-03-02

          I think there is no need to handle this case differently. I would propose to sort the strings with the same frequency (one in case of singletons) lexicographically and just show the top ten.

           
  • Philipp Mewes

    Philipp Mewes - 2017-02-09

    After some work i could implement the subtree accumulation. See [985f41] for details.

     

    Related

    Commit: [985f41]

  • Philipp Mewes

    Philipp Mewes - 2017-03-06

    The DbManager renders support for requesting the frequency of distinct string values now ([836e4a]).

    Implemented the feature in the view ([c85279]). At most 10 (non-singleton-)values are displayed now. This also enhances the performance for properties with many distinct values, since at most 10 intervals have to be added to the respective panel (See PUBCHEM_MOLECULAR_FORMULA).

    Handled the special case of singleton-values as proposed ([2d0092] and [2579df]).

     

    Related

    Commit: [2579df]
    Commit: [2d0092]
    Commit: [836e4a]
    Commit: [c85279]

  • Philipp Mewes

    Philipp Mewes - 2017-03-06

    Should the manual be checked for updates, related to this feature too?

     
    • Nils Kriege

      Nils Kriege - 2017-03-06

      Yes, please.

       
  • Philipp Mewes

    Philipp Mewes - 2017-03-08

    Updated the manual ([bf9f33]).

     

    Related

    Commit: [bf9f33]

  • Philipp Mewes

    Philipp Mewes - 2017-03-08
    • status: in-progress --> needs-review
     
  • Nils Kriege

    Nils Kriege - 2017-03-08
    • status: needs-review --> re-opened
     
  • Nils Kriege

    Nils Kriege - 2017-03-08

    Thank you for implementing this feature. Some minor issues still need to be fixed:

    • The size of the combo boxes depends on length of the largest possible string value. Could you try to use edu.udo.scaffoldhunter.gui.util.SteppedComboBox instead?
    • It would be nice to allow to resize the dialog.
    • The checkbox 'Fit interval borders to current subset' does not make sense for string properties. Could you replace it with 'Restrict to values of current subset' when a string property is selected and implement the functionality (Show only values of the current subset and determine frequencies based on the current subset)?
    • Loading the panel for string properties takes a long time, which I suppose to be caused by the database for finding the distinct string values and counting their frequencies. Could you check if this could be sped up by adding a database index? Then this should be added to the Hibernate XML files.
    • The limitation to ten entries does not work (after removing an element, it is possible to add an arbitrary number of new elements). Actually, there is not need for this restriction. Just show up to ten strings by default and allow the user to decide if he wants to add more. This should also be clarified in the manual.
     

    Last edit: Nils Kriege 2017-03-08
1 2 3 4 > >> (Page 1 of 4)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.