Bugs in hierarchical trees?

Help
2009-06-01
2013-05-20
  • Markus Ringnér

    Markus Ringnér - 2009-06-01

    I have been having problems to understand how hierarchical trees are generated in the hierarchical clustering application.

    To illustrate the problems one can use the following toy data:

            S1      S2     S3     S4      S5
    G1      2       3       4       -2       -4
    G2      3       2       1       -3       -1
    G3      1       1       1        1         1

    and cluster it using Euclidean distance and Complete linkage. If I do that in MeV and right-click on the resulting clustering image and select 'Save Sample Node Heights', I get the following node heights:

    Node_0  Exp_1     Exp_2    1.4142135
    Node_1  Node_0  Exp_3    2.828427
    Node_2  Exp_4     Exp_5    2.828427
    Node_3  Node_1  Node_2  8.246211

    These node heights are all correct however this is not correctly drawn in the images.

    1)  Using the 'Sample Tree properties' dialog (by right-clicking the image) one can select 'use true branch length structure'. Doing this the branches for the terminal nodes for three samples (S3, S4, S5) no longer reach the heat-map (best visualized by increasing the maximal pixel height e.g. 100). This in agreement with the documentation on page 85, but this is not the correct behavior.  The node heights (above) describe an ultra-metric tree and there is no reason for a terminal node not to reach the heat-map. In the generated tree the nodes are correctly placed but the terminal lines should be extended to the heat-map.

    2) On the other hand, if 'use true branch length' is not selected the terminal nodes reach the heat map but the nodes are not correctly placed. In this case Node_1 and Node_2 do not appear on the same height in the tree. Of course one could argue that the generated tree is compatible with "not using true branch lengths" and it is in agreement with the documentation, but this behavior is different from all other clustering software I have tested: when actual distances are ignored, height steps are used such that nodes with the same true height still appear at the same height in the tree (whereas relative relations between different true node heights are lost). 

    3) Problems 1 and 2 are perhaps related to that MeV generates incorrect Newick files. For this example I get the following Newick file from MeV:

    (((S1:0.70710677,S2:0.70710677):1.4142135,S3:1.4142135):4.1231055,(S4:1.4142135,S5:1.4142135
    ):4.1231055):0.0;

    wheras it should be:

    (((S1:1.4142135,S2:1.4142135):1.4142135,S3:2.828427):5.417784,(S4:2.828427,S5:2.828427
    ):5.417784):0.0;

    Newick files are very convenient when using for example R to improve on the bitmapped graphics generated by MeV, so it would be valuable if correct Newick files were generated.

    4) I think there is a bug in how the 'Maximal pixel height' parameter in the 'Sample Tree properties' dialog is used. In the parameter information it is written "Maximum Pixel Distance (integer) is the maximum distance that any node can have. Nodes which are distant and would ordinarily have a node height greater than this value are constrained to appear this number of pixels above the lower level node." However it does not appear that this parameter introduces a constraint since with increasing parameter values higher and higher trees are observered, as if all nodes always have a node height greater than this parameter value. I would expect the tree to remain identical for parameter values above some value.

    Best Regards,

    Markus

     
    • Daniel Schlauch

      Daniel Schlauch - 2009-06-01

      Hi Markus,

      Thanks for this report.

      You are right that the behavior you expect to see does not occur in MeV at this time, but it should. 
      Here is an explanation of what you are seeing:
      Node heights are displayed as the node height + the node height of its children.  In other words, the height is not a distance from the heatmap, but a distance from the node's child.

      In regards to your other issues:
      3.) We will look into the Newick file generation bug.  Thanks for reporting.
      4.) The "maximum pixel height" parameter sets the greatest node height equal to that height, in pixels.  All other heights are then set to be proportional to that pixel and node height.  Most likely, all heights appear to be equal to the maximum because they are not significantly less than the maximum node height.

      We will make the changes to the tree display as soon as possible.  Thanks!

      Dan

       
    • Markus Ringnér

      Markus Ringnér - 2009-06-04

      Hi Dan,

      Thanks for your quick and clear response.

      > You are right that the behavior you expect to see does not occur in MeV at this time, but it should. 
      > Here is an explanation of what you are seeing:
      > Node heights are displayed as the node height + the node height of its children. In other words,
      > the height is not a distance from the
      > heatmap, but a distance from the node's child.
      >
      > In regards to your other issues:
      > 3.) We will look into the Newick file generation bug. Thanks for reporting.
      > 4.) The "maximum pixel height" parameter sets the greatest node height equal to that height, in pixels.
      > All other heights are then set to be proportional to that pixel and node height. Most likely,
      > all heights appear to be equal to the maximum because they are not significantly
      > less than the maximum node height.
      >

      OK, I understand. Your explanation is very clear. Perhaps you could use it to replace the incorrect information about
      "maximum pixel height" that is obtained by clicking on the "i"-button in the tree properties dialogs "Tree configuration".

      > We will make the changes to the tree display as soon as possible.

      Perfect, it would be great if one could use MeV to draw trees with actual node heights from the heatmap. Perhaps you could consider adding a scale to the trees, then it would be simpler to understand how the trees are drawn using the various parameter settings?

      Markus

       

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks