I have been having problems to understand how hierarchical trees are generated in the hierarchical clustering application.
To illustrate the problems one can use the following toy data:
S1 S2 S3 S4 S5
G1 2 3 4 -2 -4
G2 3 2 1 -3 -1
G3 1 1 1 1 1
and cluster it using Euclidean distance and Complete linkage. If I do that in MeV and right-click on the resulting clustering image and select 'Save Sample Node Heights', I get the following node heights:
Node_0 Exp_1 Exp_2 1.4142135
Node_1 Node_0 Exp_3 2.828427
Node_2 Exp_4 Exp_5 2.828427
Node_3 Node_1 Node_2 8.246211
These node heights are all correct however this is not correctly drawn in the images.
1) Using the 'Sample Tree properties' dialog (by right-clicking the image) one can select 'use true branch length structure'. Doing this the branches for the terminal nodes for three samples (S3, S4, S5) no longer reach the heat-map (best visualized by increasing the maximal pixel height e.g. 100). This in agreement with the documentation on page 85, but this is not the correct behavior. The node heights (above) describe an ultra-metric tree and there is no reason for a terminal node not to reach the heat-map. In the generated tree the nodes are correctly placed but the terminal lines should be extended to the heat-map.
2) On the other hand, if 'use true branch length' is not selected the terminal nodes reach the heat map but the nodes are not correctly placed. In this case Node_1 and Node_2 do not appear on the same height in the tree. Of course one could argue that the generated tree is compatible with "not using true branch lengths" and it is in agreement with the documentation, but this behavior is different from all other clustering software I have tested: when actual distances are ignored, height steps are used such that nodes with the same true height still appear at the same height in the tree (whereas relative relations between different true node heights are lost).
3) Problems 1 and 2 are perhaps related to that MeV generates incorrect Newick files. For this example I get the following Newick file from MeV:
wheras it should be:
Newick files are very convenient when using for example R to improve on the bitmapped graphics generated by MeV, so it would be valuable if correct Newick files were generated.
4) I think there is a bug in how the 'Maximal pixel height' parameter in the 'Sample Tree properties' dialog is used. In the parameter information it is written "Maximum Pixel Distance (integer) is the maximum distance that any node can have. Nodes which are distant and would ordinarily have a node height greater than this value are constrained to appear this number of pixels above the lower level node." However it does not appear that this parameter introduces a constraint since with increasing parameter values higher and higher trees are observered, as if all nodes always have a node height greater than this parameter value. I would expect the tree to remain identical for parameter values above some value.
Thanks for this report.
You are right that the behavior you expect to see does not occur in MeV at this time, but it should.
Here is an explanation of what you are seeing:
Node heights are displayed as the node height + the node height of its children. In other words, the height is not a distance from the heatmap, but a distance from the node's child.
In regards to your other issues:
3.) We will look into the Newick file generation bug. Thanks for reporting.
4.) The "maximum pixel height" parameter sets the greatest node height equal to that height, in pixels. All other heights are then set to be proportional to that pixel and node height. Most likely, all heights appear to be equal to the maximum because they are not significantly less than the maximum node height.
We will make the changes to the tree display as soon as possible. Thanks!
Thanks for your quick and clear response.
> You are right that the behavior you expect to see does not occur in MeV at this time, but it should.
> Here is an explanation of what you are seeing:
> Node heights are displayed as the node height + the node height of its children. In other words,
> the height is not a distance from the
> heatmap, but a distance from the node's child.
> In regards to your other issues:
> 3.) We will look into the Newick file generation bug. Thanks for reporting.
> 4.) The "maximum pixel height" parameter sets the greatest node height equal to that height, in pixels.
> All other heights are then set to be proportional to that pixel and node height. Most likely,
> all heights appear to be equal to the maximum because they are not significantly
> less than the maximum node height.
OK, I understand. Your explanation is very clear. Perhaps you could use it to replace the incorrect information about
"maximum pixel height" that is obtained by clicking on the "i"-button in the tree properties dialogs "Tree configuration".
> We will make the changes to the tree display as soon as possible.
Perfect, it would be great if one could use MeV to draw trees with actual node heights from the heatmap. Perhaps you could consider adding a scale to the trees, then it would be simpler to understand how the trees are drawn using the various parameter settings?