I'm new in this forum & also quite a beginner in using the program MeV. So my question could eventually be a bit ridiculous to some of you, but I'll try it anyways, as I couldn't help myself otherwise :).
Im doing my PhD in Plant Physiology, so I might give some background information: I have a set of gene expression data, all in all around 1100 genes, already scaled down from 25000 genes, they are specific for a special organelle in the cell. Also, the data set is a time course & the sampling was done at AM & PM. I played around with MeV & did some HCL, SOMs & KMCs to get some order in the data. I already figured out that I will split up the analysis to AM & PM separately, it makes it more clear. Also I choose to use Euclidian distance, after I tried out Pearson & all the others.
My problem or my question now is, what exactly is "Gene leaf order optimization"? What do I need/not need it for, or better, what exactly does it with my data? Because dependent on weather I choose to use the optimization or not, I get different HCL and also different clusters in the KMC analyses. When I try to sum up my 20 choosen clusters due to UP or DOWN regulation of specific genes, I come up with different gene amounts being either up or down regulated.
So what does the gene leaf order optimization do to my dataset?
I hope some of you will understand what I mean, if not I could try to explain it a bit more in detail.
Thanks a lot anyways!
A dendrogram from a hierarchical clustering analysis by default has an arbitrary tree structure. In other words, the branches of the tree are set but the ordering of the nodes can be altered without changing the underlying structure. Think of it as rotating the arms of any of the nodes. Because of this, the leaf ordering is arbitrary so that the relationship between any two adjacent leaves may not have any sort of meaning if they don't belong to the same cluster.
What optimal ordering seeks to do is identify the best possible orientation of each of the n-1 nodes in the tree such that the sum of the distances between all adjacent leaves is minimized. The algorithm used in this feature is described in this paper
along with a number of potentially useful applications.
Leaf ordering is computationally intensive, so it is not a default option.
Changing the leaf order will not affect the underlying tree structure, so you should get the same clusters with or without this feature. If you are getting different clusters using KMC, it is likely that the reason is that KMC is not deterministic and in some cases will return a different result each time.
Thank you a lot for the fast answer! Especially for the according publication. Even though I do not have much knowledge about all this algorithms & not much experiences with bioinformatics & programming, it was really helpful. I'm a very visual person, so the graphs & analysis they did in the paper, helped me a lot to understand it better than before!
Log in to post a comment.