SOM centroid graph

Jeffrey Ku
  • Jeffrey Ku

    Jeffrey Ku - 2010-11-10

    In the centroid graph of SOM, there are two lines, red and blue.
    The red line is the average value, but what does the blue line mean?


  • John Braisted

    John Braisted - 2010-11-12

    Hi Jeffrey,

    One is in fact the centroid or average value of all genes that are assigned to the cluster at the end of the process.  You've identified that one.

    The other is an SOM node vector.  During the process genes influence a representative expression vectors inside each of the SOM nodes that you selected to apply to the data.  The gene expression vectors are like coordinates in n-dimensional space and genes closes to a particular node vector have the greatest influence on that node vector's value.  The node vectors move through space to try to 'fit' to the genes and as it moves (actually the values of this vector change), it starts to represent a collection of genes (ideally).  After a bunch of iterations where on each iteration a gene is used to train the system (move node vectors), the nodes stop moving and genes are assigned to their closest nodes and form clusters (one per node).

    The other line you see is this node vector.  What's important to note is how well is represents the cluster.  Does it overlay the mean or centroid vector?  If it's way off for one or more clusters (some will naturally be better fits), then you probably need to adjust the input parameters.

    Number of nodes determines number of clusters.  You can try FOM or just play with SOM to find a good fit.

    Number of iterations: The default is too low.  On each iteration one gene is selected to train the system of nodes.  You want full participation (each gene should have a chance) so I would set this 10x your gene count.

    Neighborhood options:  I use bubble and keep the radius below 1.  This is very simplistic approach where only the node closes to the gene that's training the system moves at all.  in my experience if the neighborhood is too large (large radius) then all the nodes just slide all over and you get more noise in your clusters.

    Hope this helps.

    John B.

  • Jeffrey Ku

    Jeffrey Ku - 2010-11-15

    Hi John,

    This is very helpful.
    Thank you.



Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks