The Map-Reduce paradigmhttp://cacm.acm.org/magazines/2010/1/.../fulltext was explored for knowledge discovery from nuclear reactor simulation data, as it is imminent that this data will quickly become large-scale. Hadoop[<http://hadoop.apache.org/>], which is an open-source implementation of Map-Reduce, was used for this study.
For preliminary investigation, kmeans clusteringhttp://nlp.stanford.edu/IR-book/html/.../k-means-1.html available from mahout[<http://mahout.apache.org/>] was employed. The results were similar to the ones we published in the paper - "Knowledge Discovery from Nuclear Reactor Simulation Data".
Help for the various command line options for kmeans can be found at: https://cwiki.apache.org/MAHOUT/k-means-commandline.html....
For more information on Hadoop, see their website, which contains many useful resources, including a tutorial on how to use Hadoop.
More details on Mahout and its uses can be found at the Apache Mahout website. The developers keep the website very well maintained, and you can find a great introduction to k-Means clustering, in addition to information on many other methods.
Some helpful materials on MapReduce include: