The Map-Reduce paradigmhttp://cacm.acm.org/magazines/2010/1/.../fulltext was explored for knowledge discovery from nuclear reactor simulation data, as it is imminent that this data will quickly become large-scale. Hadoop[<http://hadoop.apache.org/>], which is an open-source implementation of Map-Reduce, was used for this study.
For preliminary investigation, kmeans clusteringhttp://nlp.stanford.edu/IR-book/html/.../k-means-1.html available from mahout[<http://mahout.apache.org/>] was employed. The results were similar to the ones we published in the paper - "Knowledge Discovery from Nuclear Reactor Simulation Data".
A few important resources to learn Map-Reduce are:
A few important resources for Hadoop are:
Some resources for learning mahout are:
Here are the steps, we used for working with Hadoop:
Help for the various command line options for kmeans can be found at: https://cwiki.apache.org/MAHOUT/k-means-commandline.html....