Menu

Clustering_with_Hadoop

Neeti Pokhriyal Dasha
There is a newer version of this page. You can find it here.

Help for the various command line options for kmeans can be found at: https://cwiki.apache.org/MAHOUT/k-means-commandline.html....

    • Get the clusteredPoints directory out of HDFS to local directory.
    • Run mahout seqdumper to output them. An example is: mahout seqdumper -i /data/diff_unc_trans_kmeans_op/clusteredPoints/part-m-00000 -o ~/Data_from_Andrew/diff_unc_trans_hadoop_op/.
    • This can be further processed for analysis.