Kmeans algorithm:
Clustering refers to finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups.
Kmeans is a flat clustering algorithm.http://nlp.stanford.edu/IR-book/html/.../k-means-1.html#sec:kmeans. Kmeans is an iterative algorithm, whose first step is to select k initial centroids (also called seeds), which are randomly selected data points. In the second step, k clusters are formed by assigning all points to their closest centroid, and the centroid of each cluster is recomputed. This is done iteratively till a stopping criterion is met (for example: the centroids don't change).
The "closeness" can be measured by various distance measures, or similarity measures or correlation. Different distance measures were experimented for our case, and gave similar results. Thus, results were generated with Euclidean as the chosen distance measure. Another parameter to the kmeans algorithm is the number of clusters. Since our data is small (a matrix of size 39*49), two was selected as the number of clusters. The output of the kmeans clustering is validated with the domain knowledge, and is shown to give meaningful results.
"Anomaly Detection"
A typical problem definition of anomaly detection is: Given a dataset D, find all the data points x belonging to D having the top-n largest anomaly scores calculated by a predetermined function f(x). It is associated with numerous challenges, like:
Some of the application areas include credit card fraud detection, telecommunication fraud detection, network intrusion detection, fault detection.
The general steps for any anomaly detection algorithms include:
Types of anomaly detection �schemes Graphical & Statistical-based Distance-based Model-based