This tool uses Random Forest and PAM to cluster observations and to calculate the dissimilarity between observations. It supports on-line prediction of new observations (no need to retrain); and supports datasets that contain both continuous (e.g. CPU load) and categorical (e.g. VM instance type) features. In particular, we use an unsupervised formulation of the Random Forest algorithm to calculate similarities and provide them as input to a clustering algorithm. For the sake of efficiency and meeting the dynamism requirement of autonomic clouds, our methodology consists of two steps: (i) off-line clustering and (ii) on-line prediction.
RF+PAM can:
Cluster observations (Unsupervised Learning)
Calculate the dissimilarity between 2 or more observations (how different two observations are)
Unsupervised Random Forest
On-line Unsupervised Random Forest
Status: Beta
Brought to you by:
uriarte
Downloads:
0 This Week
Linux