Name | Modified | Size | Downloads / Week |
---|---|---|---|
ReadMe.txt | 2015-12-15 | 2.8 kB | |
DenseBlocks_0.1.0_DLL.zip | 2015-12-12 | 20.9 kB | |
DenseBlocks_0.1.0_SourceCode.zip | 2015-12-12 | 90.3 kB | |
Totals: 3 Items | 114.0 kB | 1 |
DenseBlocks 0.1.0 Copyright (C) 2015 Philipp Gnoyke ######### Changelog ######### ### Version 0.1.0 ### # DenseBlocks is an unsupervised classification method for n-dimensional data, in the current version restricted to integers. # It works by dividing each dimension of feature space across the training data range into an equal number of intervals, resulting in a "gridded" feature space with a total number of cells equal to nIntervals ^ nDimensions. # A number of such "grids" are created with varying amounts of intervals, from a specified maximum number to a "grid" consisting only of one cell across all dimensions. # For each resolution the presence or absence of training data points are recorded. For one dimension it could look like this: nResolutions = 3; maxIntervals = 3 Res 3: 0 0 1 Res 2: 0 1 Res 1: 1 # With the help of those grids, a density count is created as follows: Den 3: 1 1 3 For each cell at the highest resolution a cell center is calculated. By checking each cell of each resolution the density count is incremented, if the center falls into the current covered, lower resolution cell. # A specified cutoff parameter is then used to calculate a threshold density value, which ideally should seperate different clusters from each other. The respective cells are labelled accordingly. # Remaining unlabelled cells constitute a cluster and each cluster is iteratively "flodded" to assign a common class ID to each cell. # After all clusters have been classified, the border cells are assigned according to the euclidian distance to cluster centers and the relative importance (size) of each cluster, to prefer larger over very small classes. # The resulting multidimensional blocked feature space object can then be used to classify any related actual data. # Runtime is comparably fast, clustering on a mediocre machine with 1000 training points and 3 dimensions should not take more than 1-2 seconds. No kind of parallelization is implemented. # The algorithm is not very sensitive to the relative frequency of points belonging to different clusters, especially because it records presence and not abundance. # It is however sensitive to sample size: The more training points you use, the higher should be the number of resolutions, the maximum number of intervals for the largest resolution (maxIntervals), and the cutoff parameter. # Accordingly, identifying the desired amount of clusters needs some tuning. With well seperated classes however, this can be done rather quickly. # The library also contains sampling, conversion and classifier classes to cluster Bitmaps.