DenseBlocks - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
ReadMe.txt	2015-12-15	2.8 kB	0
DenseBlocks_0.1.0_DLL.zip	2015-12-12	20.9 kB	1
DenseBlocks_0.1.0_SourceCode.zip	2015-12-12	90.3 kB	0
Totals: 3 Items		114.0 kB	1

#########
Changelog
#########

### Version 0.1.0 ###

# DenseBlocks is an unsupervised classification method for n-dimensional data, in the current version
restricted to integers.

# It works by dividing each dimension of feature space across the training data range into an
equal number of intervals, resulting in a "gridded" feature space with a total number of cells
equal to nIntervals ^ nDimensions.

# A number of such "grids" are created with varying amounts of intervals, from a specified maximum
number to a "grid" consisting only of one cell across all dimensions.

# For each resolution the presence or absence of training data points are recorded. For one dimension
it could look like this:

nResolutions = 3; maxIntervals = 3

Res 3: 0 0 1
Res 2: 0 1
Res 1: 1

# With the help of those grids, a density count is created as follows:

Den 3: 1 1 3

For each cell at the highest resolution a cell center is calculated. By checking each cell of each resolution
the density count is incremented, if the center falls into the current covered, lower resolution cell.

# A specified cutoff parameter is then used to calculate a threshold density value, which ideally should
seperate different clusters from each other. The respective cells are labelled accordingly.

# Remaining unlabelled cells constitute a cluster and each cluster is iteratively "flodded" to assign a
common class ID to each cell.

# After all clusters have been classified, the border cells are assigned according to the euclidian
distance to cluster centers and the relative importance (size) of each cluster, to prefer larger
over very small classes.

# The resulting multidimensional blocked feature space object can then be used to classify any related actual data.

# Runtime is comparably fast, clustering on a mediocre machine with 1000 training points and 3 dimensions
should not take more than 1-2 seconds. No kind of parallelization is implemented.

# The algorithm is not very sensitive to the relative frequency of points belonging to different clusters,
especially because it records presence and not abundance.

# It is however sensitive to sample size: The more training points you use, the higher should be the
number of resolutions, the maximum number of intervals for the largest resolution (maxIntervals),
and the cutoff parameter.

# Accordingly, identifying the desired amount of clusters needs some tuning. With well seperated
classes however, this can be done rather quickly.

# The library also contains sampling, conversion and classifier classes to cluster Bitmaps.

Source: ReadMe.txt, updated 2015-12-15