Download Latest Version DenseBlocks_0.1.0_DLL.zip (20.9 kB)
Email in envelope

Get an email when there's a new version of DenseBlocks

Home
Name Modified Size InfoDownloads / Week
ReadMe.txt 2015-12-15 2.8 kB
DenseBlocks_0.1.0_DLL.zip 2015-12-12 20.9 kB
DenseBlocks_0.1.0_SourceCode.zip 2015-12-12 90.3 kB
Totals: 3 Items   114.0 kB 1
DenseBlocks 0.1.0 Copyright (C) 2015  Philipp Gnoyke


#########
Changelog
#########




### Version 0.1.0 ###

# DenseBlocks is an unsupervised classification method for n-dimensional data, in the current version 
  restricted to integers.

# It works by dividing each dimension of feature space across the training data range into an 
  equal number of intervals, resulting in a "gridded" feature space with a total number of cells 
  equal to nIntervals ^ nDimensions.

# A number of such "grids" are created with varying amounts of intervals, from a specified maximum 
  number to a "grid" consisting only of one cell across all dimensions.

# For each resolution the presence or absence of training data points are recorded. For one dimension 
  it could look like this:

  nResolutions = 3; maxIntervals = 3	

  Res 3:	0	0	1
  Res 2:	    0	    1
  Res 1:		1 	

# With the help of those grids, a density count is created as follows:

  Den 3:	1	1	3

  For each cell at the highest resolution a cell center is calculated. By checking each cell of each resolution
  the density count is incremented, if the center falls into the current covered, lower resolution cell.

# A specified cutoff parameter is then used to calculate a threshold density value, which ideally should 
  seperate different clusters from each other. The respective cells are labelled accordingly.

# Remaining unlabelled cells constitute a cluster and each cluster is iteratively "flodded" to assign a 
  common class ID to each cell.

# After all clusters have been classified, the border cells are assigned according to the euclidian 
  distance to cluster centers and the relative importance (size) of each cluster, to prefer larger 
  over very small classes.

# The resulting multidimensional blocked feature space object can then be used to classify any related actual data.

# Runtime is comparably fast, clustering on a mediocre machine with 1000 training points and 3 dimensions 
  should not take more than 1-2 seconds. No kind of parallelization is implemented.

# The algorithm is not very sensitive to the relative frequency of points belonging to different clusters, 
  especially because it records presence and not abundance.

# It is however sensitive to sample size: The more training points you use, the higher should be the 
  number of resolutions, the maximum number of intervals for the largest resolution (maxIntervals), 
  and the cutoff parameter.

# Accordingly, identifying the desired amount of clusters needs some tuning. With well seperated 
  classes however, this can be done rather quickly.
  
# The library also contains sampling, conversion and classifier classes to cluster Bitmaps.


	
Source: ReadMe.txt, updated 2015-12-15