I was looking at the LongOpenHashSet code and saw that in the implementation you have an array that has the values and a byte array to hold the status of each item. This byte array could easily be eliminated reducing the memory usage by 11% for LongOpenHashSet, 20% for IntOpenHashSet, and 33% for CharOpenHashSet and ShortOpenHashSet.
A cell can have one of three state empty, removed, or occupied.
Instead of using a separate array to hold this status you can just use 2 special values in the array. Most likely you would use 0 for empty and 1 for removed. The question then is what to do about someone storing those values.
The answer is just handle them separately. You store their status outside of the array:
private final boolean[] specialValueStatus = new boolean[ 2 ];
Then the add function from IntOpenHashSet would look something like this:
private static final int EMPTY = 0;
private static final int REMOVED = 1;
public boolean add(int v)
{
if(v != EMPTY && v != REMOVED)
{
ensureCapacity(used+1);
// first hash
int h = Math.abs(keyhash.hash(v));
int i = h % data.length;
if(data[i] != EMPTY && data[i] != REMOVED)
if (data[i] == v)
return false;
// second hash
int c = 1 + (h % (data.length - 2));
for (;;) {
i -= c;
if (i < 0)
i += data.length;
// Removed entries are re-used
if(data[i] == EMPTY || data[i] == REMOVED)
break;
if(data[i] == v)
return false;
}
}
if (data[i] == EMPTY)
used++;
data[i] = v;
size++;
return true;
}
else
{
size++;
if( specialValueStatus[ v ] )
{
return false;
}
else
{
specialValueStatus[ v ] = true;
return true;
}
}
}