Create a highly scalable counter class, called Counter. Counter is somewhat slower than using raw Unsafe or Java 5's 'lock' API for single threads, and about the same speed as a well-striped Unsafe counter up to ~32 CPUs but continues to scaling linearly to 768 cpus where other approaches top out much sooner.
Change NonBlockingXXX classes to use this Counter class - note that they already used the same infrastructure before, just under a less convenient name.
Counter supports the obvious add, get, set, increment & decrement calls. 'get' cannot be atomic with 'add' because of the internal striped counters, but it is guaranteed to see 'adds' made by the same thread doing the 'get'. There is a fast approximate 'estimate_get' call as well.
Cliff