Re: [GD-General] cross platform 64bit lock free FIFO?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Andras Balogh wrote:
> Yes, I only need one writer and one reader thread, but I'm not sure that I  
> understand the solution you propose. Could you elaborate a bit? Thanks,
>   

Here is the implementation, an all its simplicity:

// Lock-free FIFO by Jon Watte.
// Size must be power of 2.
// This FIFO is correct on systems with strongly ordered caches (such as 
Intel
// and desktop PPC). It may require write barriers on systems with 
weakly ordered
// caches, at the points marked /* 1 */ and /* 2 */.
template<typename Stored, size_t Size>
class LockFreeFifo {
  public:
    LockFreeFifo() { head = 0; tail = 0; }

    /* get an item from the FIFO, return true if an item was gotten, 
else return false */
    bool get(Stored & out) {
      if (head - tail > 0) {
        out = fifo_[tail & (Size-1)];
        ++tail; /* 1 */
        return true;
      }
      return false;
    }

    /* put an item into the FIFO, return true if it fit, false if it's 
full */
    bool put(Stored & in) {
      if (head - tail < Size) {
        fifo_[head & (Size-1)] = in;
        ++head; /* 2 */
        return true;
      }
      return false;
    }
  private:
    unsigned int head;
    unsigned int tail;
    Stored fifo_[Size];
};

Because it's non-blocking, it returns FALSE when there's nothing to do 
(FIFO full or empty). Because there's a single reader and a single 
writer, there is no thread contention on "head" vs "tail". Because it's 
using unsigned math, and the Size is a power of two (important!), the 
two's complement wrapping arithmetic makes for no special cases. (Note 
the comparison of head-tail > 0 !)

On Intel, and other machines with strongly ordered caches, this doesn't 
need any kind of memory barrier or interlocked operation at all.

On some more exotic systems with weakly ordered caches (where you may 
get "old" data when reading), there is a chance of get() returning false 
when there is actually one item in the cache; similarly, there's a 
chance of put() returning false when there is actually space. When the 
"head" and "tail" share a cache line, the maximum difference can be one 
item. However, for absolute correctness in these cases, you want to 
insert a cheap memory barrier at the points marked /* 1 */ and /* 2 */ 
-- or, alternately, use the platform's atomic intrinsics (atomic_add(), 
InterlockedIncrement(), etc). On PPC, the memory barrier is 
traditionally the EIEIO instruction, on Intel (where it's not needed, 
unless you're writing to un-cached memory) it's the SFENCE instruction. 
I don't know if the new console PPC cores have a cheaper instruction 
than EIEIO.

Cheers,

                       / h+