Re: [GD-General] cross platform 64bit lock free FIFO?
Brought to you by:
vexxed72
From: Jon W. <hp...@mi...> - 2006-08-09 16:26:31
|
Andras Balogh wrote: > Yes, I only need one writer and one reader thread, but I'm not sure that I > understand the solution you propose. Could you elaborate a bit? Thanks, > Here is the implementation, an all its simplicity: // Lock-free FIFO by Jon Watte. // Size must be power of 2. // This FIFO is correct on systems with strongly ordered caches (such as Intel // and desktop PPC). It may require write barriers on systems with weakly ordered // caches, at the points marked /* 1 */ and /* 2 */. template<typename Stored, size_t Size> class LockFreeFifo { public: LockFreeFifo() { head = 0; tail = 0; } /* get an item from the FIFO, return true if an item was gotten, else return false */ bool get(Stored & out) { if (head - tail > 0) { out = fifo_[tail & (Size-1)]; ++tail; /* 1 */ return true; } return false; } /* put an item into the FIFO, return true if it fit, false if it's full */ bool put(Stored & in) { if (head - tail < Size) { fifo_[head & (Size-1)] = in; ++head; /* 2 */ return true; } return false; } private: unsigned int head; unsigned int tail; Stored fifo_[Size]; }; Because it's non-blocking, it returns FALSE when there's nothing to do (FIFO full or empty). Because there's a single reader and a single writer, there is no thread contention on "head" vs "tail". Because it's using unsigned math, and the Size is a power of two (important!), the two's complement wrapping arithmetic makes for no special cases. (Note the comparison of head-tail > 0 !) On Intel, and other machines with strongly ordered caches, this doesn't need any kind of memory barrier or interlocked operation at all. On some more exotic systems with weakly ordered caches (where you may get "old" data when reading), there is a chance of get() returning false when there is actually one item in the cache; similarly, there's a chance of put() returning false when there is actually space. When the "head" and "tail" share a cache line, the maximum difference can be one item. However, for absolute correctness in these cases, you want to insert a cheap memory barrier at the points marked /* 1 */ and /* 2 */ -- or, alternately, use the platform's atomic intrinsics (atomic_add(), InterlockedIncrement(), etc). On PPC, the memory barrier is traditionally the EIEIO instruction, on Intel (where it's not needed, unless you're writing to un-cached memory) it's the SFENCE instruction. I don't know if the new console PPC cores have a cheaper instruction than EIEIO. Cheers, / h+ |