Re: [GD-General] cross platform 64bit lock free FIFO?
Brought to you by:
vexxed72
|
From: Jon W. <hp...@mi...> - 2006-08-09 16:26:31
|
Andras Balogh wrote:
> Yes, I only need one writer and one reader thread, but I'm not sure that I
> understand the solution you propose. Could you elaborate a bit? Thanks,
>
Here is the implementation, an all its simplicity:
// Lock-free FIFO by Jon Watte.
// Size must be power of 2.
// This FIFO is correct on systems with strongly ordered caches (such as
Intel
// and desktop PPC). It may require write barriers on systems with
weakly ordered
// caches, at the points marked /* 1 */ and /* 2 */.
template<typename Stored, size_t Size>
class LockFreeFifo {
public:
LockFreeFifo() { head = 0; tail = 0; }
/* get an item from the FIFO, return true if an item was gotten,
else return false */
bool get(Stored & out) {
if (head - tail > 0) {
out = fifo_[tail & (Size-1)];
++tail; /* 1 */
return true;
}
return false;
}
/* put an item into the FIFO, return true if it fit, false if it's
full */
bool put(Stored & in) {
if (head - tail < Size) {
fifo_[head & (Size-1)] = in;
++head; /* 2 */
return true;
}
return false;
}
private:
unsigned int head;
unsigned int tail;
Stored fifo_[Size];
};
Because it's non-blocking, it returns FALSE when there's nothing to do
(FIFO full or empty). Because there's a single reader and a single
writer, there is no thread contention on "head" vs "tail". Because it's
using unsigned math, and the Size is a power of two (important!), the
two's complement wrapping arithmetic makes for no special cases. (Note
the comparison of head-tail > 0 !)
On Intel, and other machines with strongly ordered caches, this doesn't
need any kind of memory barrier or interlocked operation at all.
On some more exotic systems with weakly ordered caches (where you may
get "old" data when reading), there is a chance of get() returning false
when there is actually one item in the cache; similarly, there's a
chance of put() returning false when there is actually space. When the
"head" and "tail" share a cache line, the maximum difference can be one
item. However, for absolute correctness in these cases, you want to
insert a cheap memory barrier at the points marked /* 1 */ and /* 2 */
-- or, alternately, use the platform's atomic intrinsics (atomic_add(),
InterlockedIncrement(), etc). On PPC, the memory barrier is
traditionally the EIEIO instruction, on Intel (where it's not needed,
unless you're writing to un-cached memory) it's the SFENCE instruction.
I don't know if the new console PPC cores have a cheaper instruction
than EIEIO.
Cheers,
/ h+
|