Re: [Algorithms] pattern search for large integer
Brought to you by:
vexxed72
From: Jason H. <jas...@di...> - 2007-03-26 05:28:50
|
Drakie Awita wrote: > I have a large list of integers; each integer is of 128-bit in size. > How to answer this question efficiently: > > find those integers with bit-X equals to 1, bit-Y equals to 0, and > bit-Z equals to 1 (0<=X<Y<Z<=127). ie, find integers in a large list > with bit pattern of "*1*0*1*", where '1', '0', and '1' appears at > fixed bit position X, Y, and Z. The values of X, Y, Z are arbitrary > between [0,127] and have no observable pattern. > > The brute-force method of iterating through the list via bit-mask > check works, but it's O(n), and not practical if the list size is in > millions and there're multiple bit patterns to be matched on the same > integer list. > > Any faster algorithm? > > You do not specify much about the parameters of the algorithm, whether the bit masks are dynamically assigned or have a finite (and precomputable) number of possibilities, or whether there is a meaningful way to store only the elements that matter to you during these searches. Assuming the worst, that these integers are dynamically generated and the masks are dynamically composed for querying, you probably can do a little better. Consider storing the data in swizzled format, where all the (integers & 1) bits are stored in a packed buffer, and their index corresponds to the index of the actual integer. Do the same for (integers & 2), etc, leaving you with 128 buffers of bits. This means, however, for 128 bit integers, each new value must be bit masked and pushed onto 128 different buffers. This may be acceptable for you, depending on the generation rate versus query rate of the data. It's a lot faster than O(n) because you can trivially reject a wholly zero word on large sparse sets of integers, and cache coherency will definitely make it an order of magnitude faster, even if you are reduced to O(n) barrel shifting. If you can preprocess the data more, and know there are a finite number of queries that matter more than others, simply create a Markov-like chain where any X that is 1 will have a list of buckets that have Y that are 0 and Z are 1. Only pre-cache the ones that have the largest number of valid pieces of data, and remove them from the list that other queries will search, effectively reducing the 'n' in O(n) scans. There are many approaches. It just depends on your usage pattern, and how critical path this is. Thanks, JH |