Thread: [Algorithms] pattern search for large integer
Brought to you by:
vexxed72
From: Drakie A. <dr...@gm...> - 2007-03-26 03:55:41
|
I have a large list of integers; each integer is of 128-bit in size. How to answer this question efficiently: find those integers with bit-X equals to 1, bit-Y equals to 0, and bit-Z equals to 1 (0<=X<Y<Z<=127). ie, find integers in a large list with bit pattern of "*1*0*1*", where '1', '0', and '1' appears at fixed bit position X, Y, and Z. The values of X, Y, Z are arbitrary between [0,127] and have no observable pattern. The brute-force method of iterating through the list via bit-mask check works, but it's O(n), and not practical if the list size is in millions and there're multiple bit patterns to be matched on the same integer list. Any faster algorithm? |
From: Jason H. <jas...@di...> - 2007-03-26 05:28:50
|
Drakie Awita wrote: > I have a large list of integers; each integer is of 128-bit in size. > How to answer this question efficiently: > > find those integers with bit-X equals to 1, bit-Y equals to 0, and > bit-Z equals to 1 (0<=X<Y<Z<=127). ie, find integers in a large list > with bit pattern of "*1*0*1*", where '1', '0', and '1' appears at > fixed bit position X, Y, and Z. The values of X, Y, Z are arbitrary > between [0,127] and have no observable pattern. > > The brute-force method of iterating through the list via bit-mask > check works, but it's O(n), and not practical if the list size is in > millions and there're multiple bit patterns to be matched on the same > integer list. > > Any faster algorithm? > > You do not specify much about the parameters of the algorithm, whether the bit masks are dynamically assigned or have a finite (and precomputable) number of possibilities, or whether there is a meaningful way to store only the elements that matter to you during these searches. Assuming the worst, that these integers are dynamically generated and the masks are dynamically composed for querying, you probably can do a little better. Consider storing the data in swizzled format, where all the (integers & 1) bits are stored in a packed buffer, and their index corresponds to the index of the actual integer. Do the same for (integers & 2), etc, leaving you with 128 buffers of bits. This means, however, for 128 bit integers, each new value must be bit masked and pushed onto 128 different buffers. This may be acceptable for you, depending on the generation rate versus query rate of the data. It's a lot faster than O(n) because you can trivially reject a wholly zero word on large sparse sets of integers, and cache coherency will definitely make it an order of magnitude faster, even if you are reduced to O(n) barrel shifting. If you can preprocess the data more, and know there are a finite number of queries that matter more than others, simply create a Markov-like chain where any X that is 1 will have a list of buckets that have Y that are 0 and Z are 1. Only pre-cache the ones that have the largest number of valid pieces of data, and remove them from the list that other queries will search, effectively reducing the 'n' in O(n) scans. There are many approaches. It just depends on your usage pattern, and how critical path this is. Thanks, JH |
From: Niklas F. <ni...@gr...> - 2007-03-26 07:22:28
|
> I have a large list of integers; each integer is of 128-bit in size. > How to answer this question efficiently: > > find those integers with bit-X equals to 1, bit-Y equals to 0, and > bit-Z equals to 1 (0<=X<Y<Z<=127). ie, find integers in a large list > with bit pattern of "*1*0*1*", where '1', '0', and '1' appears at > fixed bit position X, Y, and Z. The values of X, Y, Z are arbitrary > between [0,127] and have no observable pattern. > > The brute-force method of iterating through the list via bit-mask > check works, but it's O(n), and not practical if the list size is in > millions and there're multiple bit patterns to be matched on the same > integer list. Is brute-force really too slow? Note that as you have described the problem, no algorithm can be faster than (n/8), since you would expect on average (n/8) numbers to match the pattern. So whatever algorithm you can come up with, you can never be more than 8 times faster than brute force. And whatever complicated algorithm you come up with is likely to carry additional costs, such as using more memory, destroying the cache etc. Also, if you do any kind of processing on the numbers you may find the cost of that (n/8*p, where p is the processing cost) to be comparable to the (n) cost of finding the numbers. You might be better off just using the brute force algorithm and optimizing it as much as possible with SIMD instructions and prefetch of cache lines. // Niklas |
From: Robert D. <bli...@go...> - 2007-03-26 07:53:39
|
> I have a large list of integers; each integer is of 128-bit in size. > How to answer this question efficiently: > > find those integers with bit-X equals to 1, bit-Y equals to 0, and > bit-Z equals to 1 (0<=X<Y<Z<=127). ie, find integers in a large list > with bit pattern of "*1*0*1*", where '1', '0', and '1' appears at > fixed bit position X, Y, and Z. The values of X, Y, Z are arbitrary > between [0,127] and have no observable pattern. > > The brute-force method of iterating through the list via bit-mask > check works, but it's O(n), and not practical if the list size is in > millions and there're multiple bit patterns to be matched on the same > integer list. > > Any faster algorithm? If your input data was sorted then you could do much better simply using binary searches. As others have mentioned, it really depends on what you plan to do with it ... Is this a static list of integers, or one generated dynamically which is changing all the time, or one which is slowly being added to ? Does the order of the data matter to you ? Do you have limited storage for additional data ? |
From: Tony C. <to...@mi...> - 2007-03-26 17:14:59
|
If there is nothing special about the list (i.e. it's not sorted or indexed= in a useful way), then you must visit every element in it at least once, s= o the best you can possibly hope to do is O(n). After that, all you can really do is hope to improve the running time by a = constant factor - which quickly gets into low-level stuff like what kinds o= f prefetching and bit-twiddling stuff happens to be efficient on your targe= t architecture. Touching the memory is likely to be the dominating factor in performance on= a modern architecture - proper prefetching will probably be your biggest w= in. If you have multiple bit patterns to match on the same run, you are probabl= y best testing all the bit patterns in one pass (assuming the number of pat= terns to match is a relatively small number) to avoid refetching the memory= multiple times. If you have lots of patterns to match (to many to do singl= e pass), break it up into cache-sized blocks and do multiple passes on a bl= ock before moving on to avoid refetching. All of these tricks are just hacking away at the constant factor, though. - Tony -----Original Message----- From: gda...@li... [mailto:gdalgorithms-= lis...@li...] On Behalf Of Drakie Awita Sent: Sunday, March 25, 2007 8:56 PM To: GDA...@li... Subject: [Algorithms] pattern search for large integer I have a large list of integers; each integer is of 128-bit in size. How to answer this question efficiently: find those integers with bit-X equals to 1, bit-Y equals to 0, and bit-Z equals to 1 (0<=3DX<Y<Z<=3D127). ie, find integers in a large list with bit pattern of "*1*0*1*", where '1', '0', and '1' appears at fixed bit position X, Y, and Z. The values of X, Y, Z are arbitrary between [0,127] and have no observable pattern. The brute-force method of iterating through the list via bit-mask check works, but it's O(n), and not practical if the list size is in millions and there're multiple bit patterns to be matched on the same integer list. Any faster algorithm? |