From: Jesse Guardiani <jesse@wi...>  20040630 15:19:16

Brian Hurt wrote: > On Wed, 30 Jun 2004, Jesse Guardiani wrote: > >> Are you saying that the following code (for int lists) is faster than the >> generic list intersect function I posted for large lists? > > Yes. Even faster would be to use a specialized compare function, like: > > module IntSet = Set.Make ( > struct > type t = int > let compare (x: int) y = > if (x = y) then 0 > else if (x < y) then 1 > else 1 > end > );; > >> let list_to_set = List.fold_left >> (fun x y > IntSet.add y x) >> IntSet.empty >> ;; > > This is O(N log N). > >> >> >> let inter_int_list il1 il2 = >> let t1 = list_to_set il1 in >> let t2 = list_to_set il2 in >> IntSet.elements (IntSet.inter t1 t2); >> ;; > > This is O(N log N) as well. Maybe O(N), I haven't looked. > > Converting the set back to a list is O(N) as well. > > How much do you know about bigO notation? Sounds like little or nothing. > Here's the core of the idea. Measure how long the algorithms take for > various lengths of time, and you'll come up with some formula given n, > the number of elements in the set, computing the intersection will take > a*n^2 + b*n + c seconds, for some constants a, b, and c, using your list > method. Using a set, the formula will instead look like x*n*log(n) + y*n > + z, for some constants x, y, and z. > > No matter what the constants are, there will be an n such that > a*n^2 + b*n + c > x*n*log(n) + y*n + z. At that point or larger, the set > based implementation starts being faster. This becomes more obvious if > you look at how the two functions scale. If n elements takes t time, then > how long does 2*n elements take? With the listbased algorithm, it'll > take 4*t time, more or less. With the set based implementation, it'll > take 2*t + c time, for some small c. The time the list based > implementation takes grows faster than the set based implementation, and > sooner or later it'll overtake and pass it. > > Worse yet, this generally happens sooner than you think. I'd predict that > the list based implementation and the set based implementation will take > the same amount of time for n somewhere between 3 and 100. Let's be > generous and assume the balance point is n64 (note: I haven't measured > this, so don't depend upon this number! I'm just picking a number to make > my math easier). At n=64 elements, both algorithms take the same time > 100 microseconds. How much time do they take at n=1024 elements? The > list based algorithm will take about 25,600 microseconds. The Set based > implementation, on the other hand, will take only about 2,700 microseconds > (I get this number the following way I know that k*64*log2(64) = 100 > usec, solving for k = 0.26, and then I plug that into k*1024*log2(1024)) . > At 64 elements they're the same speed, at 1024 elements, the set based > implementation is 10 times faster. At 2^20th elements, the set based > implementation is almost 5,000 times faster. > > This is why experienced programmers are much more interested in optimizing > algorithsm it's where the big payoffs are. Basically, you're not going > to see 10fold improvements in speed on the same machine. Maybe, just > maybe, badly interepreted code vr.s super optimized compiled code has a > 10fold speed difference. Not much more than that. And 10fold increases > in computer performance take the better part of a decade basically, > 350MHz PIIIs vr.s 3.5GHz P4s. How long ago were 350MHz PIIIs hot? Six > years? Seven years? At n=1024, the set based intersection on a 350MHz > PIII is about as fast as the list based intersection on a 3.5GHz P4. That's all incredibly interesting. I plan to print it out and study it, but do you "experienced programmers" ever actually write code that DOES something? Or do you just fiddle with algorithms all day? :) I'm going to use my simplified list intersect function for now (my lists are very small, usually less than 10 elements), and later when I have some time I'll benchmark the two algorithms and see where they intersect. If the intersection point is high enough, then it may well be worth it to provide two versions of the function designed for different sized lists. I just don't know how much time I can spare to benchmark algorithms when I should be writing code that does actual work. Sometimes you have to leave theory behind and get your hands dirty...  Jesse Guardiani, Systems Administrator WingNET Internet Services, P.O. Box 2605 // Cleveland, TN 373202605 423559LINK (v) 4235595145 (f) http://www.wingnet.net 