From: Bruno H. <br...@cl...> - 2007-12-11 11:31:59
|
Hi Sam, > I found the way compelem.d:N_N_equal is written (see cmp1.c, attached) to > be counterintuitive, ugly, and unreadable, so I rewrote it in the more > algebraic way, using a single return (see cmp1.c, attached). Both are equivalent. The older one emphasizes the order of execution. I agree that yours looks better. > Then I thought that you probably has a reason why you wrote it the way > you did, and the only reason I could think of was optimization Indeed. The reason is that versions 2.x and 3.x of gcc create really dumb code for boolean expressions inside conditional expressions. > Since I can hardly read assembly (I can guess what "jmp" does, but not > what "movzbl" does) You can find descriptions of all i386 instructions by googling. Really, one cannot judge the quality of the code without either doing benchmarking or by understanding the intructions. > all I did was compare the assembly length: > 165 378 2502 cmp1-O0.s > 163 386 2527 cmp1-O2.s > 181 417 2769 cmp2-O0.s > 163 382 2487 cmp2-O2.s The total code size does not tell much. Code like this testl %eax,%eax jnz L1 popl %ebx popl %ecx movl %ebp,%esp popl %ebp ret L1: movl $1,%eax popl %ebx popl %ecx movl %ebp,%esp popl %ebp ret is longer but better than testl %eax,%eax jnz L1 L2: popl %ebx popl %ecx movl %ebp,%esp popl %ebp ret L1: movl $1,%eax jmp L2 (It is better because the average number of jumps is 0.5 in the first snippet and 1.0 in the second snippet. Jumps are particularly expensive instructions in modern CPU designs.) Therefore to get an impression of the quality of the code, instead of looking at the size, look 1) whether you see some obviously unnecessary instructions, 2) at the number of jumps executed in each code path. You attached the .s code (O2 - it's pointless to look at O0 code when we talk about optimizations). So here's the differences that I see: - Around line 97, cmp1-O2.s contains a tail-recursive call to R_R_equal. In cmp2-O2.s it is a normal call, followed by 3 test-and-set instructions for %eax, followed by a jump. So it has 2 jumps more than cmp1-O2.s (the return from R_R_equal and the jmp.L8), plus a sequence of 3 unnecessary instructions. - Around line 119, the same again. I conclude that gcc's handling of conditional expressions returning boolean results is still suboptimal, and cmp1 will therefore execute faster than cmp2. Bruno |