On Wed, 29 Nov 2000, Erik Walthinsen wrote:
> line usage. It may be instructive to look at the generated assembly for
> this revision and the previous, to see if gcc is smart enough to do this
> for us anyway.
Wow. That *was* instructive:
pushl %ebp cothread_switch (cothread_state *thread) {
movl %esp,%ebp
subl $12,%esp
cmpl $0,8(%ebp) if (thread == NULL)
jne .L24
jmp .L25 goto nothread;
.p2align 4,,7
.L24:
movl 8(%ebp),%eax
movl (%eax),%edx
movl %edx,-4(%ebp) ctx = thread->ctx;
cmpl $0,-4(%ebp) if (ctx == NULL)
jne .L26
jmp .L27 goto nocontext;
.p2align 4,,7
.L26:
. . .
Man, that's *pathetic*!
Then I realized that it was compiled with no optimization flags. Add -O6:
pushl %ebp cothread_switch (cothread_state *thread) {
movl %esp,%ebp
subl $4,%esp
pushl %esi
pushl %ebx
cmpl $0,8(%ebp) if (thread == NULL)
je .L79 goto nothread;
movl 8(%ebp),%ecx
movl (%ecx),%ecx ctx = thread->ctx;
movl %ecx,-4(%ebp)
testl %ecx,%ecx if (ctx == NULL)
je .L81 goto nocontext;
. . .
That's *much* better. Though it is interesting how those two tests are
implemented differently, one with a direct compare to zero, the other by
comparing against itself (I think, I'm not up on bare x86 asm). My guess
is that it has something to do with alignment, since the cmpl requires as
many as 5 or 6 bytes to the testl's 1 or 2, since the cmpl has to encode
the $0 with as many as 32 bits.
That restores my faith in gcc. And now to compare with the previous
revision, where all the error handling is inline with the main code. I
won't even bother trying without -O6:
pushl %ebp cothread_switch (cothread_state *thread) {
movl %esp,%ebp
subl $4,%esp
pushl %esi
pushl %ebx
cmpl $0,8(%ebp) if (thread == NULL) {
jne .L78
pushl $.LC8
call g_print g_print("cothread: there's no thread...
jmp .L77 return; }
.p2align 4,,7
.L78:
movl 8(%ebp),%ecx
movl (%ecx),%ecx
movl %ecx,-4(%ebp) ctx = thread->ctx
movl 68(%ecx),%esi
leal 0(,%esi,4),%eax
movl (%eax,%ecx),%edx current = ctx->threads[ctx->current];
testl %edx,%edx if (current = NULL) {
jne .L79
pushl $.LC9
call g_print g_print("cothread: there's no current...
pushl $2
call exit exit(2); }
.p2align 4,,7
.L79:
. . .
So, I was right. Moving the error handling to the end causes a noticable
improvement in the quality of the generated asm, even with mega
optimizations turned on.
This does say that we really shouldn't even bother compiling without
optimizations. I'll see if I can get that fixed.
Erik Walthinsen <omega@...> - Staff Programmer @ OGI
Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
__
/ \ SEUL: Simple End-User Linux - http://www.seul.org/
| | M E G A Helping Linux become THE choice
_\ /_ for the home or office user
|