This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: a strange infelicity of register allocation
- To: law at cygnus dot com
- Subject: Re: a strange infelicity of register allocation
- From: Zack Weinberg <zack at rabi dot columbia dot edu>
- Date: Mon, 25 Jan 1999 16:30:54 -0500
- cc: Joern Rennecke <amylaar at cygnus dot co dot uk>, egcs at egcs dot cygnus dot com
On Mon, 25 Jan 1999 12:56:16 -0700, Jeffrey A Law wrote:
>
> In message <199901251946.OAA06903@blastula.phys.columbia.edu>you write:
>
> > The generated code is a little silly even when adjusted:
[...]
>The second sequence may actually be slower though. The use of %eax and %al
>may trigger an interlock. Someone would need to check.
>
>If we spilled, then we're probably going to lose regardless. That's why
>I'd prefer to see us work on avoid the spills to start with by tightening
>up the x86 machine description.
The code is actually much worse than I thought, even when tweaked.
Input:
unsigned char *ip, *op;
for(;;)
{
unsigned char c;
c = *ip++;
switch(c)
{
default: *op++ = c; break;
case '\0': goto eof;
case '\r': /* ... */ break;
case '\n': /* ... */ break;
case '?': /* ... */ break;
}
}
eof:
Current snapshot, -O2:
switch:
movb (%esi),%al
movb %al,-4125(%ebp)
incl %esi
xorl %eax,%eax
movb -4125(%ebp),%al
cmpl $10,%eax
je case_one
jg case_two_or_three
testl %eax,%eax
je case_four
jmp default
.p2align 4,,7
case_two_or_three:
cmpl $13,%eax
je case_two
cmpl $63,%eax
je case_three
default:
movb -4125(%ebp),%dl
movb %dl,(%ebx)
incl %ebx
jmp switch
.p2align 4,,7
...
-4125(%ebp) will be in cache, of course, but this is still pretty
disgusting.
There is questionable code elsewhere, such as:
decl -4112(%ebp)
movl -4112(%ebp),%ecx
- if this were written the other way around, you'd avoid an address
generation, and a read-mod-write to memory; note that this particular
location is unlikely to be in L1. Also the code would be smaller.
This looks like the problem Marc Espie keeps complaining about.
.L331:
cmpb $63,-4125(%ebp)
jne .L349
movb 1(%esi),%cl
movb %cl,-4125(%ebp)
testb %cl,%cl
jne .L333
cmpl $0,-4120(%ebp)
jne .L333
decl -4112(%ebp)
movl -4112(%ebp),%eax
movb $63,(%eax)
decl %eax
movl %eax,-4112(%ebp)
movb $63,(%eax)
jmp .L311
.p2align 4,,7
.L333:
movzbl -4125(%ebp),%edi
cmpb $0,trigraph_table(%edi)
jne .L334
Here the store to -4125(%ebp) would be dead if it weren't used in the
movzbl at .L333. I guess this is what Joern said about reloads not
inheriting down branches taken.
C:
buf = xrealloc (buf, op - buf);
fp->buf = buf;
return op - buf;
becomes
movl %ebx,%eax
subl -4104(%ebp),%eax
pushl %eax
movl -4104(%ebp),%edx
pushl %edx
call xrealloc
movl %eax,-4104(%ebp)
movl 12(%ebp),%eax
movl -4104(%ebp),%ecx
movl %ecx,(%eax)
subl %ecx,%ebx
movl %ebx,%eax
jmp epilogue
This could have been done
movl -4104(%ebp), %eax
subl %eax, %ebx
pushl %ebx
pushl %eax
call xrealloc
movl 12(%ebp), %ecx
movl %eax, (%ecx)
movl %ebx, %eax
jmp epilogue
I think the key issue here is that it doesn't seem to know to preserve
values in call-saved registers over a function call. It's somewhat
silly to be microoptimizing around a call to xrealloc, but this sort
of code is all over the place.
zw