This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Unnecessary sign- and zero-extensions in GCC?
- From: Nicholas Nethercote <njn at cs dot utexas dot edu>
- To: gcc at gcc dot gnu dot org
- Date: Mon, 18 Apr 2005 13:53:53 -0500 (CDT)
- Subject: Unnecessary sign- and zero-extensions in GCC?
Hi,
I've been looking at GCC's use of sign-extensions when dealing with
integers smaller than a machine word size. It looks like there is room
for improvement.
Consider this C function:
short g(short x)
{
short i;
for (i = 0; i < 10; i++) {
x += i;
}
return x;
}
On x86, using a GCC 4.0.0 20050130, with -O2 I get this code:
g:
pushl %ebp
xorl %edx, %edx
movl %esp, %ebp
movswl 8(%ebp),%ecx
.p2align 4,,15
.L2:
leal (%ecx,%edx), %eax
movswl %ax,%ecx # 1
leal 1(%edx), %eax
movzwl %ax, %eax # 2
cmpw $10, %ax
movswl %ax,%edx # 3
jne .L2
popl %ebp
movl %ecx, %eax
ret
.size g, .-g
.p2align 4,,15
The three extensions (#1, #2, #3) here are unnecessarily conservative.
This would be better:
g:
pushl %ebp
xorl %edx, %edx
movl %esp, %ebp
movswl 8(%ebp),%ecx
.p2align 4,,15
.L2:
leal (%ecx,%edx), %ecx # x += i
leal 1(%edx), %edx # i++
cmpw $10, %dx # i < 10 ?
jne .L2
popl %ebp
movswl %cx, %eax
ret
GCC's approach seems to be *eager*, in that sign-extensions are done
immediately after sub-word operations. This ensures that the high bits of
a register holding a sub-word value are always valid.
An alternative is to allow the high bits of registers holding sub-word
values to be "junk", and do sign-extensions *lazily*, only before
operations in which any "junk" high bits could adversely affect the
result. For example, if you do a right shift on a value with "junk" high
bits you have to sign/zero-extend it first, because high bits in the
operands can affect low bits in the result. The same is true of division.
In contrast, an addition of two 16-bit values with "junk" high bits is ok
if the result is also a 16-bit value. The same is true of subtraction,
multiplication and logical ops. The reason is that for these operations,
the low 16 bits of the result do not depend on the high 16 bits of the
operands.
Although you can construct examples where the eager approach gives better
code, in general I think the lazy approach results in better code, such as
in the above example. Is there a particular reason why GCC uses the eager
approach? Maybe it has to do with the form of GCC's intermediate
representation? Or are there are some subtleties to do with the lazy
approach that I have overlooked? Or maybe I've misunderstood GCC's
approach.
Any comments are appreciated. Thanks for your help.
Nick