mis-optimization: x+y vs. y+x
linux@horizon.com
linux@horizon.com
Tue May 27 03:11:00 GMT 2003
I was playing with the msp430 gcc port and noticed it was sometimes
doing far more register-shuffling than necessary. So I tried converting
the test cases to another 2-operand machine, the x86, that's a less
experimental target.
Consider the following two functions:
unsigned foo(unsigned x, unsigned y)
{
return x+y;
}
unsigned bar(unsigned x, unsigned y)
{
return y+x;
}
compiled with -O3 -fomit-frame-pointer -mregparm=3.
gcc 2.95.4 20011002 (Debian prerelease):
foo:
addl %edx,%eax
ret
bar:
addl %edx,%eax
ret
gcc 3.0.4, 3.2.3, and 3.3 (Debian):
foo:
addl %edx, %eax
ret
bar:
leal (%eax,%edx), %eax
ret
Notice that bar() uses a longer opcode than necessary.
Now, it could be claimed that the difference is trivial -
one byte, no cycles except some AGI issues. So let's try
a different commutative binary operation, one that the compiler
doesn't have a 3-operand version of.
Now, gcc 3.x adds an extra instruction. And, interestingly,
gcc 2.95.4 adds two:
gcc 2.95.4:
foo:
xorl %edx,%eax
ret
bar:
movl %eax,%ecx
movl %edx,%eax
xorl %ecx,%eax
ret
gcc 3.x:
foo:
xorl %edx, %eax
ret
bar:
xorl %eax, %edx
movl %edx, %eax
ret
I tried a few -march= options; they didn't seem to make any difference.
Anyway, I do appreciate your development efforts very much, even if
I mostly only speak up to complain.
More information about the Gcc-bugs
mailing list