mis-optimization: x+y vs. y+x
Tue May 27 03:11:00 GMT 2003
I was playing with the msp430 gcc port and noticed it was sometimes
doing far more register-shuffling than necessary. So I tried converting
the test cases to another 2-operand machine, the x86, that's a less
Consider the following two functions:
unsigned foo(unsigned x, unsigned y)
unsigned bar(unsigned x, unsigned y)
compiled with -O3 -fomit-frame-pointer -mregparm=3.
gcc 2.95.4 20011002 (Debian prerelease):
gcc 3.0.4, 3.2.3, and 3.3 (Debian):
addl %edx, %eax
leal (%eax,%edx), %eax
Notice that bar() uses a longer opcode than necessary.
Now, it could be claimed that the difference is trivial -
one byte, no cycles except some AGI issues. So let's try
a different commutative binary operation, one that the compiler
doesn't have a 3-operand version of.
Now, gcc 3.x adds an extra instruction. And, interestingly,
gcc 2.95.4 adds two:
xorl %edx, %eax
xorl %eax, %edx
movl %edx, %eax
I tried a few -march= options; they didn't seem to make any difference.
Anyway, I do appreciate your development efforts very much, even if
I mostly only speak up to complain.
More information about the Gcc-bugs