mis-optimization: x+y vs. y+x

linux@horizon.com linux@horizon.com
Tue May 27 03:11:00 GMT 2003


I was playing with the msp430 gcc port and noticed it was sometimes
doing far more register-shuffling than necessary.  So I tried converting
the test cases to another 2-operand machine, the x86, that's a less
experimental target.

Consider the following two functions:

unsigned foo(unsigned x, unsigned y)
{
	return x+y;
}

unsigned bar(unsigned x, unsigned y)
{
	return y+x;
}

compiled with -O3 -fomit-frame-pointer -mregparm=3.

gcc 2.95.4 20011002 (Debian prerelease):
foo:
	addl %edx,%eax
	ret
bar:
	addl %edx,%eax
	ret

gcc 3.0.4, 3.2.3, and 3.3 (Debian):
foo:
	addl	%edx, %eax
	ret
bar:
	leal	(%eax,%edx), %eax
	ret

Notice that bar() uses a longer opcode than necessary.


Now, it could be claimed that the difference is trivial -
one byte, no cycles except some AGI issues.  So let's try
a different commutative binary operation, one that the compiler
doesn't have a 3-operand version of.

Now, gcc 3.x adds an extra instruction.  And, interestingly,
gcc 2.95.4 adds two:

gcc 2.95.4:
foo:
	xorl %edx,%eax
	ret
bar:
	movl %eax,%ecx
	movl %edx,%eax
	xorl %ecx,%eax
	ret

gcc 3.x:
foo:
	xorl	%edx, %eax
	ret
bar:
	xorl	%eax, %edx
	movl	%edx, %eax
	ret


I tried a few -march= options; they didn't seem to make any difference.


Anyway, I do appreciate your development efforts very much, even if
I mostly only speak up to complain.



More information about the Gcc-bugs mailing list