[Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression
zippel at gcc dot gnu dot org
gcc-bugzilla@gcc.gnu.org
Thu Jul 19 18:27:00 GMT 2007
------- Comment #13 from zippel at gcc dot gnu dot org 2007-07-19 18:27 -------
The initial test case is part of the missed optimization. For example current
stable Debian gcc (4.1.2 20061115) produces code like this:
movl 4(%esp), %eax
movl 8(%esp), %edx
leal (%eax,%edx,4), %edx
movl 4(%edx), %ecx
movl 8(%edx), %eax
addl %ecx, %eax
movl 12(%edx), %ecx
addl %ecx, %eax
ret
Which has some unnecessaries moves, but it shows the basic idea, so with
eliminated moves it would be:
movl 4(%esp), %eax
movl 8(%esp), %edx
leal (%eax,%edx,4), %edx
movl 4(%edx), %eax
addl 8(%edx), %eax
addl 12(%edx), %eax
ret
>From the code size this is identical to:
movl 4(%esp), %ecx
movl 8(%esp), %edx
movl 8(%ecx,%edx,4), %eax
addl 4(%ecx,%edx,4), %eax
addl 12(%ecx,%edx,4), %eax
ret
But it depends now on the target which instruction sequence is better.
The problem is now with the new canonical form, that AFAICT it has become
practically very difficult to generate the optimal sequence based on
instruction costs.
The older gcc produces this IL before RTL generation:
D.1283 = (int *) (i * 4) + p;
return *(D.1283 + 4B) + *(D.1283 + 8B) + *(D.1283 + 12B);
which produces far better RTL for the optimizers to work with.
BTW this problem is not limited to pointer expression, since the lea
instruction is used in other expressions as well.
Let's take this example:
void f(unsigned int *p, unsigned int a)
{
p[0] = a * 4 + 4;
p[1] = a * 4 + 8;
p[2] = a * 4 + 12;
}
Above gcc 4.1 produces this:
D.1281 = a * 4;
*p = D.1281 + 4;
*(p + 4B) = D.1281 + 8;
*(p + 8B) = D.1281 + 12;
movl 8(%esp), %eax
movl 4(%esp), %ecx
sall $2, %eax
leal 4(%eax), %edx
movl %edx, (%ecx)
leal 8(%eax), %edx
addl $12, %eax
movl %edx, 4(%ecx)
movl %eax, 8(%ecx)
ret
gcc 4.2 produces this:
*p = (a + 1) * 4;
D.1545 = a * 4;
*(p + 4B) = D.1545 + 8;
*(p + 8B) = D.1545 + 12;
movl 8(%esp), %eax
movl 4(%esp), %ecx
leal 4(,%eax,4), %edx
sall $2, %eax
movl %edx, (%ecx)
leal 8(%eax), %edx
addl $12, %eax
movl %edx, 4(%ecx)
movl %eax, 8(%ecx)
ret
So 4.2 already produces slightly worse code.
Current gcc finally produces:
*p = (a + 1) * 4;
*(p + 4) = (a + 2) * 4;
*(p + 8) = (a + 3) * 4;
movl 8(%esp), %eax
movl 4(%esp), %ecx
leal 4(,%eax,4), %edx
movl %edx, (%ecx)
leal 8(,%eax,4), %edx
leal 12(,%eax,4), %eax
movl %edx, 4(%ecx)
movl %eax, 8(%ecx)
ret
This has now the largest code size of all versions.
This new canonical form IMHO clearly conflicts with what is expected at RTL
level, so I don't understand why it's so important to use this one. Could you
maybe explain the reason behind this choice?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698
More information about the Gcc-bugs
mailing list