[Bug rtl-optimization/36223] New: bad interaction between PRE/register allocation/reload
sandra at codesourcery dot com
gcc-bugzilla@gcc.gnu.org
Mon May 12 18:59:00 GMT 2008
This is a missed-optimization bug. The following reduced test case illustrates
the problem. It doesn't do anything useful, but just compile it with
mipsisa32r2-elfoabi-gcc -S -mtune=24kc -G4096 -O2 example4.c
#define N 511
#define M 9
long A[N];
long B[N];
long AA[N];
long BB[N];
long tA;
long tB;
void foo (unsigned iterations)
{
unsigned loop_cnt;
static long *aLow;
static long *bLow;
static long *aHi;
static long *bHi;
static long n1;
static long n2;
static long l;
static long i;
static long j;
static long k;
for (loop_cnt = 0; loop_cnt < iterations; loop_cnt ++) {
/* This is the loop we're interested in. */
for (i = 0; i < N; i ++) {
AA[i] = A[i];
BB[i] = B[i];
}
/* The rest of this stuff is just here to add some context to the
outer loop. */
for (k = 1; k <= M; k++) {
n1 = 1 << k;
n2 = n1 >> 1;
for (j = 0; j < n2; j++) {
for (i = j; i < N; i += n1) {
l = i + n2;
aLow = &A[l];
bLow = &B[l];
aHi = &A[i];
bHi = &B[i];
A[l] = *aHi - tA;
B[l] = *bHi - tB;
A[i] += tA;
B[i] += tB;
}
}
}
}
}
The -G option forces the global variables to use GP-relative addressing, which
involves an extra addition. Thus the first nested loop should be optimized as
if it were written:
{
long *t1 = AA;
long *t2 = A;
long *t3 = BB;
long *t4 = B;
for (i = 0; i < N; i++) {
*t1 = *t2;
*t3 = *t4;
t1++; t2++; t3++; t4++;
}
}
In 4.3.1, though, it is producing code with GP-relative addressing inside the
loop, so that the loop body has 9 adds instead of 5. Mainline head does a
better job and at least pulls out the references to A and B (which also appear
in the second nested loop).
PRE is working fine, and pulling the invariant GP-relative addressing of all
four variables all the way out of the outer loop. However, this means the
lifetimes of the corresponding pseudo-registers span the entire outer loop, and
the register allocator is (correctly) giving priority to the more localized
pseudos in the more deeply nested loops that follow. Having failed to allocate
a hardware register to span the entire lifetime of the pseudos, reload stupidly
re-inserts the previously hoisted GP-relative address computation at the point
of reference, inside the first nested loop.
I think what is needed is more smarts to make it understand that it should try
allocating a register just around the inner loop if it can't get one for the
entire outer loop, before giving up. Any thoughts on where the best place for
this to happen would be? Can this be done entirely within the register
allocator or do we need another pass to identify places where we can
potentially shorten the lifetimes of pseudos?
While this example is specific to MIPS with the GP-relative addressing, I can
see that the underlying PRE/register allocation conflict is a more general
problem that probably crops up in lots of other code with similar structure of
outer-loop-containing-multiple-inner-loops.
-Sandra
--
Summary: bad interaction between PRE/register allocation/reload
Product: gcc
Version: 4.3.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: sandra at codesourcery dot com
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: mipsisa32r2-elfoabi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36223
More information about the Gcc-bugs
mailing list