This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Problem with -ftree-ter and loop unrolling
Hello,
> > > So far OK, but with ter, this becomes
> > >
> > > sum1 = 0;
> > > sum2 = 0;
> > > for (i = 0; i < n; i+=4)
> > > {
> > > x_1 = a[i];
> > > y_1 = b[i];
> > > x_2 = a[i+1];
> > > y_2 = b[i+1];
> > > x_3 = a[i+2];
> > > y_3 = b[i+2];
> > > x_4 = a[i+3];
> > > y_4 = b[i+3];
> > > sum1 += x_1 * y_1 + x_2 * y_2 + x_3 * y_3 + x_4 * y_4;
> > > sum2 += x_1 / y_1 + x_2 / y_2 + x_3 / y_3 + x_4 / y_4;
> > > }
> > >
> > > Now we need some 11 registers for the loop, instead of the original 5
> > > (and the number of registers grows with the unroll factor).
> >
> > The TER hack we settled on for PR17549 was supposed to prevent this kind
> > of thing, but it was already obvious at the time that a better fix is
> > needed in the general case. You've find a pretty nasty one here.
>
> Why didn't it trigger? I can't reproduce it by a bit of simple hacking
> around, have you got a little testcase and options to turn on to produce
> this?
-O1 suffices. The (sum? + 1) is needed to workaround the hack
introduced to fix PR17549 (and it is very close to what happens in
sixtrack, except that there the operation with the accumulated variable
is a bit more complicated).
Zdenek
int a[200], b[200];
void xxx(void)
{
int i, sum1 = 0, sum2 = 0, x, y;
for (i = 0; i < 200; i+=8)
{
x = a[i];
y = b[i];
sum1 = (sum1 + 1) + x * y;
sum2 = (sum2 + 1) + x / y;
x = a[i+1];
y = b[i+1];
sum1 = (sum1 + 1) + x * y;
sum2 = (sum2 + 1) + x / y;
x = a[i+2];
y = b[i+2];
sum1 = (sum1 + 1) + x * y;
sum2 = (sum2 + 1) + x / y;
x = a[i+3];
y = b[i+3];
sum1 = (sum1 + 1) + x * y;
sum2 = (sum2 + 1) + x / y;
x = a[i+4];
y = b[i+4];
sum1 = (sum1 + 1) + x * y;
sum2 = (sum2 + 1) + x / y;
x = a[i+5];
y = b[i+5];
sum1 = (sum1 + 1) + x * y;
sum2 = (sum2 + 1) + x / y;
x = a[i+6];
y = b[i+6];
sum1 = (sum1 + 1) + x * y;
sum2 = (sum2 + 1) + x / y;
x = a[i+7];
y = b[i+7];
sum1 = (sum1 + 1) + x * y;
sum2 = (sum2 + 1) + x / y;
}
bla (sum1, sum2);
}