This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Variable Expansion Optimization
- From: Ayal Zaks <ZAKS at il dot ibm dot com>
- To: Steven Bosscher <stevenb at suse dot de>
- Cc: gcc at gcc dot gnu dot org, Mircea Namolaru <NAMOLARU at il dot ibm dot com>, Revital Eres <ERES at il dot ibm dot com>
- Date: Sun, 15 Aug 2004 15:10:12 +0300
- Subject: Re: Variable Expansion Optimization
Steven Bosscher <stevenb@suse.de> wrote on 15/08/2004 14:38:59:
> On Sunday 15 August 2004 12:20, Revital Eres wrote:
> > Hello,
> >
> > I intend to implement variable expansion
> > in the body of the unrolled loop.
> >
> > For example, consider the following loop
> > which contains an accumulator:
> >
> > for (i = 0 ; i < n; i++)
> > sum += a[i];
> >
> > After the unrolling we get the following code:
> >
> > sum += a[i];
> > ....
> > i = i+1;
> > sum += a[i];
> > ....
> > i = i+1
> > sum += a[i];
> > ....
> >
> > The variable expansion optimization will expand the variable
> > sum into n separate copies, one for each copy of the loop body
> > and combine them at the loop exit:
> >
> > sum += a[i]
> > ....
> > i = i+1;
> > sum1 += a[i]
> > ....
> > i = i+1
> > sum2 += a[i];
> > ....
> >
> > This transformation decreases the number of dependences
> > in the loop, thus making instruction scheduling
> > more effective when applied on it.
>
> Hmm, I guess you're going to have carefully analyze what the
> effects of such a transformation are on register pressure.
Sure. We should consider register pressure and restrain the increase of
parallelism accordingly (like gcse, etc.). In the case below, it should be
an easy extension to break the accumulator into any number of accumulators
between 1 and 8, once we're able to break it into 8 (the unroll factor). No
need to reroll.
Ayal.
>
> Compare the code for the following cases on amd64 for example,
> at -O3 -funroll-loops. foo1 is unrolled 8 times so I manually
> unrolled it for foo2 and manually applied the transformation
> you want to implement.
>
> --------------------------------------------
> #define n 256
>
> int a[n];
>
> int foo1 (void)
> {
> int i, sum;
> sum = 0;
> for (i = 0 ; i < n; i++)
> sum += a[i];
> return sum;
> }
>
> int foo2 (void)
> {
> int i, sum;
> int i1, i2, i3, i4, i5, i6, i7, i8;
> int sum1, sum2, sum3, sum4, sum5, sum6, sum7, sum8;
>
> sum1 = sum2 = sum3 = sum4 = sum5 = sum6 = sum7 = sum8 = 0;
> for (i = 0 ; i < n / 8; i++)
> {
> i1 = i;
> sum1 += a[i1];
>
> i2 = i + 1;
> sum2 += a[i2];
>
> i3 = i + 2;
> sum3 += a[i3];
>
> i4 = i + 3;
> sum4 += a[i4];
>
> i5 = i + 4;
> sum5 += a[i5];
>
> i6 = i + 5;
> sum6 += a[i6];
>
> i7 = i + 6;
> sum7 += a[i7];
>
> i8 = i + 7;
> sum8 += a[i8];
> }
>
> sum = sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7 + sum8;
> return sum;
> }
> --------------------------------------------
>
> The code for foo2 is really bad. I guess you need to work with
> the unroller to figure out when you can profitably apply this
> transformation. Perhaps you even want to be able to partially
> reroll a loop?
>
> Gr.
> Steven
>
>