This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Variable Expansion Optimization
- From: Steven Bosscher <stevenb at suse dot de>
- To: Revital Eres <ERES at il dot ibm dot com>, gcc at gcc dot gnu dot org
- Cc: Ayal Zaks <ZAKS at il dot ibm dot com>,Mircea Namolaru <NAMOLARU at il dot ibm dot com>
- Date: Sun, 15 Aug 2004 13:38:59 +0200
- Subject: Re: Variable Expansion Optimization
- Organization: SUSE Labs
- References: <OF36228DA4.E6F932AC-ONC2256EF1.002536A1-C2256EF1.0038DB12@il.ibm.com>
On Sunday 15 August 2004 12:20, Revital Eres wrote:
> Hello,
>
> I intend to implement variable expansion
> in the body of the unrolled loop.
>
> For example, consider the following loop
> which contains an accumulator:
>
> for (i = 0 ; i < n; i++)
> sum += a[i];
>
> After the unrolling we get the following code:
>
> sum += a[i];
> ....
> i = i+1;
> sum += a[i];
> ....
> i = i+1
> sum += a[i];
> ....
>
> The variable expansion optimization will expand the variable
> sum into n separate copies, one for each copy of the loop body
> and combine them at the loop exit:
>
> sum += a[i]
> ....
> i = i+1;
> sum1 += a[i]
> ....
> i = i+1
> sum2 += a[i];
> ....
>
> This transformation decreases the number of dependences
> in the loop, thus making instruction scheduling
> more effective when applied on it.
Hmm, I guess you're going to have carefully analyze what the
effects of such a transformation are on register pressure.
Compare the code for the following cases on amd64 for example,
at -O3 -funroll-loops. foo1 is unrolled 8 times so I manually
unrolled it for foo2 and manually applied the transformation
you want to implement.
--------------------------------------------
#define n 256
int a[n];
int foo1 (void)
{
int i, sum;
sum = 0;
for (i = 0 ; i < n; i++)
sum += a[i];
return sum;
}
int foo2 (void)
{
int i, sum;
int i1, i2, i3, i4, i5, i6, i7, i8;
int sum1, sum2, sum3, sum4, sum5, sum6, sum7, sum8;
sum1 = sum2 = sum3 = sum4 = sum5 = sum6 = sum7 = sum8 = 0;
for (i = 0 ; i < n / 8; i++)
{
i1 = i;
sum1 += a[i1];
i2 = i + 1;
sum2 += a[i2];
i3 = i + 2;
sum3 += a[i3];
i4 = i + 3;
sum4 += a[i4];
i5 = i + 4;
sum5 += a[i5];
i6 = i + 5;
sum6 += a[i6];
i7 = i + 6;
sum7 += a[i7];
i8 = i + 7;
sum8 += a[i8];
}
sum = sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7 + sum8;
return sum;
}
--------------------------------------------
The code for foo2 is really bad. I guess you need to work with
the unroller to figure out when you can profitably apply this
transformation. Perhaps you even want to be able to partially
reroll a loop?
Gr.
Steven