This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: type consistency of gimple
- From: Nick Clifton <nickc at redhat dot com>
- To: Diego Novillo <dnovillo at redhat dot com>
- Cc: Daniel Berlin <dberlin at dberlin dot org>, Mark Mitchell <mark at codesourcery dot com>, Kenneth Zadeck <zadeck at naturalbridge dot com>, Richard Guenther <richard dot guenther at gmail dot com>, GCC <gcc at gcc dot gnu dot org>, "Hubicha, Jan" <jh at suse dot cz>, "Edelsohn, David" <dje at watson dot ibm dot com>, Andrew Pinski <pinskia at physics dot uc dot edu>
- Date: Tue, 15 Aug 2006 15:58:52 +0100
- Subject: Re: type consistency of gimple
- References: <44DCEAAA.9000306@naturalbridge.com> <84fc9c000608111344l3cc4b9f0w97f104da6ed5b7ad@mail.gmail.com> <44DCEDF1.9010902@naturalbridge.com> <84fc9c000608111356n6bd522d5gd349007fdcb0cd1b@mail.gmail.com> <44DCF0A0.1060005@naturalbridge.com> <44DF6702.4060901@codesourcery.com> <44E074F4.3050402@dberlin.org> <44E08E4C.4060902@redhat.com>
Hi Diego,
Jeff's point about our optimizers is also true. Nick, remember that
issue with MIPS optimizations you were discussing with Jeff a few days
ago? I didn't follow most of the details, but it involved ivopts and
sign issues. Could you send a summary?
Sure:
I was looking at how a gcc 4.1 based compiler optimized a fast
fourier transform algorithm for the MIPS target. What I found was the
it was producing much worse code than a similarly configured gcc 3.4
based compiler, and the culprit was the creation of the induction
variables used in the inner loop.
The nested loops look like this:
signed short i,j,k,DataSizeExponent,DataSize;
signed short RealBitRevData[MAX], ImagBitRevData[MAX];
signed short * CosineV, * SineV;
for (k = 1; k <= DataSizeExponent; k++)
{
signed short n1 = 1 << k;
signed short n2 = n1 >> 1;
signed short ArgIndex = 0;
signed short DeltaIndex = (DataSize >> 1) / n2;
for (j = 0; j < n2; j++)
{
signed int WReal = CosineV[ArgIndex];
signed int WImag = SineV[ArgIndex];
ArgIndex += DeltaIndex;
for (i = j; i < DataSize; i += n1)
{
signed short l = i + n2;
signed int tRealData = (WReal * RealBitRevData[l]) +
(WImag * ImagBitRevData[l]);
signed int tImagData = (WReal * ImagBitRevData[l]) -
(WImag * RealBitRevData[l]);
tRealData = tRealData >> BUTTERFLY_SCALE_FACTOR;
tImagData = tImagData >> BUTTERFLY_SCALE_FACTOR;
RealBitRevData[l] = RealBitRevData[i] - tRealData;
ImagBitRevData[l] = ImagBitRevData[i] - tImagData;
RealBitRevData[i] += tRealData;
ImagBitRevData[i] += tImagData;
}
}
}
and the inner loop was being transformed into:
short int D.2046, D.2043;
long int D.2038, D.2035;
int D.2033, D.2024, D.2020, D.2008, D.2001;
[...]
<L5>:;
D.2033 = (int) (signed short) ivtmp.67;
D.2035 = (long int) RealBitRevData[D.2033];
D.2038 = (long int) ImagBitRevData[D.2033];
tRealData = WReal * D.2035 + WImag * D.2038;
tImagData = WReal * D.2038 - WImag * D.2035;
D.2001 = (int) i;
D.2043 = (short int) (tRealData >> 15);
RealBitRevData[D.2033] = RealBitRevData[D.2001] - D.2043;
D.2046 = (short int) (tImagData >> 15);
ImagBitRevData[D.2033] = ImagBitRevData[D.2001] - D.2046;
RealBitRevData[D.2001] = D.2043 + RealBitRevData[D.2001];
ImagBitRevData[D.2001] = D.2046 + ImagBitRevData[D.2001];
i = (signed short) ((short unsigned int) i + ivtmp.78);
ivtmp.68 = ivtmp.68 + ivtmp.78;
ivtmp.67 = ivtmp.67 + ivtmp.78;
if (DataSize > (signed short) (ivtmp.68 - ivtmp.78)) goto <L5>; else
goto <L7>;
The net result of these induction variables, and especially the type
translations necessary to go between signed and unsigned and shorts and
ints was an inner loop consisting of 42 instructions, as opposed to an
inner loop of 32 instructions as produced by the gcc 3.4 compiler.
Cheers
Nick