This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c/38856] New: loop iv detection failure, SSA autoincrement
- From: "sergei_lus at yahoo dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 15 Jan 2009 18:24:14 -0000
- Subject: [Bug c/38856] New: loop iv detection failure, SSA autoincrement
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
I apologize if it is a well disguised feature, but I am forced to consider this
being a performance regression/bug.
In the following trivial example:
void
VecADD(
long long *In1,
long long *In2,
long long *Out,
unsigned int samples
){
int i;
for (i = 0; i < samples; i++) {
Out[i] = In1[i] + In2[i];
}
}
there is an implicit imprecision in the way C is used - type of 'samples' is
unsigned, while type of 'i' is signed.
The problem on the high level - induction variable analysis fails for this
loop, which impairs further tree level loop optimizations from functioning
properly (including autoincrement). In my port performance is off by 50% for
this loop. GCC 3.4.6 was able to handle this situation fine.
What I believe to be the problem at the lowest level is a non-minimal (or
overly restrictive) SSA representation right before the iv detection:
VecADD (In1, In2, Out, samples)
{
int i;
long long int D.1857;
long long int D.1856;
long long int * D.1855;
long long int D.1854;
long long int * D.1853;
long long int * D.1852;
unsigned int D.1851;
unsigned int i.0;
<bb 2>:
<bb 6>:
# i_10 = PHI <0(2)>
i.0_5 = (unsigned int) i_10;
if (i.0_5 < samples_4(D))
goto <bb 3>;
else
goto <bb 5>;
<bb 3>:
# i.0_9 = PHI <i.0_3(4), i.0_5(6)>
# i_14 = PHI <i_1(4), i_10(6)>
D.1851_6 = i.0_9 * 8;
D.1852_8 = Out_7(D) + D.1851_6;
D.1853_12 = In1_11(D) + D.1851_6;
D.1854_13 = *D.1853_12;
D.1855_17 = In2_16(D) + D.1851_6;
D.1856_18 = *D.1855_17;
D.1857_19 = D.1854_13 + D.1856_18;
*D.1852_8 = D.1857_19;
i_20 = i_14 + 1;
<bb 4>:
# i_1 = PHI <i_20(3)>
i.0_3 = (unsigned int) i_1;
if (i.0_3 < samples_4(D))
goto <bb 3>;
else
goto <bb 5>;
<bb 5>:
return;
}
The two PHI nodes in the beginning of BB3 break the iv detection. Same example
when types of ?i? and ?samples? would match will be analyzed perfectly fine
with the SSA at the same point looking like this:
VecADD (In1, In2, Out, samples)
{
int i;
long long int D.1857;
long long int D.1856;
long long int * D.1855;
long long int D.1854;
long long int * D.1853;
long long int * D.1852;
unsigned int D.1851;
unsigned int i.0;
<bb 2>:
<bb 6>:
# i_9 = PHI <0(2)>
if (i_9 < samples_3(D))
goto <bb 3>;
else
goto <bb 5>;
<bb 3>:
# i_13 = PHI <i_1(4), i_9(6)>
i.0_4 = (unsigned int) i_13;
D.1851_5 = i.0_4 * 8;
D.1852_7 = Out_6(D) + D.1851_5;
D.1853_11 = In1_10(D) + D.1851_5;
D.1854_12 = *D.1853_11;
D.1855_16 = In2_15(D) + D.1851_5;
D.1856_17 = *D.1855_16;
D.1857_18 = D.1854_12 + D.1856_17;
*D.1852_7 = D.1857_18;
i_19 = i_13 + 1;
<bb 4>:
# i_1 = PHI <i_19(3)>
if (i_1 < samples_3(D))
goto <bb 3>;
else
goto <bb 5>;
<bb 5>:
return;
}
On one hand I seem to understand that a danger of signed/unsigned overflow at
increment can force this kind of conservatism, but on the high level this
situation was handled fine by gcc 3.4.6 and is handled with no issues by
another SSA based compiler. If there is a way to relax this strict
interpretation of C rules by GCC 4.3.2, I would gladly learn about it, but my
brief flag mining exercise yielded no results. Thank you.
--
Summary: loop iv detection failure, SSA autoincrement
Product: gcc
Version: 4.3.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: sergei_lus at yahoo dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38856