This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/69042] New: [6 regression] Missed optimization in ivopts
- From: "ienkovich at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 24 Dec 2015 14:00:24 +0000
- Subject: [Bug tree-optimization/69042] New: [6 regression] Missed optimization in ivopts
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69042
Bug ID: 69042
Summary: [6 regression] Missed optimization in ivopts
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ienkovich at gcc dot gnu.org
Target Milestone: ---
Created attachment 37127
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37127&action=edit
Reproducer
Here is a reduced loop:
for (i = 1; i < 64; i++) {
if (data[indexes[i]]) {
j++;
} else {
if (p (j))
return 0;
j = 0;
}
}
'i' is used for indexes access only. Therefore it may be optimized out and
replaced with (indexes + i). That's what GCC 5.3 actually does. Here is a
GIMPLE for GCC 5.3 after ivopts:
<bb 3>:
# j_23 = PHI <j_2(7), 0(2)>
# ivtmp.14_21 = PHI <ivtmp.14_20(7), ivtmp.14_19(2)>
_6 = (void *) ivtmp.14_21;
_9 = MEM[base: _6, offset: 4B];
_10 = (unsigned int) _9;
_11 = _10 * 2;
_13 = data_12(D) + _11;
_14 = *_13;
if (_14 != 0)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
j_15 = j_23 + 1;
goto <bb 6>;
<bb 5>:
_18 = p (j_23);
if (_18 != 0)
goto <bb 8>;
else
goto <bb 6>;
<bb 6>:
# j_2 = PHI <j_15(4), 0(5)>
ivtmp.14_20 = ivtmp.14_21 + 4;
if (ivtmp.14_20 != _1)
goto <bb 7>;
else
goto <bb 8>;
<bb 7>:
goto <bb 3>;
But GCC 6 doesn't do it starting from r230647 resulting in additional
address computation and increased registers pressure. Here is a GIMPLE
for GCC6 after ivopts:
<bb 3>:
# i_23 = PHI <i_16(7), 1(2)>
# j_24 = PHI <j_2(7), 0(2)>
_21 = (sizetype) i_23;
_20 = _21 * 4;
_9 = MEM[symbol: indexes, index: _20, offset: 0B];
_10 = (unsigned int) _9;
_11 = _10 * 2;
_13 = data_12(D) + _11;
_14 = *_13;
if (_14 != 0)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
j_15 = j_24 + 1;
goto <bb 6>;
<bb 5>:
_18 = p (j_24);
if (_18 != 0)
goto <bb 8>;
else
goto <bb 6>;
<bb 6>:
# j_2 = PHI <j_15(4), 0(5)>
i_16 = i_23 + 1;
if (i_16 != 64)
goto <bb 7>;
else
goto <bb 8>;
<bb 7>:
goto <bb 3>;
Testcase was made on base of EEMBC cjpegv2 benchmarks whose loop
has a ~15% performance loss on Silvermont due to this issue.
In attached testcase I put __asm__ to emulate register pressure and
demonstrate resulting load in a regressed loop version for i386.
Here is how I build the test:
gcc -S -m32 -fPIE -pie -O2 test.c
Used compiler:
Target: x86_64-pc-linux-gnu
Configured with: /export/users/gnutester/stability/svn/trunk/configure
--with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib
--enable-shared --with-demangler-in-ld --enable-cloog-backend=isl
--with-fpmath=sse --with-pkgversion=Revision=231837
--prefix=/export/users/gnutester/stability/work/trunk/64/install
--enable-languages=c,c++,fortran,java,lto
Thread model: posix
gcc version 6.0.0 20151218 (experimental) (Revision=231837)