This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/71237] [7 regression] scev tests failing after pass reorganization
- From: "andre.simoesdiasvieira at arm dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 26 May 2016 17:40:35 +0000
- Subject: [Bug tree-optimization/71237] [7 regression] scev tests failing after pass reorganization
- Auto-submitted: auto-generated
- References: <bug-71237-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237
--- Comment #1 from Andre Vieira <andre.simoesdiasvieira at arm dot com> ---
So yes disabling LIM will make the tests "PASS". Though I couldnt find an
option to do this, I disabled the pass by changing passes.def, so that doesnt
sound like a good idea to test SCCP.
However, it might be good to point out that at least for arm-none-eabi and
x86_64-pc-linux-gnu these tests are no longer testing SCCP, SCCP will not
change this code. I looked at the dumps and compared assembly of -O2 with and
without '-fno-tree-scev-cprop'.
On arm-none-eabi, it used to be IVOPTS that made the test pass, it would reuse
the same ivtmp for computing the address used by the memory dereference and the
a_p assignment. Now due to the reordering of LIM, it will no longer do this.
On x86_64 I see the following code coming out of the OPTIMIZED dump for the
scev-4.c case:
...
<bb 4>:
# ivtmp.10_14 = PHI <_24(3), ivtmp.10_25(4)>
i_11 = (int) ivtmp.10_14;
MEM[symbol: a, index: ivtmp.10_14, step: 8, offset: 4B] = 100;
ivtmp.10_25 = ivtmp.10_14 + _24;
i_22 = (int) ivtmp.10_25;
if (i_22 <= 999)
goto <bb 4>;
else
goto <bb 5>;
<bb 5>:
_2 = (sizetype) i_11;
_3 = _2 * 8;
_10 = _3 + 4;
_1 = &a + _10;
a_p = _1;
...
Now yes the scan-times &a will pass, but thats because the MEM is using
symbol:a instead of base: &a. Not sure this can be qualified as a proper PASS.
Disabling LIM here the same way I did before, that is removing the pass_lim
after pass_laddress and before pass_split_crit_edges generates the following
OPTIMIZED dump:
...
<bb 4>:
_16 = (sizetype) k_4(D);
_15 = _16 * 8;
_21 = _15 + 4;
_22 = &a + _21;
ivtmp.9_14 = (unsigned long) _22;
<bb 5>:
# i_11 = PHI <k_4(D)(4), i_8(5)>
# ivtmp.9_13 = PHI <ivtmp.9_14(4), ivtmp.9_17(5)>
_1 = (int *) ivtmp.9_13;
MEM[base: _1, offset: 0B] = 100;
i_8 = k_4(D) + i_11;
ivtmp.9_17 = ivtmp.9_13 + _15;
if (i_8 <= 999)
goto <bb 5>;
else
goto <bb 6>;
<bb 6>:
a_p = _1;
...
I prefer this output, since you loose the needless tailing address calculation.
I am not so sure the eventually generated assembly is better in this case
though. Ill add both as attachments.