This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/78496] New: Missed opportunities for jump threading
- From: "ysrumyan at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 23 Nov 2016 15:02:29 +0000
- Subject: [Bug tree-optimization/78496] New: Missed opportunities for jump threading
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78496
Bug ID: 78496
Summary: Missed opportunities for jump threading
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
Target Milestone: ---
Created attachment 40131
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40131&action=edit
test-case to reproduce, compile with -O3 option.
We noticed a huge performance drop on one important benchmark which is caused
by hoisting and collecting comparisons participated in conditional branches.
Here is comments provided by Richard on it:
Note this is a general issue with PRE which tends to
see partial redundancies when it can compute an expression to a
constant on one edge. There is nothing wrong with that but the
particular example shows the lack of a cost model with respect
to register pressure (same applies to other GIMPLE optimization passes).
In this case we have a lot of expression anticipated from the same
blocks where on one incoming edge their value is constant. Profitability
here really depends on the "distance" of the to be inserted PHI and
its use I guess.
We're missing quite some jump-threading here as well:
<bb 16>:
# x1_197 = PHI <x1_261(15), x1_435(123), x1_435(105)>
# _407 = PHI <_16(15), _16(123), 0(105)>
# aa1_410 = PHI <aa1_185(15), aa1_185(123), aa1_216(105)>
# d1_413 = PHI <d1_191(15), d1_191(123), d1_432(105)>
# w1_416 = PHI <w1_260(15), w1_260(123), 0(105)>
# v1_377 = PHI <v1_558(15), v1_558(123), 0(105)>
# oo1_371 = PHI <oo1_567(15), oo1_567(123), oo1_194(105)>
# ss1_376 = PHI <ss1_576(15), ss1_576(123), ss1_192(105)>
# r1_609 = PHI <r1_585(15), r1_585(123), r1_190(105)>
# _612 = PHI <_596(15), _596(123), _188(105)>
# out_ind_lsm.82_322 = PHI <out_ind_lsm.82_321(15),
out_ind_lsm.82_321(123), out_ind_lsm.82_532(105)>
_549 = w1_416 <= 899;
_548 = _407 > 839;
_541 = _548 & _549;
if (_541 != 0)
goto <bb 17>;
else
goto <bb 124>;
here 105 -> 16 -> 124 (forwarder) -> 18 which would eventually
make PRE behave somewhat saner (avoding the far distances).
The case appears with phicprop1 (or rather DOM, itself missing
a followup transform with respect to folding a degenerate constant
PHI plus the followup secondary threading opportunities). The
backwards threader doesn't exploit the above opportunity though.
Our forward threaders (like DOM) do. Unfortunately it requires
quite a few iterations to get all opportunities exploited...
(inserting 9 DOM/phi-only-cprop pass pairs "helps")
I suggest to open a bugreport for this. Jeff may want to look at
the threading issue (I believe the backward threader _does_ iterate).
I attach a test-case to reproduce an issue.