[Bug tree-optimization/32306] [6/7/8 Regression] redundant && || not eliminated

Fri Nov 24 16:23:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32306

--- Comment #36 from Jeffrey A. Law <law at redhat dot com> ---
Just a couple notes.  I'm not currently looking at this, but this is probably
the best bug to track thoughts around how to try and capture secondary effects
of jump threading without re-running all of DOM.

-- Taken from another BZ so that it doesn't get lost --

I looked at ways to introduce iteration without the full DOM pass years ago. 
It was pretty obvious that the most interesting things happened as a result of
exposing degenerate PHIs.  But I wasn't ever able to make a leap from that to a
low overhead iterating jump threader.  What did come out of it was the phi-only
cprop pass which propagates away the degenerate PHIs.

--

And today's thought.  In addition to degenerate PHIs the other key property
which indicates an exposed secondary opportunity is a reduction in the number
of preds for a block.  Particularly when we can drop from N to 1 pred.

ssa--dom-simplify-1 is a good example, particularly if one disables the VRP
threader.   Prior to DOM we'll see:

;;   basic block 2, loop depth 0, count 1073741825 (estimated locally), maybe
hot
;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;    pred:       ENTRY [always]  count:1073741826 (estimated locally)
(FALLTHRU,EXECUTABLE)
  if (x_3(D) > 3)
    goto <bb 3>; [33.00%]
  else
    goto <bb 4>; [67.00%]
;;    succ:       3 [33.0% (guessed)]  count:354334800 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
;;                4 [67.0% (guessed)]  count:719407025 (estimated locally)
(FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 354334802 (estimated locally), maybe
hot
;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;    pred:       2 [33.0% (guessed)]  count:354334800 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
  frob (1);
;;    succ:       4 [always (guessed)]  count:354334802 (estimated locally)
(FALLTHRU,EXECUTABLE)

;;   basic block 4, loop depth 0, count 1073741825 (estimated locally), maybe
hot
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       3 [always (guessed)]  count:354334802 (estimated locally)
(FALLTHRU,EXECUTABLE)
;;                2 [67.0% (guessed)]  count:719407025 (estimated locally)
(FALSE_VALUE,EXECUTABLE)
  if (x_3(D) > 2)
    goto <bb 5>; [33.00%]
  else
    goto <bb 6>; [67.00%]
;;    succ:       5 [33.0% (guessed)]  count:354334800 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
;;                6 [67.0% (guessed)]  count:719407025 (estimated locally)
(FALSE_VALUE,EXECUTABLE)

;;   basic block 5, loop depth 0, count 354334802 (estimated locally), maybe
hot
;;    prev block 4, next block 6, flags: (NEW, REACHABLE, VISITED)
;;    pred:       4 [33.0% (guessed)]  count:354334800 (estimated locally)
(TRUE_VALUE,EXECUTABLE)
  frob (x_3(D));
;;    succ:       6 [always (guessed)]  count:354334802 (estimated locally)
(FALLTHRU,EXECUTABLE)

;;   basic block 6, loop depth 0, count 1073741825 (estimated locally), maybe
hot
;;    prev block 5, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:       4 [67.0% (guessed)]  count:719407025 (estimated locally)
(FALSE_VALUE,EXECUTABLE)
;;                5 [always (guessed)]  count:354334802 (estimated locally)
(FALLTHRU,EXECUTABLE)
  return;
;;    succ:       EXIT [always (guessed)]  count:1073741825 (estimated locally)

}

DOM is going to discover the 3->4->5 jump thread easily.    But there are no
PHIs in BB4 that would trigger any kind of reanalysis.  But the # preds for BB4
drops from 2 to 1.  This is important as the remaining path into BB4 can be
further optimized after we realize the 3->4->5 jump thread.

It feels as if when we discover a degenerate PHI or the incoming preds drops to
1, then we ought to start a re-analysis.  The blocks involved would start at
the idom of the affected block, covering the dominance tree and the PHI nodes
as the dominance frontier.

I thought I explored that idom/dominance tree/dominance frontier idea years ago
and likely dismissed it as incomplete (consider how scrambled the dominance
tree can get after threading).  But while I could certainly conjure up
scenarios where it's incomplete, it might be "good enough" to catch the
secondary opportunities without a fully iterating DOM.

--

Of course I'm also very interested to evaluate if any of that is necessary with
Aldy's recent work on the backwards threader.