[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

law at redhat dot com gcc-bugzilla@gcc.gnu.org
Fri Feb 24 06:18:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

--- Comment #28 from Jeffrey A. Law <law at redhat dot com> ---
WRT c#26.  

Yes, I would agree that finding CSE's that way is rather gross, but
significantly less so than what will be required to address this problem in
phi-opt.

Pattern matching this is going to be significantly more complex than the
ADD_OVERFLOW/SUB_OVERFLOW.  I've looked at that code via pr79095 and catching
these saturating idioms is significantly more complex.

I prototyped the idea of having DOM do extra lookups for a widened version of X
OP Y.  It's a bit ugly relative to the match.pd approach, but possibly
manageable.

However, doing that in DOM exposes some lameness we'd have to address in VRP.

Prior to DOM2 we have:

;   basic block 4, loop depth 0, count 0, freq 4665, maybe hot
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       3 [67.6%]  (TRUE_VALUE,EXECUTABLE)
  _6 = (unsigned char) val_12(D);
  _7 = _3 + _6;
  iftmp.1_13 = (int) _7;

We transform that into:

;;   basic block 4, loop depth 0, count 0, freq 4665, maybe hot
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       3 [67.6%]  (TRUE_VALUE,EXECUTABLE)
  iftmp.1_13 = _15 & 255

That's value preserving and clearly an improvement.  Unfortunately we have to
wait until vrp2 to discover that the masking is redundant and simplify the
statement into:


iftmp.1_13 = _15

The problem is we do not propagate that copy into the PHI node at BB4's
successor.  By the time we do finally propagate away the copy, there aren't any
additional phi-opt passes to turn things into a MIN/MAX.

THe lack of copy propagation when VRP simplifies a statement like that is due
to using op_with_constant_singleton_value_range as the callback for
substitute_and_fold.  op_with_constant_singleton_value_range only returns
exactly what you would think -- constant singleton ranges.  Thus we don't
discover or exploit copy propagation opportunities created by VRP's statement
simplification.

Enhancing the callback to return an SSA_NAME for cases were VRP simplifies
arithmetic/logicals to copies allows the propagation step to propagate the copy
into the PHI node in BB4's successor.

That in turn allows phi-opt to do its job and by the .optimized dump we have:

;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
;;    prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:       ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  _1 = (sizetype) i_9(D);
  _2 = tmp_10(D) + _1;
  _3 = *_2;
  _4 = (int) _3;
  _5 = _4 + val_12(D);
  _16 = MAX_EXPR <_5, 0>;
  iftmp.0_7 = MIN_EXPR <_16, 255>;
  return iftmp.0_7;
;;    succ:       EXIT [100.0%]


Which is what we want at this stage.  Transforming something like that into
saturating arithmetic for processors which support such insns is much easier
(but IMHO out of the scope of this BZ).

Anyway, I'm offline the next few days and largely booked on non-technical stuff
much of March.  I don't know if I'll be able to push this further over the next
few weeks or not.


More information about the Gcc-bugs mailing list