This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Tue, 19 Dec 2017 10:31:20 +0000
Subject: [Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion
Auto-submitted: auto-generated
References: <bug-80520-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #9)
> So AFAICT there's two issues that need to be addressed.  PRE and split-paths.
> 
> First up is PRE.  Compile the sample code from c#5/c#6 with -O3
> -fno-split-paths
> 
> 
> Prior to PRE we have:
> 
>   if (_16 != 0)
>     goto <bb 5>; [50.00%]
>   else
>     goto <bb 4>; [50.00%]
> 
>   <bb 4> [local count: 531502203]:
> 
>   <bb 5> [local count: 1063004407]:
>   # iftmp.0_19 = PHI <2567483615(3), 0(4)>
>   _17 = _15 ^ iftmp.0_19;
> 
> That's actually reasonably good.  While it's not a conditional move in the
> gimple.  It's in a form will be easy for the RTL optimizers to handle and
> generate a suitable cmov if we just left it alone on x86_64.
> 
> 
> PRE (correctly) identifies that it can reduce the number of expression
> evaluations on the path traversing bb3->bb5 by hoisting the XOR with the
> non-zero constant into BB4 resulting in:
> 
>  if (_16 != 0)
>     goto <bb 4>; [50.00%]
>   else
>     goto <bb 5>; [50.00%]
> 
>   <bb 4> [local count: 531502203]:
>   _52 = _15 ^ 2567483615;
> 
>   <bb 5> [local count: 1063004407]:
>   # iftmp.0_19 = PHI <2567483615(4), 0(3)>
>   # prephitmp_53 = PHI <_52(4), _15(3)>
> 
> That's correct, but far from ideal.
> 
> 
> So the second issue is split-paths.  There's actually two problems to deal
> with in split-paths.
> 
> 
> 
> 
> As it stands today this is what we see in split-paths (as a result of the
> PRE de-optimization):
> 
> 
>   <bb 3>
>   [ ... ]
>   if (_20 != 0)
>     goto <bb 5>; [50.00%]
>   else
>     goto <bb 4>; [50.00%]
> 
>   <bb 4> [local count: 531502203]:
>   _18 = _25 ^ 2567483615;
> 
>   <bb 5> [local count: 1063004407]:
>   # prephitmp_49 = PHI <_25(3), _18(4)>
>   _2 = (void *) ivtmp.8_30;
>   MEM[base: _2, offset: 0B] = prephitmp_49;
>   ivtmp.8_29 = ivtmp.8_30 + 8;
>   if (ivtmp.8_29 != _6)
>     goto <bb 3>; [98.99%]
>   else
>     goto <bb 6>; [1.01%]
> 
> split-paths should try not to muck it up further.  Note that we can probably
> identify this half-diamond pretty easily.  bb3 dominates bb4.  bb4 has a
> single statement that feeds a PHI in bb5.  That's a very likely
> if-conversion candidate so split-paths ought to leave it alone.
> 
> If we were to fix PRE then split-paths would be presented with something
> like this:
> 
>  <bb3>
>  [  ... ]
>  if (_47 != 0)
>     goto <bb 4>; [50.00%]
>   else
>     goto <bb 5>; [50.00%]
> 
>   <bb 4> [local count: 531502203]:
> 
>   <bb 5> [local count: 1063004407]:
>   # iftmp.0_48 = PHI <2567483615(3), 0(4)>
>   _49 = _18 ^ iftmp.0_48;
> 
> ISTM that when either of the blocks in question (bb3 bb4) has *no*
> statements, with a single pred that is the other block then split-blocks
> definitely should leave it alone as well.
> 
> So, to summarize.
> 
> 1. PRE mucks things up a bit.
> 2. split-paths makes it worse
> 
> I've got a prototype patch that implements the two improvements to keep
> split-paths from making things worse.  That will improve things, but to
> really do a good job we'll have to either do something about PRE or have a
> pass after PRE undo PRE's deoptimization.

I don't think it's per-se a PRE "deoptimization", it's simply what PRE
is supposed to do.  The result should be quite optimal given we
should be able to coalesce _52 and _15 and thus require no edge copies
into bb 5.

 if (_16 != 0)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 4> [local count: 531502203]:
  _52 = _15 ^ 2567483615;

  <bb 5> [local count: 1063004407]:
  # prephitmp_53 = PHI <_52(4), _15(3)>

with a "reasonable" CPU we'd end up with sth like

   test r1, p1  // _16 != 0 into predicate reg p1
   xor r2, 2567483615, p1  // xor predicated with p1

now of course path splitting ruins things here (I never ever liked that pass,
it's very low-level transform is more suitable for RTL).  And if we'd
get conditional execution modeled "properly" in GIMPLE we could if-convert
there, preventing path-splitting from mucking up things here...

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]