Bug 64700 - Sink common code through PHI
Summary: Sink common code through PHI
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 5.0
: P3 enhancement
Target Milestone: ---
Assignee: Andrew Pinski
URL:
Keywords: missed-optimization
Depends on:
Blocks: 29144 94274 112324
  Show dependency treegraph
 
Reported: 2015-01-20 22:01 UTC by Jeffrey A. Law
Modified: 2023-12-28 00:00 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-09-03 00:00:00


Attachments
Testcode (263 bytes, text/plain)
2015-01-20 22:01 UTC, Jeffrey A. Law
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeffrey A. Law 2015-01-20 22:01:02 UTC
Created attachment 34507 [details]
Testcode

Originally from BZ 64081....


We do miss some interesting kind of optimization opportunities like
transforming

  if (prephitmp_87 == 1)
    goto <bb 9>;
  else
    goto <bb 10>;

  <bb 9>:
  _24 = arr1.5_23 + _62;
  pos.6_25 = *_24;
  goto <bb 11>;

  <bb 10>:
  _28 = arr2.7_27 + _62;
  pos.8_29 = *_28;

  <bb 11>:
  # prephitmp_89 = PHI <pos.6_25(9), pos.8_29(10)>

to

  if (prephitmp_87 == 1)
    goto <bb 9>;
  else
    goto <bb 11>;

  <bb 9>:
  goto <bb 11>;

  <bb 11>:
  # _24 = PHI <arr1.5_23, arr2.7_27>
  _28 = _24 + _62;
  prephitmp_89 = *_28;

sinking common computations through a PHI.

With the followup optimization in out-of-SSA to coalesce arr1.5_23 and
arr2.7_27 which means we can drop the conditional entirely.
Comment 1 Richard Biener 2015-01-21 09:43:21 UTC
Thanks for splitting out ;)

Btw, there is also the corresponding PR23286 for code hoisting with a patch
implementing that ontop of PRE.  I'd say the sinking part should be part
of a partial dead code elimination pass (aka SSU-PRE).

Note that in some cases you have the choice of sinking or hoisting so pass
ordering will then determine what we do in that case.
Comment 2 Jeffrey A. Law 2015-01-21 20:42:18 UTC
I had a code hoisting pass on top of PRE a while back as well.  Can't remember why I abandoned it.  Oh yea, on top of PRE :-)


I've still got a global code motion pass here based on Click's work.  It handles both hoisting and sinking.  Basically you record the earliest possible block for each statement and a latest block for each statement.  The path through the dominator tree connecting those two blocks is the set of valid blocks for the statement.

Then you just choose the 'best' one in that path.  Most control dependent path within the shallowest loop nest.

It didn't handle sinking PHIs or hoisting/sinking through a dependent node.  Not sure if it could be changed to do that.

I never pushed on it simply because it never did significantly better than the other code motion code we already have.  It pointed out a few minor issues in tree-ssa-sink.c, but nothing that couldn't be easily fixed.


Too bad, I always found the basic algorithm to be rather elegant.

I'm certain we're missing all kinds of interesting code motions..
Comment 3 Andrew Pinski 2016-09-04 00:59:56 UTC
here is a much simpler example:
float f(int a, float b, float c, float d)
{
  if (a)
    return c + d;
  return b + d;
}

float f1(int a, float b, float c, float d)
{
  float e;
  if (a)
    e = c;
  else
    e = b;
  return e + d;
}

For aarch64 we get:
f:
        cbnz    w0, .L5
        fadd    s0, s2, s0
        ret
        .p2align 3
.L5:
        fadd    s0, s1, s2
        ret
...
f1:
        cmp     w0, 0
        fcsel   s0, s1, s0, ne
        fadd    s0, s0, s2
        ret

These two functions should produce the same code.
Comment 4 Andrew Pinski 2021-07-20 02:52:16 UTC
  <bb 7> [local count: 958878293]:
  if (dir_lsm.26_39 == 1)
    goto <bb 8>; [34.00%]
  else
    goto <bb 9>; [66.00%]

  <bb 8> [local count: 326018623]:
  _11 = arr1.6_10 + _5;
  _12 = *_11;
  goto <bb 10>; [100.00%]

  <bb 9> [local count: 632859670]:
  _14 = arr2.9_13 + _5;
  _15 = *_14;

  <bb 10> [local count: 958878293]:
  # cstore_45 = PHI <_12(8), _15(9)>

PHI-OPT improvements that I am working on might be able to handle this case too.
MEM_EXPR is a little harder than the normal expression as you need to check the access type for aliasing reasons.

Note PHI-OPT does handle the casting case already (but it is not listed here).
Comment 5 GCC Commits 2023-05-08 07:38:22 UTC
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>:

https://gcc.gnu.org/g:6d6c17e45f62cfe0b7de502af299348fca548b01

commit r14-575-g6d6c17e45f62cfe0b7de502af299348fca548b01
Author: Andrew Pinski <apinski@marvell.com>
Date:   Thu Apr 27 12:21:54 2023 -0700

    PHIOPT: factor out unary operations instead of just conversions
    
    After using factor_out_conditional_conversion with diamond bb,
    we should be able do use it also for all normal unary gimple and not
    just conversions. This allows to optimize PR 59424 for an example.
    This is also a start to optimize PR 64700 and a few others.
    
    OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
    
    An example of this is:
    ```
    static inline unsigned long long g(int t)
    {
      unsigned t1 = t;
      return t1;
    }
    static int abs1(int a)
    {
      if (a < 0)
        a = -a;
      return a;
    }
    unsigned long long f(int c, int d, int e)
    {
      unsigned long long t;
      if (d > e)
        t = g(abs1(d));
      else
        t = g(abs1(e));
      return t;
    }
    ```
    
    Which should be optimized to:
      _9 = MAX_EXPR <d_5(D), e_6(D)>;
      _4 = ABS_EXPR <_9>;
      t_3 = (long long unsigned intD.16) _4;
    
    gcc/ChangeLog:
    
            * tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ...
            (factor_out_conditional_operation): This and add support for all unary
            operations.
            (pass_phiopt::execute): Update call to factor_out_conditional_conversion
            to call factor_out_conditional_operation instead.
    
            PR tree-optimization/109424
            PR tree-optimization/59424
    
    gcc/testsuite/ChangeLog:
    
            * gcc.dg/tree-ssa/abs-2.c: Update tree scan for
            details change in wording.
            * gcc.dg/tree-ssa/minmax-17.c: Likewise.
            * gcc.dg/tree-ssa/pr103771.c: Likewise.
            * gcc.dg/tree-ssa/minmax-18.c: New test.
            * gcc.dg/tree-ssa/minmax-19.c: New test.
Comment 6 Andrew Pinski 2023-05-08 07:40:10 UTC
Right now we just handle unary operations, binary and others will come next.
Comment 7 Andrew Pinski 2023-05-15 00:27:10 UTC
I should note store sinking is handled by the sinking pass by r11-408-g84935c9822183c .