Created attachment 34507 [details] Testcode Originally from BZ 64081.... We do miss some interesting kind of optimization opportunities like transforming if (prephitmp_87 == 1) goto <bb 9>; else goto <bb 10>; <bb 9>: _24 = arr1.5_23 + _62; pos.6_25 = *_24; goto <bb 11>; <bb 10>: _28 = arr2.7_27 + _62; pos.8_29 = *_28; <bb 11>: # prephitmp_89 = PHI <pos.6_25(9), pos.8_29(10)> to if (prephitmp_87 == 1) goto <bb 9>; else goto <bb 11>; <bb 9>: goto <bb 11>; <bb 11>: # _24 = PHI <arr1.5_23, arr2.7_27> _28 = _24 + _62; prephitmp_89 = *_28; sinking common computations through a PHI. With the followup optimization in out-of-SSA to coalesce arr1.5_23 and arr2.7_27 which means we can drop the conditional entirely.
Thanks for splitting out ;) Btw, there is also the corresponding PR23286 for code hoisting with a patch implementing that ontop of PRE. I'd say the sinking part should be part of a partial dead code elimination pass (aka SSU-PRE). Note that in some cases you have the choice of sinking or hoisting so pass ordering will then determine what we do in that case.
I had a code hoisting pass on top of PRE a while back as well. Can't remember why I abandoned it. Oh yea, on top of PRE :-) I've still got a global code motion pass here based on Click's work. It handles both hoisting and sinking. Basically you record the earliest possible block for each statement and a latest block for each statement. The path through the dominator tree connecting those two blocks is the set of valid blocks for the statement. Then you just choose the 'best' one in that path. Most control dependent path within the shallowest loop nest. It didn't handle sinking PHIs or hoisting/sinking through a dependent node. Not sure if it could be changed to do that. I never pushed on it simply because it never did significantly better than the other code motion code we already have. It pointed out a few minor issues in tree-ssa-sink.c, but nothing that couldn't be easily fixed. Too bad, I always found the basic algorithm to be rather elegant. I'm certain we're missing all kinds of interesting code motions..
here is a much simpler example: float f(int a, float b, float c, float d) { if (a) return c + d; return b + d; } float f1(int a, float b, float c, float d) { float e; if (a) e = c; else e = b; return e + d; } For aarch64 we get: f: cbnz w0, .L5 fadd s0, s2, s0 ret .p2align 3 .L5: fadd s0, s1, s2 ret ... f1: cmp w0, 0 fcsel s0, s1, s0, ne fadd s0, s0, s2 ret These two functions should produce the same code.
<bb 7> [local count: 958878293]: if (dir_lsm.26_39 == 1) goto <bb 8>; [34.00%] else goto <bb 9>; [66.00%] <bb 8> [local count: 326018623]: _11 = arr1.6_10 + _5; _12 = *_11; goto <bb 10>; [100.00%] <bb 9> [local count: 632859670]: _14 = arr2.9_13 + _5; _15 = *_14; <bb 10> [local count: 958878293]: # cstore_45 = PHI <_12(8), _15(9)> PHI-OPT improvements that I am working on might be able to handle this case too. MEM_EXPR is a little harder than the normal expression as you need to check the access type for aliasing reasons. Note PHI-OPT does handle the casting case already (but it is not listed here).
The trunk branch has been updated by Andrew Pinski <pinskia@gcc.gnu.org>: https://gcc.gnu.org/g:6d6c17e45f62cfe0b7de502af299348fca548b01 commit r14-575-g6d6c17e45f62cfe0b7de502af299348fca548b01 Author: Andrew Pinski <apinski@marvell.com> Date: Thu Apr 27 12:21:54 2023 -0700 PHIOPT: factor out unary operations instead of just conversions After using factor_out_conditional_conversion with diamond bb, we should be able do use it also for all normal unary gimple and not just conversions. This allows to optimize PR 59424 for an example. This is also a start to optimize PR 64700 and a few others. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. An example of this is: ``` static inline unsigned long long g(int t) { unsigned t1 = t; return t1; } static int abs1(int a) { if (a < 0) a = -a; return a; } unsigned long long f(int c, int d, int e) { unsigned long long t; if (d > e) t = g(abs1(d)); else t = g(abs1(e)); return t; } ``` Which should be optimized to: _9 = MAX_EXPR <d_5(D), e_6(D)>; _4 = ABS_EXPR <_9>; t_3 = (long long unsigned intD.16) _4; gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ... (factor_out_conditional_operation): This and add support for all unary operations. (pass_phiopt::execute): Update call to factor_out_conditional_conversion to call factor_out_conditional_operation instead. PR tree-optimization/109424 PR tree-optimization/59424 gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/abs-2.c: Update tree scan for details change in wording. * gcc.dg/tree-ssa/minmax-17.c: Likewise. * gcc.dg/tree-ssa/pr103771.c: Likewise. * gcc.dg/tree-ssa/minmax-18.c: New test. * gcc.dg/tree-ssa/minmax-19.c: New test.
Right now we just handle unary operations, binary and others will come next.
I should note store sinking is handled by the sinking pass by r11-408-g84935c9822183c .