For the following code (for -O2): int flag, hoist, y, z; void foo (void) { if (flag) y = hoist + 4; else flag = 888; z = hoist + 4; bark (); } ...PRE should be moving "hoist + 4" to the else arm, but it fails to do so. If you remove the call to bark(), [hoist + 4] gets moved appropriately. The bark() function call is in the same basic block as "z = hoist + 4". I wild guess is that "hoist" isn't anticipatable at the *end* of the BB beginning with "z = hoist + 4". Splitting BB's at function calls may improve PRE. Just a guess...
(In reply to comment #0) > The bark() function call is in the same basic block as "z = hoist + 4". I wild > guess is that "hoist" isn't anticipatable at the *end* of the BB beginning with > "z = hoist + 4". Splitting BB's at function calls may improve PRE. Just a > guess... What is anticipated at the end of BB is uninteresting. It is computed but not stored (only needed to compute ANTIC_IN, see compute_antic_aux). You can check the ANTIC sets and AVAIL_OUT set with -fdump-tree-pre-details. The value for the expression "hoist+4" should be in EXP_GEN of the basic block with the call, and in AVAIL_OUT of the basic block with "y=hoist+4". But this isn't the case here: * the value representative for "y=hoist+4" is y.2_3->"{plus_expr,hoist.1_2,4}" * the value representative for "z=hoist+4" is y.2_5->"{plus_expr,hoist.1_2,4}" (the name of the representative is y instead of z, probably due to copyrename) * SCCVN finds "Value numbers:(...)y.2_5=y.2_3, as expected * y.2_3 is in AVAIL_OUT of the then-block, as expected * {plus_expr,hoist.1_2,4} is in EXP_GEN of the block with the call, as expected * {plus_expr,hoist.1_2,4} is *not* in ANTIC_IN of the block with the call! This is strange because ANTIC_IN = ANTIC_OUT U EXP_GEN - TMP_GEN Function foo() at the .084.pre dump: foo () { int y.2; int hoist.1; int flag.0; <bb 2>: flag.0_1 = flag; if (flag.0_1 != 0) goto <bb 3>; else goto <bb 4>; <bb 3>: hoist.1_2 = hoist; y.2_3 = hoist.1_2 + 4; y = y.2_3; goto <bb 5>; <bb 4>: flag = 888; <bb 5>: hoist.1_4 = hoist; y.2_5 = hoist.1_4 + 4; z = y.2_5; bark (); return; } Value numbers: hoist.1_4 = hoist.1_2 y.2_5 = y.2_3 All the sets that are computed without iterations: exp_gen[0] := { } // BB0 is ENTRY_BLOCK phi_gen[0] := { } tmp_gen[0] := { } avail_out[0] := { } exp_gen[2] := { {mem_ref<0B>,addr_expr<&flag>}@.MEM_7(D) (0002) } phi_gen[2] := { } tmp_gen[2] := { flag.0_1 (0002) } avail_out[2] := { flag.0_1 (0002) } exp_gen[3] := { {mem_ref<0B>,addr_expr<&hoist>}@.MEM_7(D) (0003), {plus_expr,hoist.1_2,4} (0004) } phi_gen[3] := { } tmp_gen[3] := { hoist.1_2 (0003), y.2_3 (0004) } avail_out[3] := { flag.0_1 (0002), hoist.1_2 (0003), y.2_3 (0004) } exp_gen[4] := { } phi_gen[4] := { } tmp_gen[4] := { } avail_out[4] := { flag.0_1 (0002) } exp_gen[5] := { {mem_ref<0B>,addr_expr<&hoist>}@.MEM_7(D) (0003), {plus_expr,hoist.1_2,4} (0004) } phi_gen[5] := { } tmp_gen[5] := { hoist.1_4 (0003), y.2_5 (0004) } avail_out[5] := { flag.0_1 (0002), hoist.1_4 (0003), y.2_5 (0004) } exp_gen[1] := { } phi_gen[1] := { } tmp_gen[1] := { } avail_out[1] := { } // BB1 is EXIT_BLOCK Starting iteration 0 ANTIC_OUT[5] := { } ANTIC_IN[5] := { } S[5] := { } ANTIC_OUT[4] := { } ANTIC_IN[4] := { } S[4] := { } ANTIC_OUT[3] := { } ANTIC_IN[3] := { {mem_ref<0B>,addr_expr<&hoist>}@.MEM_7(D) (0003), {plus_expr,hoist.1_2,4} (0004) } S[3] := { } ANTIC_OUT[2] := { } ANTIC_IN[2] := { {mem_ref<0B>,addr_expr<&flag>}@.MEM_7(D) (0002) } S[2] := { } Starting iteration 1 ANTIC_OUT[3] := { } ANTIC_IN[3] := { {mem_ref<0B>,addr_expr<&hoist>}@.MEM_7(D) (0003), {plus_expr,hoist.1_2,4} (0004) } S[3] := { } ANTIC_OUT[2] := { } ANTIC_IN[2] := { {mem_ref<0B>,addr_expr<&flag>}@.MEM_7(D) (0002) } S[2] := { }
(In reply to comment #1) > (In reply to comment #0) > > > The bark() function call is in the same basic block as "z = hoist + 4". I wild > > guess is that "hoist" isn't anticipatable at the *end* of the BB beginning with > > "z = hoist + 4". Splitting BB's at function calls may improve PRE. Just a > > guess... > > What is anticipated at the end of BB is uninteresting. It is computed but not > stored (only needed to compute ANTIC_IN, see compute_antic_aux). > > You can check the ANTIC sets and AVAIL_OUT set with -fdump-tree-pre-details. > The value for the expression "hoist+4" should be in EXP_GEN of the basic block > with the call, and in AVAIL_OUT of the basic block with "y=hoist+4". But this > isn't the case here: > * the value representative for "y=hoist+4" is y.2_3->"{plus_expr,hoist.1_2,4}" > * the value representative for "z=hoist+4" is y.2_5->"{plus_expr,hoist.1_2,4}" > (the name of the representative is y instead of z, probably due to > copyrename) > * SCCVN finds "Value numbers:(...)y.2_5=y.2_3, as expected > * y.2_3 is in AVAIL_OUT of the then-block, as expected > * {plus_expr,hoist.1_2,4} is in EXP_GEN of the block with the call, as expected > * {plus_expr,hoist.1_2,4} is *not* in ANTIC_IN of the block with the call! > > This is strange because ANTIC_IN = ANTIC_OUT U EXP_GEN - TMP_GEN That's simply because ANTIC_IN in reality is clean (ANTIC_OUT U EXP_GEN - TMP_GEN) and clean () removes all values that are clobbered in that block (and EXP_GEN is at BB granularity, so inside-BB flow is ignored). Thus indeed, splitting blocks can improve things here, as would on-the-fly construction of EXP_GEN/TMP_GEN and doing clean () on its way when we compute ANTIC_IN. Of course that would be more expensive if we are iterating. Note that the way we clean () for memory PRE is ugly anyway, probably not really envisioned by the original paper (which I bet didn't deal with aliasing ...)
As discussed, EXP_GEN can be cleaned from invalid mems at generation time, mem-cleaning should happen separately from clean (), and applied to ANTIC_OUT instead. Mine for now, but don't hold your breath.
Created attachment 26940 [details] patch Patch. Not sure what to do about PA_IN - can we even end up with any invalid expressions in it, given we have a "clean" ANTIC_IN already? Well, I guess I have to understand what partial-antic computes ;)
Ah, indeed we can. But we should be able to prune from PA_OUT.
Author: rguenth Date: Thu Mar 22 13:14:54 2012 New Revision: 185691 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185691 Log: 2012-03-22 Richard Guenther <rguenther@suse.de> PR tree-optimization/52548 * tree-ssa-pre.c (valid_in_sets): Remove handling of invalidation because of clobbers. (prune_clobbered_mems): New function. (compute_antic_aux): Use it to prune ANTIC_OUT. (compute_partial_antic_aux): Use it to prune PA_IN. (compute_avail): Only insert expressions into EXP_GEN that are not invalidated when translated up to the beginning of the block. * gcc.dg/tree-ssa/ssa-pre-29.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-29.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-pre.c
Fixed.