Bug 52548 - missed PRE optimization when function call follows to-be hoisted variable
Summary: missed PRE optimization when function call follows to-be hoisted variable
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2012-03-09 21:09 UTC by Aldy Hernandez
Modified: 2012-03-22 13:16 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-03-10 00:00:00


Attachments
patch (2.07 KB, text/plain)
2012-03-21 11:33 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aldy Hernandez 2012-03-09 21:09:04 UTC
For the following code (for -O2):

int flag, hoist, y, z;

void
foo (void)
{
  if (flag)
    y = hoist + 4;
  else
    flag = 888;
  z = hoist + 4;
  bark ();
}

...PRE should be moving "hoist + 4" to the else arm, but it fails to do so.

If you remove the call to bark(), [hoist + 4] gets moved appropriately.

The bark() function call is in the same basic block as "z = hoist + 4".  I wild guess is that "hoist" isn't anticipatable at the *end* of the BB beginning with "z = hoist + 4".  Splitting BB's at function calls may improve PRE.  Just a guess...
Comment 1 Steven Bosscher 2012-03-10 00:15:12 UTC
(In reply to comment #0)

> The bark() function call is in the same basic block as "z = hoist + 4".  I wild
> guess is that "hoist" isn't anticipatable at the *end* of the BB beginning with
> "z = hoist + 4".  Splitting BB's at function calls may improve PRE.  Just a
> guess...

What is anticipated at the end of BB is uninteresting. It is computed but not stored (only needed to compute ANTIC_IN, see compute_antic_aux).

You can check the ANTIC sets and AVAIL_OUT set with -fdump-tree-pre-details. The value for the expression "hoist+4" should be in EXP_GEN of the basic block with the call, and in AVAIL_OUT of the basic block with "y=hoist+4". But this isn't the case here:
* the value representative for "y=hoist+4" is y.2_3->"{plus_expr,hoist.1_2,4}"
* the value representative for "z=hoist+4" is y.2_5->"{plus_expr,hoist.1_2,4}"
  (the name of the representative is y instead of z, probably due to copyrename)
* SCCVN finds "Value numbers:(...)y.2_5=y.2_3, as expected
* y.2_3 is in AVAIL_OUT of the then-block, as expected
* {plus_expr,hoist.1_2,4} is in EXP_GEN of the block with the call, as expected
* {plus_expr,hoist.1_2,4} is *not* in ANTIC_IN of the block with the call!

This is strange because ANTIC_IN = ANTIC_OUT U EXP_GEN - TMP_GEN

Function foo() at the .084.pre dump:

foo ()
{
  int y.2;
  int hoist.1;
  int flag.0;

<bb 2>:
  flag.0_1 = flag;
  if (flag.0_1 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  hoist.1_2 = hoist;
  y.2_3 = hoist.1_2 + 4;
  y = y.2_3;
  goto <bb 5>;

<bb 4>:
  flag = 888;

<bb 5>:
  hoist.1_4 = hoist;
  y.2_5 = hoist.1_4 + 4;
  z = y.2_5;
  bark ();
  return;
}

Value numbers:
hoist.1_4 = hoist.1_2
y.2_5 = y.2_3

All the sets that are computed without iterations:
exp_gen[0] := {  } // BB0 is ENTRY_BLOCK
phi_gen[0] := {  }
tmp_gen[0] := {  }
avail_out[0] := {  }
exp_gen[2] := { {mem_ref<0B>,addr_expr<&flag>}@.MEM_7(D) (0002) }
phi_gen[2] := {  }
tmp_gen[2] := { flag.0_1 (0002) }
avail_out[2] := { flag.0_1 (0002) }
exp_gen[3] := { {mem_ref<0B>,addr_expr<&hoist>}@.MEM_7(D) (0003), {plus_expr,hoist.1_2,4} (0004) }
phi_gen[3] := {  }
tmp_gen[3] := { hoist.1_2 (0003), y.2_3 (0004) }
avail_out[3] := { flag.0_1 (0002), hoist.1_2 (0003), y.2_3 (0004) }
exp_gen[4] := {  }
phi_gen[4] := {  }
tmp_gen[4] := {  }
avail_out[4] := { flag.0_1 (0002) }
exp_gen[5] := { {mem_ref<0B>,addr_expr<&hoist>}@.MEM_7(D) (0003), {plus_expr,hoist.1_2,4} (0004) }
phi_gen[5] := {  }
tmp_gen[5] := { hoist.1_4 (0003), y.2_5 (0004) }
avail_out[5] := { flag.0_1 (0002), hoist.1_4 (0003), y.2_5 (0004) }
exp_gen[1] := {  }
phi_gen[1] := {  }
tmp_gen[1] := {  }
avail_out[1] := {  } // BB1 is EXIT_BLOCK

Starting iteration 0
ANTIC_OUT[5] := {  }
ANTIC_IN[5] := {  }
S[5] := {  }
ANTIC_OUT[4] := {  }
ANTIC_IN[4] := {  }
S[4] := {  }
ANTIC_OUT[3] := {  }
ANTIC_IN[3] := { {mem_ref<0B>,addr_expr<&hoist>}@.MEM_7(D) (0003), {plus_expr,hoist.1_2,4} (0004) }
S[3] := {  }
ANTIC_OUT[2] := {  }
ANTIC_IN[2] := { {mem_ref<0B>,addr_expr<&flag>}@.MEM_7(D) (0002) }
S[2] := {  }
Starting iteration 1
ANTIC_OUT[3] := {  }
ANTIC_IN[3] := { {mem_ref<0B>,addr_expr<&hoist>}@.MEM_7(D) (0003), {plus_expr,hoist.1_2,4} (0004) }
S[3] := {  }
ANTIC_OUT[2] := {  }
ANTIC_IN[2] := { {mem_ref<0B>,addr_expr<&flag>}@.MEM_7(D) (0002) }
S[2] := {  }
Comment 2 Richard Biener 2012-03-12 09:50:05 UTC
(In reply to comment #1)
> (In reply to comment #0)
> 
> > The bark() function call is in the same basic block as "z = hoist + 4".  I wild
> > guess is that "hoist" isn't anticipatable at the *end* of the BB beginning with
> > "z = hoist + 4".  Splitting BB's at function calls may improve PRE.  Just a
> > guess...
> 
> What is anticipated at the end of BB is uninteresting. It is computed but not
> stored (only needed to compute ANTIC_IN, see compute_antic_aux).
> 
> You can check the ANTIC sets and AVAIL_OUT set with -fdump-tree-pre-details.
> The value for the expression "hoist+4" should be in EXP_GEN of the basic block
> with the call, and in AVAIL_OUT of the basic block with "y=hoist+4". But this
> isn't the case here:
> * the value representative for "y=hoist+4" is y.2_3->"{plus_expr,hoist.1_2,4}"
> * the value representative for "z=hoist+4" is y.2_5->"{plus_expr,hoist.1_2,4}"
>   (the name of the representative is y instead of z, probably due to
> copyrename)
> * SCCVN finds "Value numbers:(...)y.2_5=y.2_3, as expected
> * y.2_3 is in AVAIL_OUT of the then-block, as expected
> * {plus_expr,hoist.1_2,4} is in EXP_GEN of the block with the call, as expected
> * {plus_expr,hoist.1_2,4} is *not* in ANTIC_IN of the block with the call!
> 
> This is strange because ANTIC_IN = ANTIC_OUT U EXP_GEN - TMP_GEN

That's simply because ANTIC_IN in reality is clean (ANTIC_OUT U EXP_GEN - TMP_GEN) and clean () removes all values that are clobbered in that block
(and EXP_GEN is at BB granularity, so inside-BB flow is ignored).

Thus indeed, splitting blocks can improve things here, as would
on-the-fly construction of EXP_GEN/TMP_GEN and doing clean () on its
way when we compute ANTIC_IN.  Of course that would be more expensive
if we are iterating.

Note that the way we clean () for memory PRE is ugly anyway, probably
not really envisioned by the original paper (which I bet didn't deal
with aliasing ...)
Comment 3 Richard Biener 2012-03-12 14:02:59 UTC
As discussed, EXP_GEN can be cleaned from invalid mems at generation time,
mem-cleaning should happen separately from clean (), and applied to ANTIC_OUT
instead.

Mine for now, but don't hold your breath.
Comment 4 Richard Biener 2012-03-21 11:33:25 UTC
Created attachment 26940 [details]
patch

Patch.  Not sure what to do about PA_IN - can we even end up with any invalid
expressions in it, given we have a "clean" ANTIC_IN already?  Well, I guess
I have to understand what partial-antic computes ;)
Comment 5 Richard Biener 2012-03-21 11:40:49 UTC
Ah, indeed we can.  But we should be able to prune from PA_OUT.
Comment 6 Richard Biener 2012-03-22 13:15:04 UTC
Author: rguenth
Date: Thu Mar 22 13:14:54 2012
New Revision: 185691

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185691
Log:
2012-03-22  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/52548
	* tree-ssa-pre.c (valid_in_sets): Remove handling of invalidation
	because of clobbers.
	(prune_clobbered_mems): New function.
	(compute_antic_aux): Use it to prune ANTIC_OUT.
	(compute_partial_antic_aux): Use it to prune PA_IN.
	(compute_avail): Only insert expressions into EXP_GEN that
	are not invalidated when translated up to the beginning of
	the block.

	* gcc.dg/tree-ssa/ssa-pre-29.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-29.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-pre.c
Comment 7 Richard Biener 2012-03-22 13:16:40 UTC
Fixed.