This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something
- From: Richard Guenther <rguenther at suse dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Daniel Berlin <dberlin at dberlin dot org>
- Date: Tue, 14 Jul 2009 17:01:09 +0200 (CEST)
- Subject: [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something
This tries to get rid of the sledge-hammer that disables PRE if
the current function is not optimized for speed (which is always
true if optimize_size is set ...).
The idea is to allow regular and phi-translation triggered full redundancy
elimination to be performed even if a path is to be optimized for size.
Thus, we limit us to only perform insertions when we can remove a full
redundancy on a path in the CFG we want to optimize for speed.
The effect of this patch is that PRE is now enabled at -Os and performs
only full redundancy elimination (thus as if PRE would run but never
insert anything). This should reduce code-size and remove the odd
behavior that a VN before loop optimizations is missing at -Os.
I wonder if we don't want to specialize the case where the value is
available in all preds but is not the same value (thus, we'd only
need to insert a PHI node). For low indegree blocks this might
result in smaller code as well.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
Any comments?
Thanks,
Richard.
2009-07-14 Richard Guenther <rguenther@suse.de>
* tree-ssa-pre.c (do_regular_insertion): Only insert if a
redundancy along a path in the CFG we want to optimize for speed
is going to be removed.
(execute_pre): Do partial-PRE only if the function is to be
optimized for speed.
(gate_pre): Do not turn off all of PRE when not optimizing a
function for speed.
Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c (revision 149626)
--- gcc/tree-ssa-pre.c (working copy)
*************** do_regular_insertion (basic_block block,
*** 3352,3357 ****
--- 3352,3358 ----
pre_expr eprime = NULL;
edge_iterator ei;
pre_expr edoubleprime = NULL;
+ bool do_insertion = false;
val = get_expr_value_id (expr);
if (bitmap_set_contains_value (PHI_GEN (block), val))
*************** do_regular_insertion (basic_block block,
*** 3403,3408 ****
--- 3404,3413 ----
{
avail[bprime->index] = edoubleprime;
by_some = true;
+ /* We want to perform insertions to remove a redundancy on
+ a path in the CFG we want to optimize for speed. */
+ if (optimize_edge_for_speed_p (pred))
+ do_insertion = true;
if (first_s == NULL)
first_s = edoubleprime;
else if (!pre_expr_eq (first_s, edoubleprime))
*************** do_regular_insertion (basic_block block,
*** 3413,3419 ****
already existing along every predecessor, and
it's defined by some predecessor, it is
partially redundant. */
! if (!cant_insert && !all_same && by_some && dbg_cnt (treepre_insert))
{
if (insert_into_preds_of_block (block, get_expression_id (expr),
avail))
--- 3418,3425 ----
already existing along every predecessor, and
it's defined by some predecessor, it is
partially redundant. */
! if (!cant_insert && !all_same && by_some && do_insertion
! && dbg_cnt (treepre_insert))
{
if (insert_into_preds_of_block (block, get_expression_id (expr),
avail))
*************** fini_pre (bool do_fre)
*** 4475,4485 ****
only wants to do full redundancy elimination. */
static unsigned int
! execute_pre (bool do_fre ATTRIBUTE_UNUSED)
{
unsigned int todo = 0;
! do_partial_partial = optimize > 2;
/* This has to happen before SCCVN runs because
loop_optimizer_init may create new phis, etc. */
--- 4481,4491 ----
only wants to do full redundancy elimination. */
static unsigned int
! execute_pre (bool do_fre)
{
unsigned int todo = 0;
! do_partial_partial = optimize > 2 && optimize_function_for_speed_p (cfun);
/* This has to happen before SCCVN runs because
loop_optimizer_init may create new phis, etc. */
*************** do_pre (void)
*** 4563,4570 ****
static bool
gate_pre (void)
{
! /* PRE tends to generate bigger code. */
! return flag_tree_pre != 0 && optimize_function_for_speed_p (cfun);
}
struct gimple_opt_pass pass_pre =
--- 4569,4575 ----
static bool
gate_pre (void)
{
! return flag_tree_pre != 0;
}
struct gimple_opt_pass pass_pre =