[PATCH, RFC] Expose target addressing modes before expand
William J. Schmidt
wschmidt@linux.vnet.ibm.com
Mon Feb 7 17:20:00 GMT 2011
Hi Richi,
Thanks very much for the quick and thorough response! Comments inline
below.
Richard Guenther wrote:
> On Fri, 4 Feb 2011, William J. Schmidt wrote:
>
>
>> Problem report 46556 (http://gcc.gnu.org/PR46556) identifies a code-size
>> regression for powerpc targets. The root of the problem is changes in
>> address canonicalization a couple of years back. Discussion in the PR
>> suggests some form of exposure of target addressing modes prior to the
>> expand phase. Steven Bosscher (CCed) suggested this be done in a late
>> GIMPLE pass, using logic similar to what IVopts does for loops.
>>
>> Following is an early prototype of such a pass. I've verified that it
>> restores the regressed code to its earlier, more compact form, and most
>> of the addressing sequences on powerpc appear to be better or the same
>> as before. There are at least some complex expressions where early
>> exposure produces worse code in the form of extra multiplies; I have
>> some ideas for how to tune that down so the shift-add machinery for
>> multiplies isn't derailed. There are still 16 test cases that regress
>> on rev 169834 for powerpc, and I haven't examined other targets yet.
>> Clearly this is not yet ready, but before investing time into further
>> refinement, I'd like to get commentary on the general approach, as well
>> as any specific concerns. Please bear in mind that I am new to GCC, so
>> I'm still familiarizing myself with the infrastructure and am likely to
>> miss some things that are obvious to the initiated...
>>
>> The general approach is to rely on existing machinery used by
>> tree-ssa-loop-ivopts.c, simplifying out the parts that have to do with
>> induction variable optimizations. The new logic is pretty simple, but
>> the interactions with later phases may not be. For now, I've placed the
>> new pass in passes.c as the last -O1 tree-ssa pass.
>>
>> In addition to general comments, I'd be interested in answers to some
>> specific questions:
>>
>> 1) Are there concerns with early exposure of target addressing nodes on
>> other targets? Should the pass be gated to only operate on a subset of
>> targets?
>> 2) Is the pass placement appropriate?
>> 3) Have I missed any necessary fix-ups after modifying the GIMPLE code?
>>
>> Thanks very much for your insights!
>>
>
> Comments inline.
>
>
>> Bill
>>
>>
>> [gcc]
>> 2011-02-04 William Schmidt <wschmidt@linux.vnet.ibm.com>
>>
>> * tree.h (copy_ref_info): New prototype of existing function.
>> * tree-pass.h (pass_tree_addr_mode): New declaration.
>> * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
>> Remove static keyword.
>> * timevar.def (TV_TREE_ADDR_MODE): New timevar definition.
>> * tree-ssa-address.c (addr_to_parts): Change to handle null
>> iv_cand parameter.
>> * tree-ssa-addr-mode.c: New.
>> * tree-flow.h (may_be_unaligned_p): New prototype of existing
>> function.
>> * Makefile.in: Add build step for tree-ssa-addr-mode.o.
>> * passes.c: Add pass for pass_tree_addr_mode.
>>
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h (revision 169834)
>> +++ gcc/tree.h (working copy)
>> @@ -5580,6 +5580,9 @@
>> extern tree tree_mem_ref_addr (tree, tree);
>> extern void copy_mem_ref_info (tree, tree);
>>
>> +/* In tree-ssa-loop-ivopts.c */
>> +extern void copy_ref_info (tree, tree);
>> +
>> /* In tree-vrp.c */
>> extern bool ssa_name_nonnegative_p (const_tree);
>>
>> Index: gcc/tree-pass.h
>> ===================================================================
>> --- gcc/tree-pass.h (revision 169834)
>> +++ gcc/tree-pass.h (working copy)
>> @@ -385,6 +385,7 @@
>> extern struct gimple_opt_pass pass_parallelize_loops;
>> extern struct gimple_opt_pass pass_loop_prefetch;
>> extern struct gimple_opt_pass pass_iv_optimize;
>> +extern struct gimple_opt_pass pass_tree_addr_mode;
>> extern struct gimple_opt_pass pass_tree_loop_done;
>> extern struct gimple_opt_pass pass_ch;
>> extern struct gimple_opt_pass pass_ccp;
>> Index: gcc/tree-ssa-loop-ivopts.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-ivopts.c (revision 169834)
>> +++ gcc/tree-ssa-loop-ivopts.c (working copy)
>> @@ -1604,7 +1604,7 @@
>>
>> /* Returns true if memory reference REF with step STEP may be unaligned. */
>>
>> -static bool
>> +bool
>> may_be_unaligned_p (tree ref, tree step)
>> {
>> tree base;
>> @@ -5916,7 +5916,7 @@
>>
>> /* Copies the reference information from OLD_REF to NEW_REF. */
>>
>> -static void
>> +void
>> copy_ref_info (tree new_ref, tree old_ref)
>> {
>> tree new_ptr_base = NULL_TREE;
>> Index: gcc/timevar.def
>> ===================================================================
>> --- gcc/timevar.def (revision 169834)
>> +++ gcc/timevar.def (working copy)
>> @@ -236,6 +236,7 @@
>> DEFTIMEVAR (TV_TREE_UNINIT , "uninit var analysis")
>> DEFTIMEVAR (TV_PLUGIN_INIT , "plugin initialization")
>> DEFTIMEVAR (TV_PLUGIN_RUN , "plugin execution")
>> +DEFTIMEVAR (TV_TREE_ADDR_MODE , "expose addressing modes")
>>
>
> I think we should name this pass lower-address or similar, it is
> really doing target specific lowering of memory accesses.
>
>
Agreed.
>> /* Everything else in rest_of_compilation not included above. */
>> DEFTIMEVAR (TV_EARLY_LOCAL , "early local passes")
>> Index: gcc/tree-ssa-address.c
>> ===================================================================
>> --- gcc/tree-ssa-address.c (revision 169834)
>> +++ gcc/tree-ssa-address.c (working copy)
>> @@ -631,7 +631,7 @@
>> is <= 2 -- in that case, no loop invariant code motion can be
>> exposed. */
>>
>> - if (!base_hint && (addr->n > 2))
>> + if (!base_hint && (addr->n > 2) && iv_cand)
>> move_variant_to_index (parts, addr, iv_cand);
>>
>> /* First move the most expensive feasible multiplication
>> Index: gcc/tree-ssa-addr-mode.c
>> ===================================================================
>> --- gcc/tree-ssa-addr-mode.c (revision 0)
>> +++ gcc/tree-ssa-addr-mode.c (revision 0)
>> @@ -0,0 +1,251 @@
>> +/* Expose target addressing modes in gimple representation.
>> + Copyright (C) 2011
>> + Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it
>> +under the terms of the GNU General Public License as published by the
>> +Free Software Foundation; either version 3, or (at your option) any
>> +later version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3. If not see
>> +<http://www.gnu.org/licenses/>. */
>> +
>> +/* This pass performs a scan over the gimple representation to expose
>> + target machine addressing in reference-class expressions. This is
>> + necessary for some targets for which the current address
>> + canonicalization is suboptimal. Induction variable optimizations
>> + already expose target addressing for many (but not all) expressions
>> + in loops, so this has most benefit in non-loop code. Algorithms
>> + used here are simplifications of those used in the induction
>> + variable optimizations. */
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "tree.h"
>> +#include "basic-block.h"
>> +#include "tree-dump.h"
>> +#include "tree-pass.h"
>> +#include "timevar.h"
>> +#include "cfgloop.h"
>> +#include "gimple.h"
>> +#include "tree-flow.h"
>> +#include "tree-affine.h"
>> +
>> +extern void copy_ref_info (tree, tree);
>> +
>> +static bool
>> +tree_contains_mem_ref_p (tree base)
>> +{
>> + unsigned int i;
>> + enum tree_code code;
>> +
>> + if (!base)
>> + return false;
>> +
>> + code = TREE_CODE (base);
>> + if (code == TARGET_MEM_REF || code == MEM_REF)
>> + return true;
>> +
>> + for (i = 0; i < TREE_CODE_LENGTH (code); i++)
>> + if (tree_contains_mem_ref_p (TREE_OPERAND (base, i)))
>> + return true;
>> +
>> + return false;
>> +}
>>
>
> I think you don't need this function, but see below.
>
>
This was added when I encountered a number of ICEs downstream. I wasn't
sure at the time whether to treat those as real bugs or whether what I
was doing was breaking a contract with a downstream pass. I will remove
this and investigate those regressions more carefully.
>> +/* Expose target addressing modes in EXPR. Logic extracted and
>> + simplified from tree-ssa-loop-ivopts.c to handle non-ivar
>> + opportunities. */
>> +
>> +static bool
>> +tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool speed)
>> +{
>> + tree base = *expr, mem_ref;
>> + struct pointer_map_t *cache;
>> + aff_tree aff;
>> +
>> + if (REFERENCE_CLASS_P (*expr))
>>
>
> I personally prefer indentation like
>
> if (!REFERENCE_CLASS_P (*expr))
> return false;
>
>
So do I. Will clean this up.
>> + {
>> + /* If the expression is a virtual method table reference, don't attempt
>> + this. Incorrect code generation occurs; an example test case is
>> + testsuite/g++.dg/opt/pr47366.C. */
>> + if (TREE_CODE (base) == COMPONENT_REF
>> + && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
>> + return false;
>>
>
> ;) It probably confuses Martins devirtualization code.
>
>
>> + base = unshare_expr (base);
>>
>
> Ick you don't need this I think.
>
>
Agreed. Left over from my initial theft of code from IVopts; should be
more conservative.
>> + /*
>> + if (TREE_CODE (base) == TARGET_MEM_REF)
>> + {
>> + tree type = build_pointer_type (TREE_TYPE (base));
>> + base = tree_mem_ref_addr (type, base);
>> + }
>> + */
>> + /* If the addressing mode is already exposed, let it be.
>> + #### I am not certain this doesn't miss out on some
>> + opportunities, so just in case I'm leaving the above
>> + code commented out in place for now. But I think all this
>> + does is make things more explicit that CSE can catch
>> + anyhow, and in some cases it produces expressions that
>> + the may-aliasing infrastructure can't handle. */
>> + if (tree_contains_mem_ref_p (base))
>>
>
> TREE_CODE (base) == TARGET_MEM_REF || TREE_CODE (base) == MEM_REF
>
> sub-operands can never be complex. I _think_ you want to not
> bail out for MEM_REFs, though.
>
>
Per above, will investigate the regressions that resulted.
>> + return false;
>> + else
>>
>
> no need for else {
>
>
>> + {
>> + /* Check that the base expression is addressable. */
>> + if (may_be_nonaddressable_p (base))
>> + return false;
>>
>
> Correct (the access can include a V_C_E or a bitfield-ref, in which
> case we cannot compute the address of the base of the outermost
> access -- the variable name "base" is probably misleading here).
>
> What we want to do during memory access lowering is decompose the
> access into a part that we can always take the address of,
> like for a bitfield access a.b.c into a.b and .c, eventually
> re-write a.b and re-attach .c. I have plans to re-surrect a lowering
> pass I developed some years ago for 4.7 (it's on the mem-ref branch).
>
>
>> + /* On strict alignment platforms, check that the base expression
>> + is sufficiently aligned. There is no additional "step"
>> + information required, so specify single-byte alignment
>> + for that parameter. */
>> + if (STRICT_ALIGNMENT && may_be_unaligned_p (base, size_one_node))
>> + return false;
>> +
>> + base = build_fold_addr_expr (base);
>>
>
> This is probably where you require unshare_expr ...
>
>
>> + }
>> +
>> + /* Express the address of the original expression as an affine
>> + combination of trees, looking through SSA defs. */
>> + cache = pointer_map_create ();
>> + tree_to_aff_combination_expand (base, TREE_TYPE (base), &aff, &cache);
>> + pointer_map_destroy (cache);
>>
>
> ... but here the result shouldn't require it anymore. It's better
> to make a wrapper/worker of tree_to_aff_combination_expand that
> can handle non-addresses. Probably generally useful (did you run
> into actual problems w/o unshare_expr?).
>
>
I will do this and place it in tree-affine.[ch]. I did not run into
actual problems without unshare_expr, and agree it is likely unnecessary.
>> +
>> + /* It is possible for aff to have no elements, in which case do
>> + nothing. Example: lhs = MEM[(struct foo *) 0B].bar does not
>> + produce an affine combination. */
>> + if (!aff.n)
>> + return false;
>> +
>> + /* Replace the input expression with an equivalent memory reference
>> + legal on the target machine. */
>> + mem_ref = create_mem_ref (gsi, TREE_TYPE (*expr), &aff,
>> + reference_alias_ptr_type (*expr),
>> + NULL_TREE, NULL_TREE, speed);
>>
>
> Doing the above will un-CSE a lot of dependent computations. I think
> you want to limit the SSA walking of tree_to_aff_combination_expand
> or invent some mechanism to re-use existing values from the IL.
>
> That said - the above needs lots of work ;) Like it misses a
> cost function - we should try to re-use base or index registers
> across memory accesses, like IVOPTs does in loops. It probably
> is enough to collect candidates on a per-BB basis for this.
>
>
I completely agree. As noted, I saw some poor results involving extra
multiplies, and realized I have some work cut out here. I was initially
trying not to redesign the wheel where possible, but the existing
machinery will need more refinement. I will consult with you as I get
deeper into that.
>> + /* Ensure the memory reference was not built with a base expression
>> + of zero, which confuses the may-aliasing in subsequent dead
>> + store elimination. Assume this isn't a useful replacement when
>> + this occurs. */
>> + if (integer_zerop (TREE_OPERAND (mem_ref, 0)))
>> + return false;
>>
>
> Huh, that would be a bug in create_mem_ref I suppose. Or in alias
> analysis.
>
>
I saw this several times, IIRC usually when processing existing
mem_refs. I threw this catch-all in so I could make progress. I will
dig into this further and let you know what I'm seeing.
>> + /* Finish the replacement. */
>> + copy_ref_info (mem_ref, *expr);
>> + *expr = mem_ref;
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>> +/* Expose target addressing modes in STMT, which must be an assignment. */
>> +
>> +static void
>> +tree_ssa_addr_mode_stmt (gimple stmt, gimple_stmt_iterator *gsi, bool speed)
>> +{
>> + tree *lhs, *rhs1, *rhs2, *rhs3;
>> + enum tree_code subcode = gimple_assign_rhs_code (stmt);
>> + enum gimple_rhs_class rhs_class = get_gimple_rhs_class (subcode);
>> + bool changed;
>> +
>> + /* Bitfields are being ignored for now, as they are in induction
>> + variable optimization. */
>> + if (subcode == BIT_FIELD_REF)
>> + return;
>>
>
> You can have BIT_FIELD_REFs on the lhs as well.
>
> Note that COMPONENT_REFs with a DECL_BIT_FIELD FIELD_DECL are also
> bit-field accesses.
>
> It's probably easier to move the checks into tree_ssa_addr_mode_tree
> and operate on TREE_CODE of the reference.
>
>
OK, thanks!
>> + lhs = gimple_assign_lhs_ptr (stmt);
>> + changed = tree_ssa_addr_mode_tree (lhs, gsi, speed);
>> +
>> + rhs1 = gimple_assign_rhs1_ptr (stmt);
>> + changed = tree_ssa_addr_mode_tree (rhs1, gsi, speed) || changed;
>>
>
> No need for all the code below - there are never multiple memory
> accesses (you are interested in) in a single statement.
>
>
OK. I thought as much, but wasn't certain whether it was a rule I could
rely on as future ops are added. I'll remove the extra logic.
>> + if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS)
>> + {
>> + rhs2 = gimple_assign_rhs2_ptr (stmt);
>> + changed = tree_ssa_addr_mode_tree (rhs2, gsi, speed) || changed;
>> + }
>> +
>> + if (rhs_class == GIMPLE_TERNARY_RHS)
>> + {
>> + rhs3 = gimple_assign_rhs3_ptr (stmt);
>> + changed = tree_ssa_addr_mode_tree (rhs3, gsi, speed) || changed;
>> + }
>> +
>> + if (changed)
>> + update_stmt (stmt);
>> +}
>> +
>> +/* Process the statements in block BB. */
>> +
>> +static void
>> +tree_ssa_addr_mode_block (basic_block bb)
>> +{
>> + gimple_stmt_iterator gsi;
>> +
>> + bool speed = optimize_bb_for_speed_p (bb);
>> +
>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> + {
>> + gimple stmt = gsi_stmt (gsi);
>> +
>> + /* #### Avoiding volatile memory refs like tree-ssa-loop-ivopts.c
>> + does, but need to revisit whether that's necessary. */
>>
>
> Do you want to iterate over only memory accesses or also over
> address computations? I suppose you are only interested in
> scalar memory accesses (memory accesses with some non-BLKmode).
> In that case you get all relevant stms with
>
> if (gimple_vuse (stmt)
> && gimple_assign_single_p (stmt))
>
>
OK, good idea. Any thoughts on volatile? It seems like lowering those
should be harmless, unless the flagging as volatile is somehow lost by
the lowering.
>> + if (is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
>> + {
>> + tree_ssa_addr_mode_stmt (stmt, &gsi, speed);
>> + }
>> + }
>> +}
>>
>
> I think you got the easy pieces done - now the hard part of
> creating target dependent addressing that actually helps is missing.
> And to not do too much un-CSE that will never be cleaned up afterwards.
>
> Richard.
>
>
/nod. Thanks very much! I appreciate the help and will work on
refining this.
>> +/* Main entry point for exposing target addressing modes not caught
>> + by tree-ssa-loop-ivopts.c. */
>> +
>> +static unsigned int
>> +tree_ssa_addr_mode (void)
>> +{
>> + basic_block bb;
>> + bool chicken_switch = false;
>> +
>> + /* #### Temporary debug aid; set to true to disable this phase in debugger. */
>> + if (chicken_switch)
>> + return 0;
>> +
>> + FOR_EACH_BB (bb)
>> + tree_ssa_addr_mode_block (bb);
>> +
>> + return 0;
>> +}
>> +
>> +struct gimple_opt_pass pass_tree_addr_mode =
>> +{
>> + {
>> + GIMPLE_PASS,
>> + "addrmode", /* name */
>> + NULL, /* gate */
>> + tree_ssa_addr_mode, /* execute */
>> + NULL, /* sub */
>> + NULL, /* next */
>> + 0, /* static_pass_number */
>> + TV_TREE_ADDR_MODE, /* tv_id */
>> + PROP_cfg, /* properties_required */
>> + 0, /* properties_provided */
>> + 0, /* properties_destroyed */
>> + 0, /* todo_flags_start */
>> + /* #### Probably can remove TODO_rebuild_alias. */
>> + TODO_dump_func | TODO_update_ssa |
>> + TODO_rebuild_alias /* todo_flags_finish */
>> + }
>> +};
>> Index: gcc/tree-flow.h
>> ===================================================================
>> --- gcc/tree-flow.h (revision 169834)
>> +++ gcc/tree-flow.h (working copy)
>> @@ -808,6 +808,7 @@
>> addr_space_t);
>> unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
>> bool may_be_nonaddressable_p (tree expr);
>> +bool may_be_unaligned_p (tree ref, tree step);
>>
>> /* In tree-ssa-threadupdate.c. */
>> extern bool thread_through_all_blocks (bool);
>> Index: gcc/Makefile.in
>> ===================================================================
>> --- gcc/Makefile.in (revision 169834)
>> +++ gcc/Makefile.in (working copy)
>> @@ -1389,6 +1389,7 @@
>> tree-sra.o \
>> tree-switch-conversion.o \
>> tree-ssa-address.o \
>> + tree-ssa-addr-mode.o \
>> tree-ssa-alias.o \
>> tree-ssa-ccp.o \
>> tree-ssa-coalesce.o \
>> @@ -2556,6 +2557,9 @@
>> $(TREE_PASS_H) $(FLAGS_H) $(TREE_INLINE_H) $(RECOG_H) insn-config.h \
>> $(EXPR_H) gt-tree-ssa-address.h $(GGC_H) tree-affine.h $(TARGET_H) \
>> tree-pretty-print.h
>> +tree-ssa-addr-mode.o : tree-ssa-addr-mode.c $(CONFIG_H) $(SYSTEM_H) \
>> + coretypes.h $(TREE_H) $(BASIC_BLOCK_H) $(TREE_DUMP_H) $(TREE_PASS_H) \
>> + $(TIMEVAR_H) $(CFGLOOP_H) $(GIMPLE_H) $(TREE_FLOW_H) $(TREE_AFFINE_H)
>> tree-ssa-loop-niter.o : tree-ssa-loop-niter.c $(TREE_FLOW_H) $(CONFIG_H) \
>> $(SYSTEM_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(PARAMS_H) \
>> $(TREE_INLINE_H) output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
>> Index: gcc/passes.c
>> ===================================================================
>> --- gcc/passes.c (revision 169834)
>> +++ gcc/passes.c (working copy)
>> @@ -944,6 +944,10 @@
>> NEXT_PASS (pass_rename_ssa_copies);
>> NEXT_PASS (pass_uncprop);
>> NEXT_PASS (pass_local_pure_const);
>> + /* Note: pass_tree_addr_mode should be as close to pass_expand as
>> + possible. Particularly must be after DCE since DCE has
>> + trouble seeing through some of the generated mem_refs. */
>> + NEXT_PASS (pass_tree_addr_mode);
>> }
>> NEXT_PASS (pass_lower_complex_O0);
>> NEXT_PASS (pass_cleanup_eh);
>>
>>
>>
>>
>
>
More information about the Gcc-patches
mailing list