[PATCH, RFC] Expose target addressing modes before expand

William J. Schmidt wschmidt@linux.vnet.ibm.com
Mon Feb 7 17:20:00 GMT 2011


Hi Richi,

Thanks very much for the quick and thorough response!  Comments inline 
below.

Richard Guenther wrote:
> On Fri, 4 Feb 2011, William J. Schmidt wrote:
>
>   
>> Problem report 46556 (http://gcc.gnu.org/PR46556) identifies a code-size
>> regression for powerpc targets.  The root of the problem is changes in
>> address canonicalization a couple of years back.  Discussion in the PR
>> suggests some form of exposure of target addressing modes prior to the
>> expand phase.  Steven Bosscher (CCed) suggested this be done in a late
>> GIMPLE pass, using logic similar to what IVopts does for loops.
>>
>> Following is an early prototype of such a pass.  I've verified that it
>> restores the regressed code to its earlier, more compact form, and most
>> of the addressing sequences on powerpc appear to be better or the same
>> as before.  There are at least some complex expressions where early
>> exposure produces worse code in the form of extra multiplies; I have
>> some ideas for how to tune that down so the shift-add machinery for
>> multiplies isn't derailed.  There are still 16 test cases that regress
>> on rev 169834 for powerpc, and I haven't examined other targets yet.
>> Clearly this is not yet ready, but before investing time into further
>> refinement, I'd like to get commentary on the general approach, as well
>> as any specific concerns.  Please bear in mind that I am new to GCC, so
>> I'm still familiarizing myself with the infrastructure and am likely to
>> miss some things that are obvious to the initiated...
>>
>> The general approach is to rely on existing machinery used by
>> tree-ssa-loop-ivopts.c, simplifying out the parts that have to do with
>> induction variable optimizations.  The new logic is pretty simple, but
>> the interactions with later phases may not be.  For now, I've placed the
>> new pass in passes.c as the last -O1 tree-ssa pass.
>>
>> In addition to general comments, I'd be interested in answers to some
>> specific questions:
>>
>> 1) Are there concerns with early exposure of target addressing nodes on
>> other targets?  Should the pass be gated to only operate on a subset of
>> targets?
>> 2) Is the pass placement appropriate?
>> 3) Have I missed any necessary fix-ups after modifying the GIMPLE code?
>>
>> Thanks very much for your insights!
>>     
>
> Comments inline.
>
>   
>> Bill
>>
>>
>> [gcc]
>> 2011-02-04  William Schmidt  <wschmidt@linux.vnet.ibm.com>
>>
>>         * tree.h (copy_ref_info): New prototype of existing function.
>>         * tree-pass.h (pass_tree_addr_mode): New declaration.
>>         * tree-ssa-loop-ivopts.c (may_be_unaligned_p, copy_ref_info):
>>         Remove static keyword.
>>         * timevar.def (TV_TREE_ADDR_MODE): New timevar definition.
>>         * tree-ssa-address.c (addr_to_parts): Change to handle null
>>         iv_cand parameter.
>>         * tree-ssa-addr-mode.c:  New.
>>         * tree-flow.h (may_be_unaligned_p): New prototype of existing
>>         function.
>>         * Makefile.in: Add build step for tree-ssa-addr-mode.o.
>>         * passes.c: Add pass for pass_tree_addr_mode.
>>
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h  (revision 169834)
>> +++ gcc/tree.h  (working copy)
>> @@ -5580,6 +5580,9 @@
>>  extern tree tree_mem_ref_addr (tree, tree);
>>  extern void copy_mem_ref_info (tree, tree);
>>  
>> +/* In tree-ssa-loop-ivopts.c */
>> +extern void copy_ref_info (tree, tree);
>> +
>>  /* In tree-vrp.c */
>>  extern bool ssa_name_nonnegative_p (const_tree);
>>  
>> Index: gcc/tree-pass.h
>> ===================================================================
>> --- gcc/tree-pass.h     (revision 169834)
>> +++ gcc/tree-pass.h     (working copy)
>> @@ -385,6 +385,7 @@
>>  extern struct gimple_opt_pass pass_parallelize_loops;
>>  extern struct gimple_opt_pass pass_loop_prefetch;
>>  extern struct gimple_opt_pass pass_iv_optimize;
>> +extern struct gimple_opt_pass pass_tree_addr_mode;
>>  extern struct gimple_opt_pass pass_tree_loop_done;
>>  extern struct gimple_opt_pass pass_ch;
>>  extern struct gimple_opt_pass pass_ccp;
>> Index: gcc/tree-ssa-loop-ivopts.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-ivopts.c  (revision 169834)
>> +++ gcc/tree-ssa-loop-ivopts.c  (working copy)
>> @@ -1604,7 +1604,7 @@
>>  
>>  /* Returns true if memory reference REF with step STEP may be unaligned.  */
>>  
>> -static bool
>> +bool
>>  may_be_unaligned_p (tree ref, tree step)
>>  {
>>    tree base;
>> @@ -5916,7 +5916,7 @@
>>  
>>  /* Copies the reference information from OLD_REF to NEW_REF.  */
>>  
>> -static void
>> +void
>>  copy_ref_info (tree new_ref, tree old_ref)
>>  {
>>    tree new_ptr_base = NULL_TREE;
>> Index: gcc/timevar.def
>> ===================================================================
>> --- gcc/timevar.def     (revision 169834)
>> +++ gcc/timevar.def     (working copy)
>> @@ -236,6 +236,7 @@
>>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
>> +DEFTIMEVAR (TV_TREE_ADDR_MODE        , "expose addressing modes")
>>     
>
> I think we should name this pass lower-address or similar, it is
> really doing target specific lowering of memory accesses.
>
>   
Agreed.
>>  /* Everything else in rest_of_compilation not included above.  */
>>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
>> Index: gcc/tree-ssa-address.c
>> ===================================================================
>> --- gcc/tree-ssa-address.c      (revision 169834)
>> +++ gcc/tree-ssa-address.c      (working copy)
>> @@ -631,7 +631,7 @@
>>       is <= 2 -- in that case, no loop invariant code motion can be
>>       exposed.  */
>>  
>> -  if (!base_hint && (addr->n > 2))
>> +  if (!base_hint && (addr->n > 2) && iv_cand)
>>      move_variant_to_index (parts, addr, iv_cand);
>>  
>>    /* First move the most expensive feasible multiplication
>> Index: gcc/tree-ssa-addr-mode.c
>> ===================================================================
>> --- gcc/tree-ssa-addr-mode.c    (revision 0)
>> +++ gcc/tree-ssa-addr-mode.c    (revision 0)
>> @@ -0,0 +1,251 @@
>> +/* Expose target addressing modes in gimple representation.
>> +   Copyright (C) 2011
>> +   Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it
>> +under the terms of the GNU General Public License as published by the
>> +Free Software Foundation; either version 3, or (at your option) any
>> +later version.
>> +
>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +/* This pass performs a scan over the gimple representation to expose
>> +   target machine addressing in reference-class expressions.  This is
>> +   necessary for some targets for which the current address
>> +   canonicalization is suboptimal.  Induction variable optimizations
>> +   already expose target addressing for many (but not all) expressions
>> +   in loops, so this has most benefit in non-loop code.  Algorithms
>> +   used here are simplifications of those used in the induction
>> +   variable optimizations. */
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "tree.h"
>> +#include "basic-block.h"
>> +#include "tree-dump.h"
>> +#include "tree-pass.h"
>> +#include "timevar.h"
>> +#include "cfgloop.h"
>> +#include "gimple.h"
>> +#include "tree-flow.h"
>> +#include "tree-affine.h"
>> +
>> +extern void copy_ref_info (tree, tree);
>> +
>> +static bool
>> +tree_contains_mem_ref_p (tree base)
>> +{
>> +  unsigned int i;
>> +  enum tree_code code;
>> +
>> +  if (!base)
>> +    return false;
>> +
>> +  code = TREE_CODE (base);
>> +  if (code == TARGET_MEM_REF || code == MEM_REF)
>> +    return true;
>> +
>> +  for (i = 0; i < TREE_CODE_LENGTH (code); i++)
>> +    if (tree_contains_mem_ref_p (TREE_OPERAND (base, i)))
>> +      return true;
>> +
>> +  return false;
>> +}
>>     
>
> I think you don't need this function, but see below.
>
>   
This was added when I encountered a number of ICEs downstream.  I wasn't 
sure at the time whether to treat those as real bugs or whether what I 
was doing was breaking a contract with a downstream pass.  I will remove 
this and investigate those regressions more carefully.
>> +/* Expose target addressing modes in EXPR.  Logic extracted and
>> +   simplified from tree-ssa-loop-ivopts.c to handle non-ivar
>> +   opportunities. */
>> +
>> +static bool
>> +tree_ssa_addr_mode_tree (tree *expr, gimple_stmt_iterator *gsi, bool speed)
>> +{
>> +  tree base = *expr, mem_ref;
>> +  struct pointer_map_t *cache;
>> +  aff_tree aff;
>> +
>> +  if (REFERENCE_CLASS_P (*expr))
>>     
>
> I personally prefer indentation like
>
>      if (!REFERENCE_CLASS_P (*expr))
>        return false;
>
>   
So do I.  Will clean this up.
>> +    {
>> +      /* If the expression is a virtual method table reference, don't attempt
>> +        this.  Incorrect code generation occurs; an example test case is
>> +        testsuite/g++.dg/opt/pr47366.C. */
>> +      if (TREE_CODE (base) == COMPONENT_REF
>> +         && DECL_VIRTUAL_P (TREE_OPERAND (base, 1)))
>> +       return false;
>>     
>
> ;)  It probably confuses Martins devirtualization code.
>
>   
>> +      base = unshare_expr (base);
>>     
>
> Ick you don't need this I think.
>
>   
Agreed.  Left over from my initial theft of code from IVopts; should be 
more conservative.
>> +      /*
>> +      if (TREE_CODE (base) == TARGET_MEM_REF)
>> +       {
>> +         tree type = build_pointer_type (TREE_TYPE (base));
>> +         base = tree_mem_ref_addr (type, base);
>> +       }
>> +      */
>> +      /* If the addressing mode is already exposed, let it be.
>> +         #### I am not certain this doesn't miss out on some 
>> +         opportunities, so just in case I'm leaving the above
>> +         code commented out in place for now.  But I think all this
>> +         does is make things more explicit that CSE can catch
>> +         anyhow, and in some cases it produces expressions that
>> +         the may-aliasing infrastructure can't handle. */
>> +      if (tree_contains_mem_ref_p (base))
>>     
>
> TREE_CODE (base) == TARGET_MEM_REF || TREE_CODE (base) == MEM_REF
>
> sub-operands can never be complex.  I _think_ you want to not
> bail out for MEM_REFs, though.
>
>   
Per above, will investigate the regressions that resulted.
>> +       return false;
>> +      else
>>     
>
> no need for else {
>
>   
>> +       {
>> +         /* Check that the base expression is addressable. */
>> +         if (may_be_nonaddressable_p (base))
>> +           return false;
>>     
>
> Correct (the access can include a V_C_E or a bitfield-ref, in which
> case we cannot compute the address of the base of the outermost
> access -- the variable name "base" is probably misleading here).
>
> What we want to do during memory access lowering is decompose the
> access into a part that we can always take the address of,
> like for a bitfield access a.b.c into a.b and .c, eventually
> re-write a.b and re-attach .c.  I have plans to re-surrect a lowering
> pass I developed some years ago for 4.7 (it's on the mem-ref branch).
>
>   
>> +         /* On strict alignment platforms, check that the base expression
>> +            is sufficiently aligned.  There is no additional "step"
>> +             information required, so specify single-byte alignment
>> +             for that parameter.  */
>> +         if (STRICT_ALIGNMENT && may_be_unaligned_p (base, size_one_node))
>> +           return false;
>> +
>> +         base = build_fold_addr_expr (base);
>>     
>
> This is probably where you require unshare_expr ...
>
>   
>> +       }
>> +
>> +      /* Express the address of the original expression as an affine
>> +         combination of trees, looking through SSA defs. */
>> +      cache = pointer_map_create ();
>> +      tree_to_aff_combination_expand (base, TREE_TYPE (base), &aff, &cache);
>> +      pointer_map_destroy (cache);
>>     
>
> ... but here the result shouldn't require it anymore.  It's better
> to make a wrapper/worker of tree_to_aff_combination_expand that
> can handle non-addresses.  Probably generally useful (did you run
> into actual problems w/o unshare_expr?).
>
>   
I will do this and place it in tree-affine.[ch].  I did not run into 
actual problems without unshare_expr, and agree it is likely unnecessary.
>> +
>> +      /* It is possible for aff to have no elements, in which case do
>> +        nothing.  Example: lhs = MEM[(struct foo *) 0B].bar does not
>> +         produce an affine combination. */
>> +      if (!aff.n)
>> +       return false;
>> +
>> +      /* Replace the input expression with an equivalent memory reference
>> +        legal on the target machine. */
>> +      mem_ref = create_mem_ref (gsi, TREE_TYPE (*expr), &aff, 
>> +                                reference_alias_ptr_type (*expr),
>> +                               NULL_TREE, NULL_TREE, speed);
>>     
>
> Doing the above will un-CSE a lot of dependent computations.  I think
> you want to limit the SSA walking of tree_to_aff_combination_expand
> or invent some mechanism to re-use existing values from the IL.
>
> That said - the above needs lots of work ;)  Like it misses a
> cost function - we should try to re-use base or index registers
> across memory accesses, like IVOPTs does in loops.  It probably
> is enough to collect candidates on a per-BB basis for this.
>
>   
I completely agree.  As noted, I saw some poor results involving extra 
multiplies, and realized I have some work cut out here.  I was initially 
trying not to redesign the wheel where possible, but the existing 
machinery will need more refinement.  I will consult with you as I get 
deeper into that.
>> +      /* Ensure the memory reference was not built with a base expression
>> +        of zero, which confuses the may-aliasing in subsequent dead
>> +        store elimination.  Assume this isn't a useful replacement when
>> +        this occurs. */
>> +      if (integer_zerop (TREE_OPERAND (mem_ref, 0)))
>> +       return false;
>>     
>
> Huh, that would be a bug in create_mem_ref I suppose.  Or in alias 
> analysis.
>
>   
I saw this several times, IIRC usually when processing existing 
mem_refs.  I threw this catch-all in so I could make progress.  I will 
dig into this further and let you know what I'm seeing.
>> +      /* Finish the replacement. */
>> +      copy_ref_info (mem_ref, *expr);
>> +      *expr = mem_ref;
>> +      return true;
>> +    }
>> +  return false;
>> +}
>> +
>> +/* Expose target addressing modes in STMT, which must be an assignment. */
>> +
>> +static void
>> +tree_ssa_addr_mode_stmt (gimple stmt, gimple_stmt_iterator *gsi, bool speed)
>> +{
>> +  tree *lhs, *rhs1, *rhs2, *rhs3;
>> +  enum tree_code subcode = gimple_assign_rhs_code (stmt);
>> +  enum gimple_rhs_class rhs_class = get_gimple_rhs_class (subcode);
>> +  bool changed;
>> +
>> +  /* Bitfields are being ignored for now, as they are in induction
>> +     variable optimization. */
>> +  if (subcode == BIT_FIELD_REF)
>> +    return;
>>     
>
> You can have BIT_FIELD_REFs on the lhs as well.
>
> Note that COMPONENT_REFs with a DECL_BIT_FIELD FIELD_DECL are also
> bit-field accesses.
>
> It's probably easier to move the checks into tree_ssa_addr_mode_tree
> and operate on TREE_CODE of the reference.
>
>   
OK, thanks!
>> +  lhs = gimple_assign_lhs_ptr (stmt);
>> +  changed = tree_ssa_addr_mode_tree (lhs, gsi, speed);
>> +
>> +  rhs1 = gimple_assign_rhs1_ptr (stmt);
>> +  changed = tree_ssa_addr_mode_tree (rhs1, gsi, speed) || changed;
>>     
>
> No need for all the code below - there are never multiple memory
> accesses (you are interested in) in a single statement.
>
>   
OK.  I thought as much, but wasn't certain whether it was a rule I could 
rely on as future ops are added.  I'll remove the extra logic.
>> +  if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS)
>> +    {
>> +      rhs2 = gimple_assign_rhs2_ptr (stmt);
>> +      changed = tree_ssa_addr_mode_tree (rhs2, gsi, speed) || changed;
>> +    }
>> +
>> +  if (rhs_class == GIMPLE_TERNARY_RHS)
>> +    {
>> +      rhs3 = gimple_assign_rhs3_ptr (stmt);
>> +      changed = tree_ssa_addr_mode_tree (rhs3, gsi, speed) || changed;
>> +    }
>> +
>> +  if (changed)
>> +    update_stmt (stmt);
>> +}
>> +
>> +/* Process the statements in block BB. */
>> +
>> +static void
>> +tree_ssa_addr_mode_block (basic_block bb)
>> +{
>> +  gimple_stmt_iterator gsi;
>> +
>> +  bool speed = optimize_bb_for_speed_p (bb);
>> +
>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +    {
>> +      gimple stmt = gsi_stmt (gsi);
>> +
>> +      /* #### Avoiding volatile memory refs like tree-ssa-loop-ivopts.c
>> +         does, but need to revisit whether that's necessary. */
>>     
>
> Do you want to iterate over only memory accesses or also over
> address computations?  I suppose you are only interested in
> scalar memory accesses (memory accesses with some non-BLKmode).
> In that case you get all relevant stms with
>
>          if (gimple_vuse (stmt)
>              && gimple_assign_single_p (stmt))
>
>   
OK, good idea.  Any thoughts on volatile?  It seems like lowering those 
should be harmless, unless the flagging as volatile is somehow lost by 
the lowering.
>> +      if (is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
>> +       {
>> +         tree_ssa_addr_mode_stmt (stmt, &gsi, speed);
>> +       }
>> +    }
>> +}
>>     
>
> I think you got the easy pieces done - now the hard part of
> creating target dependent addressing that actually helps is missing.
> And to not do too much un-CSE that will never be cleaned up afterwards.
>
> Richard.
>
>   
/nod.  Thanks very much!  I appreciate the help and will work on 
refining this.
>> +/* Main entry point for exposing target addressing modes not caught
>> +   by tree-ssa-loop-ivopts.c. */
>> +
>> +static unsigned int
>> +tree_ssa_addr_mode (void)
>> +{
>> +  basic_block bb;
>> +  bool chicken_switch = false;
>> +
>> +  /* #### Temporary debug aid; set to true to disable this phase in debugger. */
>> +  if (chicken_switch)
>> +    return 0;
>> +
>> +  FOR_EACH_BB (bb)
>> +    tree_ssa_addr_mode_block (bb);
>> +
>> +  return 0;
>> +}
>> +
>> +struct gimple_opt_pass pass_tree_addr_mode =
>> +{
>> + {
>> +  GIMPLE_PASS,
>> +  "addrmode",                          /* name */
>> +  NULL,                                        /* gate */
>> +  tree_ssa_addr_mode,                  /* execute */
>> +  NULL,                                        /* sub */
>> +  NULL,                                        /* next */
>> +  0,                                   /* static_pass_number */
>> +  TV_TREE_ADDR_MODE,                   /* tv_id */
>> +  PROP_cfg,                            /* properties_required */
>> +  0,                                   /* properties_provided */
>> +  0,                                   /* properties_destroyed */
>> +  0,                                   /* todo_flags_start */
>> +  /* #### Probably can remove TODO_rebuild_alias. */
>> +  TODO_dump_func | TODO_update_ssa |
>> +    TODO_rebuild_alias                         /* todo_flags_finish */
>> + }
>> +};
>> Index: gcc/tree-flow.h
>> ===================================================================
>> --- gcc/tree-flow.h     (revision 169834)
>> +++ gcc/tree-flow.h     (working copy)
>> @@ -808,6 +808,7 @@
>>                                       addr_space_t);
>>  unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool);
>>  bool may_be_nonaddressable_p (tree expr);
>> +bool may_be_unaligned_p (tree ref, tree step);
>>  
>>  /* In tree-ssa-threadupdate.c.  */
>>  extern bool thread_through_all_blocks (bool);
>> Index: gcc/Makefile.in
>> ===================================================================
>> --- gcc/Makefile.in     (revision 169834)
>> +++ gcc/Makefile.in     (working copy)
>> @@ -1389,6 +1389,7 @@
>>         tree-sra.o \
>>         tree-switch-conversion.o \
>>         tree-ssa-address.o \
>> +       tree-ssa-addr-mode.o \
>>         tree-ssa-alias.o \
>>         tree-ssa-ccp.o \
>>         tree-ssa-coalesce.o \
>> @@ -2556,6 +2557,9 @@
>>     $(TREE_PASS_H) $(FLAGS_H) $(TREE_INLINE_H) $(RECOG_H) insn-config.h \
>>     $(EXPR_H) gt-tree-ssa-address.h $(GGC_H) tree-affine.h $(TARGET_H) \
>>     tree-pretty-print.h
>> +tree-ssa-addr-mode.o : tree-ssa-addr-mode.c $(CONFIG_H) $(SYSTEM_H) \
>> +   coretypes.h $(TREE_H) $(BASIC_BLOCK_H) $(TREE_DUMP_H) $(TREE_PASS_H) \
>> +   $(TIMEVAR_H) $(CFGLOOP_H) $(GIMPLE_H) $(TREE_FLOW_H) $(TREE_AFFINE_H)
>>  tree-ssa-loop-niter.o : tree-ssa-loop-niter.c $(TREE_FLOW_H) $(CONFIG_H) \
>>     $(SYSTEM_H) $(TREE_H) $(TM_P_H) $(CFGLOOP_H) $(PARAMS_H) \
>>     $(TREE_INLINE_H) output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
>> Index: gcc/passes.c
>> ===================================================================
>> --- gcc/passes.c        (revision 169834)
>> +++ gcc/passes.c        (working copy)
>> @@ -944,6 +944,10 @@
>>        NEXT_PASS (pass_rename_ssa_copies);
>>        NEXT_PASS (pass_uncprop);
>>        NEXT_PASS (pass_local_pure_const);
>> +      /* Note: pass_tree_addr_mode should be as close to pass_expand as
>> +        possible.  Particularly must be after DCE since DCE has
>> +         trouble seeing through some of the generated mem_refs. */
>> +      NEXT_PASS (pass_tree_addr_mode);
>>      }
>>    NEXT_PASS (pass_lower_complex_O0);
>>    NEXT_PASS (pass_cleanup_eh);
>>
>>
>>
>>     
>
>   



More information about the Gcc-patches mailing list