This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH: PR/40314] extract common address for memory access to fields of large struct


On Mon, Jun 1, 2009 at 4:13 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sun, May 31, 2009 at 11:12 AM, Carrot Wei <carrot@google.com> wrote:
>> ChangeLog:
>> 2009-05-31 ?Carrot Wei ?<carrot@google.com>
>>
>> ? ? ? ?PR/40314
>> ? ? ? ?* config/arm/arm.c: add functions arm_address_offset_range,
>> ? ? ? ?thumb1_address_offset_range, thumb2_address_offset_range,
>> ? ? ? ?arm_address_offset_range_1.
>> ? ? ? ?* config/arm/arm-protos.h: add prototype of arm_address_offset_range.
>> ? ? ? ?* doc/tm.texi: add entry for TARGET_ADDRESS_OFFSET_RANGE.
>> ? ? ? ?* Makefile.in: add rule for extract_common_addr.
>> ? ? ? ?* target.h: add a hook function get_address_offset_range.
>> ? ? ? ?* target-def.h: add a hook function TARGET_ADDRESS_OFFSET_RANGE.
>> ? ? ? ?* tree-pass.h: add a pass_extract_common_addr.
>> ? ? ? ?* passes.c: add a pass_extract_common_addr.
>> ? ? ? ?* extract-common-addr.c: implementation of the new
>> ? ? ? ?pass_extract_common_addr.
>>
>> Test:
>> This patch was applied to trunk GCC and tested on the arm emulator with newlib.
>> No new failures.
>
> And what does it do for code size? ?Compile time? ?Do you know CSiBE?
> It's a nice benchmark to check those two things, too.
>
I've tested CSiBE, nearly no changes to code size and compile time. It
seems there is no large structure in CSiBE to trigger this
optimization. For mcf from CPU SPEC 2006, this optimization can reduce
about 7% static instructions.

>
> And then some random comments:
>
> First, it would be nice to have some general description at the top of
> the file of how the algorithm works.
>
> It looks to me like you're collecting base+offset per basic block (so
> the optimization is local) and then process the recorded information
> globally. ?Why not record-and-process one basic block at a time?
>
It may lost optimization opportunities by record-and-process one basic
block at a time. In following example two accesses of structure fields
are in different basic blocks, they won't be optimized if we limit the
range to basic block. But we can find it if we process it globally.

if (p->field1)
  return p->field2;

> Perhaps you can add some comments in process_one_base in the part with
> the fuzzy math? ?;-) ?I got all confused by how you calculate the new
> base+offset and alignment.
>
done
>
>> +/* Given a register rtx, look up the offset_table. If we found it and they
>> + ? have same machine mode returns the pointer to the offset information.
>> + ? Otherwise returns NULL. ?*/
>> +
>> +static inline struct offset_info *
>> +find_reg (rtx x)
>
> A more descriptive function name, perhaps? ?Everyone else also calls
> their functions find_reg already... :-)
>
done
>
>> +/* If x is MEM, and the address is in the form of
>> + ? (plus base_reg offset_reg)
>> + ? remove the corresponding item from offset table and
>> + ? insert it into a base_reg_info object. */
>
> Describe the return value of the function too, please. ?Just saying
> it's called via for_each_rtx is enough.
> And again, a more descriptive function name, perhaps? ?Explain what is
> passed in DATA.
>
done
>
>> +static int
>> +find_mem (rtx x, void *data)
>> +{
> (...)
>> + ?/* If the offset is not large enough, ignore it. ?*/
>> + ?targetm.get_address_offset_range (GET_MODE (x), reg1,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?&min_offset, &max_offset, &alignment);
>> + ?if (offset_item->offset >= min_offset
>> + ? ? ?&& offset_item->offset <= max_offset)
>> + ? ?return -1;
>> +
>> + ?/* If the base register and offset register don't have the same
>> + ? ? machine mode, ignore this offset. ?*/
>> + ?if (GET_MODE (reg1) != offset_item->mode)
>> + ? ?return -1;
>
> Move this second check before the targetm.get_address_offset_range check?
>
done
>
>> + ?/* Look up the corresponding base register information.
>> + ? ? If there is no one, create it. ?*/
>> + ?base_reg_item = &base_table[REGNO (reg1)];
>> + ?if (base_reg_item->offsets == NULL)
>> + ? ?{
>> + ? ? ?base_reg_item->mode = GET_MODE (reg1);
>> + ? ? ?base_reg_item->base_regno = REGNO (reg1);
>> + ? ? ?base_reg_item->base_reg = reg1;
>> + ? ?}
>> +
>> + ?/* Associate this offset information with the base information. ?*/
>> + ?offset_item->mem = x;
>> + ?offset_item->mem_insn = (rtx) data;
>> + ?offset_item->next = base_reg_item->offsets;
>> + ?base_reg_item->offsets = offset_item;
>
> Why collect them in a list here when you later need the offsets in an array?
>
>
>> +/* If a reg is clobbered in insn, remove it from offset table. ?*/
>> +
>> +static void
>> +clobber_insn_reg (rtx insn)
>> +{
>
> Can write this and clobber_reg by walking over DF_INSN_DEFS looking
> for refs with DF_REF_MUST_CLOBBER, and with DF_REF_REAL_REG.
>
done
>
>> +/* If this insn is in the form of
>> + ? (set reg offset)
>> + ? insert this offset info into offset table. ?*/
>> +
>> +static void
>> +find_offset (rtx insn)
>> +{
>> + ?rtx src, dest;
>> + ?unsigned int regno;
>> + ?HOST_WIDE_INT offset_val;
>> + ?struct offset_info *offset;
>> + ?rtx x = single_set (insn);
>> + ?if (!x)
>> + ? ?return;
>> +
>> + ?dest = SET_DEST (x);
>> + ?if (!REG_P (dest))
>> + ? ?return;
>> + ?regno = REGNO (dest);
>> + ?if (regno < FIRST_PSEUDO_REGISTER)
>> + ? ?return;
>
> if (HARD_REGISTER_NUM_P (regno))
> ?return;
>
done

> Same with other FIRST_PSEUDO_REGISTER. ?You should use
> HARD_REGISTER_NUM_P and HARD_REGISTER_P instead of comparing against
> FIRST_PSEUDO_REGISTER.
>
>
>> + ?src = SET_SRC (x);
>> + ?if (GET_CODE (src) != CONST_INT)
>> + ? ?return;
>> + ?offset_val = INTVAL (src);
>> + ?if (offset_val <= 0)
>> + ? ?return;
>> +
>> + ?offset = (struct offset_info *) pool_alloc (offset_element_pool);
>> + ?offset->mem = NULL;
>> + ?offset->offset_regno = regno;
>> + ?offset->mode = GET_MODE (dest);
>> + ?offset->offset = offset_val;
>> + ?offset->next = NULL;
>> + ?offset_table[regno] = offset;
>
> Maybe pool_free the existing offset_table[regno] entry? ?Or are there
> still base_reg_items that can point to that entry? ?Or are you sure
> that offset_table[regno] is always NULL at this point? ?If the latter,
> maybe gcc_assert that?
>
Function clobber_insn_reg will clear offset_table[regno] first, so
offset_table[regno] shoule always be NULL at this point. Added
gcc_assert.
>
>> +/* Look for optimization opportunities. ?*/
>> +
>> +static void
>> +collect_candidates (void)
>> +{
>> + ?basic_block block;
>> +
>> + ?FOR_EACH_BB (block)
>> + ? ?{
>> + ? ? ?rtx insn;
>> + ? ? ?memset (offset_table, 0, sizeof (struct offset_info *) * max_reg_num ());
>
> You may want to experiment with something to track the registers for
> which you have recorded something in the offset_table. ?I think the
> number of offsets you find is generally very small compared to
> max_reg_num(). ?It may speed things up a bit if you keep a bitmap of
> registers that you recorded something for and only clean those
> entries, instead of using memset.
>
Added bitmap to track the non-emtpy entry and speed up resetting.
>
>> +}
>> +
>> +/* Optimize a list of memory addressing relative to one base. ?*/
>> +static void
>> +process_one_base (struct base_reg_info *base_reg)
>> +{
>
> Describe the function arguments please, always.
>
done

>> + ?HOST_WIDE_INT max_offset, min_offset;
>> + ?HOST_WIDE_INT base_offset, max_base;
>> + ?int alignment, new_alignment;
>> +
>> + ?struct offset_info **offset_array;
>> + ?struct offset_info *offset = base_reg->offsets;
>> + ?int count = 0;
>> + ?int i, group_count;
>> +
>> + ?/* Count the number of memory accesses using the same base register. */
>> + ?while (offset)
>> + ? ?{
>> + ? ? ?count++;
>> + ? ? ?offset = offset->next;
>> + ? ?}
>> + ?if (count < 2)
>> + ? ?return;
>> +
>> + ?/* Sort the memory accesses according to their offset. ?*/
>> + ?offset_array = XNEWVEC (struct offset_info *, count);
>> + ?offset = base_reg->offsets;
>> + ?for (i = 0; i < count; i++)
>> + ? ?{
>> + ? ? ?offset_array[i] = offset;
>> + ? ? ?offset = offset->next;
>> + ? ?}
>> + ?qsort (offset_array, count, sizeof(struct offset_info *), compare_offset);
>
> So you have recorded the offsets in the list base_reg->offsets. ?But
> you want to know the number of offsets of a base reg and you want to
> sort the list. ?So you walk the list to count and the stuff the list
> into an array to sort it.
>
> Why not record the offsets into an array straight away, then? ?Looks
> like you could better use a VEC instead, e.g.:
>
> typedef struct offset_info *offset_info_t
> DEF_VEC_P(offset_info_t);
> DEF_VEC_ALLOC_P(offset_info_t, heap);
>
> /* Memory base address information. ?*/
> struct base_reg_info
> {
> ?/* All offset information used with this base register. ?*/
> ?VEC(offset_info_t, heap) *offsets;
> (...)
> };
>
> and use VEC_safe_push to record the offset_info entries. ?You can use
> VEC_length to get the number of memory accesses using the same base
> register. ?You can qsort a VEC in place, grep for "qsort (VEC_address"
> for examples.
>
This is a really good method. Thank you.
>
>> +static unsigned int
>> +rest_of_handle_extract_common_addr (void)
>> +{
>> + ?int i;
>> + ?int reg_count = max_reg_num();
>> + ?offset_table = XNEWVEC (struct offset_info *, reg_count);
>> + ?base_table = XNEWVEC (struct base_reg_info, reg_count);
>> + ?memset (offset_table, 0, sizeof (struct offset_info *) * reg_count);
>> + ?memset (base_table, 0, sizeof (struct base_reg_info) * reg_count);
>
> XCNEWVEC instead of XNEWVEC+memset.
>
done
>
>> +struct rtl_opt_pass pass_extract_common_addr =
>> +{
>> + {
>> + ?RTL_PASS,
>> + ?"extract_common_addr", ? ? ? ? ? ? ? ?/* name */
>> + ?gate_handle_common_addr, ? ? ? ? ? ? ?/* gate */
>> + ?rest_of_handle_extract_common_addr, ? /* execute */
>> + ?NULL, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* sub */
>> + ?NULL, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* next */
>> + ?0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* static_pass_number */
>> + ?TV_NONE, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tv_id */
>
> New pass needs a new timevar.
>
>
> Ciao!
> Steven
>

Following is my modified patch.

thanks
Carrot


ChangeLog:
2009-05-31  Carrot Wei  <carrot@google.com>

        PR/40314
        * config/arm/arm.c: add functions arm_address_offset_range,
        thumb1_address_offset_range, thumb2_address_offset_range,
        arm_address_offset_range_1.
        * config/arm/arm-protos.h: add prototype of arm_address_offset_range.
        * doc/tm.texi: add entry for TARGET_ADDRESS_OFFSET_RANGE.
        * Makefile.in: add rule for extract_common_addr.
        * target.h: add a hook function get_address_offset_range.
        * target-def.h: add a hook function TARGET_ADDRESS_OFFSET_RANGE.
        * tree-pass.h: add a pass_extract_common_addr.
        * passes.c: add a pass_extract_common_addr.
        * extract-common-addr.c: implementation of the new
        pass_extract_common_addr.
        * timevar.def: add a new tv_id TV_EXTRACT_COMMON_ADDR.


Index: timevar.def
===================================================================
--- timevar.def (revision 148009)
+++ timevar.def (working copy)
@@ -150,6 +150,7 @@ DEFTIMEVAR (TV_VARCONST              , "
 DEFTIMEVAR (TV_LOWER_SUBREG         , "lower subreg")
 DEFTIMEVAR (TV_JUMP                  , "jump")
 DEFTIMEVAR (TV_FWPROP                , "forward prop")
+DEFTIMEVAR (TV_EXTRACT_COMMON_ADDR   , "extract common address")
 DEFTIMEVAR (TV_CSE                   , "CSE")
 DEFTIMEVAR (TV_DCE                   , "dead code elimination")
 DEFTIMEVAR (TV_DSE1                  , "dead store elim1")

Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 148009)
+++ tree-pass.h (working copy)
@@ -433,6 +433,7 @@ extern struct rtl_opt_pass pass_rtl_fwpr
 extern struct rtl_opt_pass pass_rtl_fwprop_addr;
 extern struct rtl_opt_pass pass_jump2;
 extern struct rtl_opt_pass pass_lower_subreg;
+extern struct rtl_opt_pass pass_extract_common_addr;
 extern struct rtl_opt_pass pass_cse;
 extern struct rtl_opt_pass pass_fast_rtl_dce;
 extern struct rtl_opt_pass pass_ud_rtl_dce;

Index: target.h
===================================================================
--- target.h    (revision 148009)
+++ target.h    (working copy)
@@ -619,6 +619,10 @@ struct gcc_target
   /* Given an address RTX, say whether it is valid.  */
   bool (* legitimate_address_p) (enum machine_mode, rtx, bool);

+  /* Given a base reg, return the min and max address offsets.  */
+  void (* get_address_offset_range)(enum machine_mode, const_rtx,
+                                    HOST_WIDE_INT*, HOST_WIDE_INT*, int*);
+
   /* True if the given constant can be put into an object_block.  */
   bool (* use_blocks_for_constant_p) (enum machine_mode, const_rtx);

Index: target-def.h
===================================================================
--- target-def.h        (revision 148009)
+++ target-def.h        (working copy)
@@ -490,6 +490,7 @@
 #define TARGET_LEGITIMIZE_ADDRESS default_legitimize_address
 #define TARGET_DELEGITIMIZE_ADDRESS hook_rtx_rtx_identity
 #define TARGET_LEGITIMATE_ADDRESS_P default_legitimate_address_p
+#define TARGET_ADDRESS_OFFSET_RANGE NULL
 #define TARGET_USE_BLOCKS_FOR_CONSTANT_P hook_bool_mode_const_rtx_false
 #define TARGET_MIN_ANCHOR_OFFSET 0
 #define TARGET_MAX_ANCHOR_OFFSET 0
@@ -879,6 +880,7 @@
   TARGET_LEGITIMIZE_ADDRESS,                   \
   TARGET_DELEGITIMIZE_ADDRESS,                 \
   TARGET_LEGITIMATE_ADDRESS_P,                 \
+  TARGET_ADDRESS_OFFSET_RANGE,                  \
   TARGET_USE_BLOCKS_FOR_CONSTANT_P,            \
   TARGET_MIN_ANCHOR_OFFSET,                    \
   TARGET_MAX_ANCHOR_OFFSET,                    \

Index: passes.c
===================================================================
--- passes.c    (revision 148009)
+++ passes.c    (working copy)
@@ -725,6 +725,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_jump2);
       NEXT_PASS (pass_lower_subreg);
       NEXT_PASS (pass_df_initialize_opt);
+      NEXT_PASS (pass_extract_common_addr);
       NEXT_PASS (pass_cse);
       NEXT_PASS (pass_rtl_fwprop);
       NEXT_PASS (pass_rtl_cprop);

Index: Makefile.in
===================================================================
--- Makefile.in (revision 148009)
+++ Makefile.in (working copy)
@@ -1129,6 +1129,7 @@ OBJS-common = \
        explow.o \
        expmed.o \
        expr.o \
+       extract-common-addr.o \
        final.o \
        fixed-value.o \
        fold-const.o \
@@ -2704,6 +2705,9 @@ ipa-struct-reorg.o: ipa-struct-reorg.c i
    $(TREE_PASS_H) opts.h $(IPA_TYPE_ESCAPE_H) $(TREE_DUMP_H) \
    $(GIMPLE_H)

+extract-common-addr.o : extract-common-addr.c $(CONFIG_H) $(SYSTEM_H) \
+   coretypes.h $(BITMAP_H) $(TM_H) $(RTL_H) $(FLAGS_H) insn-config.h $(DF_H) \
+   $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) alloc-pool.h $(TARGET_H)
$(TREE_PASS_H)
 coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
    $(FUNCTION_H) $(TOPLEV_H) $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \

Index: tm.texi
===================================================================
--- tm.texi     (revision 148009)
+++ tm.texi     (working copy)
@@ -5564,6 +5564,11 @@ holding the constant.  This restriction
 of TLS symbols for various targets.
 @end deftypefn

+@deftypefn {Target Hook} void TARGET_ADDRESS_OFFSET_RANGE (enum
machine_mode @var{mode}, const_rtx @var{x}, HOST_WIDE_INT
*@var{min_offset}, HOST_WIDE_INT *@var{max_offset}, int
*@var{alignment})
+Given a memory access @var{mode} and base register @var{x}, this hook should
+return the offset range and alignment in a legal (base + offset) addressing.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (enum
machine_mode @var{mode}, rtx @var{x})
 This hook should return true if pool entries for constant @var{x} can
 be placed in an @code{object_block} structure.  @var{mode} is the mode

Index: arm-protos.h
===================================================================
--- arm-protos.h        (revision 148009)
+++ arm-protos.h        (working copy)
@@ -58,6 +58,8 @@ extern int arm_legitimate_address_outer_
 extern int thumb_legitimate_offset_p (enum machine_mode, HOST_WIDE_INT);
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
                                            int);
+extern void arm_address_offset_range (enum machine_mode, const_rtx,
+                                      HOST_WIDE_INT *, HOST_WIDE_INT *, int *);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);

Index: arm.c
===================================================================
--- arm.c       (revision 148009)
+++ arm.c       (working copy)
@@ -407,6 +407,9 @@ static bool arm_allocate_stack_slots_for
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P    arm_legitimate_address_p

+#undef  TARGET_ADDRESS_OFFSET_RANGE
+#define TARGET_ADDRESS_OFFSET_RANGE arm_address_offset_range
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 /* Obstack for minipool constant handling.  */
@@ -4476,6 +4479,180 @@ arm_legitimate_address_p (enum machine_m
     return thumb1_legitimate_address_p (mode, x, strict_p);
 }

+/* In arm mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+arm_address_offset_range_1 (enum machine_mode mode, const_rtx base_reg,
+                            HOST_WIDE_INT *min_offset,
+                            HOST_WIDE_INT *max_offset,
+                            int *alignment)
+{
+  if (TARGET_HARD_FLOAT && (TARGET_FPA || TARGET_MAVERICK)
+      && (GET_MODE_CLASS (mode) == MODE_FLOAT
+          || (TARGET_MAVERICK && mode == DImode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1015;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (mode == DImode || mode == DFmode)
+    {
+      if (TARGET_LDRD)
+        {
+          *min_offset = -255;
+          *max_offset = 255;
+        }
+      else
+        {
+          *min_offset = -4095;
+          *max_offset = 4091;
+        }
+      *alignment = 1;
+      return;
+    }
+
+  if (arm_arch4)
+    {
+      if (mode == HImode || mode == QImode)
+        *max_offset = 255;
+      else
+        *max_offset = 4095;
+    }
+  else
+    *max_offset = (mode == HImode) ? 4094 : 4095;
+  *min_offset = - (*max_offset);
+  *alignment = 1;
+}
+
+/* In thumb2 mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+thumb2_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                             HOST_WIDE_INT *min_offset,
+                             HOST_WIDE_INT *max_offset,
+                             int *alignment)
+{
+  if (TARGET_HARD_FLOAT && (TARGET_FPA || TARGET_MAVERICK)
+      && (GET_MODE_CLASS (mode) == MODE_FLOAT
+          || (TARGET_MAVERICK && mode == DImode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))
+    {
+      if (!TARGET_LDRD || mode != DImode)
+        {
+          *min_offset = -1023;
+          *max_offset = 1023;
+          *alignment = 4;
+          return;
+        }
+    }
+
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1015;
+      *alignment = 4;
+      return;
+    }
+
+  if (mode == DImode || mode == DFmode)
+    {
+      *min_offset = -255;
+      *max_offset = 255;
+      *alignment = 4;
+      return;
+    }
+
+  *min_offset = -255;
+  *max_offset = 4095;
+  *alignment = 1;
+}
+
+/* In thumb1 mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+thumb1_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                             HOST_WIDE_INT *min_offset,
+                             HOST_WIDE_INT *max_offset,
+                             int *alignment)
+{
+  *min_offset = 0;
+  switch (GET_MODE_SIZE (mode))
+    {
+    case 1:
+      *max_offset = 31;
+      *alignment = 1;
+      break;
+
+    case 2:
+      *max_offset = 63;
+      *alignment = 2;
+      break;
+
+    default:
+      *max_offset = 127 - GET_MODE_SIZE (mode);
+      *alignment = 4;
+    }
+}
+
+/* Returns the min and max address offset relative to the specified
+   base register.  */
+
+void
+arm_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                          HOST_WIDE_INT *min_offset,
+                          HOST_WIDE_INT *max_offset,
+                          int *alignment)
+{
+  HOST_WIDE_INT mask;
+
+  /* Currently only deal with pseudo base register.  */
+  gcc_assert (!base_reg || !HARD_REGISTER_P (base_reg));
+
+  if (TARGET_ARM)
+    arm_address_offset_range_1 (mode, base_reg,
+                                min_offset, max_offset, alignment);
+  else if (TARGET_THUMB2)
+    thumb2_address_offset_range (mode, base_reg,
+                                 min_offset, max_offset, alignment);
+  else /* (TARGET_THUMB1) */
+    thumb1_address_offset_range (mode, base_reg,
+                                 min_offset, max_offset, alignment);
+
+  mask = ~(*alignment - 1);
+  *max_offset = *max_offset & mask;
+  *min_offset = (*min_offset + *alignment - 1) & mask;
+}
+
 /* Build the SYMBOL_REF for __tls_get_addr.  */

 static GTY(()) rtx tls_get_addr_libfunc;

Index: extract-common-addr.c
===================================================================
--- extract-common-addr.c       (revision 0)
+++ extract-common-addr.c       (revision 0)
@@ -0,0 +1,616 @@
+/* Extract common address calculation for accesses to
+   fields of large struct.
+   Copyright (C) 2009
+   Free Software Foundation, Inc.
+   Contributed by Carrot Wei <carrot@google.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This file contains optimization for complex address calculation for
+   large structures.
+
+   Suppose we have a struct defined as:
+   typedef struct large_struct
+   {
+     char dummy[400];
+     int  field1;
+     int  field2;
+   } large_struct_t;
+
+   Now we want to access individual fields through a pointer:
+
+   large_struct_t *p = ...;
+   ... = p->field1;
+   ... = p->field2;
+
+   If the load instruction of the target architecture can't support such
+   a large offset, gcc usually generates following insns to access struct
+   fields:
+
+   (set r200 400)                     # 400 is offset of field1
+   (set r201 (mem (plus r100 r200)))  # r100 contains pointer p
+   ...
+   (set r300 404)                     # 404 is offset of field2
+   (set r301 (mem (plus r100 r300)))  # r100 contains pointer p
+
+   Sometimes the large constant(offset) loading needs more than one
+   instruction, it makes things worse.
+
+   One possible optimization for multiple accesses is to first compute a new
+   base that is nearer to the fields that we want to access, then use the
+   normal (base + offset) addressing mode to access them. So ideally the
+   previous insns should be transformed to:
+
+   (set r101 (plus r100 400))
+   (set r201 (mem (plus r101 0)))
+   ...
+   (set r301 (mem (plus r101 4)))
+
+   The actual transformation done by this pass is:
+
+   (set r200 400)                     # keep the original insn
+   (set r250 (plus r100 400))         # r250 is new base
+   (set r201 (mem (plus r250 0)))
+   ...
+   (set r300 404)
+   (set r251 (plus r100 400))         # r251 contains same value as r250
+   (set r301 (mem (plus r251 4)))
+
+   And let dce and cse do the rest.
+
+   The implementation can be divided into 2 parts:
+
+   1. Collect instruction sequences like
+
+   (set r200 400)
+   (set r201 (mem (plus r100 r200)))
+
+   For simplicity the corresponding two insns must reside in the same basic
+   block. For example, the instruction sequence for p->feild1 style accesses
+   should always be in the same basic block.
+
+   2. For each base register, split the corresponding memory accesses into
+   groups. Create a new base value for each group and modify the memory
+   accesses to use the new base value and offset.
+*/
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "basic-block.h"
+#include "alloc-pool.h"
+#include "flags.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "expr.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "vec.h"
+#include "df.h"
+#include "timevar.h"
+
+/* Memory offset information.  */
+struct offset_info
+{
+  rtx mem_insn;              /* The insn does memory access.  */
+  rtx mem;                   /* The mem rtx which uses this offset.  */
+  unsigned int offset_regno; /* The regno of the dest of the set.  */
+  enum machine_mode mode;    /* The dest machine mode.  */
+  HOST_WIDE_INT offset;      /* The offset constant.  */
+};
+
+typedef struct offset_info *offset_info_t;
+DEF_VEC_P(offset_info_t);
+DEF_VEC_ALLOC_P(offset_info_t, heap);
+
+/* Memory base address information.  */
+struct base_reg_info
+{
+  /* All offset information used with this base register.  */
+  VEC(offset_info_t, heap) *offsets;
+  unsigned int base_regno;     /* Base register number.  */
+  rtx base_reg;                /* Base register rtx.  */
+  enum machine_mode mode;      /* Base register machine mode.  */
+};
+
+/* Offset table, indexed by the offset register number.  */
+static struct offset_info **offset_table;
+
+/* Base register table, indexed by the base register number.  */
+static struct base_reg_info *base_table;
+
+/* Memory pool for offset_info elements.  */
+static alloc_pool offset_element_pool;
+
+/* Bitmap of offset_table, used to speed up resetting offset_table.  */
+static bitmap offset_bitmap;
+
+/* Given a register rtx, look up the offset_table. If we found it and they
+   have same machine mode returns the pointer to the offset information.
+   Otherwise returns NULL.  */
+
+static inline struct offset_info *
+find_offset_reg (rtx x)
+{
+  struct offset_info *offset_item;
+
+  gcc_assert (REG_P (x));
+  offset_item = offset_table[REGNO (x)];
+  if (offset_item == NULL)
+    return NULL;
+  if (offset_item->mode != GET_MODE (x))
+    return NULL;
+  return offset_item;
+}
+
+/* If x is MEM, and the address is in the form of
+   (plus base_reg offset_reg)
+   remove the corresponding item from offset table and
+   insert it into a base_reg_info object.
+   This function is called via for_each_rtx.
+   data is the insn containing x.  */
+
+static int
+find_mem_with_offset_reg (rtx *x, void *data)
+{
+  HOST_WIDE_INT min_offset, max_offset;
+  int alignment;
+  struct base_reg_info *base_reg_item;
+  struct offset_info *offset_item;
+  rtx reg1, reg2, address;
+
+  if (!MEM_P (*x))
+    return 0;
+
+  /* Look for (plus reg1 reg2) pattern.  */
+  address = XEXP (*x, 0);
+  if (GET_CODE (address) != PLUS)
+    return 0;
+  reg1 = XEXP (address, 0);
+  reg2 = XEXP (address, 1);
+  if (!REG_P (reg1) || !REG_P (reg2))
+    return 0;
+
+  /* Look for addressing offset. If necessary swap reg1 and reg2
+     so that reg1 always contains base address and reg2 always
+     contains offset.  */
+  offset_item = find_offset_reg (reg2);
+  if (offset_item == NULL)
+    {
+      rtx t = reg1;
+      reg1 = reg2;
+      reg2 = t;
+      offset_item = find_offset_reg (reg2);
+      if (offset_item == NULL)
+        return -1;
+    }
+
+  /* Currently only deals with pseudo base register. Other register
+     classes are also possible but need more complicated
+     implementation of targetm.get_address_offset_range.  */
+  if (HARD_REGISTER_P (reg1))
+    return -1;
+
+  /* If the base register and offset register don't have the same
+     machine mode, ignore this offset.  */
+  if (GET_MODE (reg1) != offset_item->mode)
+    return -1;
+
+  /* If the offset is not large enough, ignore it.  */
+  targetm.get_address_offset_range (GET_MODE (*x), reg1,
+                                    &min_offset, &max_offset, &alignment);
+  if (offset_item->offset >= min_offset
+      && offset_item->offset <= max_offset)
+    return -1;
+
+  /* Look up the corresponding base register information.
+     If there is not one, create it.  */
+  base_reg_item = &base_table[REGNO (reg1)];
+  if (base_reg_item->offsets == NULL)
+    {
+      base_reg_item->mode = GET_MODE (reg1);
+      base_reg_item->base_regno = REGNO (reg1);
+      base_reg_item->base_reg = reg1;
+      base_reg_item->offsets = VEC_alloc (offset_info_t, heap, 10);
+    }
+
+  /* Associate this offset information with the base information.  */
+  offset_item->mem = *x;
+  offset_item->mem_insn = (rtx) data;
+  VEC_safe_push (offset_info_t, heap, base_reg_item->offsets, offset_item);
+
+  /* Remove the offset information from offset table, since we don't
+     expect it will be used more than once.  */
+  offset_table[offset_item->offset_regno] = NULL;
+  bitmap_clear_bit (offset_bitmap, offset_item->offset_regno);
+  return -1;
+}
+
+/* If a reg is clobbered in insn, remove it from offset table.  */
+
+static void
+clobber_insn_reg (rtx insn)
+{
+  df_ref *def_rec;
+
+  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+    {
+      df_ref def = *def_rec;
+      rtx reg = DF_REF_REAL_REG (def);
+      unsigned int regno = REGNO (reg);
+      if (offset_table[regno] != NULL)
+        {
+          pool_free (offset_element_pool, offset_table[regno]);
+          offset_table[regno] = NULL;
+          bitmap_clear_bit (offset_bitmap, regno);
+        }
+    }
+}
+
+/* If this insn is in the form of
+   (set reg offset)
+   insert this offset info into offset table.  */
+
+static void
+find_offset (rtx insn)
+{
+  rtx src, dest;
+  unsigned int regno;
+  HOST_WIDE_INT offset_val;
+  struct offset_info *offset;
+  rtx x = single_set (insn);
+  if (!x)
+    return;
+
+  dest = SET_DEST (x);
+  if (!REG_P (dest))
+    return;
+  regno = REGNO (dest);
+  if (HARD_REGISTER_NUM_P (regno))
+    return;
+
+  src = SET_SRC (x);
+  if (GET_CODE (src) != CONST_INT)
+    return;
+  offset_val = INTVAL (src);
+  if (offset_val <= 0)
+    return;
+
+  offset = (struct offset_info *) pool_alloc (offset_element_pool);
+  offset->mem = NULL;
+  offset->offset_regno = regno;
+  offset->mode = GET_MODE (dest);
+  offset->offset = offset_val;
+  gcc_assert (!offset_table[regno]);
+  offset_table[regno] = offset;
+  bitmap_set_bit (offset_bitmap, regno);
+}
+
+/* Look for optimization opportunities.  */
+
+static void
+collect_candidates (void)
+{
+  basic_block block;
+
+  FOR_EACH_BB (block)
+    {
+      rtx insn;
+      bitmap_iterator bi;
+      unsigned int regno;
+
+      /* Clear offset_table.  */
+      EXECUTE_IF_SET_IN_BITMAP (offset_bitmap, 0, regno, bi)
+        {
+          pool_free (offset_element_pool, offset_table[regno]);
+          offset_table[regno] = NULL;
+          bitmap_clear_bit (offset_bitmap, regno);
+        }
+
+      FOR_BB_INSNS (block, insn)
+        {
+          if (!INSN_P (insn))
+            continue;
+
+          for_each_rtx (&PATTERN (insn), find_mem_with_offset_reg, insn);
+
+          clobber_insn_reg (insn);
+
+          find_offset (insn);
+        }
+    }
+}
+
+/* Compare function used by qsort.  */
+
+static int
+compare_offset (const void *pa, const void *pb)
+{
+  struct offset_info * const *offset1 = (struct offset_info* const *)pa;
+  struct offset_info * const *offset2 = (struct offset_info* const *)pb;
+  if ((*offset1)->offset != (*offset2)->offset)
+    return (*offset1)->offset - (*offset2)->offset;
+  return GET_MODE ((*offset1)->mem) - GET_MODE ((*offset2)->mem);
+}
+
+/* Do the actual modification to a group of memory accesses.
+   base_reg contains the original base information.
+   base_offset will be added to original base to get new base.
+   count is the number of memory accesses in the group.
+   offset_array is the array of memory accesses.  */
+
+static void
+rewrite_memory_access (struct base_reg_info *base_reg,
+                       struct offset_info **offset_array,
+                       int count,
+                       HOST_WIDE_INT base_offset)
+{
+  int i;
+  for (i = 0; i < count; i++)
+    {
+      rtx new_base_reg, new_base, new_base_insn, new_offset, x;
+      struct offset_info *offset = offset_array[i];
+
+      /* Insert an insn to compute the new base reg.  */
+      new_base_reg = gen_reg_rtx (offset->mode);
+      new_base = gen_rtx_PLUS (offset->mode,
+                          base_reg->base_reg, GEN_INT (base_offset));
+      new_base_insn = gen_move_insn (new_base_reg, new_base);
+      emit_insn_before (new_base_insn, offset->mem_insn);
+
+      /* Modify the original MEM rtx to use new base and offset address.  */
+      new_offset = GEN_INT (offset->offset - base_offset);
+      x = XEXP (offset->mem, 0);
+      if (REGNO (XEXP (x, 0)) == base_reg->base_regno)
+        {
+          validate_change (offset->mem, &XEXP (x, 0), new_base_reg, 1);
+          validate_change (offset->mem, &XEXP (x, 1), new_offset, 1);
+        }
+      else
+        {
+          validate_change (offset->mem, &XEXP (x, 1), new_base_reg, 1);
+          validate_change (offset->mem, &XEXP (x, 0), new_offset, 1);
+        }
+
+      if (!apply_change_group ())
+        remove_insn(new_base_insn);
+
+      df_insn_rescan (offset->mem_insn);
+    }
+}
+
+/* Optimize a list of memory addressing relative to one base register
+   specified in base_reg.
+
+   For each memory access we've already known its offset, by consulting
+   backend we can also know the min and max possible offset constant that
+   can be used in the MEM rtx. So we can easily compute out the new base
+   value range by subtracting min_offset and max_offset from the actual
+   memory offset.
+
+   If two MEMs' new base value range overlap, they can be put into the same
+   group, and the group's base value range is the overlapped part. By this
+   method we can add MEM to this group one by one until the next MEM's base
+   value range doesn't overlap with current group. Now we can modify the
+   group of memory accesses with the low end of the group value range as the
+   new base value. A new group starts with the next MEM access.
+
+   For some architectures the constant offset part of MEM rtx has alignment
+   requirement. So we need to consider it when computing the overlapped base
+   value range.  */
+
+static void
+process_one_base (struct base_reg_info *base_reg)
+{
+  HOST_WIDE_INT max_offset, min_offset;
+  HOST_WIDE_INT base_offset, max_base;
+  int alignment, new_alignment;
+
+  offset_info_t *offset_array = VEC_address (offset_info_t, base_reg->offsets);
+  int count = VEC_length (offset_info_t, base_reg->offsets);
+  int i, group_count;
+
+  if (count < 2)
+    return;
+
+  /* Sort the memory accesses according to their offset.  */
+  qsort (offset_array, count, sizeof (offset_info_t), compare_offset);
+
+  /* group_count contains the number of MEM accesses should be relative to
+     the same new base address.
+     (base_offset + old_base) is the new base address of current group.
+     max_base is the maximum possible value of base_offset.
+     alignment is the offset requirement, not the memory alignment.  */
+  group_count = 1;
+  targetm.get_address_offset_range (GET_MODE (offset_array[0]->mem),
+                                    NULL, &min_offset, &max_offset,
+                                    &new_alignment);
+  base_offset = offset_array[0]->offset - max_offset;
+  max_base = offset_array[0]->offset - min_offset;
+  alignment = new_alignment;
+
+  /* Add the memory access to current group one by one until it is not
+     feasible to add the new memory access.  */
+  for (i = 1; i < count; i++)
+    {
+      HOST_WIDE_INT base2, max_base2;
+      int alignment2;
+
+      targetm.get_address_offset_range (GET_MODE (offset_array[i]->mem),
+              NULL, &min_offset, &max_offset, &new_alignment);
+      base2 = offset_array[i]->offset - max_offset;
+      alignment2 = alignment;
+
+      /* Adjust the new base_offset and alignment.  */
+      if (base2 < base_offset)
+        {
+          /* New base offset is less than current offset.  */
+          if (new_alignment > alignment)
+            {
+              /* But we have stricter alignment. We should use the new
+                 alignment and the aligned old offset as new base offset.  */
+              HOST_WIDE_INT delta = offset_array[i]->offset - base_offset;
+              alignment2 = new_alignment;
+              delta = delta & ~(alignment2 - 1);
+              base2 = offset_array[i]->offset - delta;
+            }
+          else
+            /* New alignment isn't stricter than old alignment.
+               We can simply use old base offset and alignment.  */
+            base2 = base_offset;
+        }
+      else
+        {
+          /* New base offset is greater than current offset.  */
+          if (new_alignment >= alignment)
+            /* New alignment is also stricter than old alignment.
+               We can simply use new base offset and new alignment.  */
+            alignment2 = new_alignment;
+          else
+            {
+              /* Old alignment is stricter than new alignment. We need to
+                 adjust new base value according to old alignment and keep
+                 the old alignment.  */
+              HOST_WIDE_INT delta = max_offset & ~(alignment - 1);
+              base2 = offset_array[i]->offset - delta;
+            }
+        }
+
+      /* Adjust max_base.  */
+      max_base2 = offset_array[i]->offset - min_offset;
+      if (new_alignment < alignment)
+        {
+          /* Old alignment is stricter than new alignment, adjust max_base
+             according to old alignment.  */
+          HOST_WIDE_INT mask = alignment - 1;
+          HOST_WIDE_INT aligned_min = (min_offset + alignment - 1) & mask;
+          max_base2 = offset_array[i]->offset - aligned_min;
+          /* If the new max_base is smaller, update max_base.*/
+          if (max_base2 < max_base)
+            max_base = max_base2;
+        }
+      else
+        {
+          /* New alignment is stricter than old alignment.  */
+          if (max_base2 > max_base)
+            {
+              /* New max_base is greater than old max_base, we must use the
+                 old one, and adjust its alignment.  */
+              HOST_WIDE_INT delta = offset_array[i]->offset - max_base;
+              delta = (delta + new_alignment - 1) & ~(new_alignment - 1);
+              max_base = offset_array[i]->offset - delta;
+            }
+          else
+            /* New max_base is less than old max_base, use the new one.  */
+            max_base = max_base2;
+        }
+
+      /* If the new base_offset is greater than the new max_base, this
+         memory access can't be put into current group.  */
+      if (base2 > max_base)
+        {
+          /* Only when there are two or more memory accesses in one group,
+             this optimization can be beneficial.  */
+          if (group_count > 1)
+          {
+            struct offset_info **start = &offset_array[i - group_count];
+            rewrite_memory_access (base_reg, start, group_count, base_offset);
+          }
+
+          /* Start a new group.  */
+          base_offset = offset_array[i]->offset - max_offset;
+          max_base = offset_array[i]->offset - min_offset;
+          alignment = new_alignment;
+          group_count = 0;
+        }
+      else
+        {
+          alignment = alignment2;
+          base_offset = base2;
+        }
+
+      group_count++;
+    }
+
+  /* The final group.  */
+  if (group_count > 1)
+    {
+      struct offset_info **start = &offset_array[i - group_count];
+      rewrite_memory_access (base_reg, start, group_count, base_offset);
+    }
+}
+
+/* This optimization can be enabled by providing an implementation of
+   targetm.get_address_offset_range.  */
+
+static bool
+gate_handle_common_addr (void)
+{
+  return targetm.get_address_offset_range != NULL &&  optimize > 0;
+}
+
+static unsigned int
+rest_of_handle_extract_common_addr (void)
+{
+  int i;
+  int reg_count = max_reg_num();
+  offset_table = XCNEWVEC (struct offset_info *, reg_count);
+  base_table = XCNEWVEC (struct base_reg_info, reg_count);
+  offset_bitmap = BITMAP_ALLOC (NULL);
+  offset_element_pool = create_alloc_pool ("offset pool",
+                                           sizeof (struct offset_info), 64);
+
+  /* Phase1, collect optimization cadidates.  */
+  collect_candidates ();
+
+  /* Phase2, create new base and rewrite related memory accesses.  */
+  for (i = 0; i < reg_count; i++)
+    {
+      if (base_table[i].offsets != NULL)
+        {
+          process_one_base(&base_table[i]);
+          VEC_free (offset_info_t, heap, base_table[i].offsets);
+        }
+    }
+
+  free_alloc_pool (offset_element_pool);
+  BITMAP_FREE (offset_bitmap);
+  free (base_table);
+  free (offset_table);
+  return 0;
+}
+
+struct rtl_opt_pass pass_extract_common_addr =
+{
+ {
+  RTL_PASS,
+  "extract_common_addr",                /* name */
+  gate_handle_common_addr,              /* gate */
+  rest_of_handle_extract_common_addr,   /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_EXTRACT_COMMON_ADDR,               /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_dump_func                        /* todo_flags_finish */
+ }
+};


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]