This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)


> Hello,
> This is new patch version.
> Comments from Bernhard Reutner-Fischer review applied.
> Also, test case bitfildmrg2.c modified - it is now execute test.
> 
> 
> Example:
> 
> Original code:
>   <unnamed-unsigned:3> D.1351;
>   <unnamed-unsigned:9> D.1350;
>   <unnamed-unsigned:7> D.1349;
>   D.1349_2 = p1_1(D)->f1;
>   p2_3(D)->f1 = D.1349_2;
>   D.1350_4 = p1_1(D)->f2;
>   p2_3(D)->f2 = D.1350_4;
>   D.1351_5 = p1_1(D)->f3;
>   p2_3(D)->f3 = D.1351_5;
> 
> Optimized code:
>   <unnamed-unsigned:19> D.1358;
>   _16 = pr1_2(D)->_field0;
>   pr2_4(D)->_field0 = _16;
> 
> Algorithm works on basic block level and consists of following 3 major steps:
> 1. Go trough basic block statements list. If there are statement pairs
> that implement copy of bit field content from one memory location to
> another record statements pointers and other necessary data in
> corresponding data structure.
> 2. Identify records that represent adjacent bit field accesses and
> mark them as merged.
> 3. Modify trees accordingly.
> 
> New command line option "-ftree-bitfield-merge" is introduced.
> 
> Tested - passed gcc regression tests.

Comments inline (sorry for the late reply...)

> Changelog -
> 
> gcc/ChangeLog:
> 2013-09-24 Zoran Jovanovic (zoran.jovanovic@imgtec.com)
>   * Makefile.in : Added tree-sra.c to GTFILES.
>   * common.opt (ftree-bitfield-merge): New option.
>   * doc/invoke.texi: Added reference to "-ftree-bitfield-merge".
>   * tree-sra.c (ssa_bitfield_merge): New function.
>   Entry for (-ftree-bitfield-merge).
>   (bitfield_stmt_access_pair_htab_hash): New function.
>   (bitfield_stmt_access_pair_htab_eq): New function.
>   (cmp_access): New function.
>   (create_and_insert_access): New function.
>   (get_bit_offset): New function.
>   (get_merged_bit_field_size): New function.
>   (add_stmt_access_pair): New function.
>   * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.
>   (field_byte_offset): declaration moved to tree.h, static removed.
>   * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
>   * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
>   * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.
>   * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.
>   * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.
>   (simple_type_size_in_bits): moved from dwarf2out.c.
>   * tree.h (expressions_equal_p): declaration added.
>   (field_byte_offset): declaration added.
> 
> Patch -
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a2e3f7a..54aa8e7 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h
> $(srcdir)/coretypes.h \
>    $(srcdir)/asan.c \
>    $(srcdir)/ubsan.c \
>    $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \
> +  $(srcdir)/tree-sra.c \
>    @all_gtfiles@
> 
>  # Compute the list of GT header files from the corresponding C sources,
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 202e169..afac514 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2164,6 +2164,10 @@ ftree-sra
>  Common Report Var(flag_tree_sra) Optimization
>  Perform scalar replacement of aggregates
> 
> +ftree-bitfield-merge
> +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
> +Enable bit-field merge on trees
> +

Drop the tree- prefix for new options, it doesn't tell users anything.
I suggest

fmerge-bitfields
Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
Merge loads and stores of consecutive bitfields

and adjust docs with regarding to the flag name change of course.

>  ftree-ter
>  Common Report Var(flag_tree_ter) Optimization
>  Replace temporary expressions in the SSA->normal pass
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index aa0f4ed..e588cae 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
>  -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
>  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
> --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
> +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
>  -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
>  -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
>  -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
> @@ -7679,6 +7679,11 @@ pointer alignment information.
>  This pass only operates on local scalar variables and is enabled by default
>  at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.
> 
> +@item -ftree-bitfield-merge
> +@opindex ftree-bitfield-merge
> +Combines several adjacent bit-field accesses that copy values
> +from one memory location to another into one single bit-field access.
> +
>  @item -ftree-ccp
>  @opindex ftree-ccp
>  Perform sparse conditional constant propagation (CCP) on trees.  This
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 95049e4..e74db17 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT,
> unsigned int);
>  static tree field_type (const_tree);
>  static unsigned int simple_type_align_in_bits (const_tree);
>  static unsigned int simple_decl_align_in_bits (const_tree);
> -static unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree);
> -static HOST_WIDE_INT field_byte_offset (const_tree);
>  static void add_AT_location_description        (dw_die_ref, enum
> dwarf_attribute,
>                                          dw_loc_list_ref);
>  static void add_data_member_location_attribute (dw_die_ref, tree);
> @@ -10149,25 +10147,6 @@ is_base_type (tree type)
>    return 0;
>  }
> 
> -/* Given a pointer to a tree node, assumed to be some kind of a ..._TYPE
> -   node, return the size in bits for the type if it is a constant, or else
> -   return the alignment for the type if the type's size is not constant, or
> -   else return BITS_PER_WORD if the type actually turns out to be an
> -   ERROR_MARK node.  */
> -
> -static inline unsigned HOST_WIDE_INT
> -simple_type_size_in_bits (const_tree type)
> -{
> -  if (TREE_CODE (type) == ERROR_MARK)
> -    return BITS_PER_WORD;
> -  else if (TYPE_SIZE (type) == NULL_TREE)
> -    return 0;
> -  else if (host_integerp (TYPE_SIZE (type), 1))
> -    return tree_low_cst (TYPE_SIZE (type), 1);
> -  else
> -    return TYPE_ALIGN (type);
> -}
> -
>  /* Similarly, but return a double_int instead of UHWI.  */
> 
>  static inline double_int
> @@ -14521,7 +14500,7 @@ round_up_to_align (double_int t, unsigned int align)
>     because the offset is actually variable.  (We can't handle the latter case
>     just yet).  */
> 
> -static HOST_WIDE_INT
> +HOST_WIDE_INT
>  field_byte_offset (const_tree decl)
>  {
>    double_int object_offset_in_bits;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c
> new file mode 100644
> index 0000000..e9e96b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-bitfield-merge -fdump-tree-esra" }  */
> +
> +struct S
> +{
> +  unsigned f1:7;
> +  unsigned f2:9;
> +  unsigned f3:3;
> +  unsigned f4:5;
> +  unsigned f5:1;
> +  unsigned f6:2;
> +};
> +
> +unsigned
> +foo (struct S *p1, struct S *p2, int *ptr)
> +{
> +  p2->f1 = p1->f1;
> +  p2->f2 = p1->f2;
> +  p2->f3 = p1->f3;
> +  *ptr = 7;
> +  p2->f4 = p1->f4;
> +  p2->f5 = p1->f5;
> +  p2->f6 = p1->f6;
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "19" "esra" } } */
> +/* { dg-final { scan-tree-dump "8" "esra"} } */

That's an awfully unspecific dump scan ;)  Please make it more
specific and add a comment what code generation you expect.

> +/* { dg-final { cleanup-tree-dump "esra" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c
> b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c
> new file mode 100644
> index 0000000..653e904
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c
> @@ -0,0 +1,90 @@
> +/* Check whether use of -ftree-bitfield-merge results in presence of overlaping
> +   unions results in incorrect code.  */
> +/* { dg-options "-O2 -ftree-bitfield-merge" }  */
> +/* { dg-do run } */
> +#include <stdio.h>
> +extern void abort (void);
> +
> +struct s1
> +{
> +  unsigned f1:4;
> +  unsigned f2:4;
> +  unsigned f3:4;
> +};
> +
> +struct s2
> +{
> +  unsigned char c;
> +  unsigned f1:4;
> +  unsigned f2:4;
> +  unsigned f3:4;
> +};
> +
> +struct s3
> +{
> +  unsigned f1:3;
> +  unsigned f2:3;
> +  unsigned f3:3;
> +};
> +
> +struct s4
> +{
> +  unsigned f0:3;
> +  unsigned f1:3;
> +  unsigned f2:3;
> +  unsigned f3:3;
> +};
> +
> +union un_1
> +{
> +  struct s1 a;
> +  struct s2 b;
> +};
> +
> +union un_2
> +{
> +  struct s3 a;
> +  struct s4 b;
> +};
> +
> +void f1 (union un_1 *p1, union un_1 *p2)
> +{
> +  p2->a.f3 = p1->b.f3;
> +  p2->a.f2 = p1->b.f2;
> +  p2->a.f1 = p1->b.f1;
> +
> +  if (p1->b.f1 != 3)
> +    abort ();
> +}
> +
> +void f2 (union un_2 *p1, union un_2 *p2)
> +{
> +  p2->b.f1 = p1->a.f1;
> +  p2->b.f2 = p1->a.f2;
> +  p2->b.f3 = p1->a.f3;
> +
> +  if (p2->b.f1 != 0 || p2->b.f2 != 0 || p2->b.f3 != 0)
> +    abort ();
> +}
> +
> +int main ()
> +{
> +  union un_1 u1;
> +  union un_2 u2;
> +
> +  u1.b.f1 = 1;
> +  u1.b.f2 = 2;
> +  u1.b.f3 = 3;
> +  u1.b.c = 0;
> +
> +  f1 (&u1, &u1);
> +
> +  u2.b.f0 = 0;
> +  u2.b.f1 = 1;
> +  u2.b.f2 = 2;
> +  u2.b.f3 = 3;
> +
> +  f2 (&u2, &u2);
> +
> +  return 0;
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 58c7565..610245b 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -92,6 +92,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-pretty-print.h"
>  #include "ipa-inline.h"
>  #include "ipa-utils.h"
> +#include "ggc.h"

You shouldn't need that I think.

>  /* Enumeration of all aggregate reductions we can do.  */
>  enum sra_mode { SRA_MODE_EARLY_IPA,   /* early call regularization */
> @@ -3424,12 +3425,476 @@ perform_intra_sra (void)
>    return ret;
>  }
> 
> +/* This optimization combines several adjacent bit-field accesses that copy
> +   values from one memory location to another into one single bit-field
> +   access.  */
> +
> +/* Data for single bit-field read/write sequence.  */
> +struct GTY (()) bitfield_access_d {
> +  gimple load_stmt;              /* Bit-field load statement.  */
> +  gimple store_stmt;             /* Bit-field store statement.  */
> +  unsigned src_offset_words;     /* Bit-field offset at src in words.  */
> +  unsigned src_bit_offset;       /* Bit-field offset inside source word.  */
> +  unsigned src_bit_size;         /* Size of bit-field in source word.  */
> +  unsigned dst_offset_words;     /* Bit-field offset at dst in words.  */
> +  unsigned dst_bit_offset;       /* Bit-field offset inside destination
> +                                    word.  */
> +  unsigned src_field_offset;     /* Source field offset.  */
> +  unsigned dst_bit_size;         /* Size of bit-field in destination word.  */
> +  tree src_addr;                 /* Address of source memory access.  */
> +  tree dst_addr;                 /* Address of destination memory access.  */
> +  bool merged;                   /* True if access is merged with another
> +                                    one.  */
> +  bool modified;                 /* True if bit-field size is modified.  */
> +  bool is_barrier;               /* True if access is barrier (call or mem
> +                                    access).  */
> +  struct bitfield_access_d *next; /* Access with which this one is merged.  */
> +  tree bitfield_type;            /* Field type.  */
> +  tree bitfield_representative;          /* Bit field representative
> of original
> +                                    declaration.  */
> +  tree field_decl_context;       /* Context of original bit-field
> +                                    declaration.  */
> +};
> +
> +typedef struct bitfield_access_d bitfield_access_o;
> +typedef struct bitfield_access_d *bitfield_access;
> +
> +/* Connecting register with bit-field access sequence that defines value in
> +   that register.  */
> +struct GTY (()) bitfield_stmt_access_pair_d
> +{
> +  gimple stmt;
> +  bitfield_access access;
> +};
> +
> +typedef struct bitfield_stmt_access_pair_d bitfield_stmt_access_pair_o;
> +typedef struct bitfield_stmt_access_pair_d *bitfield_stmt_access_pair;
> +
> +static GTY ((param_is (struct bitfield_stmt_access_pair_d)))
> +  htab_t bitfield_stmt_access_htab;

Nor does this need to be registered with the garbage collector.  Its
lifetime is not greater than that of the SRA pass, right?

> +/* Hash table callbacks for bitfield_stmt_access_htab.  */
> +
> +static hashval_t
> +bitfield_stmt_access_pair_htab_hash (const void *p)
> +{
> +  const bitfield_stmt_access_pair entry = (const bitfield_stmt_access_pair)p;
> +  return (hashval_t) (uintptr_t) entry->stmt;
> +}
> +
> +static int
> +bitfield_stmt_access_pair_htab_eq (const void *p1, const void *p2)
> +{
> +  const struct bitfield_stmt_access_pair_d *entry1 =
> +    (const bitfield_stmt_access_pair)p1;
> +  const struct bitfield_stmt_access_pair_d *entry2 =
> +    (const bitfield_stmt_access_pair)p2;
> +  return entry1->stmt == entry2->stmt;
> +}
> +
> +/* Counter used for generating unique names for new fields.  */
> +static unsigned new_field_no;

Must be stale...?

> +/* Compare two bit-field access records.  */
> +
> +static int
> +cmp_access (const void *p1, const void *p2)
> +{
> +  const bitfield_access a1 = (*(const bitfield_access*)p1);
> +  const bitfield_access a2 = (*(const bitfield_access*)p2);
> +
> +  if (a1->bitfield_representative - a2->bitfield_representative)
> +    return a1->bitfield_representative - a2->bitfield_representative;

Subtracting two unrelated pointers is undefined - I suggest you
use DECL_UID (a1->bitfield_representative).

> +  if (!expressions_equal_p (a1->src_addr, a2->src_addr))
> +    return a1 - a2;

Same - the comparison ends up depending on memory layout, that's bad.

> +
> +  if (!expressions_equal_p (a1->dst_addr, a2->dst_addr))
> +    return a1 - a2;
> +  if (a1->src_offset_words - a2->src_offset_words)
> +    return a1->src_offset_words - a2->src_offset_words;
> +
> +  return a1->src_bit_offset - a2->src_bit_offset;
> +}
> +
> +/* Create new bit-field access structure and add it to given bitfield_accesses
> +   htab.  */
> +
> +static bitfield_access
> +create_and_insert_access (vec<bitfield_access>
> +                      *bitfield_accesses)
> +{
> +  bitfield_access access = ggc_alloc_bitfield_access_d ();
> +  memset (access, 0, sizeof (struct bitfield_access_d));
> +  bitfield_accesses->safe_push (access);
> +  return access;
> +}
> +
> +/* Slightly modified add_bit_offset_attribute from dwarf2out.c.  */
> +
> +static inline HOST_WIDE_INT
> +get_bit_offset (tree decl)
> +{
> +  tree type = DECL_BIT_FIELD_TYPE (decl);
> +  HOST_WIDE_INT bitpos_int;
> +
> +  /* Must be a field and a bit-field.  */
> +  gcc_assert (type && TREE_CODE (decl) == FIELD_DECL);
> +  /* Bit position and decl size should be integer constants that can be
> +     represented in a signle HOST_WIDE_INT.  */
> +  if (! host_integerp (bit_position (decl), 0)
> +      || ! host_integerp (DECL_SIZE (decl), 1))
> +    return -1;
> +
> +  bitpos_int = int_bit_position (decl);
> +  return bitpos_int;
> +}
> +
> +/* Returns size of combined bitfields.  Size cannot be larger than size
> +   of largest directly accessible memory unit.  */
> +
> +static int
> +get_merged_bit_field_size (bitfield_access access)
> +{
> +  bitfield_access tmp_access = access;
> +  int size = 0;
> +
> +  while (tmp_access)
> +  {
> +    size += tmp_access->src_bit_size;
> +    tmp_access = tmp_access->next;
> +  }
> +  return size;
> +}
> +
> +/* Adds new pair consisting of statement and bit-field access structure that
> +   contains it.  */
> +
> +static bool add_stmt_access_pair (bitfield_access access, gimple stmt)
> +{
> +  bitfield_stmt_access_pair new_pair;
> +  void **slot;
> +  if (!bitfield_stmt_access_htab)
> +    bitfield_stmt_access_htab =
> +      htab_create_ggc (128, bitfield_stmt_access_pair_htab_hash,
> +                      bitfield_stmt_access_pair_htab_eq, NULL);
> +  new_pair = ggc_alloc_bitfield_stmt_access_pair_o ();
> +  new_pair->stmt = stmt;
> +  new_pair->access = access;
> +  slot = htab_find_slot (bitfield_stmt_access_htab, new_pair, INSERT);
> +  if (*slot == HTAB_EMPTY_ENTRY)
> +    {
> +      *slot = new_pair;
> +      return true;
> +    }
> +  return false;
> +}
> +
> +/* Returns true if given COMPONENT_REF is part of an union.  */
> +
> +static bool part_of_union_p (tree component)
> +{
> +  tree tmp = component;
> +  bool res = false;
> +  while (TREE_CODE (tmp) == COMPONENT_REF)
> +    {
> +      if (TREE_CODE (TREE_TYPE (tmp)) == UNION_TYPE)
> +       {
> +         res = true;
> +         break;
> +       }
> +      tmp = TREE_OPERAND (tmp, 0);
> +    }
> +  if (tmp && (TREE_CODE (TREE_TYPE (tmp)) == UNION_TYPE))
> +    res = true;
> +  return res;
> +}
> +
> +/* Main entry point for the bit-field merge optimization.  */

Ok, I'm skipping to here (I was wondering what you need all the above
functions for)

> +static unsigned int
> +ssa_bitfield_merge (void)
> +{
> +  basic_block bb;
> +  unsigned int todoflags = 0;
> +  vec<bitfield_access> bitfield_accesses;
> +  int ix, iy;
> +  bitfield_access access;
> +  bool cfg_changed = false;
> +
> +  /* In the strict volatile bitfields case, doing code changes here may prevent
> +     other optimizations, in particular in a SLOW_BYTE_ACCESS setting.  */
> +  if (flag_strict_volatile_bitfields > 0)
> +    return 0;

Hmm, I'm not sure we should care ... - but well, I don't care ;)

> +  FOR_EACH_BB (bb)
> +    {
> +      gimple_stmt_iterator gsi;
> +      vec<bitfield_access> bitfield_accesses_merge = vNULL;
> +      tree prev_representative = NULL_TREE;
> +      bitfield_accesses.create (0);
> +
> +      /* Identify all bitfield copy sequences in the basic-block.  */
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
> +       {
> +         gimple stmt = gsi_stmt (gsi);
> +         tree lhs, rhs;
> +         void **slot;
> +         struct bitfield_stmt_access_pair_d asdata;
> +
> +         if (!is_gimple_assign (stmt))

Instead of checking TREE_THIS_VOLATILE below check here
  || gimple_has_volatile_ops (stmt)

also you can narrow the assigns to visit by checking for

           !is_gimple_assign_single (stmt)

instead

> +           {
> +             gsi_next (&gsi);
> +             continue;
> +           }
> +
> +         lhs = gimple_assign_lhs (stmt);
> +         rhs = gimple_assign_rhs1 (stmt);
> +
> +         if (TREE_CODE (rhs) == COMPONENT_REF)
> +           {
> +             use_operand_p use;
> +             gimple use_stmt;
> +             tree op0 = TREE_OPERAND (rhs, 0);
> +             tree op1 = TREE_OPERAND (rhs, 1);
> +
> +             if (TREE_CODE (op1) == FIELD_DECL && DECL_BIT_FIELD_TYPE (op1)

op1 is always a FIELD_DECL

> +                 && !TREE_THIS_VOLATILE (op1) && !part_of_union_p (rhs))

I wonder what's the issue with unions ... sure, for

union {
  int field1 : 3;
  int field2 : 2;
};

but for

union {
  struct {
     int field1 : 3;
     int field2 : 2;
  } a;
...
};

?  Thus I'd simply check

            && TREE_CODE (DECL_CONTEXT (op1)) != UNION_TYPE
            && TREE_CODE (DECL_CONTEXT (op1)) != QUAL_UNION_TYPE

maybe introduce

#define UNION_TYPE_P(TYPE)
  (TREE_CODE (TYPE) == UNION_TYPE            \
   || TREE_CODE (TYPE) == QUAL_UNION_TYPE)

alongside RECORD_OR_UNION_TYPE_P in tree.h.

> +               {
> +                 if (single_imm_use (lhs, &use, &use_stmt)
> +                      && is_gimple_assign (use_stmt))
> +                   {
> +                     tree use_lhs = gimple_assign_lhs (use_stmt);
> +                     if (TREE_CODE (use_lhs) == COMPONENT_REF)

I'm not sure I follow the logic here, but it seems that you match
a very specific pattern only - a bitfield copy.

I'd have applied the lowering (this is really a lowering pass
with a cost model you make up here) if the same 
DECL_BIT_FIELD_REPRESENTATIVE is used more than once in a BB
on the same underlying base object.  That is, we are reasonably
sure we can remove a redundant load and eventually redundant stores.

Thus, very simplistic you'd just record a op0, 
DECL_BIT_FIELD_REPRESENTATIVE pair into the hashtable, using
iterative_hash_expr for op0 and DECL_UID for the representative,
counting the number of times you see them.  Make sure to apply
the same for stores to bitfields, of course.

> +                       {
> +                         tree use_op0 = TREE_OPERAND (use_lhs, 0);
> +                         tree use_op1 = TREE_OPERAND (use_lhs, 1);
> +                         tree tmp_repr = DECL_BIT_FIELD_REPRESENTATIVE (op1);
> +
> +                         if (TREE_CODE (use_op1) == FIELD_DECL
> +                             && DECL_BIT_FIELD_TYPE (use_op1)
> +                             && !TREE_THIS_VOLATILE (use_op1))
> +                           {
> +                             if (prev_representative
> +                                 && (prev_representative != tmp_repr))
> +                               {
> +                                 /* If previous access has different
> +                                    representative then barrier is needed
> +                                    between it and new access.  */
> +                                 access = create_and_insert_access
> +                                            (&bitfield_accesses);
> +                                 access->is_barrier = true;
> +                               }
> +                             prev_representative = tmp_repr;
> +                             /* Create new bit-field access structure.  */
> +                             access = create_and_insert_access
> +                                        (&bitfield_accesses);
> +                             /* Collect access data - load instruction.  */
> +                             access->src_bit_size = tree_low_cst
> +                                                     (DECL_SIZE (op1), 1);
> +                             access->src_bit_offset = get_bit_offset (op1);
> +                             access->src_offset_words =
> +                               field_byte_offset (op1) / UNITS_PER_WORD;
> +                             access->src_field_offset =
> +                               tree_low_cst (DECL_FIELD_OFFSET (op1),1);
> +                             access->src_addr = op0;
> +                             access->load_stmt = gsi_stmt (gsi);
> +                             /* Collect access data - store instruction.  */
> +                             access->dst_bit_size =
> +                               tree_low_cst (DECL_SIZE (use_op1), 1);
> +                             access->dst_bit_offset =
> +                               get_bit_offset (use_op1);
> +                             access->dst_offset_words =
> +                               field_byte_offset (use_op1) / UNITS_PER_WORD;
> +                             access->dst_addr = use_op0;
> +                             access->store_stmt = use_stmt;
> +                             add_stmt_access_pair (access, stmt);
> +                             add_stmt_access_pair (access, use_stmt);
> +                             access->bitfield_type
> +                               = DECL_BIT_FIELD_TYPE (use_op1);
> +                             access->bitfield_representative = tmp_repr;
> +                             access->field_decl_context =
> +                               DECL_FIELD_CONTEXT (op1);
> +                           }
> +                       }
> +                   }
> +               }
> +           }
> +
> +         /* Insert barrier for merging if statement is function call or memory
> +            access.  */
> +         if (bitfield_stmt_access_htab)
> +           {
> +             asdata.stmt = stmt;
> +             slot
> +               = htab_find_slot (bitfield_stmt_access_htab, &asdata,
> +                                 NO_INSERT);
> +             if (!slot
> +                 && ((gimple_code (stmt) == GIMPLE_CALL)
> +                     || (gimple_has_mem_ops (stmt))))
> +               {
> +                 /* Create new bit-field access structure.  */
> +                 access = create_and_insert_access (&bitfield_accesses);
> +                 /* Mark it as barrier.  */
> +                 access->is_barrier = true;
> +               }
> +           }
> +         gsi_next (&gsi);
> +       }
> +
> +      /* If there are no at least two accesses go to the next basic block.  */
> +      if (bitfield_accesses.length () <= 1)
> +       {
> +         bitfield_accesses.release ();
> +         continue;
> +       }
> +
> +      /* Find bit-field accesses that can be merged.  */
> +      for (ix = 0; bitfield_accesses.iterate (ix, &access); ix++)
> +       {
> +         bitfield_access head_access;
> +         bitfield_access mrg_access;
> +         bitfield_access prev_access;
> +         if (!bitfield_accesses_merge.exists ())
> +           bitfield_accesses_merge.create (0);
> +
> +         bitfield_accesses_merge.safe_push (access);
> +
> +         if (!access->is_barrier
> +             && !(access == bitfield_accesses.last ()
> +             && !bitfield_accesses_merge.is_empty ()))
> +           continue;
> +
> +         bitfield_accesses_merge.qsort (cmp_access);
> +
> +         head_access = prev_access = NULL;
> +         for (iy = 0; bitfield_accesses_merge.iterate (iy, &mrg_access); iy++)
> +           {
> +             if (head_access
> +                 && expressions_equal_p (head_access->src_addr,
> +                                         mrg_access->src_addr)
> +                 && expressions_equal_p (head_access->dst_addr,
> +                                         mrg_access->dst_addr)
> +                 && prev_access->src_offset_words
> +                    == mrg_access->src_offset_words
> +                 && prev_access->dst_offset_words
> +                    == mrg_access->dst_offset_words
> +                 && prev_access->src_bit_offset + prev_access->src_bit_size
> +                    == mrg_access->src_bit_offset
> +                 && prev_access->dst_bit_offset + prev_access->dst_bit_size
> +                    == mrg_access->dst_bit_offset
> +                 && prev_access->bitfield_representative
> +                    == mrg_access->bitfield_representative)
> +               {
> +                 /* Merge conditions are satisfied - merge accesses.  */
> +                 mrg_access->merged = true;
> +                 prev_access->next = mrg_access;
> +                 head_access->modified = true;
> +                 prev_access = mrg_access;
> +               }
> +             else
> +               head_access = prev_access = mrg_access;
> +           }
> +         bitfield_accesses_merge.release ();
> +         bitfield_accesses_merge = vNULL;
> +       }
> +
> +      /* Modify generated code.  */

Ick, so you are actually applying an optimization instead of just
lowering the accesses to DECL_BIT_FIELD_REPRESENTATIVE accesses
and letting CSE and DSE do their job.  Hmm.  I don't think this is
a good idea.

Instead what I'd like to see is more something like the following
(bah, I never merged BIT_FIELD_EXPR which performs the bit-field-insert
in a single stmt);  quickly hacked together, without a cost model,
just the lowering piece (likely not endian clean - but who knows).

Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c	(revision 204561)
+++ gcc/tree-sra.c	(working copy)
@@ -3445,10 +3445,121 @@ perform_intra_sra (void)
   return ret;
 }
 
+static void
+lower_bitfields (void)
+{
+  basic_block bb;
+  FOR_EACH_BB (bb)
+    for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	 !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple stmt = gsi_stmt (gsi);
+	if (!gimple_assign_single_p (stmt)
+	    || gimple_has_volatile_ops (stmt))
+	  continue;
+
+	/* Lower a bitfield read.  */
+	tree ref = gimple_assign_rhs1 (stmt);
+	if (TREE_CODE (ref) == COMPONENT_REF
+	    && DECL_BIT_FIELD_TYPE (TREE_OPERAND (ref, 1))
+	    && DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (ref, 1)))
+	  {
+	    tree field = TREE_OPERAND (ref, 1);
+	    tree rep = DECL_BIT_FIELD_REPRESENTATIVE (field);
+	    unsigned HOST_WIDE_INT off;
+	    if (host_integerp (DECL_FIELD_OFFSET (field), 1)
+		&& host_integerp (DECL_FIELD_OFFSET (rep), 1))
+	      off = (tree_low_cst (DECL_FIELD_OFFSET (field), 1)
+		     - tree_low_cst (DECL_FIELD_OFFSET (rep), 1)) * BITS_PER_UNIT;
+	    else
+	      off = 0;
+	    off += (tree_low_cst (DECL_FIELD_BIT_OFFSET (field), 1)
+		    - tree_low_cst (DECL_FIELD_BIT_OFFSET (rep), 1));
+	    tree loadres = make_ssa_name (TREE_TYPE (rep), NULL);
+	    gimple load
+	      = gimple_build_assign (loadres,
+				     build3 (COMPONENT_REF, TREE_TYPE (rep),
+					     TREE_OPERAND (ref, 0), rep,
+					     NULL_TREE));
+	    gimple_set_vuse (load, gimple_vuse (stmt));
+	    gsi_insert_before (&gsi, load, GSI_SAME_STMT);
+	    gimple_assign_set_rhs1 (stmt,
+				    build3 (BIT_FIELD_REF, TREE_TYPE (ref),
+					    loadres,
+					    DECL_SIZE (field),
+					    bitsize_int (off)));
+	    update_stmt (stmt);
+	  }
+	ref = gimple_assign_lhs (stmt);
+	if (TREE_CODE (ref) == COMPONENT_REF
+	    && DECL_BIT_FIELD_TYPE (TREE_OPERAND (ref, 1))
+	    && DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (ref, 1)))
+	  {
+	    tree field = TREE_OPERAND (ref, 1);
+	    tree rep = DECL_BIT_FIELD_REPRESENTATIVE (field);
+	    unsigned HOST_WIDE_INT off;
+	    if (host_integerp (DECL_FIELD_OFFSET (field), 1)
+		&& host_integerp (DECL_FIELD_OFFSET (rep), 1))
+	      off = (tree_low_cst (DECL_FIELD_OFFSET (field), 1)
+		     - tree_low_cst (DECL_FIELD_OFFSET (rep), 1)) * BITS_PER_UNIT;
+	    else
+	      off = 0;
+	    off += (tree_low_cst (DECL_FIELD_BIT_OFFSET (field), 1)
+		    - tree_low_cst (DECL_FIELD_BIT_OFFSET (rep), 1));
+	    tree loadres = make_ssa_name (TREE_TYPE (rep), NULL);
+	    gimple load
+	      = gimple_build_assign (loadres,
+				     build3 (COMPONENT_REF, TREE_TYPE (rep),
+					     unshare_expr (TREE_OPERAND (ref, 0)),
+					     rep,
+					     NULL_TREE));
+	    gimple_set_vuse (load, gimple_vuse (stmt));
+	    gsi_insert_before (&gsi, load, GSI_SAME_STMT);
+	    /* FIXME:  BIT_FIELD_EXPR.  */
+	    /* Mask out bits.  */
+	    tree masked = make_ssa_name (TREE_TYPE (rep), NULL);
+	    gimple tems
+	      = gimple_build_assign_with_ops (BIT_AND_EXPR,
+					      masked, loadres,
+					      double_int_to_tree (TREE_TYPE (rep),
+								  double_int::mask (TREE_INT_CST_LOW (DECL_SIZE (field))).lshift (off)));
+	    gsi_insert_before (&gsi, tems, GSI_SAME_STMT);
+	    /* Zero-extend the value to representative size.  */
+	    tree tem2 = make_ssa_name (unsigned_type_for (TREE_TYPE (field)), NULL);
+	    tems = gimple_build_assign_with_ops (NOP_EXPR, tem2,
+						 gimple_assign_rhs1 (stmt),
+						 NULL_TREE);
+	    gsi_insert_before (&gsi, tems, GSI_SAME_STMT);
+	    tree tem = make_ssa_name (TREE_TYPE (rep), NULL);
+	    tems = gimple_build_assign_with_ops (NOP_EXPR, tem, tem2, NULL_TREE);
+	    gsi_insert_before (&gsi, tems, GSI_SAME_STMT);
+	    /* Shift the value into place.  */
+	    tem2 = make_ssa_name (TREE_TYPE (rep), NULL);
+	    tems = gimple_build_assign_with_ops (LSHIFT_EXPR, tem2, tem,
+						 size_int (off));
+	    gsi_insert_before (&gsi, tems, GSI_SAME_STMT);
+	    /* Merge masked loaded value and value.  */
+	    tree modres = make_ssa_name (TREE_TYPE (rep), NULL);
+	    gimple mod
+	      = gimple_build_assign_with_ops (BIT_IOR_EXPR, modres,
+					      masked, tem2);
+	    gsi_insert_before (&gsi, mod, GSI_SAME_STMT);
+	    /* Finally adjust the store.  */
+	    gimple_assign_set_rhs1 (stmt, modres);
+	    gimple_assign_set_lhs (stmt,
+				   build3 (COMPONENT_REF, TREE_TYPE (rep),
+					   TREE_OPERAND (ref, 0), rep,
+					   NULL_TREE));
+	    update_stmt (stmt);
+	  }
+      }
+}
+
 /* Perform early intraprocedural SRA.  */
 static unsigned int
 early_intra_sra (void)
 {
+  lower_bitfields ();
   sra_mode = SRA_MODE_EARLY_INTRA;
   return perform_intra_sra ();
 }


The idea is that this lowering then makes ESRA able to handle
it (you can see that for example in the result from
gcc.c-torture/execute/20000113-1.c which is miscompiled by
the above, eh ... what did I say about not testing it??).

I'll try to get the above working and cleaned up a bit today
and maybe early next week.  So stay tuned, I'll hand it over
to make you test if it works as advertised for you.

Richard.


> +      for (ix = 0; bitfield_accesses.iterate (ix, &access); ix++)
> +       {
> +         if (access->merged)
> +           {
> +             /* Access merged - remove instructions.  */
> +             gimple_stmt_iterator tmp_gsi;
> +             tmp_gsi = gsi_for_stmt (access->load_stmt);
> +             gsi_remove (&tmp_gsi, true);
> +             tmp_gsi = gsi_for_stmt (access->store_stmt);
> +             gsi_remove (&tmp_gsi, true);
> +           }
> +         else if (access->modified)
> +           {
> +             /* Access modified - modify generated code.  */
> +             gimple_stmt_iterator tmp_gsi;
> +             tree tmp_ssa;
> +             tree itype = make_node (INTEGER_TYPE);
> +             tree new_rhs;
> +             tree new_lhs;
> +             gimple new_stmt;
> +             char new_field_name [15];
> +             int decl_size;
> +
> +             /* Bit-field size changed - modify load statement.  */
> +             access->src_bit_size = get_merged_bit_field_size (access);
> +
> +             TYPE_PRECISION (itype) = access->src_bit_size;
> +             fixup_unsigned_type (itype);
> +
> +             /* Create new declaration.  */
> +             tree new_field = make_node (FIELD_DECL);
> +             sprintf (new_field_name, "_field%x", new_field_no++);
> +             DECL_NAME (new_field) = get_identifier (new_field_name);
> +             TREE_TYPE (new_field) = itype;
> +             DECL_BIT_FIELD (new_field) = 1;
> +             DECL_BIT_FIELD_TYPE (new_field) = access->bitfield_type;
> +             DECL_BIT_FIELD_REPRESENTATIVE (new_field) =
> +               access->bitfield_representative;
> +             DECL_FIELD_CONTEXT (new_field) = access->field_decl_context;
> +             DECL_NONADDRESSABLE_P (new_field) = 1;
> +             DECL_FIELD_OFFSET (new_field) =
> +               build_int_cst (unsigned_type_node, access->src_field_offset);
> +             DECL_FIELD_BIT_OFFSET (new_field) =
> +               build_int_cst (unsigned_type_node, access->src_bit_offset);
> +             DECL_SIZE (new_field) = build_int_cst (unsigned_type_node,
> +                                                    access->src_bit_size);
> +             decl_size = access->src_bit_size / BITS_PER_UNIT
> +               + (access->src_bit_size % BITS_PER_UNIT ? 1 : 0);
> +             DECL_SIZE_UNIT (new_field) =
> +               build_int_cst (unsigned_type_node, decl_size);
> +
> +             tmp_ssa = make_ssa_name (create_tmp_var (itype, NULL), NULL);
> +
> +             /* Create new component ref.  */
> +             new_rhs = build3 (COMPONENT_REF, itype, access->src_addr,
> +                               new_field, NULL);
> +             tmp_gsi = gsi_for_stmt (access->load_stmt);
> +             new_stmt = gimple_build_assign (tmp_ssa, new_rhs);
> +             gsi_insert_after (&tmp_gsi, new_stmt, GSI_SAME_STMT);
> +             SSA_NAME_DEF_STMT (tmp_ssa) = new_stmt;
> +             gsi_remove (&tmp_gsi, true);
> +
> +             /* Bit-field size changed - modify store statement.  */
> +             /* Create new component ref.  */
> +             new_lhs = build3 (COMPONENT_REF, itype, access->dst_addr,
> +                               new_field, NULL);
> +             new_stmt = gimple_build_assign (new_lhs, tmp_ssa);
> +             tmp_gsi = gsi_for_stmt (access->store_stmt);
> +             gsi_insert_after (&tmp_gsi, new_stmt, GSI_SAME_STMT);
> +             gsi_remove (&tmp_gsi, true);
> +             cfg_changed = true;
> +           }
> +       }
> +      /* Empty or delete data structures used for basic block.  */
> +      htab_empty (bitfield_stmt_access_htab);
> +      bitfield_stmt_access_htab = NULL;
> +      bitfield_accesses.release ();
> +    }
> +
> +  if (cfg_changed)
> +    todoflags |= TODO_cleanup_cfg;
> +
> +  return todoflags;
> +}
> +
>  /* Perform early intraprocedural SRA.  */
>  static unsigned int
>  early_intra_sra (void)
>  {
> +  unsigned int todoflags = 0;
>    sra_mode = SRA_MODE_EARLY_INTRA;
> -  return perform_intra_sra ();
> +  if (flag_tree_bitfield_merge)
> +    todoflags = ssa_bitfield_merge ();
> +  return todoflags | perform_intra_sra ();
>  }
> 
>  /* Perform "late" intraprocedural SRA.  */
> @@ -5105,3 +5570,5 @@ make_pass_early_ipa_sra (gcc::context *ctxt)
>  {
>    return new pass_early_ipa_sra (ctxt);
>  }
> +
> +#include "gt-tree-sra.h"
> diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
> index bd2feb4..683fd76 100644
> --- a/gcc/tree-ssa-sccvn.c
> +++ b/gcc/tree-ssa-sccvn.c
> @@ -4176,29 +4176,6 @@ get_next_value_id (void)
>    return next_value_id++;
>  }
> 
> -
> -/* Compare two expressions E1 and E2 and return true if they are equal.  */
> -
> -bool
> -expressions_equal_p (tree e1, tree e2)
> -{
> -  /* The obvious case.  */
> -  if (e1 == e2)
> -    return true;
> -
> -  /* If only one of them is null, they cannot be equal.  */
> -  if (!e1 || !e2)
> -    return false;
> -
> -  /* Now perform the actual comparison.  */
> -  if (TREE_CODE (e1) == TREE_CODE (e2)
> -      && operand_equal_p (e1, e2, OEP_PURE_SAME))
> -    return true;
> -
> -  return false;
> -}
> -
> -
>  /* Return true if the nary operation NARY may trap.  This is a copy
>     of stmt_could_throw_1_p adjusted to the SCCVN IL.  */
> 
> diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
> index 94e3603..707b18c 100644
> --- a/gcc/tree-ssa-sccvn.h
> +++ b/gcc/tree-ssa-sccvn.h
> @@ -21,10 +21,6 @@
>  #ifndef TREE_SSA_SCCVN_H
>  #define TREE_SSA_SCCVN_H
> 
> -/* In tree-ssa-sccvn.c  */
> -bool expressions_equal_p (tree, tree);
> -
> -
>  /* TOP of the VN lattice.  */
>  extern tree VN_TOP;
> 
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 6593cf8..6683957 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -12155,4 +12155,44 @@ contains_bitfld_component_ref_p (const_tree ref)
>    return false;
>  }
> 
> +/* Compare two expressions E1 and E2 and return true if they are equal.  */
> +
> +bool
> +expressions_equal_p (tree e1, tree e2)
> +{
> +  /* The obvious case.  */
> +  if (e1 == e2)
> +    return true;
> +
> +  /* If only one of them is null, they cannot be equal.  */
> +  if (!e1 || !e2)
> +    return false;
> +
> +  /* Now perform the actual comparison.  */
> +  if (TREE_CODE (e1) == TREE_CODE (e2)
> +      && operand_equal_p (e1, e2, OEP_PURE_SAME))
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Given a pointer to a tree node, assumed to be some kind of a ..._TYPE
> +   node, return the size in bits for the type if it is a constant, or else
> +   return the alignment for the type if the type's size is not constant, or
> +   else return BITS_PER_WORD if the type actually turns out to be an
> +   ERROR_MARK node.  */
> +
> +unsigned HOST_WIDE_INT
> +simple_type_size_in_bits (const_tree type)
> +{
> +  if (TREE_CODE (type) == ERROR_MARK)
> +    return BITS_PER_WORD;
> +  else if (TYPE_SIZE (type) == NULL_TREE)
> +    return 0;
> +  else if (host_integerp (TYPE_SIZE (type), 1))
> +    return tree_low_cst (TYPE_SIZE (type), 1);
> +  else
> +    return TYPE_ALIGN (type);
> +}
> +
>  #include "gt-tree.h"
> diff --git a/gcc/tree.h b/gcc/tree.h
> index a263a2c..b2bd481 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4528,6 +4528,7 @@ extern tree get_ref_base_and_extent (tree,
> HOST_WIDE_INT *,
>                                      HOST_WIDE_INT *, HOST_WIDE_INT *);
>  extern bool contains_bitfld_component_ref_p (const_tree);
>  extern bool type_in_anonymous_namespace_p (tree);
> +extern bool expressions_equal_p (tree e1, tree e2);
> 
>  /* In tree-nested.c */
>  extern tree build_addr (tree, tree);
> @@ -4693,6 +4694,10 @@ extern tree resolve_asm_operand_names (tree,
> tree, tree, tree);
>  extern tree tree_overlaps_hard_reg_set (tree, HARD_REG_SET *);
>  #endif
> 
> +/* In dwarf2out.c.  */
> +HOST_WIDE_INT
> +field_byte_offset (const_tree decl);
> +
>  ?
>  /* In tree-inline.c  */
> 
> @@ -4979,5 +4984,6 @@ builtin_decl_implicit_p (enum built_in_function fncode)
>  #endif /* NO_DOLLAR_IN_LABEL */
>  #endif /* NO_DOT_IN_LABEL */
> 
> +extern unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree type);
> 
>  #endif  /* GCC_TREE_H  */
> 
> 
> Regards,
> Zoran Jovanovic
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]