This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
- From: Richard Biener <rguenther at suse dot de>
- To: Zoran Jovanovic <Zoran dot Jovanovic at imgtec dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 8 Nov 2013 13:25:24 +0100 (CET)
- Subject: Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
- Authentication-results: sourceware.org; auth=none
- References: <386B40EC5E8DBF459FD11A754D868AD922E31112 at BADAG02 dot ba dot imgtec dot org> <386B40EC5E8DBF459FD11A754D868AD922E333D4 at BADAG02 dot ba dot imgtec dot org> <386B40EC5E8DBF459FD11A754D868AD922E34DA7 at BADAG02 dot ba dot imgtec dot org> <CAFiYyc0dcpDeXqwM2G3BTJUkpTsjzivRVEuWGfmGE4QcMhxERA at mail dot gmail dot com>
> Hello,
> This is new patch version.
> Comments from Bernhard Reutner-Fischer review applied.
> Also, test case bitfildmrg2.c modified - it is now execute test.
>
>
> Example:
>
> Original code:
> <unnamed-unsigned:3> D.1351;
> <unnamed-unsigned:9> D.1350;
> <unnamed-unsigned:7> D.1349;
> D.1349_2 = p1_1(D)->f1;
> p2_3(D)->f1 = D.1349_2;
> D.1350_4 = p1_1(D)->f2;
> p2_3(D)->f2 = D.1350_4;
> D.1351_5 = p1_1(D)->f3;
> p2_3(D)->f3 = D.1351_5;
>
> Optimized code:
> <unnamed-unsigned:19> D.1358;
> _16 = pr1_2(D)->_field0;
> pr2_4(D)->_field0 = _16;
>
> Algorithm works on basic block level and consists of following 3 major steps:
> 1. Go trough basic block statements list. If there are statement pairs
> that implement copy of bit field content from one memory location to
> another record statements pointers and other necessary data in
> corresponding data structure.
> 2. Identify records that represent adjacent bit field accesses and
> mark them as merged.
> 3. Modify trees accordingly.
>
> New command line option "-ftree-bitfield-merge" is introduced.
>
> Tested - passed gcc regression tests.
Comments inline (sorry for the late reply...)
> Changelog -
>
> gcc/ChangeLog:
> 2013-09-24 Zoran Jovanovic (zoran.jovanovic@imgtec.com)
> * Makefile.in : Added tree-sra.c to GTFILES.
> * common.opt (ftree-bitfield-merge): New option.
> * doc/invoke.texi: Added reference to "-ftree-bitfield-merge".
> * tree-sra.c (ssa_bitfield_merge): New function.
> Entry for (-ftree-bitfield-merge).
> (bitfield_stmt_access_pair_htab_hash): New function.
> (bitfield_stmt_access_pair_htab_eq): New function.
> (cmp_access): New function.
> (create_and_insert_access): New function.
> (get_bit_offset): New function.
> (get_merged_bit_field_size): New function.
> (add_stmt_access_pair): New function.
> * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.
> (field_byte_offset): declaration moved to tree.h, static removed.
> * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
> * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
> * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.
> * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.
> * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.
> (simple_type_size_in_bits): moved from dwarf2out.c.
> * tree.h (expressions_equal_p): declaration added.
> (field_byte_offset): declaration added.
>
> Patch -
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a2e3f7a..54aa8e7 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h
> $(srcdir)/coretypes.h \
> $(srcdir)/asan.c \
> $(srcdir)/ubsan.c \
> $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \
> + $(srcdir)/tree-sra.c \
> @all_gtfiles@
>
> # Compute the list of GT header files from the corresponding C sources,
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 202e169..afac514 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2164,6 +2164,10 @@ ftree-sra
> Common Report Var(flag_tree_sra) Optimization
> Perform scalar replacement of aggregates
>
> +ftree-bitfield-merge
> +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
> +Enable bit-field merge on trees
> +
Drop the tree- prefix for new options, it doesn't tell users anything.
I suggest
fmerge-bitfields
Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
Merge loads and stores of consecutive bitfields
and adjust docs with regarding to the flag name change of course.
> ftree-ter
> Common Report Var(flag_tree_ter) Optimization
> Replace temporary expressions in the SSA->normal pass
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index aa0f4ed..e588cae 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
> -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
> -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
> -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
> --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
> +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
> -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
> -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
> -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
> @@ -7679,6 +7679,11 @@ pointer alignment information.
> This pass only operates on local scalar variables and is enabled by default
> at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled.
>
> +@item -ftree-bitfield-merge
> +@opindex ftree-bitfield-merge
> +Combines several adjacent bit-field accesses that copy values
> +from one memory location to another into one single bit-field access.
> +
> @item -ftree-ccp
> @opindex ftree-ccp
> Perform sparse conditional constant propagation (CCP) on trees. This
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 95049e4..e74db17 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT,
> unsigned int);
> static tree field_type (const_tree);
> static unsigned int simple_type_align_in_bits (const_tree);
> static unsigned int simple_decl_align_in_bits (const_tree);
> -static unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree);
> -static HOST_WIDE_INT field_byte_offset (const_tree);
> static void add_AT_location_description (dw_die_ref, enum
> dwarf_attribute,
> dw_loc_list_ref);
> static void add_data_member_location_attribute (dw_die_ref, tree);
> @@ -10149,25 +10147,6 @@ is_base_type (tree type)
> return 0;
> }
>
> -/* Given a pointer to a tree node, assumed to be some kind of a ..._TYPE
> - node, return the size in bits for the type if it is a constant, or else
> - return the alignment for the type if the type's size is not constant, or
> - else return BITS_PER_WORD if the type actually turns out to be an
> - ERROR_MARK node. */
> -
> -static inline unsigned HOST_WIDE_INT
> -simple_type_size_in_bits (const_tree type)
> -{
> - if (TREE_CODE (type) == ERROR_MARK)
> - return BITS_PER_WORD;
> - else if (TYPE_SIZE (type) == NULL_TREE)
> - return 0;
> - else if (host_integerp (TYPE_SIZE (type), 1))
> - return tree_low_cst (TYPE_SIZE (type), 1);
> - else
> - return TYPE_ALIGN (type);
> -}
> -
> /* Similarly, but return a double_int instead of UHWI. */
>
> static inline double_int
> @@ -14521,7 +14500,7 @@ round_up_to_align (double_int t, unsigned int align)
> because the offset is actually variable. (We can't handle the latter case
> just yet). */
>
> -static HOST_WIDE_INT
> +HOST_WIDE_INT
> field_byte_offset (const_tree decl)
> {
> double_int object_offset_in_bits;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c
> new file mode 100644
> index 0000000..e9e96b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-bitfield-merge -fdump-tree-esra" } */
> +
> +struct S
> +{
> + unsigned f1:7;
> + unsigned f2:9;
> + unsigned f3:3;
> + unsigned f4:5;
> + unsigned f5:1;
> + unsigned f6:2;
> +};
> +
> +unsigned
> +foo (struct S *p1, struct S *p2, int *ptr)
> +{
> + p2->f1 = p1->f1;
> + p2->f2 = p1->f2;
> + p2->f3 = p1->f3;
> + *ptr = 7;
> + p2->f4 = p1->f4;
> + p2->f5 = p1->f5;
> + p2->f6 = p1->f6;
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "19" "esra" } } */
> +/* { dg-final { scan-tree-dump "8" "esra"} } */
That's an awfully unspecific dump scan ;) Please make it more
specific and add a comment what code generation you expect.
> +/* { dg-final { cleanup-tree-dump "esra" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c
> b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c
> new file mode 100644
> index 0000000..653e904
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c
> @@ -0,0 +1,90 @@
> +/* Check whether use of -ftree-bitfield-merge results in presence of overlaping
> + unions results in incorrect code. */
> +/* { dg-options "-O2 -ftree-bitfield-merge" } */
> +/* { dg-do run } */
> +#include <stdio.h>
> +extern void abort (void);
> +
> +struct s1
> +{
> + unsigned f1:4;
> + unsigned f2:4;
> + unsigned f3:4;
> +};
> +
> +struct s2
> +{
> + unsigned char c;
> + unsigned f1:4;
> + unsigned f2:4;
> + unsigned f3:4;
> +};
> +
> +struct s3
> +{
> + unsigned f1:3;
> + unsigned f2:3;
> + unsigned f3:3;
> +};
> +
> +struct s4
> +{
> + unsigned f0:3;
> + unsigned f1:3;
> + unsigned f2:3;
> + unsigned f3:3;
> +};
> +
> +union un_1
> +{
> + struct s1 a;
> + struct s2 b;
> +};
> +
> +union un_2
> +{
> + struct s3 a;
> + struct s4 b;
> +};
> +
> +void f1 (union un_1 *p1, union un_1 *p2)
> +{
> + p2->a.f3 = p1->b.f3;
> + p2->a.f2 = p1->b.f2;
> + p2->a.f1 = p1->b.f1;
> +
> + if (p1->b.f1 != 3)
> + abort ();
> +}
> +
> +void f2 (union un_2 *p1, union un_2 *p2)
> +{
> + p2->b.f1 = p1->a.f1;
> + p2->b.f2 = p1->a.f2;
> + p2->b.f3 = p1->a.f3;
> +
> + if (p2->b.f1 != 0 || p2->b.f2 != 0 || p2->b.f3 != 0)
> + abort ();
> +}
> +
> +int main ()
> +{
> + union un_1 u1;
> + union un_2 u2;
> +
> + u1.b.f1 = 1;
> + u1.b.f2 = 2;
> + u1.b.f3 = 3;
> + u1.b.c = 0;
> +
> + f1 (&u1, &u1);
> +
> + u2.b.f0 = 0;
> + u2.b.f1 = 1;
> + u2.b.f2 = 2;
> + u2.b.f3 = 3;
> +
> + f2 (&u2, &u2);
> +
> + return 0;
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 58c7565..610245b 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -92,6 +92,7 @@ along with GCC; see the file COPYING3. If not see
> #include "gimple-pretty-print.h"
> #include "ipa-inline.h"
> #include "ipa-utils.h"
> +#include "ggc.h"
You shouldn't need that I think.
> /* Enumeration of all aggregate reductions we can do. */
> enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */
> @@ -3424,12 +3425,476 @@ perform_intra_sra (void)
> return ret;
> }
>
> +/* This optimization combines several adjacent bit-field accesses that copy
> + values from one memory location to another into one single bit-field
> + access. */
> +
> +/* Data for single bit-field read/write sequence. */
> +struct GTY (()) bitfield_access_d {
> + gimple load_stmt; /* Bit-field load statement. */
> + gimple store_stmt; /* Bit-field store statement. */
> + unsigned src_offset_words; /* Bit-field offset at src in words. */
> + unsigned src_bit_offset; /* Bit-field offset inside source word. */
> + unsigned src_bit_size; /* Size of bit-field in source word. */
> + unsigned dst_offset_words; /* Bit-field offset at dst in words. */
> + unsigned dst_bit_offset; /* Bit-field offset inside destination
> + word. */
> + unsigned src_field_offset; /* Source field offset. */
> + unsigned dst_bit_size; /* Size of bit-field in destination word. */
> + tree src_addr; /* Address of source memory access. */
> + tree dst_addr; /* Address of destination memory access. */
> + bool merged; /* True if access is merged with another
> + one. */
> + bool modified; /* True if bit-field size is modified. */
> + bool is_barrier; /* True if access is barrier (call or mem
> + access). */
> + struct bitfield_access_d *next; /* Access with which this one is merged. */
> + tree bitfield_type; /* Field type. */
> + tree bitfield_representative; /* Bit field representative
> of original
> + declaration. */
> + tree field_decl_context; /* Context of original bit-field
> + declaration. */
> +};
> +
> +typedef struct bitfield_access_d bitfield_access_o;
> +typedef struct bitfield_access_d *bitfield_access;
> +
> +/* Connecting register with bit-field access sequence that defines value in
> + that register. */
> +struct GTY (()) bitfield_stmt_access_pair_d
> +{
> + gimple stmt;
> + bitfield_access access;
> +};
> +
> +typedef struct bitfield_stmt_access_pair_d bitfield_stmt_access_pair_o;
> +typedef struct bitfield_stmt_access_pair_d *bitfield_stmt_access_pair;
> +
> +static GTY ((param_is (struct bitfield_stmt_access_pair_d)))
> + htab_t bitfield_stmt_access_htab;
Nor does this need to be registered with the garbage collector. Its
lifetime is not greater than that of the SRA pass, right?
> +/* Hash table callbacks for bitfield_stmt_access_htab. */
> +
> +static hashval_t
> +bitfield_stmt_access_pair_htab_hash (const void *p)
> +{
> + const bitfield_stmt_access_pair entry = (const bitfield_stmt_access_pair)p;
> + return (hashval_t) (uintptr_t) entry->stmt;
> +}
> +
> +static int
> +bitfield_stmt_access_pair_htab_eq (const void *p1, const void *p2)
> +{
> + const struct bitfield_stmt_access_pair_d *entry1 =
> + (const bitfield_stmt_access_pair)p1;
> + const struct bitfield_stmt_access_pair_d *entry2 =
> + (const bitfield_stmt_access_pair)p2;
> + return entry1->stmt == entry2->stmt;
> +}
> +
> +/* Counter used for generating unique names for new fields. */
> +static unsigned new_field_no;
Must be stale...?
> +/* Compare two bit-field access records. */
> +
> +static int
> +cmp_access (const void *p1, const void *p2)
> +{
> + const bitfield_access a1 = (*(const bitfield_access*)p1);
> + const bitfield_access a2 = (*(const bitfield_access*)p2);
> +
> + if (a1->bitfield_representative - a2->bitfield_representative)
> + return a1->bitfield_representative - a2->bitfield_representative;
Subtracting two unrelated pointers is undefined - I suggest you
use DECL_UID (a1->bitfield_representative).
> + if (!expressions_equal_p (a1->src_addr, a2->src_addr))
> + return a1 - a2;
Same - the comparison ends up depending on memory layout, that's bad.
> +
> + if (!expressions_equal_p (a1->dst_addr, a2->dst_addr))
> + return a1 - a2;
> + if (a1->src_offset_words - a2->src_offset_words)
> + return a1->src_offset_words - a2->src_offset_words;
> +
> + return a1->src_bit_offset - a2->src_bit_offset;
> +}
> +
> +/* Create new bit-field access structure and add it to given bitfield_accesses
> + htab. */
> +
> +static bitfield_access
> +create_and_insert_access (vec<bitfield_access>
> + *bitfield_accesses)
> +{
> + bitfield_access access = ggc_alloc_bitfield_access_d ();
> + memset (access, 0, sizeof (struct bitfield_access_d));
> + bitfield_accesses->safe_push (access);
> + return access;
> +}
> +
> +/* Slightly modified add_bit_offset_attribute from dwarf2out.c. */
> +
> +static inline HOST_WIDE_INT
> +get_bit_offset (tree decl)
> +{
> + tree type = DECL_BIT_FIELD_TYPE (decl);
> + HOST_WIDE_INT bitpos_int;
> +
> + /* Must be a field and a bit-field. */
> + gcc_assert (type && TREE_CODE (decl) == FIELD_DECL);
> + /* Bit position and decl size should be integer constants that can be
> + represented in a signle HOST_WIDE_INT. */
> + if (! host_integerp (bit_position (decl), 0)
> + || ! host_integerp (DECL_SIZE (decl), 1))
> + return -1;
> +
> + bitpos_int = int_bit_position (decl);
> + return bitpos_int;
> +}
> +
> +/* Returns size of combined bitfields. Size cannot be larger than size
> + of largest directly accessible memory unit. */
> +
> +static int
> +get_merged_bit_field_size (bitfield_access access)
> +{
> + bitfield_access tmp_access = access;
> + int size = 0;
> +
> + while (tmp_access)
> + {
> + size += tmp_access->src_bit_size;
> + tmp_access = tmp_access->next;
> + }
> + return size;
> +}
> +
> +/* Adds new pair consisting of statement and bit-field access structure that
> + contains it. */
> +
> +static bool add_stmt_access_pair (bitfield_access access, gimple stmt)
> +{
> + bitfield_stmt_access_pair new_pair;
> + void **slot;
> + if (!bitfield_stmt_access_htab)
> + bitfield_stmt_access_htab =
> + htab_create_ggc (128, bitfield_stmt_access_pair_htab_hash,
> + bitfield_stmt_access_pair_htab_eq, NULL);
> + new_pair = ggc_alloc_bitfield_stmt_access_pair_o ();
> + new_pair->stmt = stmt;
> + new_pair->access = access;
> + slot = htab_find_slot (bitfield_stmt_access_htab, new_pair, INSERT);
> + if (*slot == HTAB_EMPTY_ENTRY)
> + {
> + *slot = new_pair;
> + return true;
> + }
> + return false;
> +}
> +
> +/* Returns true if given COMPONENT_REF is part of an union. */
> +
> +static bool part_of_union_p (tree component)
> +{
> + tree tmp = component;
> + bool res = false;
> + while (TREE_CODE (tmp) == COMPONENT_REF)
> + {
> + if (TREE_CODE (TREE_TYPE (tmp)) == UNION_TYPE)
> + {
> + res = true;
> + break;
> + }
> + tmp = TREE_OPERAND (tmp, 0);
> + }
> + if (tmp && (TREE_CODE (TREE_TYPE (tmp)) == UNION_TYPE))
> + res = true;
> + return res;
> +}
> +
> +/* Main entry point for the bit-field merge optimization. */
Ok, I'm skipping to here (I was wondering what you need all the above
functions for)
> +static unsigned int
> +ssa_bitfield_merge (void)
> +{
> + basic_block bb;
> + unsigned int todoflags = 0;
> + vec<bitfield_access> bitfield_accesses;
> + int ix, iy;
> + bitfield_access access;
> + bool cfg_changed = false;
> +
> + /* In the strict volatile bitfields case, doing code changes here may prevent
> + other optimizations, in particular in a SLOW_BYTE_ACCESS setting. */
> + if (flag_strict_volatile_bitfields > 0)
> + return 0;
Hmm, I'm not sure we should care ... - but well, I don't care ;)
> + FOR_EACH_BB (bb)
> + {
> + gimple_stmt_iterator gsi;
> + vec<bitfield_access> bitfield_accesses_merge = vNULL;
> + tree prev_representative = NULL_TREE;
> + bitfield_accesses.create (0);
> +
> + /* Identify all bitfield copy sequences in the basic-block. */
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
> + {
> + gimple stmt = gsi_stmt (gsi);
> + tree lhs, rhs;
> + void **slot;
> + struct bitfield_stmt_access_pair_d asdata;
> +
> + if (!is_gimple_assign (stmt))
Instead of checking TREE_THIS_VOLATILE below check here
|| gimple_has_volatile_ops (stmt)
also you can narrow the assigns to visit by checking for
!is_gimple_assign_single (stmt)
instead
> + {
> + gsi_next (&gsi);
> + continue;
> + }
> +
> + lhs = gimple_assign_lhs (stmt);
> + rhs = gimple_assign_rhs1 (stmt);
> +
> + if (TREE_CODE (rhs) == COMPONENT_REF)
> + {
> + use_operand_p use;
> + gimple use_stmt;
> + tree op0 = TREE_OPERAND (rhs, 0);
> + tree op1 = TREE_OPERAND (rhs, 1);
> +
> + if (TREE_CODE (op1) == FIELD_DECL && DECL_BIT_FIELD_TYPE (op1)
op1 is always a FIELD_DECL
> + && !TREE_THIS_VOLATILE (op1) && !part_of_union_p (rhs))
I wonder what's the issue with unions ... sure, for
union {
int field1 : 3;
int field2 : 2;
};
but for
union {
struct {
int field1 : 3;
int field2 : 2;
} a;
...
};
? Thus I'd simply check
&& TREE_CODE (DECL_CONTEXT (op1)) != UNION_TYPE
&& TREE_CODE (DECL_CONTEXT (op1)) != QUAL_UNION_TYPE
maybe introduce
#define UNION_TYPE_P(TYPE)
(TREE_CODE (TYPE) == UNION_TYPE \
|| TREE_CODE (TYPE) == QUAL_UNION_TYPE)
alongside RECORD_OR_UNION_TYPE_P in tree.h.
> + {
> + if (single_imm_use (lhs, &use, &use_stmt)
> + && is_gimple_assign (use_stmt))
> + {
> + tree use_lhs = gimple_assign_lhs (use_stmt);
> + if (TREE_CODE (use_lhs) == COMPONENT_REF)
I'm not sure I follow the logic here, but it seems that you match
a very specific pattern only - a bitfield copy.
I'd have applied the lowering (this is really a lowering pass
with a cost model you make up here) if the same
DECL_BIT_FIELD_REPRESENTATIVE is used more than once in a BB
on the same underlying base object. That is, we are reasonably
sure we can remove a redundant load and eventually redundant stores.
Thus, very simplistic you'd just record a op0,
DECL_BIT_FIELD_REPRESENTATIVE pair into the hashtable, using
iterative_hash_expr for op0 and DECL_UID for the representative,
counting the number of times you see them. Make sure to apply
the same for stores to bitfields, of course.
> + {
> + tree use_op0 = TREE_OPERAND (use_lhs, 0);
> + tree use_op1 = TREE_OPERAND (use_lhs, 1);
> + tree tmp_repr = DECL_BIT_FIELD_REPRESENTATIVE (op1);
> +
> + if (TREE_CODE (use_op1) == FIELD_DECL
> + && DECL_BIT_FIELD_TYPE (use_op1)
> + && !TREE_THIS_VOLATILE (use_op1))
> + {
> + if (prev_representative
> + && (prev_representative != tmp_repr))
> + {
> + /* If previous access has different
> + representative then barrier is needed
> + between it and new access. */
> + access = create_and_insert_access
> + (&bitfield_accesses);
> + access->is_barrier = true;
> + }
> + prev_representative = tmp_repr;
> + /* Create new bit-field access structure. */
> + access = create_and_insert_access
> + (&bitfield_accesses);
> + /* Collect access data - load instruction. */
> + access->src_bit_size = tree_low_cst
> + (DECL_SIZE (op1), 1);
> + access->src_bit_offset = get_bit_offset (op1);
> + access->src_offset_words =
> + field_byte_offset (op1) / UNITS_PER_WORD;
> + access->src_field_offset =
> + tree_low_cst (DECL_FIELD_OFFSET (op1),1);
> + access->src_addr = op0;
> + access->load_stmt = gsi_stmt (gsi);
> + /* Collect access data - store instruction. */
> + access->dst_bit_size =
> + tree_low_cst (DECL_SIZE (use_op1), 1);
> + access->dst_bit_offset =
> + get_bit_offset (use_op1);
> + access->dst_offset_words =
> + field_byte_offset (use_op1) / UNITS_PER_WORD;
> + access->dst_addr = use_op0;
> + access->store_stmt = use_stmt;
> + add_stmt_access_pair (access, stmt);
> + add_stmt_access_pair (access, use_stmt);
> + access->bitfield_type
> + = DECL_BIT_FIELD_TYPE (use_op1);
> + access->bitfield_representative = tmp_repr;
> + access->field_decl_context =
> + DECL_FIELD_CONTEXT (op1);
> + }
> + }
> + }
> + }
> + }
> +
> + /* Insert barrier for merging if statement is function call or memory
> + access. */
> + if (bitfield_stmt_access_htab)
> + {
> + asdata.stmt = stmt;
> + slot
> + = htab_find_slot (bitfield_stmt_access_htab, &asdata,
> + NO_INSERT);
> + if (!slot
> + && ((gimple_code (stmt) == GIMPLE_CALL)
> + || (gimple_has_mem_ops (stmt))))
> + {
> + /* Create new bit-field access structure. */
> + access = create_and_insert_access (&bitfield_accesses);
> + /* Mark it as barrier. */
> + access->is_barrier = true;
> + }
> + }
> + gsi_next (&gsi);
> + }
> +
> + /* If there are no at least two accesses go to the next basic block. */
> + if (bitfield_accesses.length () <= 1)
> + {
> + bitfield_accesses.release ();
> + continue;
> + }
> +
> + /* Find bit-field accesses that can be merged. */
> + for (ix = 0; bitfield_accesses.iterate (ix, &access); ix++)
> + {
> + bitfield_access head_access;
> + bitfield_access mrg_access;
> + bitfield_access prev_access;
> + if (!bitfield_accesses_merge.exists ())
> + bitfield_accesses_merge.create (0);
> +
> + bitfield_accesses_merge.safe_push (access);
> +
> + if (!access->is_barrier
> + && !(access == bitfield_accesses.last ()
> + && !bitfield_accesses_merge.is_empty ()))
> + continue;
> +
> + bitfield_accesses_merge.qsort (cmp_access);
> +
> + head_access = prev_access = NULL;
> + for (iy = 0; bitfield_accesses_merge.iterate (iy, &mrg_access); iy++)
> + {
> + if (head_access
> + && expressions_equal_p (head_access->src_addr,
> + mrg_access->src_addr)
> + && expressions_equal_p (head_access->dst_addr,
> + mrg_access->dst_addr)
> + && prev_access->src_offset_words
> + == mrg_access->src_offset_words
> + && prev_access->dst_offset_words
> + == mrg_access->dst_offset_words
> + && prev_access->src_bit_offset + prev_access->src_bit_size
> + == mrg_access->src_bit_offset
> + && prev_access->dst_bit_offset + prev_access->dst_bit_size
> + == mrg_access->dst_bit_offset
> + && prev_access->bitfield_representative
> + == mrg_access->bitfield_representative)
> + {
> + /* Merge conditions are satisfied - merge accesses. */
> + mrg_access->merged = true;
> + prev_access->next = mrg_access;
> + head_access->modified = true;
> + prev_access = mrg_access;
> + }
> + else
> + head_access = prev_access = mrg_access;
> + }
> + bitfield_accesses_merge.release ();
> + bitfield_accesses_merge = vNULL;
> + }
> +
> + /* Modify generated code. */
Ick, so you are actually applying an optimization instead of just
lowering the accesses to DECL_BIT_FIELD_REPRESENTATIVE accesses
and letting CSE and DSE do their job. Hmm. I don't think this is
a good idea.
Instead what I'd like to see is more something like the following
(bah, I never merged BIT_FIELD_EXPR which performs the bit-field-insert
in a single stmt); quickly hacked together, without a cost model,
just the lowering piece (likely not endian clean - but who knows).
Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c (revision 204561)
+++ gcc/tree-sra.c (working copy)
@@ -3445,10 +3445,121 @@ perform_intra_sra (void)
return ret;
}
+static void
+lower_bitfields (void)
+{
+ basic_block bb;
+ FOR_EACH_BB (bb)
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gimple stmt = gsi_stmt (gsi);
+ if (!gimple_assign_single_p (stmt)
+ || gimple_has_volatile_ops (stmt))
+ continue;
+
+ /* Lower a bitfield read. */
+ tree ref = gimple_assign_rhs1 (stmt);
+ if (TREE_CODE (ref) == COMPONENT_REF
+ && DECL_BIT_FIELD_TYPE (TREE_OPERAND (ref, 1))
+ && DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (ref, 1)))
+ {
+ tree field = TREE_OPERAND (ref, 1);
+ tree rep = DECL_BIT_FIELD_REPRESENTATIVE (field);
+ unsigned HOST_WIDE_INT off;
+ if (host_integerp (DECL_FIELD_OFFSET (field), 1)
+ && host_integerp (DECL_FIELD_OFFSET (rep), 1))
+ off = (tree_low_cst (DECL_FIELD_OFFSET (field), 1)
+ - tree_low_cst (DECL_FIELD_OFFSET (rep), 1)) * BITS_PER_UNIT;
+ else
+ off = 0;
+ off += (tree_low_cst (DECL_FIELD_BIT_OFFSET (field), 1)
+ - tree_low_cst (DECL_FIELD_BIT_OFFSET (rep), 1));
+ tree loadres = make_ssa_name (TREE_TYPE (rep), NULL);
+ gimple load
+ = gimple_build_assign (loadres,
+ build3 (COMPONENT_REF, TREE_TYPE (rep),
+ TREE_OPERAND (ref, 0), rep,
+ NULL_TREE));
+ gimple_set_vuse (load, gimple_vuse (stmt));
+ gsi_insert_before (&gsi, load, GSI_SAME_STMT);
+ gimple_assign_set_rhs1 (stmt,
+ build3 (BIT_FIELD_REF, TREE_TYPE (ref),
+ loadres,
+ DECL_SIZE (field),
+ bitsize_int (off)));
+ update_stmt (stmt);
+ }
+ ref = gimple_assign_lhs (stmt);
+ if (TREE_CODE (ref) == COMPONENT_REF
+ && DECL_BIT_FIELD_TYPE (TREE_OPERAND (ref, 1))
+ && DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (ref, 1)))
+ {
+ tree field = TREE_OPERAND (ref, 1);
+ tree rep = DECL_BIT_FIELD_REPRESENTATIVE (field);
+ unsigned HOST_WIDE_INT off;
+ if (host_integerp (DECL_FIELD_OFFSET (field), 1)
+ && host_integerp (DECL_FIELD_OFFSET (rep), 1))
+ off = (tree_low_cst (DECL_FIELD_OFFSET (field), 1)
+ - tree_low_cst (DECL_FIELD_OFFSET (rep), 1)) * BITS_PER_UNIT;
+ else
+ off = 0;
+ off += (tree_low_cst (DECL_FIELD_BIT_OFFSET (field), 1)
+ - tree_low_cst (DECL_FIELD_BIT_OFFSET (rep), 1));
+ tree loadres = make_ssa_name (TREE_TYPE (rep), NULL);
+ gimple load
+ = gimple_build_assign (loadres,
+ build3 (COMPONENT_REF, TREE_TYPE (rep),
+ unshare_expr (TREE_OPERAND (ref, 0)),
+ rep,
+ NULL_TREE));
+ gimple_set_vuse (load, gimple_vuse (stmt));
+ gsi_insert_before (&gsi, load, GSI_SAME_STMT);
+ /* FIXME: BIT_FIELD_EXPR. */
+ /* Mask out bits. */
+ tree masked = make_ssa_name (TREE_TYPE (rep), NULL);
+ gimple tems
+ = gimple_build_assign_with_ops (BIT_AND_EXPR,
+ masked, loadres,
+ double_int_to_tree (TREE_TYPE (rep),
+ double_int::mask (TREE_INT_CST_LOW (DECL_SIZE (field))).lshift (off)));
+ gsi_insert_before (&gsi, tems, GSI_SAME_STMT);
+ /* Zero-extend the value to representative size. */
+ tree tem2 = make_ssa_name (unsigned_type_for (TREE_TYPE (field)), NULL);
+ tems = gimple_build_assign_with_ops (NOP_EXPR, tem2,
+ gimple_assign_rhs1 (stmt),
+ NULL_TREE);
+ gsi_insert_before (&gsi, tems, GSI_SAME_STMT);
+ tree tem = make_ssa_name (TREE_TYPE (rep), NULL);
+ tems = gimple_build_assign_with_ops (NOP_EXPR, tem, tem2, NULL_TREE);
+ gsi_insert_before (&gsi, tems, GSI_SAME_STMT);
+ /* Shift the value into place. */
+ tem2 = make_ssa_name (TREE_TYPE (rep), NULL);
+ tems = gimple_build_assign_with_ops (LSHIFT_EXPR, tem2, tem,
+ size_int (off));
+ gsi_insert_before (&gsi, tems, GSI_SAME_STMT);
+ /* Merge masked loaded value and value. */
+ tree modres = make_ssa_name (TREE_TYPE (rep), NULL);
+ gimple mod
+ = gimple_build_assign_with_ops (BIT_IOR_EXPR, modres,
+ masked, tem2);
+ gsi_insert_before (&gsi, mod, GSI_SAME_STMT);
+ /* Finally adjust the store. */
+ gimple_assign_set_rhs1 (stmt, modres);
+ gimple_assign_set_lhs (stmt,
+ build3 (COMPONENT_REF, TREE_TYPE (rep),
+ TREE_OPERAND (ref, 0), rep,
+ NULL_TREE));
+ update_stmt (stmt);
+ }
+ }
+}
+
/* Perform early intraprocedural SRA. */
static unsigned int
early_intra_sra (void)
{
+ lower_bitfields ();
sra_mode = SRA_MODE_EARLY_INTRA;
return perform_intra_sra ();
}
The idea is that this lowering then makes ESRA able to handle
it (you can see that for example in the result from
gcc.c-torture/execute/20000113-1.c which is miscompiled by
the above, eh ... what did I say about not testing it??).
I'll try to get the above working and cleaned up a bit today
and maybe early next week. So stay tuned, I'll hand it over
to make you test if it works as advertised for you.
Richard.
> + for (ix = 0; bitfield_accesses.iterate (ix, &access); ix++)
> + {
> + if (access->merged)
> + {
> + /* Access merged - remove instructions. */
> + gimple_stmt_iterator tmp_gsi;
> + tmp_gsi = gsi_for_stmt (access->load_stmt);
> + gsi_remove (&tmp_gsi, true);
> + tmp_gsi = gsi_for_stmt (access->store_stmt);
> + gsi_remove (&tmp_gsi, true);
> + }
> + else if (access->modified)
> + {
> + /* Access modified - modify generated code. */
> + gimple_stmt_iterator tmp_gsi;
> + tree tmp_ssa;
> + tree itype = make_node (INTEGER_TYPE);
> + tree new_rhs;
> + tree new_lhs;
> + gimple new_stmt;
> + char new_field_name [15];
> + int decl_size;
> +
> + /* Bit-field size changed - modify load statement. */
> + access->src_bit_size = get_merged_bit_field_size (access);
> +
> + TYPE_PRECISION (itype) = access->src_bit_size;
> + fixup_unsigned_type (itype);
> +
> + /* Create new declaration. */
> + tree new_field = make_node (FIELD_DECL);
> + sprintf (new_field_name, "_field%x", new_field_no++);
> + DECL_NAME (new_field) = get_identifier (new_field_name);
> + TREE_TYPE (new_field) = itype;
> + DECL_BIT_FIELD (new_field) = 1;
> + DECL_BIT_FIELD_TYPE (new_field) = access->bitfield_type;
> + DECL_BIT_FIELD_REPRESENTATIVE (new_field) =
> + access->bitfield_representative;
> + DECL_FIELD_CONTEXT (new_field) = access->field_decl_context;
> + DECL_NONADDRESSABLE_P (new_field) = 1;
> + DECL_FIELD_OFFSET (new_field) =
> + build_int_cst (unsigned_type_node, access->src_field_offset);
> + DECL_FIELD_BIT_OFFSET (new_field) =
> + build_int_cst (unsigned_type_node, access->src_bit_offset);
> + DECL_SIZE (new_field) = build_int_cst (unsigned_type_node,
> + access->src_bit_size);
> + decl_size = access->src_bit_size / BITS_PER_UNIT
> + + (access->src_bit_size % BITS_PER_UNIT ? 1 : 0);
> + DECL_SIZE_UNIT (new_field) =
> + build_int_cst (unsigned_type_node, decl_size);
> +
> + tmp_ssa = make_ssa_name (create_tmp_var (itype, NULL), NULL);
> +
> + /* Create new component ref. */
> + new_rhs = build3 (COMPONENT_REF, itype, access->src_addr,
> + new_field, NULL);
> + tmp_gsi = gsi_for_stmt (access->load_stmt);
> + new_stmt = gimple_build_assign (tmp_ssa, new_rhs);
> + gsi_insert_after (&tmp_gsi, new_stmt, GSI_SAME_STMT);
> + SSA_NAME_DEF_STMT (tmp_ssa) = new_stmt;
> + gsi_remove (&tmp_gsi, true);
> +
> + /* Bit-field size changed - modify store statement. */
> + /* Create new component ref. */
> + new_lhs = build3 (COMPONENT_REF, itype, access->dst_addr,
> + new_field, NULL);
> + new_stmt = gimple_build_assign (new_lhs, tmp_ssa);
> + tmp_gsi = gsi_for_stmt (access->store_stmt);
> + gsi_insert_after (&tmp_gsi, new_stmt, GSI_SAME_STMT);
> + gsi_remove (&tmp_gsi, true);
> + cfg_changed = true;
> + }
> + }
> + /* Empty or delete data structures used for basic block. */
> + htab_empty (bitfield_stmt_access_htab);
> + bitfield_stmt_access_htab = NULL;
> + bitfield_accesses.release ();
> + }
> +
> + if (cfg_changed)
> + todoflags |= TODO_cleanup_cfg;
> +
> + return todoflags;
> +}
> +
> /* Perform early intraprocedural SRA. */
> static unsigned int
> early_intra_sra (void)
> {
> + unsigned int todoflags = 0;
> sra_mode = SRA_MODE_EARLY_INTRA;
> - return perform_intra_sra ();
> + if (flag_tree_bitfield_merge)
> + todoflags = ssa_bitfield_merge ();
> + return todoflags | perform_intra_sra ();
> }
>
> /* Perform "late" intraprocedural SRA. */
> @@ -5105,3 +5570,5 @@ make_pass_early_ipa_sra (gcc::context *ctxt)
> {
> return new pass_early_ipa_sra (ctxt);
> }
> +
> +#include "gt-tree-sra.h"
> diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
> index bd2feb4..683fd76 100644
> --- a/gcc/tree-ssa-sccvn.c
> +++ b/gcc/tree-ssa-sccvn.c
> @@ -4176,29 +4176,6 @@ get_next_value_id (void)
> return next_value_id++;
> }
>
> -
> -/* Compare two expressions E1 and E2 and return true if they are equal. */
> -
> -bool
> -expressions_equal_p (tree e1, tree e2)
> -{
> - /* The obvious case. */
> - if (e1 == e2)
> - return true;
> -
> - /* If only one of them is null, they cannot be equal. */
> - if (!e1 || !e2)
> - return false;
> -
> - /* Now perform the actual comparison. */
> - if (TREE_CODE (e1) == TREE_CODE (e2)
> - && operand_equal_p (e1, e2, OEP_PURE_SAME))
> - return true;
> -
> - return false;
> -}
> -
> -
> /* Return true if the nary operation NARY may trap. This is a copy
> of stmt_could_throw_1_p adjusted to the SCCVN IL. */
>
> diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
> index 94e3603..707b18c 100644
> --- a/gcc/tree-ssa-sccvn.h
> +++ b/gcc/tree-ssa-sccvn.h
> @@ -21,10 +21,6 @@
> #ifndef TREE_SSA_SCCVN_H
> #define TREE_SSA_SCCVN_H
>
> -/* In tree-ssa-sccvn.c */
> -bool expressions_equal_p (tree, tree);
> -
> -
> /* TOP of the VN lattice. */
> extern tree VN_TOP;
>
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 6593cf8..6683957 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -12155,4 +12155,44 @@ contains_bitfld_component_ref_p (const_tree ref)
> return false;
> }
>
> +/* Compare two expressions E1 and E2 and return true if they are equal. */
> +
> +bool
> +expressions_equal_p (tree e1, tree e2)
> +{
> + /* The obvious case. */
> + if (e1 == e2)
> + return true;
> +
> + /* If only one of them is null, they cannot be equal. */
> + if (!e1 || !e2)
> + return false;
> +
> + /* Now perform the actual comparison. */
> + if (TREE_CODE (e1) == TREE_CODE (e2)
> + && operand_equal_p (e1, e2, OEP_PURE_SAME))
> + return true;
> +
> + return false;
> +}
> +
> +/* Given a pointer to a tree node, assumed to be some kind of a ..._TYPE
> + node, return the size in bits for the type if it is a constant, or else
> + return the alignment for the type if the type's size is not constant, or
> + else return BITS_PER_WORD if the type actually turns out to be an
> + ERROR_MARK node. */
> +
> +unsigned HOST_WIDE_INT
> +simple_type_size_in_bits (const_tree type)
> +{
> + if (TREE_CODE (type) == ERROR_MARK)
> + return BITS_PER_WORD;
> + else if (TYPE_SIZE (type) == NULL_TREE)
> + return 0;
> + else if (host_integerp (TYPE_SIZE (type), 1))
> + return tree_low_cst (TYPE_SIZE (type), 1);
> + else
> + return TYPE_ALIGN (type);
> +}
> +
> #include "gt-tree.h"
> diff --git a/gcc/tree.h b/gcc/tree.h
> index a263a2c..b2bd481 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4528,6 +4528,7 @@ extern tree get_ref_base_and_extent (tree,
> HOST_WIDE_INT *,
> HOST_WIDE_INT *, HOST_WIDE_INT *);
> extern bool contains_bitfld_component_ref_p (const_tree);
> extern bool type_in_anonymous_namespace_p (tree);
> +extern bool expressions_equal_p (tree e1, tree e2);
>
> /* In tree-nested.c */
> extern tree build_addr (tree, tree);
> @@ -4693,6 +4694,10 @@ extern tree resolve_asm_operand_names (tree,
> tree, tree, tree);
> extern tree tree_overlaps_hard_reg_set (tree, HARD_REG_SET *);
> #endif
>
> +/* In dwarf2out.c. */
> +HOST_WIDE_INT
> +field_byte_offset (const_tree decl);
> +
> ?
> /* In tree-inline.c */
>
> @@ -4979,5 +4984,6 @@ builtin_decl_implicit_p (enum built_in_function fncode)
> #endif /* NO_DOLLAR_IN_LABEL */
> #endif /* NO_DOT_IN_LABEL */
>
> +extern unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree type);
>
> #endif /* GCC_TREE_H */
>
>
> Regards,
> Zoran Jovanovic
>
>
--
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend