This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH 6/6] Make SRA replace constant-pool loads
- From: Alan Lawrence <alan dot lawrence at arm dot com>
- To: gcc-patches at gcc dot gnu dot org
- Date: Thu, 29 Oct 2015 19:18:22 +0000
- Subject: [PATCH 6/6] Make SRA replace constant-pool loads
- Authentication-results: sourceware.org; auth=none
- References: <1446146302-17002-1-git-send-email-alan dot lawrence at arm dot com>
This has changed quite a bit since the previous revision
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
and specifically Ada on ARM.
I didn't find a good alternative to scanning for constant-pool accesses "as we
go" through the function, and although I didn't find any of these accesses
being disqualified after we'd found them in C, some awkward constant-pool
expressions in Ada forced me to disqualify them for a new reason (inability to
constant-propagate). Hence, I introduced a new bitmap of
disqualified_constants, rather than an additional pass through the function.
Next, the approach of using a replacement_expr (with tho constant value) in
place of the old replacement_decl, also failed (the decl could be used in an
expression where substituting in the expr produced ill-formed gimple). Hence,
constant-pool entries now use replacement_decls just like everything else, for
which I generate initializers in initialize_constant_pool_replacements.
However, I was able to avoid generating the replacement constants early, and to
do so late in initialize_constant_pool_replacements; the main trick here was to
use fold_array_ctor_reference, kindly pointed out by Richie :), to fold the
MEM_REFs built in analyze_access_subtree.
Finally, I found completely_scalarize was still failing to fold all the
constant expressions, because Ada was putting VIEW_CONVERT_EXPRs in the
constant pool, and fold-const.c did not deal with ARRAY_REFs of these. However,
requiring completely_scalarize to be able to fold everything, seemed fragile,
instead I decoupled from fold-const by allowing to fail.
With these changes, ssa-dom-cse-2.c is fixed on all platforms with appropriate
--param.
There are still a few remaining test failures, on AArch64:
gcc.dg/guality/pr54970.c -O1 line 15 a[0] == 1
gcc.dg/guality/pr54970.c -O1 line 20 a[0] == 1
gcc.dg/guality/pr54970.c -O1 line 25 a[0] == 1
(also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os).
...which I'm working on. I also tested with the hack at https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html . This revealed one new failure in advsimd-intrinsics/vldX.c on AArch64 (fixed by https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02777.html), and fixed a bunch of guality failures on ARM:
gcc.dg/guality/pr54970.c -O1 line 31 a[0] == 4
gcc.dg/guality/pr54970.c -O1 line 36 a[0] == 4
gcc.dg/guality/pr54970.c -O1 line 45 a[0] == 4
gcc.dg/guality/pr54970.c -O1 line 45 p[-2] == 4
gcc.dg/guality/pr54970.c -O1 line 45 q[-1] == 4
(also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os)
--Alan
gcc/ChangeLog:
* tree-sra.c (disqualified_constants): New.
(sra_initialize): Initialize it.
(sra_deinitialize): Deallocate it.
(disqualify_candidate): Set bit in disqualified_constants.
(subst_constant_pool_initial): New.
(create_access): Scan for constant-pool entries as we go.
(scalarizable_type_p): Disallow types containing placeholders.
(completely_scalarize): Return bool to allow failure.
(scalarize_elem): Likewise; check we can generate constant replacements.
(maybe_add_sra_candidate): Allow constant-pool entries.
(analyze_access_subtree): Checking-assert that we can fold any refs
built for constant-pool entries.
(analyze_all_variable_accesses): Deal with completely_scalarize failing.
(initialize_constant_pool_replacements): New.
(sra_modify_function_body): Call previous.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add sra-max-scalarization-size param,
remove xfail.
---
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 7 +-
gcc/tree-sra.c | 221 ++++++++++++++++++++++++--
2 files changed, 208 insertions(+), 20 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 9eccdc9..b13d583 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
int
foo ()
@@ -17,7 +17,4 @@ foo ()
/* After late unrolling the above loop completely DOM should be
able to optimize this to return 28. */
-/* See PR63679 and PR64159, if the target forces the initializer to memory then
- DOM is not able to perform this optimization. */
-
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 358db79..1920265 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -90,6 +90,7 @@ along with GCC; see the file COPYING3. If not see
#include "gimple-iterator.h"
#include "gimplify-me.h"
#include "gimple-walk.h"
+#include "gimple-fold.h"
#include "tree-cfg.h"
#include "flags.h"
#include "insn-config.h"
@@ -336,6 +337,10 @@ candidate (unsigned uid)
those which cannot be (because they are and need be used as a whole). */
static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
+/* Bitmap of candidates in the constant pool, which cannot be scalarized
+ because this would produce non-constant expressions (e.g. Ada). */
+static bitmap disqualified_constants;
+
/* Obstack for creation of fancy names. */
static struct obstack name_obstack;
@@ -658,6 +663,7 @@ sra_initialize (void)
(vec_safe_length (cfun->local_decls) / 2);
should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
+ disqualified_constants = BITMAP_ALLOC (NULL);
gcc_obstack_init (&name_obstack);
base_access_vec = new hash_map<tree, auto_vec<access_p> >;
memset (&sra_stats, 0, sizeof (sra_stats));
@@ -676,6 +682,7 @@ sra_deinitialize (void)
candidates = NULL;
BITMAP_FREE (should_scalarize_away_bitmap);
BITMAP_FREE (cannot_scalarize_away_bitmap);
+ BITMAP_FREE (disqualified_constants);
access_pool.release ();
assign_link_pool.release ();
obstack_free (&name_obstack, NULL);
@@ -690,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
{
if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
candidates->remove_elt_with_hash (decl, DECL_UID (decl));
+ if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
+ bitmap_set_bit (disqualified_constants, DECL_UID (decl));
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -841,8 +850,76 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
return access;
}
+/* Given EXPR a chain of COMPONENT_REFs and/or ARRAY_REFs built around a
+ constant-pool declaration VAR, builds a tree like EXPR but with VAR replaced
+ by its DECL_INITIAL, and tries to fold it to a constant. If folding succeeds
+ then return the folded tree, else NULL_TREE. */
+
+static tree
+subst_constant_pool_initial (tree expr, tree var)
+{
+ if (TREE_CODE (expr) == VAR_DECL)
+ {
+ gcc_assert (DECL_IN_CONSTANT_POOL (expr));
+ gcc_assert (expr == var);
+ return DECL_INITIAL (expr);
+ }
+ if (TREE_CODE (expr) == COMPONENT_REF)
+ {
+ tree field = TREE_OPERAND (expr, 1);
+ gcc_assert (TREE_CODE (field) == FIELD_DECL);
+ gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+ tree record = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
+ if (!record)
+ return NULL_TREE;
+ tree ref = build3 (COMPONENT_REF, TREE_TYPE (expr),
+ record, field, NULL_TREE);
+ tree res = fold (ref);
+ if (res != ref)
+ return res;
+ }
+ else if (TREE_CODE (expr) == ARRAY_REF)
+ {
+ tree array = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
+ if (!array)
+ return NULL_TREE;
+ tree ref = build4 (ARRAY_REF, TREE_TYPE (expr),
+ array,
+ TREE_OPERAND (expr, 1),
+ TREE_OPERAND (expr, 2),
+ TREE_OPERAND (expr, 3));
+ /* Ada (at least) builds ARRAY_REFs around strings. */
+ if (tree folded = fold_read_from_constant_string (ref))
+ return folded;
+ tree res = fold (ref);
+ if (res != ref)
+ return res;
+ }
+ else if (TREE_CODE (expr) == MEM_REF)
+ {
+ gcc_assert (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR);
+ tree type = TREE_TYPE (expr);
+ tree sz = TYPE_SIZE (TREE_TYPE (expr));
+ tree val = TREE_OPERAND (TREE_OPERAND (expr, 0), 0);
+ val = subst_constant_pool_initial (val, var);
+ if (TREE_CODE (val) != CONSTRUCTOR
+ || !tree_fits_uhwi_p (TREE_OPERAND (expr, 1))
+ || !tree_fits_uhwi_p (sz))
+ return NULL_TREE;
+ unsigned HOST_WIDE_INT offset =
+ tree_to_uhwi (TREE_OPERAND (expr, 1)) * BITS_PER_UNIT;
+ return fold_ctor_reference (type, val, offset, tree_to_uhwi (sz), var);
+ }
+
+ /* Unknown expression, or could not constant-fold. */
+ return NULL_TREE;
+}
+
+static bool maybe_add_sra_candidate (tree var);
+
/* Create and insert access for EXPR. Return created access, or NULL if it is
- not possible. */
+ not possible. Also scan for uses of constant pool as we go along and add
+ to candidates. */
static struct access *
create_access (tree expr, gimple *stmt, bool write)
@@ -865,6 +942,32 @@ create_access (tree expr, gimple *stmt, bool write)
else
ptr = false;
+ /* For constant-pool entries, check we can substitute the constant value. */
+ if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
+ && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
+ {
+ gcc_assert (!write);
+ if (bitmap_bit_p (disqualified_constants, DECL_UID (base)))
+ return NULL;
+ tree repl = subst_constant_pool_initial (expr, base);
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Trying to scalarize constant-pool ");
+ print_generic_expr (dump_file, base, 0);
+ fprintf (dump_file, "\n In expression: ");
+ print_generic_expr (dump_file, expr, 0);
+ fprintf (dump_file, "\n Constant replacement: ");
+ print_generic_expr (dump_file, repl, 0);
+ fprintf (dump_file, "\n");
+ }
+ /* If we can't resolve the expression to a constant, we risk creating
+ a malformed expression (e.g. Ada testsuite, ACATS chapter C3). */
+ if (!repl || !TREE_CONSTANT (repl))
+ disqualify_candidate (base, "Cannot constant-propagate out of pool.");
+ else
+ maybe_add_sra_candidate (base);
+ }
+
if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
return NULL;
@@ -923,6 +1026,8 @@ static bool
scalarizable_type_p (tree type)
{
gcc_assert (!is_gimple_reg_type (type));
+ if (type_contains_placeholder_p (type))
+ return false;
switch (TREE_CODE (type))
{
@@ -970,15 +1075,19 @@ scalarizable_type_p (tree type)
}
}
-static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
+static bool scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
/* Create total_scalarization accesses for all scalar fields of a member
of type DECL_TYPE conforming to scalarizable_type_p. BASE
must be the top-most VAR_DECL representing the variable; within that,
OFFSET locates the member and REF must be the memory reference expression for
- the member. */
+ the member.
-static void
+ Return TRUE if successful; may fail in the case of constant-pool entry BASE
+ whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
+ COMPONENT_REF expressions. */
+
+static bool
completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
{
switch (TREE_CODE (decl_type))
@@ -991,10 +1100,11 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
tree ft = TREE_TYPE (fld);
tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE);
- scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), nref,
- ft);
+ if (!scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)),
+ nref, ft))
+ return false;
}
- break;
+ return true;
case ARRAY_TYPE:
{
tree elemtype = TREE_TYPE (decl_type);
@@ -1026,12 +1136,13 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
ref, size_int (idx + min),
NULL_TREE, NULL_TREE);
int el_off = offset + idx * el_size;
- scalarize_elem (base, el_off, el_size, nref, elemtype);
+ if (!scalarize_elem (base, el_off, el_size, nref, elemtype))
+ return false;
}
while (++idx <= lenlessone);
}
}
- break;
+ return true;
default:
gcc_unreachable ();
}
@@ -1040,22 +1151,32 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
/* Create total_scalarization accesses for a member of type TYPE, which must
satisfy either is_gimple_reg_type or scalarizable_type_p. BASE must be the
top-most VAR_DECL representing the variable; within that, POS and SIZE locate
- the member and REF must be the reference expression for it. */
+ the member and REF must be the reference expression for it.
-static void
+ Return TRUE if successful; may fail in the case of constant-pool entry BASE
+ whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
+ COMPONENT_REF expressions. */
+
+static bool
scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
tree ref, tree type)
{
if (is_gimple_reg_type (type))
{
+ if (TREE_CODE (base) == VAR_DECL
+ && DECL_IN_CONSTANT_POOL (base)
+ && subst_constant_pool_initial (ref, base) == NULL_TREE)
+ return false;
struct access *access = create_access_1 (base, pos, size);
access->expr = ref;
access->type = type;
access->grp_total_scalarization = 1;
/* Accesses for intraprocedural SRA can have their stmt NULL. */
+
+ return true;
}
- else
- completely_scalarize (base, type, pos, ref);
+
+ return completely_scalarize (base, type, pos, ref);
}
/* Create a total_scalarization access for VAR as a whole. VAR must be of a
@@ -1843,7 +1964,12 @@ maybe_add_sra_candidate (tree var)
reject (var, "not aggregate");
return false;
}
- if (needs_to_live_in_memory (var))
+ /* Allow constant-pool entries (that "need to live in memory")
+ unless we are doing IPA SRA. */
+ if (needs_to_live_in_memory (var)
+ && (sra_mode == SRA_MODE_EARLY_IPA
+ || TREE_CODE (var) != VAR_DECL
+ || !DECL_IN_CONSTANT_POOL (var)))
{
reject (var, "needs to live in memory");
return false;
@@ -2343,6 +2469,10 @@ analyze_access_subtree (struct access *root, struct access *parent,
(unsigned) root->offset, (unsigned) root->size);
fprintf (dump_file, " to an integer.\n");
}
+ gcc_checking_assert (TREE_CODE (root->base) != VAR_DECL
+ || !DECL_IN_CONSTANT_POOL (root->base)
+ || subst_constant_pool_initial (root->expr,
+ root->base));
}
root->grp_to_be_replaced = 1;
@@ -2604,8 +2734,17 @@ analyze_all_variable_accesses (void)
if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
<= max_scalarization_size)
{
+ vec<access_p> &accesses = base_access_vec->get_or_insert (var);
+ unsigned orig_count = accesses.length ();
+ if (!completely_scalarize (var, TREE_TYPE (var), 0, var))
+ {
+ /* Failed. Discard any accesses created during attempt. */
+ for (unsigned i = orig_count; i < accesses.length (); i++)
+ access_pool.remove (accesses[i]);
+ accesses.truncate (orig_count);
+ continue;
+ }
create_total_scalarization_access (var);
- completely_scalarize (var, TREE_TYPE (var), 0, var);
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "Will attempt to totally scalarize ");
@@ -3480,6 +3619,56 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
}
}
+static void
+initialize_constant_pool_replacements (void)
+{
+ gimple_seq seq = NULL;
+ gimple_stmt_iterator gsi = gsi_start (seq);
+ bitmap_iterator bi;
+ unsigned i;
+
+ EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
+ if (bitmap_bit_p (should_scalarize_away_bitmap, i)
+ && !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
+ {
+ tree var = candidate (i);
+ if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
+ continue;
+ vec<access_p> *access_vec = get_base_access_vector (var);
+ if (!access_vec)
+ continue;
+ for (unsigned i = 0; i < access_vec->length (); i++)
+ {
+ struct access *access = (*access_vec)[i];
+ tree val = subst_constant_pool_initial (access->expr, access->base);
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Expression ");
+ print_generic_expr (dump_file, access->expr, 0);
+ fprintf (dump_file, " replaced with ");
+ print_generic_expr (dump_file, access->replacement_decl, 0);
+ fprintf (dump_file, " initialized to ");
+ print_generic_expr (dump_file, val, 0);
+ fprintf (dump_file, "\n");
+ if (!access->replacement_decl)
+ dump_access (dump_file, access, true);
+ }
+ if (!access->replacement_decl)
+ continue;
+ gcc_assert (val);
+ gassign *stmt = gimple_build_assign (
+ get_access_replacement (access), val);
+ gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+ update_stmt (stmt);
+ }
+ }
+
+ seq = gsi_seq (gsi);
+ if (seq)
+ gsi_insert_seq_on_edge_immediate (
+ single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
+}
+
/* Traverse the function body and all modifications as decided in
analyze_all_variable_accesses. Return true iff the CFG has been
changed. */
@@ -3490,6 +3679,8 @@ sra_modify_function_body (void)
bool cfg_changed = false;
basic_block bb;
+ initialize_constant_pool_replacements ();
+
FOR_EACH_BB_FN (bb, cfun)
{
gimple_stmt_iterator gsi = gsi_start_bb (bb);
--
1.9.1