This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[Patchv2 3/4] Control SRA and IPA-SRA by a param rather than MOVE_RATIO
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: richard dot guenther at gmail dot com, richard dot earnshaw at arm dot com, marcus dot shawcroft at arm dot com, pinskia at gmail dot com
- Date: Thu, 25 Sep 2014 15:57:35 +0100
- Subject: [Patchv2 3/4] Control SRA and IPA-SRA by a param rather than MOVE_RATIO
- Authentication-results: sourceware.org; auth=none
- References: <CAFiYyc33WnFRGZiSLz+b8dFX=eE_pkoHPoTMFEN3zna-rRUKTQ at mail dot gmail dot com> <1411657056-24865-1-git-send-email-james dot greenhalgh at arm dot com>
Hi,
After hookizing MOVE_BY_PIECES_P and migrating tree-inline.c, we are
left with only one user of MOVE_RATIO - deciding the maximum size of
aggregate for SRA.
Past discussions have made it clear [1] that keeping this use of
MOVE_RATIO is undesirable. Clearly it is now also misnamed.
The previous iteration of this patch was rejected as too complicated. I
went off and tried simplifying it to use MOVE_RATIO, but if we do that we
end up breaking some interface boundaries between the driver and the
backend.
This patch partially hookizes MOVE_RATIO under the new name
TARGET_MAX_SCALARIZATION_SIZE and uses it to set default values for two
new parameters:
sra-max-scalarization-size-Ospeed - The maximum size of aggregate
to consider when compiling for speed
sra-max-scalarization-size-Osize - The maximum size of aggregate
to consider when compiling for size.
We then modify SRA to use these parameters rather than MOVE_RATIO.
Bootstrapped and regression tested for x86, arm and aarch64 with no
issues.
OK for trunk?
[1]: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01997.html
---
gcc/
2014-09-25 James Greenhalgh <james.greenhalgh@arm.com>
* doc/invoke.texi (sra-max-scalarization-size-Ospeed): Document.
(sra-max-scalarization-size-Osize): Likewise.
* doc/tm.texi.in
(MOVE_RATIO): Reduce documentation to a stub, deprecate.
(TARGET_MAX_SCALARIZATION_SIZE): Add hook.
* doc/tm.texi: Regenerate.
* defaults.h (MOVE_RATIO): Remove default implementation.
(SET_RATIO): Add a default implementation if MOVE_RATIO
is not defined.
* params.def (sra-max-scalarization-size-Ospeed): New.
(sra-max-scalarization-size-Osize): Likewise.
* target.def (max_scalarization_size): New.
* targhooks.c (default_max_scalarization_size): New.
* targhooks.h (default_max_scalarization_size): New.
* tree-sra.c (get_max_scalarization_size): New.
(analyze_all_variable_accesses): Use it.
diff --git a/gcc/defaults.h b/gcc/defaults.h
index c1776b0..f723e2c 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1191,18 +1191,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
#define BRANCH_COST(speed_p, predictable_p) 1
#endif
-/* If a memory-to-memory move would take MOVE_RATIO or more simple
- move-instruction sequences, we will do a movmem or libcall instead. */
-
-#ifndef MOVE_RATIO
-#if defined (HAVE_movmemqi) || defined (HAVE_movmemhi) || defined (HAVE_movmemsi) || defined (HAVE_movmemdi) || defined (HAVE_movmemti)
-#define MOVE_RATIO(speed) 2
-#else
-/* If we are optimizing for space (-Os), cut down the default move ratio. */
-#define MOVE_RATIO(speed) ((speed) ? 15 : 3)
-#endif
-#endif
-
/* If a clear memory operation would take CLEAR_RATIO or more simple
move-instruction sequences, we will do a setmem or libcall instead. */
@@ -1219,7 +1207,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
SET_RATIO or more simple move-instruction sequences, we will do a movmem
or libcall instead. */
#ifndef SET_RATIO
+#ifdef MOVE_RATIO
#define SET_RATIO(speed) MOVE_RATIO (speed)
+#elif defined (HAVE_movmemqi) || defined (HAVE_movmemhi) || defined (HAVE_movmemsi) || defined (HAVE_movmemdi) || defined (HAVE_movmemti)
+#define SET_RATIO(speed) 2
+#else
+/* If we are optimizing for space (-Os), cut down the default move ratio. */
+#define SET_RATIO(speed) ((speed) ? 15 : 3)
+#endif
#endif
/* Supply a default definition for FUNCTION_ARG_PADDING:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index eae4ab1..c3e6eaa 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10301,6 +10301,16 @@ parameters only when their cumulative size is less or equal to
@option{ipa-sra-ptr-growth-factor} times the size of the original
pointer parameter.
+@item sra-max-scalarization-size-Ospeed
+@item sra-max-scalarization-size-Osize
+The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to
+replace scalar parts of aggregates with uses of independent scalar
+variables. These parameters control the maximum size, in storage units,
+of aggregate which will be considered for replacement when compiling for
+speed
+(@option{sra-max-scalarization-size-Ospeed}) or size
+(@option{sra-max-scalarization-size-Osize}) respectively.
+
@item tm-max-aggregate-size
When making copies of thread-local variables in a transaction, this
parameter specifies the size in bytes after which variables are
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f59641a..b4061eb 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6098,20 +6098,25 @@ this macro is defined, it should produce a nonzero value when
@end defmac
@defmac MOVE_RATIO (@var{speed})
-The threshold of number of scalar memory-to-memory move insns, @emph{below}
-which a sequence of insns should be generated instead of a
-string move insn or a library call. Increasing the value will always
-make code faster, but eventually incurs high cost in increased code size.
+This macro is deprecated and is only used to guide the default behaviours
+of @code{TARGET_MOVE_BY_PIECES_PROFITABLE_P} and
+@code{TARGET_MAX_TOTAL_SCALARIZATION_SIZE}. New ports should implement
+that hook in preference to this macro.
+@end defmac
-Note that on machines where the corresponding move insn is a
-@code{define_expand} that emits a sequence of insns, this macro counts
-the number of such sequences.
+@deftypefn {Target Hook} {unsigned int} TARGET_MAX_SCALARIZATION_SIZE (bool @var{speed_p})
+This target hook is used by the Scalar Replacement of Aggregates passes
+(SRA and IPA-SRA). This hook gives the maximimum size, in storage units,
+of aggregate to consider for replacement. @var{speed_p} is true if we are
+currently compiling for speed.
-The parameter @var{speed} is true if the code is currently being
-optimized for speed rather than size.
+By default, the maximum scalarization size is determined by MOVE_RATIO,
+if it is defined. Otherwise, a sensible default is chosen.
-If you don't define this, a reasonable default is used.
-@end defmac
+Note that a user may choose to override this target hook with the
+parameters @code{sra-max-scalarization-size-Ospeed} and
+@code{sra-max-scalarization-size-Osize}.
+@end deftypefn
@defmac MOVE_BY_PIECES_P (@var{size}, @var{alignment})
A C expression used to implement the default behaviour of
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d2a4386..bdd1ec4 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4581,21 +4581,14 @@ this macro is defined, it should produce a nonzero value when
@end defmac
@defmac MOVE_RATIO (@var{speed})
-The threshold of number of scalar memory-to-memory move insns, @emph{below}
-which a sequence of insns should be generated instead of a
-string move insn or a library call. Increasing the value will always
-make code faster, but eventually incurs high cost in increased code size.
-
-Note that on machines where the corresponding move insn is a
-@code{define_expand} that emits a sequence of insns, this macro counts
-the number of such sequences.
-
-The parameter @var{speed} is true if the code is currently being
-optimized for speed rather than size.
-
-If you don't define this, a reasonable default is used.
+This macro is deprecated and is only used to guide the default behaviours
+of @code{TARGET_MOVE_BY_PIECES_PROFITABLE_P} and
+@code{TARGET_MAX_TOTAL_SCALARIZATION_SIZE}. New ports should implement
+that hook in preference to this macro.
@end defmac
+@hook TARGET_MAX_SCALARIZATION_SIZE
+
@defmac MOVE_BY_PIECES_P (@var{size}, @var{alignment})
A C expression used to implement the default behaviour of
@code{TARGET_MOVE_BY_PIECES_PROFITABLE_P}. New ports should implement
diff --git a/gcc/params.def b/gcc/params.def
index aefdd07..7b6c7e2 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -942,6 +942,18 @@ DEFPARAM (PARAM_TM_MAX_AGGREGATE_SIZE,
"pairs",
9, 0, 0)
+DEFPARAM (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED,
+ "sra-max-scalarization-size-Ospeed",
+ "Maximum size, in storage units, of an aggregate which should be "
+ "considered for scalarization when compiling for speed",
+ 0, 0, 0)
+
+DEFPARAM (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE,
+ "sra-max-scalarization-size-Osize",
+ "Maximum size, in storage units, of an aggregate which should be "
+ "considered for scalarization when compiling for size",
+ 0, 0, 0)
+
DEFPARAM (PARAM_IPA_CP_VALUE_LIST_SIZE,
"ipa-cp-value-list-size",
"Maximum size of a list of values associated with each parameter for "
diff --git a/gcc/target.def b/gcc/target.def
index 10f3b2e..4e19845 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3049,6 +3049,24 @@ are the same as to this target hook.",
int, (enum machine_mode mode, reg_class_t rclass, bool in),
default_memory_move_cost)
+/* Return the maximum size in bytes of aggregate which will be considered
+ for replacement by SRA/IP-SRA. */
+DEFHOOK
+(max_scalarization_size,
+ "This target hook is used by the Scalar Replacement of Aggregates passes\n\
+(SRA and IPA-SRA). This hook gives the maximimum size, in storage units,\n\
+of aggregate to consider for replacement. @var{speed_p} is true if we are\n\
+currently compiling for speed.\n\
+\n\
+By default, the maximum scalarization size is determined by MOVE_RATIO,\n\
+if it is defined. Otherwise, a sensible default is chosen.\n\
+\n\
+Note that a user may choose to override this target hook with the\n\
+parameters @code{sra-max-scalarization-size-Ospeed} and\n\
+@code{sra-max-scalarization-size-Osize}.",
+ unsigned int, (bool speed_p),
+ default_max_scalarization_size)
+
DEFHOOK
(move_by_pieces_profitable_p,
"GCC will attempt several strategies when asked to copy between\n\
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index eb0a4cd..abc94ff 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1421,6 +1421,15 @@ get_move_ratio (bool speed_p ATTRIBUTE_UNUSED)
return move_ratio;
}
+/* Return the maximum size, in storage units, of aggregate
+ which will be considered for replacement by SRA/IP-SRA. */
+
+unsigned int
+default_max_scalarization_size (bool speed_p ATTRIBUTE_UNUSED)
+{
+ return get_move_ratio (speed_p) * MOVE_MAX_PIECES;
+}
+
/* The threshold of move insns below which the movmem optab is expanded or a
call to memcpy is emitted. */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index f76ad31..35467f8 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -181,6 +181,7 @@ extern int default_memory_move_cost (enum machine_mode, reg_class_t, bool);
extern int default_register_move_cost (enum machine_mode, reg_class_t,
reg_class_t);
+extern unsigned int default_max_scalarization_size (bool size_p);
extern bool default_move_by_pieces_profitable_p (unsigned int,
unsigned int, bool);
extern unsigned int default_estimate_block_copy_ninsns (HOST_WIDE_INT, bool);
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 8259dba..c611d29 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2482,6 +2482,25 @@ propagate_all_subaccesses (void)
}
}
+/* Return the appropriate parameter value giving the maximum size of
+ aggregate (in storage units) to be considered for scalerization.
+ SPEED_P, which is true if we are currently optimizing for speed
+ rather than size. */
+
+unsigned int
+get_max_scalarization_size (bool speed_p)
+{
+ unsigned param_max_scalarization_size
+ = speed_p
+ ? PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED)
+ : PARAM_VALUE (PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE);
+
+ if (!param_max_scalarization_size)
+ return targetm.max_scalarization_size (speed_p);
+
+ return param_max_scalarization_size;
+}
+
/* Go through all accesses collected throughout the (intraprocedural) analysis
stage, exclude overlapping ones, identify representatives and build trees
out of them, making decisions about scalarization on the way. Return true
@@ -2493,10 +2512,10 @@ analyze_all_variable_accesses (void)
int res = 0;
bitmap tmp = BITMAP_ALLOC (NULL);
bitmap_iterator bi;
- unsigned i, max_total_scalarization_size;
-
- max_total_scalarization_size = UNITS_PER_WORD * BITS_PER_UNIT
- * MOVE_RATIO (optimize_function_for_speed_p (cfun));
+ unsigned i;
+ unsigned int max_scalarization_size
+ = get_max_scalarization_size (optimize_function_for_size_p (cfun))
+ * BITS_PER_UNIT;
EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
if (bitmap_bit_p (should_scalarize_away_bitmap, i)
@@ -2508,7 +2527,7 @@ analyze_all_variable_accesses (void)
&& type_consists_of_records_p (TREE_TYPE (var)))
{
if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
- <= max_total_scalarization_size)
+ <= max_scalarization_size)
{
completely_scalarize_var (var);
if (dump_file && (dump_flags & TDF_DETAILS))