[PATCH, i386]: Implement post-reload vzeroupper insertion pass
Uros Bizjak
ubizjak@gmail.com
Sun Nov 11 20:47:00 GMT 2012
On Sun, Nov 11, 2012 at 7:36 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> Regarding vzeroupper insertion pass - we will use gcc pass manager to
> insert a target-dependant pass directly after reload ...
... like attached patch. The patch inserts vzeroupper pass directly
after reload, so spills from 256bit registers are considered when
processing AVX_U128 entity. The patched gcc reruns mode-switching
pass, so an export of entry function from mode-switching is needed.
2012-11-10 Uros Bizjak <ubizjak@gmail.com>
Vladimir Yakovlev <vladimir.b.yakovlev@intel.com>
PR target/47440
* config/i386/i386.c (struct rtl_opt_pass pass_insert_vzeroupper): New.
(gate_insert_vzeroupper): New function.
(rest_of_handle_insert_vzeroupper): Ditto.
(ix86_option_override): Register vzeroupper insertion pass here.
(ix86_init_machine_status): Remove optimize_mode_switching[AVX_U128]
initialization.
* mode-switching.c (optimize_mode_switching): Export.
* rtl.h (optimize_mode_switching): Declare prototype.
Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} AVX
target. Functionally equivalent patch was tested on SPEC2000/2006 by
Vladimir.
I will wait a day or two for possible comments. I guess that
non-algorithmic change to mode-switching doesn't need an approval...
Uros.
-------------- next part --------------
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 193409)
+++ config/i386/i386.c (working copy)
@@ -2301,6 +2301,51 @@ static const char *const cpu_names[TARGET_CPU_DEFA
"btver2"
};
+static bool
+gate_insert_vzeroupper (void)
+{
+ return TARGET_VZEROUPPER;
+}
+
+static unsigned int
+rest_of_handle_insert_vzeroupper (void)
+{
+ int i;
+
+ /* vzeroupper instructions are inserted immediately after reload to
+ account for possible spills from 256bit registers. The pass
+ reuses mode switching infrastructure by re-running mode insertion
+ pass, so disable entities that have already been processed. */
+ for (i = 0; i < MAX_386_ENTITIES; i++)
+ ix86_optimize_mode_switching[i] = 0;
+
+ ix86_optimize_mode_switching[AVX_U128] = 1;
+
+ optimize_mode_switching ();
+ return 0;
+}
+
+struct rtl_opt_pass pass_insert_vzeroupper =
+{
+ {
+ RTL_PASS,
+ "vzeroupper", /* name */
+ OPTGROUP_NONE, /* optinfo_flags */
+ gate_insert_vzeroupper, /* gate */
+ rest_of_handle_insert_vzeroupper, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ TV_NONE, /* tv_id */
+ 0, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_df_finish | TODO_verify_rtl_sharing |
+ 0, /* todo_flags_finish */
+ }
+};
+
/* Return true if a red-zone is in use. */
static inline bool
@@ -3705,7 +3750,16 @@ ix86_option_override_internal (bool main_args_p)
static void
ix86_option_override (void)
{
+ static struct register_pass_info insert_vzeroupper_info
+ = { &pass_insert_vzeroupper.pass, "reload",
+ 1, PASS_POS_INSERT_AFTER
+ };
+
ix86_option_override_internal (true);
+
+
+ /* This needs to be done at start up. It's convenient to do it here. */
+ register_pass (&insert_vzeroupper_info);
}
/* Update register usage after having seen the compiler flags. */
@@ -23406,7 +23460,6 @@ ix86_init_machine_status (void)
f = ggc_alloc_cleared_machine_function ();
f->use_fast_prologue_epilogue_nregs = -1;
f->call_abi = ix86_abi;
- f->optimize_mode_switching[AVX_U128] = TARGET_VZEROUPPER;
return f;
}
Index: mode-switching.c
===================================================================
--- mode-switching.c (revision 193407)
+++ mode-switching.c (working copy)
@@ -447,7 +447,7 @@ create_pre_exit (int n_entities, int *entity_map,
/* Find all insns that need a particular mode setting, and insert the
necessary mode switches. Return true if we did work. */
-static int
+int
optimize_mode_switching (void)
{
rtx insn;
Index: rtl.h
===================================================================
--- rtl.h (revision 193407)
+++ rtl.h (working copy)
@@ -2719,6 +2719,9 @@ extern rtx get_reg_base_value (unsigned int);
extern int stack_regs_mentioned (const_rtx insn);
#endif
+/* In mode-switching.c */
+extern int optimize_mode_switching (void);
+
/* In toplev.c */
extern GTY(()) rtx stack_limit_rtx;
More information about the Gcc-patches
mailing list