[PATCH, i386]: Implement post-reload vzeroupper insertion pass

Uros Bizjak ubizjak@gmail.com
Sun Nov 11 20:47:00 GMT 2012


On Sun, Nov 11, 2012 at 7:36 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

> Regarding vzeroupper insertion pass - we will use gcc pass manager to
> insert a target-dependant pass directly after reload ...

... like attached patch. The patch inserts vzeroupper pass directly
after reload, so spills from 256bit registers are considered when
processing AVX_U128 entity. The patched gcc reruns mode-switching
pass, so an export of entry function from mode-switching is needed.

2012-11-10  Uros Bizjak  <ubizjak@gmail.com>
            Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

        PR target/47440
        * config/i386/i386.c (struct rtl_opt_pass pass_insert_vzeroupper): New.
        (gate_insert_vzeroupper): New function.
        (rest_of_handle_insert_vzeroupper): Ditto.
        (ix86_option_override): Register vzeroupper insertion pass here.
        (ix86_init_machine_status): Remove optimize_mode_switching[AVX_U128]
        initialization.
        * mode-switching.c (optimize_mode_switching): Export.
        * rtl.h (optimize_mode_switching): Declare prototype.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} AVX
target. Functionally equivalent patch was tested on SPEC2000/2006 by
Vladimir.

I will wait a day or two for possible comments. I guess that
non-algorithmic change to mode-switching doesn't need an approval...

Uros.
-------------- next part --------------
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 193409)
+++ config/i386/i386.c	(working copy)
@@ -2301,6 +2301,51 @@ static const char *const cpu_names[TARGET_CPU_DEFA
   "btver2"
 };
 

+static bool
+gate_insert_vzeroupper (void)
+{
+  return TARGET_VZEROUPPER;
+}
+
+static unsigned int
+rest_of_handle_insert_vzeroupper (void)
+{
+  int i;
+
+  /* vzeroupper instructions are inserted immediately after reload to
+     account for possible spills from 256bit registers.  The pass
+     reuses mode switching infrastructure by re-running mode insertion
+     pass, so disable entities that have already been processed.  */
+  for (i = 0; i < MAX_386_ENTITIES; i++)
+    ix86_optimize_mode_switching[i] = 0;
+
+  ix86_optimize_mode_switching[AVX_U128] = 1;
+
+  optimize_mode_switching ();
+  return 0;
+}
+
+struct rtl_opt_pass pass_insert_vzeroupper =
+{
+ {
+  RTL_PASS,
+  "vzeroupper",				/* name */
+  OPTGROUP_NONE,			/* optinfo_flags */
+  gate_insert_vzeroupper,		/* gate */
+  rest_of_handle_insert_vzeroupper,	/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_NONE,				/* tv_id */
+  0,					/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_df_finish | TODO_verify_rtl_sharing |
+  0,					/* todo_flags_finish */
+ }
+};
+
 /* Return true if a red-zone is in use.  */
 
 static inline bool
@@ -3705,7 +3750,16 @@ ix86_option_override_internal (bool main_args_p)
 static void
 ix86_option_override (void)
 {
+  static struct register_pass_info insert_vzeroupper_info
+    = { &pass_insert_vzeroupper.pass, "reload",
+	1, PASS_POS_INSERT_AFTER
+      };
+
   ix86_option_override_internal (true);
+
+
+  /* This needs to be done at start up.  It's convenient to do it here.  */
+  register_pass (&insert_vzeroupper_info);
 }
 
 /* Update register usage after having seen the compiler flags.  */
@@ -23406,7 +23460,6 @@ ix86_init_machine_status (void)
   f = ggc_alloc_cleared_machine_function ();
   f->use_fast_prologue_epilogue_nregs = -1;
   f->call_abi = ix86_abi;
-  f->optimize_mode_switching[AVX_U128] = TARGET_VZEROUPPER;
 
   return f;
 }
Index: mode-switching.c
===================================================================
--- mode-switching.c	(revision 193407)
+++ mode-switching.c	(working copy)
@@ -447,7 +447,7 @@ create_pre_exit (int n_entities, int *entity_map,
 /* Find all insns that need a particular mode setting, and insert the
    necessary mode switches.  Return true if we did work.  */
 
-static int
+int
 optimize_mode_switching (void)
 {
   rtx insn;
Index: rtl.h
===================================================================
--- rtl.h	(revision 193407)
+++ rtl.h	(working copy)
@@ -2719,6 +2719,9 @@ extern rtx get_reg_base_value (unsigned int);
 extern int stack_regs_mentioned (const_rtx insn);
 #endif
 
+/* In mode-switching.c */
+extern int optimize_mode_switching (void);
+
 /* In toplev.c */
 extern GTY(()) rtx stack_limit_rtx;
 


More information about the Gcc-patches mailing list