[PATCH] [8/9 Regression] i386: Add pass_remove_partial_avx_dependency

H.J. Lu hjl.tools@gmail.com
Thu Feb 21 17:43:00 GMT 2019


On Thu, Feb 21, 2019 at 5:58 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> Hello,
>
> 2019-02-01  H.J. Lu  <hongjiu.lu@intel.com>
>             Hongtao Liu  <hongtao.liu@intel.com>
>             Sunil K Pandey  <sunil.k.pandey@intel.com>
>
>         PR target/87007
>         * config/i386/i386-passes.def: Add
>         pass_remove_partial_avx_dependency.
>         * config/i386/i386-protos.h
>         (make_pass_remove_partial_avx_dependency): New.
>         * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
>         New function.
>         (pass_data_remove_partial_avx_dependency): New.
>         (pass_remove_partial_avx_dependency): Likewise.
>         (make_pass_remove_partial_avx_dependency): Likewise.
>         * config/i386/i386.md (partial_xmm_update): New attribute.
>         (*extendsfdf2): Add partial_xmm_update.
>         (truncdfsf2): Likewise.
>         (*float<SWI48:mode><MODEF:mode>2): Likewise.
>         (SF/DF conversion splitters): Disabled for TARGET_AVX.
>
> gcc/testsuite/
>
> 2019-02-01  H.J. Lu  <hongjiu.lu@intel.com>
>             Hongtao Liu  <hongtao.liu@intel.com>
>             Sunil K Pandey  <sunil.k.pandey@intel.com>
>
>         PR target/87007
>         * gcc.target/i386/pr87007-1.c: New test.
>         * gcc.target/i386/pr87007-2.c: Likewise.
>
>
> It seems to me that more systematic way would be to use mode switching
> pass that uses the LCM framework and possibly tweak LCM to do the right
> thing with respect to loops (easy solution would be to lift insertion
> points to the dominators with smaller frequency even if there may be path
> that does not execute the instruction needing the pxor).
>
> Teaching LCM framework is however more intrusive than self contained
> minipass and Since the patch solves a regression and is self contained I
> guess we should go ahead with it for this release and look for more
> systematic solutions later.
>
> Patch is OK with the following change.
>
> +static unsigned int
> +remove_partial_avx_dependency (void)
> +{
> +  timevar_push (TV_MACH_DEP);
> +
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_md_add_problem ();
> +  df_analyze ();
>
> Please delay the initialization after you hit first instruction that

I changed it to:

  if (v4sf_const0)
    {
      calculate_dominance_info (CDI_DOMINATORS);
      df_set_flags (DF_DEFER_INSN_RESCAN);
      df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
      df_md_add_problem ();
      df_analyze ();

      /* (Re-)discover loops so that bb->loop_father can be used in the
         analysis below.  */
      loop_optimizer_init (AVOID_CFG_MODIFICATIONS);

      /* Generate a vxorps at entry of the nearest dominator for basic
         blocks with conversions, which is in the the fake loop that
         contains the whole function, so that there is only a single
         vxorps in the whole function.   */
      bb = nearest_common_dominator_for_set (CDI_DOMINATORS,
                                             convert_bbs);
      while (bb->loop_father->latch
             != EXIT_BLOCK_PTR_FOR_FN (cfun))
        bb = get_immediate_dominator (CDI_DOMINATORS,
                                      bb->loop_father->header);

      insn = BB_HEAD (bb);
      if (!NONDEBUG_INSN_P (insn))
        insn = next_nonnote_nondebug_insn (insn);
      set = gen_rtx_SET (v4sf_const0, CONST0_RTX (V4SFmode));
      set_insn = emit_insn_before (set, insn);
      df_insn_rescan (set_insn);
      df_process_deferred_rescans ();
      loop_optimizer_finalize ();
    }

> needs processing.  The pass is run unconditionally and in many functions
> it will do noting. Can you also gate the pass to run only of AVX is
> enabled?

There are

 virtual bool gate (function *)
    {
      return (TARGET_AVX
              && TARGET_SSE_PARTIAL_REG_DEPENDENCY
              && TARGET_SSE_MATH
              && optimize
              && optimize_function_for_speed_p (cfun));
    }

> Patch is OK with this change. Please way a day for possible Uros' or RM
> reactions.  Sorry for the delayed reaction.
> Honza

This is the updated patch I am going to check in tomorrow.

Thanks.

-- 
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-8-9-Regression-i386-Add-pass_remove_partial_avx_depe.patch
Type: text/x-patch
Size: 14989 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190221/9e9286d5/attachment.bin>


More information about the Gcc-patches mailing list