This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion


> 2013-09-16  Wei Mi  <wmi@google.com>
> 
>         * gcc/config/i386/i386-c.c (ix86_target_macros_internal): Separate
>         PROCESSOR_COREI7_AVX out from PROCESSOR_COREI7.
>         * gcc/config/i386/i386.c (ix86_option_override_internal): Ditto.
>         (ix86_issue_rate): Ditto.
>         (ia32_multipass_dfa_lookahead): Ditto.
>         (ix86_sched_init_global): Ditto.
>         (get_builtin_code_for_version): Ditto.
>         * gcc/config/i386/i386.h (enum target_cpu_default): Ditto.
>         (enum processor_type): Ditto.
>         * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto.

This patch is OK.
> 
> 2013-09-16  Wei Mi  <wmi@google.com>
> 
>         * gcc/config/i386/i386.c (ix86_macro_fusion_p): New Function.
>         (ix86_macro_fusion_pair_p): Ditto.
>         * gcc/config/i386/i386.h: Add new tune features about macro-fusion.
>         * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto.
>         * gcc/doc/tm.texi: Generated.
>         * gcc/doc/tm.texi.in: Ditto.
>         * gcc/haifa-sched.c (try_group_insn): New function.
>         (group_insns_for_macro_fusion): Ditto.
>         (sched_init): Call group_insns_for_macro_fusion.
>         * gcc/sched-rgn.c (add_branch_dependences): Keep insns in
>         a SCHED_GROUP at the end of BB to remain their location.
>         * gcc/target.def: Add two hooks: macro_fusion_p and
>         macro_fusion_pair_p.

I think original plan for fusing was to turn the fused instructions into one within
combiner.  I guess this would lead to quite a large explossion of insns patterns,
so the scheduler approach may be better, if scheduler maintainer agree.
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1fd3f60..85b7aa0 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -24856,6 +24856,90 @@ ia32_multipass_dfa_lookahead (void)
>      }
>  }
> 
> +/* Return true if target platform supports macro-fusion.  */
> +
> +static bool
> +ix86_macro_fusion_p ()
> +{
> +  if (TARGET_FUSE_CMP_AND_BRANCH
> +      && (!TARGET_64BIT || TARGET_FUSE_CMP_AND_BRANCH_64))

You disable fusion for Budozer here sinze you did not add it into
TARGET_FUSE_CMP_AND_BRANCH_64.
> @@ -364,6 +364,12 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
>         ix86_tune_features[X86_TUNE_USE_VECTOR_CONVERTS]
>  #define TARGET_FUSE_CMP_AND_BRANCH \
>         ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH]
> +#define TARGET_FUSE_CMP_AND_BRANCH_64 \
> +       ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_64]

Perhaps we can have TARGET_FUSE_CMP_AND_BRANCH_64 and TARGET_FUSE_CMP_AND_BRANCH_32
plus an macro TARGET_FUSE_CMP_AND_BRANCH that chose corresponding variant based
on TARGET_64BIT rather than having to wind down the test in every use.
> +#define TARGET_FUSE_CMP_AND_BRANCH_SOFLAGS \
> +       ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS]
> +#define TARGET_FUSE_ALU_AND_BRANCH \
> +       ix86_tune_features[X86_TUNE_FUSE_ALU_AND_BRANCH]
>  #define TARGET_OPT_AGU ix86_tune_features[X86_TUNE_OPT_AGU]
>  #define TARGET_VECTORIZE_DOUBLE \
>         ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> index 4ae5f70..a60d0f4 100644
> --- a/gcc/config/i386/x86-tune.def
> +++ b/gcc/config/i386/x86-tune.def
> @@ -196,7 +196,22 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS,
> "use_vector_converts", m_AMDFAM10)
>  /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction
>     with a subsequent conditional jump instruction into a single
>     compare-and-branch uop.  */
> -DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch", m_BDVER)
> +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch",
> +          m_CORE_ALL | m_BDVER)
> +/* X86_TUNE_FUSE_CMP_AND_BRANCH_64: Fuse compare with a subsequent
> +   conditional jump instruction for TARGET_64BIT.  */
> +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_64, "fuse_cmp_and_branch_64",
> +          m_COREI7 | m_COREI7_AVX | m_HASWELL)
Add m_BDVER
> +/* X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS: Fuse compare with a
> +   subsequent conditional jump instruction when the condition jump
> +   check sign flag (SF) or overflow flag (OF).  */
> +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS, "fuse_cmp_and_branch_soflags",
> +          m_COREI7 | m_COREI7_AVX | m_HASWELL)

This flag is affecting only fuding of ALU and BRANCh or should it also affect
X86_TUNE_FUSE_CMP_AND_BRANCH?  In current implementation it seems to be the first
and in that case it ought to be documented that way and probably
called ALT_AND_BRANCH_SOFLAGS to avoid confussion.

I am not sure if AMD hardware has any limitations here.  It fuses only cmp/test
as far as I know, but I do not think it matters what flags you use.

The i386 specific part of the change seems resonable to me.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]