This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Wei Mi <wmi at google dot com>
- Cc: "H.J. Lu" <hjl dot tools at gmail dot com>, Alexander Monakov <amonakov at ispras dot ru>, Steven Bosscher <stevenb dot gcc at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, David Li <davidxl at google dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>
- Date: Sun, 22 Sep 2013 11:57:26 +0200
- Subject: Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOo-dc7=ax8_pA21wuxnqphLBvf_Voi2n1OHJX7ZEab=ew at mail dot gmail dot com> <CA+4CFy4fqCRvM2Luw2_p6AEZOmucSV1KemntEO3_XU5TfzA-7A at mail dot gmail dot com> <CA+4CFy6gdxREYiJa2B70RBe2aUtLY3zQ9ShK9jGEy26Hdn9QOg at mail dot gmail dot com> <CAMe9rOp1R8XACsL=v-JZkvpPzTOFiZhZPMqQXWkmPgHW5cjC6w at mail dot gmail dot com> <CA+4CFy5nM2Dw7kv0G61N5PKHoAanmAaKm+45oS4pN22TKgSAFg at mail dot gmail dot com>
> 2013-09-16 Wei Mi <wmi@google.com>
>
> * gcc/config/i386/i386-c.c (ix86_target_macros_internal): Separate
> PROCESSOR_COREI7_AVX out from PROCESSOR_COREI7.
> * gcc/config/i386/i386.c (ix86_option_override_internal): Ditto.
> (ix86_issue_rate): Ditto.
> (ia32_multipass_dfa_lookahead): Ditto.
> (ix86_sched_init_global): Ditto.
> (get_builtin_code_for_version): Ditto.
> * gcc/config/i386/i386.h (enum target_cpu_default): Ditto.
> (enum processor_type): Ditto.
> * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto.
This patch is OK.
>
> 2013-09-16 Wei Mi <wmi@google.com>
>
> * gcc/config/i386/i386.c (ix86_macro_fusion_p): New Function.
> (ix86_macro_fusion_pair_p): Ditto.
> * gcc/config/i386/i386.h: Add new tune features about macro-fusion.
> * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto.
> * gcc/doc/tm.texi: Generated.
> * gcc/doc/tm.texi.in: Ditto.
> * gcc/haifa-sched.c (try_group_insn): New function.
> (group_insns_for_macro_fusion): Ditto.
> (sched_init): Call group_insns_for_macro_fusion.
> * gcc/sched-rgn.c (add_branch_dependences): Keep insns in
> a SCHED_GROUP at the end of BB to remain their location.
> * gcc/target.def: Add two hooks: macro_fusion_p and
> macro_fusion_pair_p.
I think original plan for fusing was to turn the fused instructions into one within
combiner. I guess this would lead to quite a large explossion of insns patterns,
so the scheduler approach may be better, if scheduler maintainer agree.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1fd3f60..85b7aa0 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -24856,6 +24856,90 @@ ia32_multipass_dfa_lookahead (void)
> }
> }
>
> +/* Return true if target platform supports macro-fusion. */
> +
> +static bool
> +ix86_macro_fusion_p ()
> +{
> + if (TARGET_FUSE_CMP_AND_BRANCH
> + && (!TARGET_64BIT || TARGET_FUSE_CMP_AND_BRANCH_64))
You disable fusion for Budozer here sinze you did not add it into
TARGET_FUSE_CMP_AND_BRANCH_64.
> @@ -364,6 +364,12 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
> ix86_tune_features[X86_TUNE_USE_VECTOR_CONVERTS]
> #define TARGET_FUSE_CMP_AND_BRANCH \
> ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH]
> +#define TARGET_FUSE_CMP_AND_BRANCH_64 \
> + ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_64]
Perhaps we can have TARGET_FUSE_CMP_AND_BRANCH_64 and TARGET_FUSE_CMP_AND_BRANCH_32
plus an macro TARGET_FUSE_CMP_AND_BRANCH that chose corresponding variant based
on TARGET_64BIT rather than having to wind down the test in every use.
> +#define TARGET_FUSE_CMP_AND_BRANCH_SOFLAGS \
> + ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS]
> +#define TARGET_FUSE_ALU_AND_BRANCH \
> + ix86_tune_features[X86_TUNE_FUSE_ALU_AND_BRANCH]
> #define TARGET_OPT_AGU ix86_tune_features[X86_TUNE_OPT_AGU]
> #define TARGET_VECTORIZE_DOUBLE \
> ix86_tune_features[X86_TUNE_VECTORIZE_DOUBLE]
> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> index 4ae5f70..a60d0f4 100644
> --- a/gcc/config/i386/x86-tune.def
> +++ b/gcc/config/i386/x86-tune.def
> @@ -196,7 +196,22 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS,
> "use_vector_converts", m_AMDFAM10)
> /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction
> with a subsequent conditional jump instruction into a single
> compare-and-branch uop. */
> -DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch", m_BDVER)
> +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch",
> + m_CORE_ALL | m_BDVER)
> +/* X86_TUNE_FUSE_CMP_AND_BRANCH_64: Fuse compare with a subsequent
> + conditional jump instruction for TARGET_64BIT. */
> +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_64, "fuse_cmp_and_branch_64",
> + m_COREI7 | m_COREI7_AVX | m_HASWELL)
Add m_BDVER
> +/* X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS: Fuse compare with a
> + subsequent conditional jump instruction when the condition jump
> + check sign flag (SF) or overflow flag (OF). */
> +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS, "fuse_cmp_and_branch_soflags",
> + m_COREI7 | m_COREI7_AVX | m_HASWELL)
This flag is affecting only fuding of ALU and BRANCh or should it also affect
X86_TUNE_FUSE_CMP_AND_BRANCH? In current implementation it seems to be the first
and in that case it ought to be documented that way and probably
called ALT_AND_BRANCH_SOFLAGS to avoid confussion.
I am not sure if AMD hardware has any limitations here. It fuses only cmp/test
as far as I know, but I do not think it matters what flags you use.
The i386 specific part of the change seems resonable to me.
Honza