[PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059
will schmidt
will_schmidt@vnet.ibm.com
Wed Apr 20 21:20:23 GMT 2022
On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059
>
> This is V4 of the patch. Compared to V3 of the patch, GCC will just
> ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.
>
Hi,
No comments on code, a few comments about the comments below.
> The splitting of signed halfword and word loads into unsigned load and
> sign extension is now suppressed with -Os, but it is done normally if we
> are not optimizing for space.
I see references to TARGET_P8_FUSION_SIGN in the patch below, and some
removal of old code. I assume this describes the implementation that
remains.
>
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's. However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.
specification. :-)
>
> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller. If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't do
> the inlinging. When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.
inlining.
>
> This code makes the -mpower8-fusion option a nop. It is accepted without
> warning, but it does nothing. Power8 fusion is only enabled if we are tuning
> for a power8.
>
> The undocumented -mpower8-fusion-sign option is also made into a nop.
>
> I left in the pragma target and attribute target support for power8-fusion, but
> using it doesn't do anything now. This is because I told the customer who
> encountered this problem that one solution was to add an explicit
> no-power8-fusion option in their target pragma or attribute to work around the
> problem.
>
> I have tested this patch on a little endian power10 system. I have tested
> previous versions on little endian power9 and big endian power8 systems.
> Can I apply this patch to the master branch?
>
> If it is accepted, I will produce a similar patch for back porting to GCC 11
> and GCC 10.
>
> 2022-04-12 Michael Meissner <meissner@linux.ibm.com>
>
> gcc/
> PR target/102059
> * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
> (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
> (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.
> * config/rs6000/rs6000.cc (rs6000_option_override_internal):
> Delete code that set the power8 fusion options automatically.
> (rs6000_opt_masks): Allow #pragma target and attribute target
> power8-fusion option for backwards compatibility.
> (rs6000_print_options_internal): Skip printing backward
> compatibility options that are just ignored.
> * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
> (TARGET_P8_FUSION_SIGN): Likewise.
> (MASK_P8_FUSION): Delete.
> * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
> ignore it completely.
> (-mpower8-fusion-sign): Likewise.
> * doc/invoke.texi (RS/6000 and PowerPC Options): Delete
> -mpower8-fusion.
>
> gcc/testsuite/
> PR target/102059
> * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
> * gcc.dg/lto/pr102059-2_0.c: Likewise.
> * gcc.target/powerpc/pr102059-3.c: Likewise.
> * gcc.target/powerpc/pr102059-4.c: New test.
> ---
> gcc/config/rs6000/rs6000-cpus.def | 18 +++----
> gcc/config/rs6000/rs6000.cc | 49 +++++--------------
> gcc/config/rs6000/rs6000.h | 13 ++++-
> gcc/config/rs6000/rs6000.opt | 8 +--
> gcc/doc/invoke.texi | 13 +----
> gcc/testsuite/gcc.dg/lto/pr102059-1_0.c | 2 +-
> gcc/testsuite/gcc.dg/lto/pr102059-2_0.c | 2 +-
> gcc/testsuite/gcc.target/powerpc/pr102059-3.c | 2 +-
> gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +++++++++
> 9 files changed, 62 insertions(+), 68 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
>
> diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..d913a3d6b73 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -54,19 +54,14 @@
> | OPTION_MASK_QUAD_MEMORY \
> | OPTION_MASK_QUAD_MEMORY_ATOMIC)
>
> -/* ISA masks setting fusion options. */
> -#define OTHER_FUSION_MASKS (OPTION_MASK_P8_FUSION \
> - | OPTION_MASK_P8_FUSION_SIGN)
> -
> /* Add ISEL back into ISA 3.0, since it is supposed to be a win. Do not add
> FLOAT128_HW here until we are ready to make -mfloat128 on by default. */
> -#define ISA_3_0_MASKS_SERVER ((ISA_2_7_MASKS_SERVER \
> - | OPTION_MASK_ISEL \
> - | OPTION_MASK_MODULO \
> - | OPTION_MASK_P9_MINMAX \
> - | OPTION_MASK_P9_MISC \
> - | OPTION_MASK_P9_VECTOR) \
> - & ~OTHER_FUSION_MASKS)
> +#define ISA_3_0_MASKS_SERVER (ISA_2_7_MASKS_SERVER \
> + | OPTION_MASK_ISEL \
> + | OPTION_MASK_MODULO \
> + | OPTION_MASK_P9_MINMAX \
> + | OPTION_MASK_P9_MISC \
> + | OPTION_MASK_P9_VECTOR)
>
> /* Support for the IEEE 128-bit floating point hardware requires a lot of the
> VSX instructions that are part of ISA 3.0. */
> @@ -140,7 +135,6 @@
> | OPTION_MASK_MODULO \
> | OPTION_MASK_MULHW \
> | OPTION_MASK_NO_UPDATE \
> - | OPTION_MASK_P8_FUSION \
> | OPTION_MASK_P8_VECTOR \
> | OPTION_MASK_P9_MINMAX \
> | OPTION_MASK_P9_MISC \
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index ceaddafd33b..269c7510e3e 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4040,41 +4040,6 @@ rs6000_option_override_internal (bool global_init_p)
> && optimize_function_for_speed_p (cfun))
> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>
> - /* Enable power8 fusion if we are tuning for power8, even if we aren't
> - generating power8 instructions. Power9 does not optimize power8 fusion
> - cases. */
> - if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION))
> - {
> - if (processor_target_table[tune_index].processor == PROCESSOR_POWER8)
> - rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
> - else
> - rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
> - }
> -
> - /* Setting additional fusion flags turns on base fusion. */
> - if (!TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN)
> - {
> - if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)
> - {
> - if (TARGET_P8_FUSION_SIGN)
> - error ("%qs requires %qs", "-mpower8-fusion-sign",
> - "-mpower8-fusion");
> -
> - rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
> - }
> - else
> - rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
> - }
> -
> - /* Power8 does not fuse sign extended loads with the addis. If we are
> - optimizing at high levels for speed, convert a sign extended load into a
> - zero extending load, and an explicit sign extension. */
> - if (TARGET_P8_FUSION
> - && !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
> - && optimize_function_for_speed_p (cfun)
> - && optimize >= 3)
> - rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
> -
> /* ISA 3.0 vector instructions include ISA 2.07. */
> if (TARGET_P9_VECTOR && !TARGET_P8_VECTOR)
> {
> @@ -24010,8 +23975,6 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
> { "pcrel-opt", OPTION_MASK_PCREL_OPT, false, true },
> { "popcntb", OPTION_MASK_POPCNTB, false, true },
> { "popcntd", OPTION_MASK_POPCNTD, false, true },
> - { "power8-fusion", OPTION_MASK_P8_FUSION, false, true },
> - { "power8-fusion-sign", OPTION_MASK_P8_FUSION_SIGN, false, true },
> { "power8-vector", OPTION_MASK_P8_VECTOR, false, true },
> { "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
> { "power9-misc", OPTION_MASK_P9_MISC, false, true },
> @@ -24051,6 +24014,13 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
> #endif
> { "soft-float", OPTION_MASK_SOFT_FLOAT, false, false },
> { "string", 0, false, false },
> +
> + /* The Power8 fusion option was removed. We ignore using it in #pragma and
> + attribute target. Users may have used the options to suppress errors if
> + they declare an inline function to be specifically power8 and the function
> + was included by power9 or power10 which turned off the power8 fusion
> + support. */
"was removed" is a good detail for history, but possibly not required
here? Maybe stl "The Power8 fusion option is a no-op". ?
> + { "power8-fusion", 0, false, true },
> };
>
> /* Builtin mask mapping for printing the flags. */
> @@ -24687,6 +24657,11 @@ rs6000_print_options_internal (FILE *file,
> HOST_WIDE_INT mask = opts[i].mask;
> size_t len = comma_len + prefix_len + strlen (name);
>
> + /* Don't print options that exist for backwards compatibility, but are
> + ignored now like -mpower8-fusion. */
> + if (!mask)
> + continue;
I'm not sure calling out the -mpower8-fusion option here is necessary.
perhaps s/ignored now like -mpower8-fusion/masked out./
> +
> if (!invert)
> {
> if ((flags & mask) == 0)
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 523256a5c9d..52a29e1c702 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -504,6 +504,18 @@ extern int rs6000_vector_align[];
> #define TARGET_MINMAX (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT \
> && (TARGET_P9_MINMAX || !flag_trapping_math))
>
> +/* Power8 has special fusion operations that are enabled if we are tuning for
> + power8. This used to be settable with an option (-mpower8-fusion), but that
> + option has been removed. */
s/has been removed/is now ignored/ ?
> +#define TARGET_P8_FUSION (rs6000_tune == PROCESSOR_POWER8)
> +
> +/* Power8 fusion does not fuse loads with sign extends. If we are not
> + optimizing for space, split loads with sign extension to loads with zero
> + extension and an explicit sign extend operation, so that the zero extending
> + load can be fused. */
> +#define TARGET_P8_FUSION_SIGN (TARGET_P8_FUSION \
> + && !optimize_function_for_size_p (cfun))
> +
> /* In switching from using target_flags to using rs6000_isa_flags, the options
> machinery creates OPTION_MASK_<xxx> instead of MASK_<xxx>. For now map
> OPTION_MASK_<xxx> back into MASK_<xxx>. */
> @@ -517,7 +529,6 @@ extern int rs6000_vector_align[];
> #define MASK_FLOAT128_KEYWORD OPTION_MASK_FLOAT128_KEYWORD
> #define MASK_FLOAT128_HW OPTION_MASK_FLOAT128_HW
> #define MASK_FPRND OPTION_MASK_FPRND
> -#define MASK_P8_FUSION OPTION_MASK_P8_FUSION
> #define MASK_HARD_FLOAT OPTION_MASK_HARD_FLOAT
> #define MASK_HTM OPTION_MASK_HTM
> #define MASK_ISEL OPTION_MASK_ISEL
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 4931d781c4e..6c4caf4c9ee 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -474,13 +474,13 @@ Save the TOC in the prologue for indirect calls rather than inline.
> mvsx-timode
> Target RejectNegative Undocumented Ignore
>
> +; The -mpower8-fusion and -mpower8-fusion-sign options existed in the past, but
> +; they are ignored now.
If the comment is necessary (it may be redundant with the Ignore
attribute in place) I suggest s/existed in the past, but they//
> mpower8-fusion
> -Target Mask(P8_FUSION) Var(rs6000_isa_flags)
> -Fuse certain integer operations together for better performance on power8.
> +Target Undocumented Ignore
>
> mpower8-fusion-sign
> -Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags)
> -Allow sign extension in fusion operations.
> +Target Undocumented Ignore
>
> mpower8-vector
> Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1a51759e6e4..7c21779e3b2 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1266,8 +1266,7 @@ See RS/6000 and PowerPC Options.
> -mveclibabi=@var{type} -mfriz -mno-friz @gol
> -mpointers-to-nested-functions -mno-pointers-to-nested-functions @gol
> -msave-toc-indirect -mno-save-toc-indirect @gol
> --mpower8-fusion -mno-mpower8-fusion -mpower8-vector -mno-power8-vector @gol
> --mcrypto -mno-crypto -mhtm -mno-htm @gol
> +-mpower8-vector -mno-power8-vector -mcrypto -mno-crypto -mhtm -mno-htm @gol
> -mquad-memory -mno-quad-memory @gol
> -mquad-memory-atomic -mno-quad-memory-atomic @gol
> -mcompat-align-parm -mno-compat-align-parm @gol
> @@ -28355,7 +28354,7 @@ following options:
> -mpopcntb -mpopcntd -mpowerpc64 @gol
> -mpowerpc-gpopt -mpowerpc-gfxopt @gol
> -mmulhw -mdlmzb -mmfpgpr -mvsx @gol
> --mcrypto -mhtm -mpower8-fusion -mpower8-vector @gol
> +-mcrypto -mhtm -mpower8-vector @gol
> -mquad-memory -mquad-memory-atomic -mfloat128 @gol
> -mfloat128-hardware -mprefixed -mpcrel -mmma @gol
> -mrop-protect}
> @@ -28470,14 +28469,6 @@ Enable (disable) the use of the built-in functions that allow direct
> access to the Hardware Transactional Memory (HTM) instructions that
> were added in version 2.07 of the PowerPC ISA.
>
> -@item -mpower8-fusion
> -@itemx -mno-power8-fusion
> -@opindex mpower8-fusion
> -@opindex mno-power8-fusion
> -Generate code that keeps (does not keeps) some integer operations
> -adjacent so that the instructions can be fused together on power8 and
> -later processors.
> -
> @item -mpower8-vector
> @itemx -mno-power8-vector
> @opindex mpower8-vector
> diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> index e1004f5cfbf..b215b701097 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> @@ -1,7 +1,7 @@
> /* { dg-lto-do link } */
> /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */
> /* -Wno-attributes suppresses always_inline warnings. */
> -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes -mno-power8-fusion" } } */
> +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes" } } */
>
> int __attribute__ ((always_inline))
> foo1 (int *b)
> diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> index ebb0a7b8fa1..ffab23ab7e1 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> @@ -1,6 +1,6 @@
> /* { dg-lto-do link } */
> /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */
> -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -flto -fdump-ipa-inline" } } */
> +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -fdump-ipa-inline" } } */
>
> int
> foo1 (int *b)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> index 21c982d93f0..0cb3a4cb9f9 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -fdump-tree-einline-optimized" } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -fdump-tree-einline-optimized" } */
>
> /* Like pr102059-1.c, to verify the inlining still happens
> even without always_inline attribute. */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> new file mode 100644
> index 00000000000..5fe66f8af4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-require-effective-target power10_ok } */
I think for coverage, this test could actually target power9_ok, but
sufficient as is.
lgtm,
thanks
-Will
> +
> +/* Verify that power10 can explicity include functions compiled for power8.
> + The issue was -mcpu=power8 enables -mpower8-fusion, but -mcpu=power9 or
> + -mcpu=power10 do not set power8-fusion by default. Thus when doing this
> + compilation, they would get an error that the inline function failed in its
> + inlining due to having incompatible options. */
> +
> +static inline int __attribute__ ((always_inline,target("cpu=power8")))
> +foo (int *b)
> +{
> + *b += 10;
> + return *b;
> +}
> +
> +int
> +bar (int *a)
> +{
> + *a = foo (a);
> + return 0;
> +}
> --
> 2.35.1
>
>
More information about the Gcc-patches
mailing list