[PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

Wed Apr 20 21:20:23 GMT 2022

On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059
> 
> This is V4 of the patch.  Compared to V3 of the patch, GCC will just
> ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.
> 

Hi, 
No comments on code, a few comments about the comments below.

> The splitting of signed halfword and word loads into unsigned load and
> sign extension is now suppressed with -Os, but it is done normally if we
> are not optimizing for space.

I see references to TARGET_P8_FUSION_SIGN in the patch below, and some
removal of old code.  I assume this describes the implementation that
remains.  

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.

specification.  :-)

> 
> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.

inlining. 

> 
> This code makes the -mpower8-fusion option a nop.  It is accepted without
> warning, but it does nothing.  Power8 fusion is only enabled if we are tuning
> for a power8.
> 
> The undocumented -mpower8-fusion-sign option is also made into a nop.
> 
> I left in the pragma target and attribute target support for power8-fusion, but
> using it doesn't do anything now.  This is because I told the customer who
> encountered this problem that one solution was to add an explicit
> no-power8-fusion option in their target pragma or attribute to work around the
> problem.
> 
> I have tested this patch on a little endian power10 system.  I have tested
> previous versions on little endian power9 and big endian power8 systems.
> Can I apply this patch to the master branch?
> 
> If it is accepted, I will produce a similar patch for back porting to GCC 11
> and GCC 10.
> 
> 2022-04-12   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 	PR target/102059
> 	* config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
> 	(ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
> 	(POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.
> 	* config/rs6000/rs6000.cc (rs6000_option_override_internal):
> 	Delete code that set the power8 fusion options automatically.
> 	(rs6000_opt_masks): Allow #pragma target and attribute target
> 	power8-fusion option for backwards compatibility.
> 	(rs6000_print_options_internal): Skip printing backward
> 	compatibility options that are just ignored.
> 	* config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
> 	(TARGET_P8_FUSION_SIGN): Likewise.
> 	(MASK_P8_FUSION): Delete.
> 	* config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
> 	ignore it completely.
> 	(-mpower8-fusion-sign): Likewise.
> 	* doc/invoke.texi (RS/6000 and PowerPC Options): Delete
> 	-mpower8-fusion.
> 
> gcc/testsuite/
> 	PR target/102059
> 	* gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
> 	* gcc.dg/lto/pr102059-2_0.c: Likewise.
> 	* gcc.target/powerpc/pr102059-3.c: Likewise.
> 	* gcc.target/powerpc/pr102059-4.c: New test.
> ---
>  gcc/config/rs6000/rs6000-cpus.def             | 18 +++----
>  gcc/config/rs6000/rs6000.cc                   | 49 +++++--------------
>  gcc/config/rs6000/rs6000.h                    | 13 ++++-
>  gcc/config/rs6000/rs6000.opt                  |  8 +--
>  gcc/doc/invoke.texi                           | 13 +----
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c       |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c       |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +++++++++
>  9 files changed, 62 insertions(+), 68 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..d913a3d6b73 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -54,19 +54,14 @@
>  				 | OPTION_MASK_QUAD_MEMORY		\
>  				 | OPTION_MASK_QUAD_MEMORY_ATOMIC)
> 
> -/* ISA masks setting fusion options.  */
> -#define OTHER_FUSION_MASKS	(OPTION_MASK_P8_FUSION			\
> -				 | OPTION_MASK_P8_FUSION_SIGN)
> -
>  /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
>     FLOAT128_HW here until we are ready to make -mfloat128 on by default.  */
> -#define ISA_3_0_MASKS_SERVER	((ISA_2_7_MASKS_SERVER			\
> -				  | OPTION_MASK_ISEL			\
> -				  | OPTION_MASK_MODULO			\
> -				  | OPTION_MASK_P9_MINMAX		\
> -				  | OPTION_MASK_P9_MISC			\
> -				  | OPTION_MASK_P9_VECTOR)		\
> -				 & ~OTHER_FUSION_MASKS)
> +#define ISA_3_0_MASKS_SERVER	(ISA_2_7_MASKS_SERVER			\
> +				 | OPTION_MASK_ISEL			\
> +				 | OPTION_MASK_MODULO			\
> +				 | OPTION_MASK_P9_MINMAX		\
> +				 | OPTION_MASK_P9_MISC			\
> +				 | OPTION_MASK_P9_VECTOR)
> 
>  /* Support for the IEEE 128-bit floating point hardware requires a lot of the
>     VSX instructions that are part of ISA 3.0.  */
> @@ -140,7 +135,6 @@
>  				 | OPTION_MASK_MODULO			\
>  				 | OPTION_MASK_MULHW			\
>  				 | OPTION_MASK_NO_UPDATE		\
> -				 | OPTION_MASK_P8_FUSION		\
>  				 | OPTION_MASK_P8_VECTOR		\
>  				 | OPTION_MASK_P9_MINMAX		\
>  				 | OPTION_MASK_P9_MISC			\
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index ceaddafd33b..269c7510e3e 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4040,41 +4040,6 @@ rs6000_option_override_internal (bool global_init_p)
>        && optimize_function_for_speed_p (cfun))
>      rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
> 
> -  /* Enable power8 fusion if we are tuning for power8, even if we aren't
> -     generating power8 instructions.  Power9 does not optimize power8 fusion
> -     cases.  */
> -  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION))
> -    {
> -      if (processor_target_table[tune_index].processor == PROCESSOR_POWER8)
> -	rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
> -      else
> -	rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
> -    }
> -
> -  /* Setting additional fusion flags turns on base fusion.  */
> -  if (!TARGET_P8_FUSION && TARGET_P8_FUSION_SIGN)
> -    {
> -      if (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)
> -	{
> -	  if (TARGET_P8_FUSION_SIGN)
> -	    error ("%qs requires %qs", "-mpower8-fusion-sign",
> -		   "-mpower8-fusion");
> -
> -	  rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
> -	}
> -      else
> -	rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
> -    }
> -
> -  /* Power8 does not fuse sign extended loads with the addis.  If we are
> -     optimizing at high levels for speed, convert a sign extended load into a
> -     zero extending load, and an explicit sign extension.  */
> -  if (TARGET_P8_FUSION
> -      && !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
> -      && optimize_function_for_speed_p (cfun)
> -      && optimize >= 3)
> -    rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
> -
>    /* ISA 3.0 vector instructions include ISA 2.07.  */
>    if (TARGET_P9_VECTOR && !TARGET_P8_VECTOR)
>      {
> @@ -24010,8 +23975,6 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
>    { "pcrel-opt",		OPTION_MASK_PCREL_OPT,		false, true  },
>    { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
>    { "popcntd",			OPTION_MASK_POPCNTD,		false, true  },
> -  { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
> -  { "power8-fusion-sign",	OPTION_MASK_P8_FUSION_SIGN,	false, true  },
>    { "power8-vector",		OPTION_MASK_P8_VECTOR,		false, true  },
>    { "power9-minmax",		OPTION_MASK_P9_MINMAX,		false, true  },
>    { "power9-misc",		OPTION_MASK_P9_MISC,		false, true  },
> @@ -24051,6 +24014,13 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
>  #endif
>    { "soft-float",		OPTION_MASK_SOFT_FLOAT,		false, false },
>    { "string",			0,				false, false },
> +
> +  /* The Power8 fusion option was removed.  We ignore using it in #pragma and
> +     attribute target.  Users may have used the options to suppress errors if
> +     they declare an inline function to be specifically power8 and the function
> +     was included by power9 or power10 which turned off the power8 fusion
> +     support.  */

"was removed" is a good detail for history, but possibly not required
here?    Maybe stl "The Power8 fusion option is a no-op".  ?

> +  { "power8-fusion",		0,				false, true  },
>  };
> 
>  /* Builtin mask mapping for printing the flags.  */
> @@ -24687,6 +24657,11 @@ rs6000_print_options_internal (FILE *file,
>        HOST_WIDE_INT mask = opts[i].mask;
>        size_t len = comma_len + prefix_len + strlen (name);
> 
> +      /* Don't print options that exist for backwards compatibility, but are
> +	 ignored now like -mpower8-fusion.  */
> +      if (!mask)
> +	continue;

I'm not sure calling out the -mpower8-fusion option here is necessary. 
perhaps s/ignored now like -mpower8-fusion/masked out./

> +
>        if (!invert)
>  	{
>  	  if ((flags & mask) == 0)
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 523256a5c9d..52a29e1c702 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -504,6 +504,18 @@ extern int rs6000_vector_align[];
>  #define TARGET_MINMAX	(TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT		\
>  			 && (TARGET_P9_MINMAX || !flag_trapping_math))
> 
> +/* Power8 has special fusion operations that are enabled if we are tuning for
> +   power8.  This used to be settable with an option (-mpower8-fusion), but that
> +   option has been removed.  */

s/has been removed/is now ignored/  ?

> +#define TARGET_P8_FUSION	(rs6000_tune == PROCESSOR_POWER8)
> +
> +/* Power8 fusion does not fuse loads with sign extends.  If we are not
> +   optimizing for space, split loads with sign extension to loads with zero
> +   extension and an explicit sign extend operation, so that the zero extending
> +   load can be fused.  */
> +#define TARGET_P8_FUSION_SIGN	(TARGET_P8_FUSION			\
> +				 && !optimize_function_for_size_p (cfun))
> +
>  /* In switching from using target_flags to using rs6000_isa_flags, the options
>     machinery creates OPTION_MASK_<xxx> instead of MASK_<xxx>.  For now map
>     OPTION_MASK_<xxx> back into MASK_<xxx>.  */
> @@ -517,7 +529,6 @@ extern int rs6000_vector_align[];
>  #define MASK_FLOAT128_KEYWORD		OPTION_MASK_FLOAT128_KEYWORD
>  #define MASK_FLOAT128_HW		OPTION_MASK_FLOAT128_HW
>  #define MASK_FPRND			OPTION_MASK_FPRND
> -#define MASK_P8_FUSION			OPTION_MASK_P8_FUSION
>  #define MASK_HARD_FLOAT			OPTION_MASK_HARD_FLOAT
>  #define MASK_HTM			OPTION_MASK_HTM
>  #define MASK_ISEL			OPTION_MASK_ISEL
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 4931d781c4e..6c4caf4c9ee 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -474,13 +474,13 @@ Save the TOC in the prologue for indirect calls rather than inline.
>  mvsx-timode
>  Target RejectNegative Undocumented Ignore
> 
> +; The -mpower8-fusion and -mpower8-fusion-sign options existed in the past, but
> +; they are ignored now.

If the comment is necessary (it may be redundant with the Ignore
attribute in place)  I suggest  s/existed in the past, but they//

>  mpower8-fusion
> -Target Mask(P8_FUSION) Var(rs6000_isa_flags)
> -Fuse certain integer operations together for better performance on power8.
> +Target Undocumented Ignore
> 
>  mpower8-fusion-sign
> -Target Undocumented Mask(P8_FUSION_SIGN) Var(rs6000_isa_flags)
> -Allow sign extension in fusion operations.
> +Target Undocumented Ignore
> 
>  mpower8-vector
>  Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1a51759e6e4..7c21779e3b2 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1266,8 +1266,7 @@ See RS/6000 and PowerPC Options.
>  -mveclibabi=@var{type}  -mfriz  -mno-friz @gol
>  -mpointers-to-nested-functions  -mno-pointers-to-nested-functions @gol
>  -msave-toc-indirect  -mno-save-toc-indirect @gol
> --mpower8-fusion  -mno-mpower8-fusion  -mpower8-vector  -mno-power8-vector @gol
> --mcrypto  -mno-crypto  -mhtm  -mno-htm @gol
> +-mpower8-vector  -mno-power8-vector -mcrypto  -mno-crypto  -mhtm  -mno-htm @gol
>  -mquad-memory  -mno-quad-memory @gol
>  -mquad-memory-atomic  -mno-quad-memory-atomic @gol
>  -mcompat-align-parm  -mno-compat-align-parm @gol
> @@ -28355,7 +28354,7 @@ following options:
>  -mpopcntb  -mpopcntd  -mpowerpc64 @gol
>  -mpowerpc-gpopt  -mpowerpc-gfxopt @gol
>  -mmulhw  -mdlmzb  -mmfpgpr  -mvsx @gol
> --mcrypto  -mhtm  -mpower8-fusion  -mpower8-vector @gol
> +-mcrypto  -mhtm  -mpower8-vector @gol
>  -mquad-memory  -mquad-memory-atomic  -mfloat128 @gol
>  -mfloat128-hardware -mprefixed -mpcrel -mmma @gol
>  -mrop-protect}
> @@ -28470,14 +28469,6 @@ Enable (disable) the use of the built-in functions that allow direct
>  access to the Hardware Transactional Memory (HTM) instructions that
>  were added in version 2.07 of the PowerPC ISA.
> 
> -@item -mpower8-fusion
> -@itemx -mno-power8-fusion
> -@opindex mpower8-fusion
> -@opindex mno-power8-fusion
> -Generate code that keeps (does not keeps) some integer operations
> -adjacent so that the instructions can be fused together on power8 and
> -later processors.
> -
>  @item -mpower8-vector
>  @itemx -mno-power8-vector
>  @opindex mpower8-vector
> diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> index e1004f5cfbf..b215b701097 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr102059-1_0.c
> @@ -1,7 +1,7 @@
>  /* { dg-lto-do link } */
>  /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */
>  /* -Wno-attributes suppresses always_inline warnings.  */
> -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes -mno-power8-fusion" } } */
> +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes" } } */
> 
>  int __attribute__ ((always_inline))
>  foo1 (int *b)
> diff --git a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> index ebb0a7b8fa1..ffab23ab7e1 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr102059-2_0.c
> @@ -1,6 +1,6 @@
>  /* { dg-lto-do link } */
>  /* { dg-skip-if "power10 and above only" { ! { power10_ok } } } */
> -/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -flto -fdump-ipa-inline" } } */
> +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -fdump-ipa-inline" } } */
> 
>  int
>  foo1 (int *b)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> index 21c982d93f0..0cb3a4cb9f9 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-power8-fusion -fdump-tree-einline-optimized" } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -fdump-tree-einline-optimized" } */
> 
>  /* Like pr102059-1.c, to verify the inlining still happens
>     even without always_inline attribute.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> new file mode 100644
> index 00000000000..5fe66f8af4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-require-effective-target power10_ok } */

I think for coverage, this test could actually target power9_ok, but
sufficient as is. 

lgtm,
thanks
-Will

> +
> +/* Verify that power10 can explicity include functions compiled for power8.
> +   The issue was -mcpu=power8 enables -mpower8-fusion, but -mcpu=power9 or
> +   -mcpu=power10 do not set power8-fusion by default.  Thus when doing this
> +   compilation, they would get an error that the inline function failed in its
> +   inlining due to having incompatible options.  */
> +
> +static inline int __attribute__ ((always_inline,target("cpu=power8")))
> +foo (int *b)
> +{
> +  *b += 10;
> +  return *b;
> +}
> +
> +int
> +bar (int *a)
> +{
> +  *a = foo (a);
> +  return 0;
> +}
> -- 
> 2.35.1
> 
>