Bug 93932 - PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors
Summary: PowerPC vec_extract with variable element number has code regressions for V2D...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: ---
Assignee: Michael Meissner
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2020-02-25 19:43 UTC by Michael Meissner
Modified: 2020-04-16 18:05 UTC (History)
4 users (show)

See Also:
Host:
Target: powerpc
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-02-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meissner 2020-02-25 19:43:21 UTC
I've been looking at vec_extract recently, both in terms of support for the -mcpu=future and to look at supporting PR target/93230 in GCC 11.

In some cases, if you have a vec_extract built-in function where the vector is in a register, and the element is variable, the compiler decides store this vector to memory, and then do the variable extract using a scalar load.  Unfortunately, this lead to a STORE-HIT-LOAD slowdown, as the scalar load will likely have to wait for the vector store to finish.

The test cases are the fold-vect-extract-<type>.p{7,8,9}.c} files in the gcc.target/powerpc directory, where <type> is 'char', 'short', 'int', 'longlong', 'float' and 'double', and the p7/p8/p9 indicates whether the test is for -mcpu=power7, -mcpu=power8, or -mcpu=power9.

For -mcpu=power8, the regressions are:
fold-vect-extract-double.p8.c: GCC 9.x and current trunk
fold-vect-extract-longlong.p8.c: GCC 9.x and current trunk

For -mcpu=power9, the regressions are:
fold-vect-extract-double.p9.c: GCC 9.x (current trunk is ok)
fold-vect-extract-longlong.p9.c: GCC 9.x (current trunk is ok)
Comment 1 Michael Meissner 2020-02-25 19:49:25 UTC
I've discovered that the issue is the combined insn that does variable extract where it handles both the register case and the memory case:

(define_insn_and_split "vsx_extract_<mode>_var"
  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
			    UNSPEC_VSX_EXTRACT))
   (clobber (match_scratch:DI 3 "=r,&b,&b"))
   (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  rs6000_split_vec_extract_var (operands[0], operands[1], operands[2],
				operands[3], operands[4]);
  DONE;
})

If I split the insn into two separate patterns, one that handles only the register, and the other that only handles memory accesses.  This way the compiler doesn't create the store and does the variable extract in the register.
Comment 2 Segher Boessenkool 2020-02-25 21:41:50 UTC
So the issue is that input_operand allows too much?

Other patterns that could use such a fix are float<QHI:mode><FP_ISA3:mode>2,
floatuns<QHI:mode><FP_ISA3:mode>2, movsd_store, movsd_load,
*vsx_le_permute_<mode>, vsx_set_v1ti, vsx_extract_<mode>_var,
vsx_extract_v4sf_var, vsx_extract_<mode>_var, 
*vsx_extract_<mode>_<VS_scalar>mode_var, vsx_splat_<mode>.
A few of those are probably safe, the define_expand ones for example; but
those are all uses of input_operand in an insn that isn't a movM (which
is the only place this predicate should be used).
Comment 3 Michael Meissner 2020-02-25 23:09:58 UTC
While I agree that in general, we should only use input_operand for moves and define_expands, I tend to think in the short term (GCC 10) we should just fix the case we know about.  As you point out, this is used in every single place where we fold sign/zero/float extension into a load.

In looking at gcc 8 and gcc 9, the variable extract patterns are mostly the same, except gcc 8 uses 'ww', etc. constraints, while gcc 9/10 uses the 'isa' attribute to eliminate the cases using power9 instructions on power8.

I don't know why only V2DI/V2DF shows it up, when V4SF/V4SI/V8HI/V16QI use the same construct, and why -mcpu=power9 compiles it ok on trunk, but not gcc 9.
Comment 4 GCC Commits 2020-02-27 19:42:48 UTC
The master branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:75c299acc47b19fd6c445a6931798325a54e6fb4

commit r10-6908-g75c299acc47b19fd6c445a6931798325a54e6fb4
Author: Michael Meissner <meissner@gcc.gnu.org>
Date:   Thu Feb 27 14:41:39 2020 -0500

    Fix PR target/93932
    
    [gcc]
    2020-02-27  Michael Meissner  <meissner@linux.ibm.com>
    
    	PR target/93932
    	* config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator):
    	Split the insn into two parts.  This insn only does variable
    	extract from a register.
    	(vsx_extract_<mode>_var_load, VSX_D iterator): New insn, do
    	variable extract from memory.
    	(vsx_extract_v4sf_var): Split the insn into two parts.  This insn
    	only does variable extract from a register.
    	(vsx_extract_v4sf_var_load): New insn, do variable extract from
    	memory.
    	(vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Split the insn
    	into two parts.  This insn only does variable extract from a
    	register.
    	(vsx_extract_<mode>_var_load, VSX_EXTRACT_I iterator): New insn,
    	do variable extract from memory.
    
    [gcc/testsuite]
    2020-02-27  Michael Meissner  <meissner@linux.ibm.com>
    
    	PR target/93932
    	* gcc.target/powerpc/fold-vec-extract-longlong.p8.c: Adjust
    	instruction counts.
Comment 5 Richard Biener 2020-03-20 08:36:00 UTC
Fixed on trunk?
Comment 6 GCC Commits 2020-04-09 17:25:37 UTC
The releases/gcc-9 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:892c755eae2e2e45547395013695fdd819c297fa

commit r9-8486-g892c755eae2e2e45547395013695fdd819c297fa
Author: Michael Meissner <meissner@gcc.gnu.org>
Date:   Thu Apr 9 12:25:05 2020 -0500

    Backport PR target/93932 (variable vec_extract) to GCC 9
    
    2020-04-09  Michael Meissner  <meissner@linux.ibm.com>
    
            Back port from trunk
            2020-02-26  Michael Meissner  <meissner@linux.ibm.com>
    
            PR target/93932
            * config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator):
            Split the insn into two parts.  This insn only does variable
            extract from a register.
            (vsx_extract_<mode>_var_load, VSX_D iterator): New insn, do
            variable extract from memory.
            (vsx_extract_v4sf_var): Split the insn into two parts.  This insn
            only does variable extract from a register.
            (vsx_extract_v4sf_var_load): New insn, do variable extract from
            memory.
            (vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Split the insn
            into two parts.  This insn only does variable extract from a
            register.
            (vsx_extract_<mode>_var_load, VSX_EXTRACT_I iterator): New insn,
            do variable extract from memory.
Comment 7 GCC Commits 2020-04-16 16:51:10 UTC
The releases/gcc-9 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:baf3a5a94244b4a260810825870be6ecc15fa35a

commit r9-8504-gbaf3a5a94244b4a260810825870be6ecc15fa35a
Author: Michael Meissner <2019-02-12  Michael Meissner  meissner@linux.ibm.com>
Date:   Thu Apr 16 12:49:22 2020 -0400

    Fix target/94557 PowerPC regression on GCC 9 (variable vec_extract)
    
    2020-04-16  Michael Meissner  <meissner@linux.ibm.com>
    
            PR target/94557
            * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Fix
            regression caused by PR target/93932 backport.  Mask variable
            vector extract index so it does not go beyond the vector when
            extracting a vector element from memory.
Comment 8 Michael Meissner 2020-04-16 18:05:05 UTC
With the committal of the fix to PR target/94557 (fix regression caused on the GCC 9 branch by PR target/93932 patch), this patch now can be closed.