I've been looking at vec_extract recently, both in terms of support for the -mcpu=future and to look at supporting PR target/93230 in GCC 11. In some cases, if you have a vec_extract built-in function where the vector is in a register, and the element is variable, the compiler decides store this vector to memory, and then do the variable extract using a scalar load. Unfortunately, this lead to a STORE-HIT-LOAD slowdown, as the scalar load will likely have to wait for the vector store to finish. The test cases are the fold-vect-extract-<type>.p{7,8,9}.c} files in the gcc.target/powerpc directory, where <type> is 'char', 'short', 'int', 'longlong', 'float' and 'double', and the p7/p8/p9 indicates whether the test is for -mcpu=power7, -mcpu=power8, or -mcpu=power9. For -mcpu=power8, the regressions are: fold-vect-extract-double.p8.c: GCC 9.x and current trunk fold-vect-extract-longlong.p8.c: GCC 9.x and current trunk For -mcpu=power9, the regressions are: fold-vect-extract-double.p9.c: GCC 9.x (current trunk is ok) fold-vect-extract-longlong.p9.c: GCC 9.x (current trunk is ok)
I've discovered that the issue is the combined insn that does variable extract where it handles both the register case and the memory case: (define_insn_and_split "vsx_extract_<mode>_var" [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r") (unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,Q,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) (clobber (match_scratch:V2DI 4 "=&v,X,X"))] "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" [(const_int 0)] { rs6000_split_vec_extract_var (operands[0], operands[1], operands[2], operands[3], operands[4]); DONE; }) If I split the insn into two separate patterns, one that handles only the register, and the other that only handles memory accesses. This way the compiler doesn't create the store and does the variable extract in the register.
So the issue is that input_operand allows too much? Other patterns that could use such a fix are float<QHI:mode><FP_ISA3:mode>2, floatuns<QHI:mode><FP_ISA3:mode>2, movsd_store, movsd_load, *vsx_le_permute_<mode>, vsx_set_v1ti, vsx_extract_<mode>_var, vsx_extract_v4sf_var, vsx_extract_<mode>_var, *vsx_extract_<mode>_<VS_scalar>mode_var, vsx_splat_<mode>. A few of those are probably safe, the define_expand ones for example; but those are all uses of input_operand in an insn that isn't a movM (which is the only place this predicate should be used).
While I agree that in general, we should only use input_operand for moves and define_expands, I tend to think in the short term (GCC 10) we should just fix the case we know about. As you point out, this is used in every single place where we fold sign/zero/float extension into a load. In looking at gcc 8 and gcc 9, the variable extract patterns are mostly the same, except gcc 8 uses 'ww', etc. constraints, while gcc 9/10 uses the 'isa' attribute to eliminate the cases using power9 instructions on power8. I don't know why only V2DI/V2DF shows it up, when V4SF/V4SI/V8HI/V16QI use the same construct, and why -mcpu=power9 compiles it ok on trunk, but not gcc 9.
The master branch has been updated by Michael Meissner <meissner@gcc.gnu.org>: https://gcc.gnu.org/g:75c299acc47b19fd6c445a6931798325a54e6fb4 commit r10-6908-g75c299acc47b19fd6c445a6931798325a54e6fb4 Author: Michael Meissner <meissner@gcc.gnu.org> Date: Thu Feb 27 14:41:39 2020 -0500 Fix PR target/93932 [gcc] 2020-02-27 Michael Meissner <meissner@linux.ibm.com> PR target/93932 * config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_<mode>_var_load, VSX_D iterator): New insn, do variable extract from memory. (vsx_extract_v4sf_var): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_v4sf_var_load): New insn, do variable extract from memory. (vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_<mode>_var_load, VSX_EXTRACT_I iterator): New insn, do variable extract from memory. [gcc/testsuite] 2020-02-27 Michael Meissner <meissner@linux.ibm.com> PR target/93932 * gcc.target/powerpc/fold-vec-extract-longlong.p8.c: Adjust instruction counts.
Fixed on trunk?
The releases/gcc-9 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>: https://gcc.gnu.org/g:892c755eae2e2e45547395013695fdd819c297fa commit r9-8486-g892c755eae2e2e45547395013695fdd819c297fa Author: Michael Meissner <meissner@gcc.gnu.org> Date: Thu Apr 9 12:25:05 2020 -0500 Backport PR target/93932 (variable vec_extract) to GCC 9 2020-04-09 Michael Meissner <meissner@linux.ibm.com> Back port from trunk 2020-02-26 Michael Meissner <meissner@linux.ibm.com> PR target/93932 * config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_<mode>_var_load, VSX_D iterator): New insn, do variable extract from memory. (vsx_extract_v4sf_var): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_v4sf_var_load): New insn, do variable extract from memory. (vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Split the insn into two parts. This insn only does variable extract from a register. (vsx_extract_<mode>_var_load, VSX_EXTRACT_I iterator): New insn, do variable extract from memory.
The releases/gcc-9 branch has been updated by Michael Meissner <meissner@gcc.gnu.org>: https://gcc.gnu.org/g:baf3a5a94244b4a260810825870be6ecc15fa35a commit r9-8504-gbaf3a5a94244b4a260810825870be6ecc15fa35a Author: Michael Meissner <2019-02-12 Michael Meissner meissner@linux.ibm.com> Date: Thu Apr 16 12:49:22 2020 -0400 Fix target/94557 PowerPC regression on GCC 9 (variable vec_extract) 2020-04-16 Michael Meissner <meissner@linux.ibm.com> PR target/94557 * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Fix regression caused by PR target/93932 backport. Mask variable vector extract index so it does not go beyond the vector when extracting a vector element from memory.
With the committal of the fix to PR target/94557 (fix regression caused on the GCC 9 branch by PR target/93932 patch), this patch now can be closed.