ISA 2.07 (i.e. -mcpu=power8) and above added support for doing various operations on V2DI (i.e. vector long long) data types, including shifts. If you generate code to shift a V2DI type by a constant, the compiler generates sub-optimal code: For example: #include <altivec.h> typedef vector long long vi64_t; typedef vector unsigned long long vui64_t; vi64_t shiftra_test64 (vi64_t a) { vui64_t x = {4, 4}; return (vi64_t) vec_vsrad (a, x); } Generates: shiftra_test64: xxspltib 0,4 xxlor 32,0,0 vextsb2d 0,0 vsrad 2,2,0 blr when it could generate: shiftra_test64: vspltisw 0,4 vsrad 2,2,0 blr This is true of all 3 shift operations (shift left, logical shift right, and arithmetic shift right).
Created attachment 45611 [details] Proposed patch to fix the problem This patch adds combiner insns to match attempted vector long long shifts by a constant on ISA 3.0.
Created attachment 45612 [details] Revised proposed patch This patch also adds optimizations for V4SI for constant shifts in the 16..31 range on ISA 3.0 and also adds a test.
The vector shift sequence does not appear in Spec 2006 CPU compiled for power9. The vector shift sequence does appear 4 times in the 602_gcc_s benchmark, and once in the 683_imagick_s benchmark.
You could just do xxspltib xx,sh vsrad 2,2,xx (only the low 6 bits of the shift count are looked at, for 64-bit shifts, in all vector insns).
Sure I could use XXSPLTIB all of the time if I limit the optimization to ISA 3.0 (power9). I was trying to add optimization for shift counts for 1..15 on ISA 2.07 (power8) as well, hence using VSPLTISW for the constants that fit in that range. Looking at the code for ISA 2.07, I think we need to extend the code for loading duplicated constants to consider using vspltisw/vupklsw on ISA 2.07 instead of xxspltib/vecb2d that we can use on ISA 3.0. I think I missed that the last time I was updating the code.
For 32-bit or smaller shifts you can use vspltisb always, or vspltis[hw] if you prefer. If generating code for ISA 2.07 (Power8) you don't have xxspltib but you do have vsld/vsrd/vsrad/vrld, hrm. You still can always generate the constant with just two insns of course (vspltis* and either a vsl* or a vrl* by 1, depending if you need the low bit 0 or 1, for example). But you cannot in general do it in one (non-load ;-) ) insn I think.
Created attachment 58918 [details] Updated patch for GCC 15 I updated this patch for GCC 15. I will try it out and submit.
Created attachment 58919 [details] Updated patch for GCC 15 V2 I had the wrong ChangeLog message in the previous version of the patch.
The master branch has been updated by Michael Meissner <meissner@gcc.gnu.org>: https://gcc.gnu.org/g:9a07ac151327f61963b092062eb8566dd0c6f0cd commit r15-3676-g9a07ac151327f61963b092062eb8566dd0c6f0cd Author: Michael Meissner <meissner@linux.ibm.com> Date: Tue Sep 17 21:05:27 2024 -0400 PR 89213: Add better support for shifting vectors with 64-bit elements This patch fixes PR target/89213 to allow better code to be generated to do constant shifts of V2DI/V2DF vectors. Previously GCC would do constant shifts of vectors with 64-bit elements by using: XXSPLTIB 32,4 VEXTSB2D 0,0 VSRAD 2,2,0 I.e., the PowerPC does not have a VSPLTISD instruction to load -15..14 for the 64-bit shift count in one instruction. Instead, it would need to load a byte and then convert it to 64-bit. With this patch, GCC now realizes that the vector shift instructions will look at the bottom 6 bits for the shift count, and it can use either a VSPLTISW or XXSPLTIB instruction to load the shift count. 2024-09-17 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/89213 * config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec. (VSHIFT_MODE): New mode iterator. (vshift_code): New code iterator. (vshift_attr): New code attribute. (altivec_<mode>_<vshift_attr>_const): New pattern to optimize vector long long/int shifts by a constant. (altivec_<mode>_shift_const): New helper insn to load up a constant used by the shift operation. * config/rs6000/predicates.md (vector_shift_constant): New predicate. gcc/testsuite/ PR target/89213 * gcc.target/powerpc/pr89213.c: New test. * gcc.target/powerpc/vec-rlmi-rlnm.c: Update instruction count.
Patch committed on September 18th, 2024.