Bug 89213 - Optimize V2DI shifts by a constant on power8 & above systems.
Summary: Optimize V2DI shifts by a constant on power8 & above systems.
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.0
: P3 enhancement
Target Milestone: ---
Assignee: Michael Meissner
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2019-02-05 20:54 UTC by Michael Meissner
Modified: 2024-09-18 01:06 UTC (History)
3 users (show)

See Also:
Host:
Target: powerpc*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-02-05 00:00:00


Attachments
Proposed patch to fix the problem (1.43 KB, patch)
2019-02-05 21:28 UTC, Michael Meissner
Details | Diff
Revised proposed patch (2.33 KB, patch)
2019-02-05 22:49 UTC, Michael Meissner
Details | Diff
Updated patch for GCC 15 (2.88 KB, patch)
2024-08-12 21:45 UTC, Michael Meissner
Details | Diff
Updated patch for GCC 15 V2 (2.65 KB, patch)
2024-08-12 21:57 UTC, Michael Meissner
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meissner 2019-02-05 20:54:33 UTC
ISA 2.07 (i.e. -mcpu=power8) and above added support for doing various operations on V2DI (i.e. vector long long) data types, including shifts.

If you generate code to shift a V2DI type by a constant, the compiler generates sub-optimal code:

For example:

#include <altivec.h>

typedef vector long long vi64_t;
typedef vector unsigned long long vui64_t;

vi64_t
shiftra_test64 (vi64_t a)
{
  vui64_t x = {4, 4};
  return (vi64_t) vec_vsrad (a, x);
}

Generates:

shiftra_test64:
        xxspltib 0,4
        xxlor 32,0,0
        vextsb2d 0,0
        vsrad 2,2,0
        blr

when it could generate:

shiftra_test64:
        vspltisw 0,4
        vsrad 2,2,0
        blr

This is true of all 3 shift operations (shift left, logical shift right, and arithmetic shift right).
Comment 1 Michael Meissner 2019-02-05 21:28:17 UTC
Created attachment 45611 [details]
Proposed patch to fix the problem

This patch adds combiner insns to match attempted vector long long shifts by a constant on ISA 3.0.
Comment 2 Michael Meissner 2019-02-05 22:49:51 UTC
Created attachment 45612 [details]
Revised proposed patch

This patch also adds optimizations for V4SI for constant shifts in the 16..31 range on ISA 3.0 and also adds a test.
Comment 3 Michael Meissner 2019-02-06 08:30:22 UTC
The vector shift sequence does not appear in Spec 2006 CPU compiled for power9.

The vector shift sequence does appear 4 times in the 602_gcc_s benchmark, and once in the 683_imagick_s benchmark.
Comment 4 Segher Boessenkool 2019-02-06 11:31:43 UTC
You could just do

xxspltib xx,sh
vsrad 2,2,xx

(only the low 6 bits of the shift count are looked at, for 64-bit shifts,
in all vector insns).
Comment 5 Michael Meissner 2019-02-06 14:53:42 UTC
Sure I could use XXSPLTIB all of the time if I limit the optimization to ISA 3.0 (power9).  I was trying to add optimization for shift counts for 1..15 on ISA 2.07 (power8) as well, hence using VSPLTISW for the constants that fit in that range.

Looking at the code for ISA 2.07, I think we need to extend the code for loading duplicated constants to consider using vspltisw/vupklsw on ISA 2.07 instead of xxspltib/vecb2d that we can use on ISA 3.0.  I think I missed that the last time I was updating the code.
Comment 6 Segher Boessenkool 2019-02-06 16:46:58 UTC
For 32-bit or smaller shifts you can use vspltisb always, or vspltis[hw] if
you prefer.

If generating code for ISA 2.07 (Power8) you don't have xxspltib but you do
have vsld/vsrd/vsrad/vrld, hrm.  You still can always generate the constant
with just two insns of course (vspltis* and either a vsl* or a vrl* by 1,
depending if you need the low bit 0 or 1, for example).  But you cannot in
general do it in one (non-load ;-) ) insn I think.
Comment 7 Michael Meissner 2024-08-12 21:45:54 UTC
Created attachment 58918 [details]
Updated patch for GCC 15

I updated this patch for GCC 15.  I will try it out and submit.
Comment 8 Michael Meissner 2024-08-12 21:57:40 UTC
Created attachment 58919 [details]
Updated patch for GCC 15 V2

I had the wrong ChangeLog message in the previous version of the patch.
Comment 9 GCC Commits 2024-09-18 01:06:06 UTC
The master branch has been updated by Michael Meissner <meissner@gcc.gnu.org>:

https://gcc.gnu.org/g:9a07ac151327f61963b092062eb8566dd0c6f0cd

commit r15-3676-g9a07ac151327f61963b092062eb8566dd0c6f0cd
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Sep 17 21:05:27 2024 -0400

    PR 89213: Add better support for shifting vectors with 64-bit elements
    
    This patch fixes PR target/89213 to allow better code to be generated to do
    constant shifts of V2DI/V2DF vectors.  Previously GCC would do constant shifts
    of vectors with 64-bit elements by using:
    
            XXSPLTIB 32,4
            VEXTSB2D 0,0
            VSRAD 2,2,0
    
    I.e., the PowerPC does not have a VSPLTISD instruction to load -15..14 for the
    64-bit shift count in one instruction.  Instead, it would need to load a byte
    and then convert it to 64-bit.
    
    With this patch, GCC now realizes that the vector shift instructions will look
    at the bottom 6 bits for the shift count, and it can use either a VSPLTISW or
    XXSPLTIB instruction to load the shift count.
    
    2024-09-17  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/89213
            * config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec.
            (VSHIFT_MODE): New mode iterator.
            (vshift_code): New code iterator.
            (vshift_attr): New code attribute.
            (altivec_<mode>_<vshift_attr>_const): New pattern to optimize
            vector long long/int shifts by a constant.
            (altivec_<mode>_shift_const): New helper insn to load up a
            constant used by the shift operation.
            * config/rs6000/predicates.md (vector_shift_constant): New
            predicate.
    
    gcc/testsuite/
    
            PR target/89213
            * gcc.target/powerpc/pr89213.c: New test.
            * gcc.target/powerpc/vec-rlmi-rlnm.c: Update instruction count.