[PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]

Thu Oct 28 15:00:30 GMT 2021

On Thu, Oct 28, 2021 at 1:39 AM Xionghu Luo <luoxhu@linux.ibm.com> wrote:
>
> On 2021/10/27 21:24, David Edelsohn wrote:
> > On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo <luoxhu@linux.ibm.com> wrote:
> >>
> >> If the second operand of __builtin_shuffle is const vector 0, and with
> >> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
> >>
> >> gcc/ChangeLog:
> >>
> >>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
> >>         patterns match and emit for VSX xxpermdi.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         * gcc.target/powerpc/pr102868.c: New test.
> >> ---
> >>  gcc/config/rs6000/rs6000.c                  | 47 ++++++++++++++++--
> >>  gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +++++++++++++++++++++
> >>  2 files changed, 97 insertions(+), 3 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index d0730253bcc..5d802c1fa96 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
> >>      {OPTION_MASK_P8_VECTOR,
> >>       BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
> >>                       : CODE_FOR_p8_vmrgew_v4sf_direct,
> >> -     {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
> >> +     {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
> >> +    {OPTION_MASK_VSX,
> >> +     (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +                      : CODE_FOR_vsx_xxpermdi_v16qi),
> >> +     {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
> >> +    {OPTION_MASK_VSX,
> >> +     (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +                      : CODE_FOR_vsx_xxpermdi_v16qi),
> >> +     {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
> >> +    {OPTION_MASK_VSX,
> >> +     (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +                      : CODE_FOR_vsx_xxpermdi_v16qi),
> >> +     {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
> >> +    {OPTION_MASK_VSX,
> >> +     (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +                      : CODE_FOR_vsx_xxpermdi_v16qi),
> >> +     {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};
> >
> > If the insn_code is the same for big endian and little endian, why
> > does the new code test BYTES_BIG_ENDIAN to set the same value
> > (CODE_FOR_vsx_xxpermdi_v16qi)?
> >
>
> Thanks for the catch, updated the patch as below:
>
> [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]
>
> If the second operand of __builtin_shuffle is const vector 0, and with
> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
>
> gcc/ChangeLog:
>
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
>         patterns match and emit for VSX xxpermdi.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/powerpc/pr102868.c: New test.

Okay.

Thanks, David