[PATCH] Add vec_sh{l,r}_v4sf (PR libgomp/91530)
Uros Bizjak
ubizjak@gmail.com
Wed Aug 28 12:03:00 GMT 2019
On Wed, Aug 28, 2019 at 8:45 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> The following two testcases FAIL to be vectorized, because SSE2 doesn't have
> many permutation instructions and the one that actually works (whole vector
> shifts) aren't enabled for the V4SFmode.
>
> The following patch fixes it by enabling those optabs also for V4SFmode (and
> V2DFmode). Strictly speaking, we need it only for the VI_128 modes plus
> V4SFmode, but I'm not sure it is worth adding yet another iterator for
> VI_128 + V4SF and the instructions actually do work for V2DFmode too, just
> there are also other permutation instructions that handle V2DFmode.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2019-08-28 Jakub Jelinek <jakub@redhat.com>
>
> PR libgomp/91530
> * config/i386/sse.md (vec_shl_<mode>, vec_shr_<mode>): Use
> V_128 iterator instead of VI_128.
>
> * testsuite/libgomp.c/scan-21.c: New test.
> * testsuite/libgomp.c/scan-22.c: New test.
OK.
(We already use integer shifts in floating-point context, e.g.
signbit<mode>2 expander in sse.md.)
Thanks,
Uros.
> --- gcc/config/i386/sse.md.jj 2019-08-27 12:26:25.385089103 +0200
> +++ gcc/config/i386/sse.md 2019-08-27 13:50:42.594849445 +0200
> @@ -12047,9 +12047,9 @@ (define_insn "<shift_insn><mode>3<mask_n
> (define_expand "vec_shl_<mode>"
> [(set (match_dup 3)
> (ashift:V1TI
> - (match_operand:VI_128 1 "register_operand")
> + (match_operand:V_128 1 "register_operand")
> (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> - (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> + (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
> "TARGET_SSE2"
> {
> operands[1] = gen_lowpart (V1TImode, operands[1]);
> @@ -12060,9 +12060,9 @@ (define_expand "vec_shl_<mode>"
> (define_expand "vec_shr_<mode>"
> [(set (match_dup 3)
> (lshiftrt:V1TI
> - (match_operand:VI_128 1 "register_operand")
> + (match_operand:V_128 1 "register_operand")
> (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> - (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> + (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
> "TARGET_SSE2"
> {
> operands[1] = gen_lowpart (V1TImode, operands[1]);
> --- libgomp/testsuite/libgomp.c/scan-21.c.jj 2019-08-27 22:56:03.805127837 +0200
> +++ libgomp/testsuite/libgomp.c/scan-21.c 2019-08-27 22:58:26.347043679 +0200
> @@ -0,0 +1,6 @@
> +/* { dg-require-effective-target size32plus } */
> +/* { dg-require-effective-target avx_runtime } */
> +/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 -mno-sse3" } */
> +/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } */
> +
> +#include "scan-13.c"
> --- libgomp/testsuite/libgomp.c/scan-22.c.jj 2019-08-27 22:56:51.034437425 +0200
> +++ libgomp/testsuite/libgomp.c/scan-22.c 2019-08-27 22:59:01.978522645 +0200
> @@ -0,0 +1,6 @@
> +/* { dg-require-effective-target size32plus } */
> +/* { dg-require-effective-target avx_runtime } */
> +/* { dg-additional-options "-O2 -fopenmp -fdump-tree-vect-details -msse2 -mno-sse3" } */
> +/* { dg-final { scan-tree-dump-times "vectorized \[2-6] loops" 2 "vect" } } */
> +
> +#include "scan-17.c"
>
> Jakub
More information about the Gcc-patches
mailing list