[PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive
Richard Biener
rguenther@suse.de
Wed Jun 19 09:02:00 GMT 2019
On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
>vec_shl_optab
>has been removed as unused, because we only used vec_shr_optab for the
>reductions.
>Without this patch the vect-simd-*.c tests can be vectorized just fine
>for SSE4 and above, but can't be with SSE2. As the comment in
>tree-vect-stmts.c tries to explain, for the inclusive scan operation we
>want (when using V8SImode vectors):
> _30 = MEM <vector(8) int> [(int *)&D.2043];
> _31 = MEM <vector(8) int> [(int *)&D.2042];
> _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
> _33 = _31 + _32;
> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
> _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
> _35 = _33 + _34;
> // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> // _31[1]+.._31[4], ... _31[4]+.._31[7] };
> _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
> _37 = _35 + _36;
> // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> // _31[0]+.._31[4], ... _31[0]+.._31[7] };
> _38 = _30 + _37;
> _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
> MEM <vector(8) int> [(int *)&D.2043] = _39;
> MEM <vector(8) int> [(int *)&D.2042] = _38; */
>For V4SImode vectors that would be VEC_PERM_EXPR <x, init, { 4, 0, 1, 2
>}>,
>VEC_PERM_EXPR <x2, init, { 4, 5, 0, 1 }> and
>VEC_PERM_EXPR <x3, init, { 3, 3, 3, 3 }> etc.
>Unfortunately, SSE2 can't do the VEC_PERM_EXPR <x, init, { 4, 0, 1, 2
>}>
>permutation (the other two it can do). Well, to be precise, it can do
>it
>using the vector left shift which has been removed as unused, provided
>that init is initializer_zerop (shifting all zeros from the left).
>init usually is all zeros, that is the neutral element of additive
>reductions and couple of others too, in the unlikely case that some
>other
>reduction is used with scan (multiplication, minimum, maximum, bitwise
>and),
>we can use a VEC_COND_EXPR with constant first argument, i.e. a blend
>or
>and/or.
>
>So, this patch reintroduces vec_shl_optab (most backends actually have
>those
>patterns already) and handles its expansion and vector generic lowering
>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the first
>operand is initializer_zerop and third operand starts with a few
>numbers
>smaller than number of elements (doesn't matter which one, as all
>elements
>are same - zero) followed by nelts, nelts+1, nelts+2, ...
>Unlike vec_shr_optab which has zero as the second operand, this one has
>it
>as first operand, because VEC_PERM_EXPR canonicalization wants to have
>first element selector smaller than number of elements. And unlike
>vec_shr_optab, where we also have a fallback in have_whole_vector_shift
>using normal permutations, this one doesn't need it, that "fallback" is
>tried
>first before vec_shl_optab.
>
>For the vec_shl_optab checks, it tests only for constant number of
>elements
>vectors, not really sure if our VECTOR_CST encoding can express the
>left
>shifts in any way nor whether SVE supports those (I see aarch64 has
>vec_shl_insert but that is just a fixed shift by element bits and
>shifts in
>a scalar rather than zeros).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok.
Richard.
>2019-06-19 Jakub Jelinek <jakub@redhat.com>
>
> * doc/md.texi: Document vec_shl_<mode> pattern.
> * optabs.def (vec_shl_optab): New optab.
> * optabs.c (shift_amt_for_vec_perm_mask): Add shift_optab
> argument, if == vec_shl_optab, check for left whole vector shift
> pattern rather than right shift.
> (expand_vec_perm_const): Add vec_shl_optab support.
> * optabs-query.c (can_vec_perm_var_p): Mention also vec_shl optab
> in the comment.
> * tree-vect-generic.c (lower_vec_perm): Support permutations which
> can be handled by vec_shl_optab.
> * tree-vect-stmts.c (scan_store_can_perm_p): New function.
> (check_scan_store): Use it.
> (vectorizable_scan_store): If target can't do normal permutations,
> try to use whole vector left shifts and if needed a VEC_COND_EXPR
> after it.
> * config/i386/sse.md (vec_shl_<mode>): New expander.
>
> * gcc.dg/vect/vect-simd-8.c: If main is defined, don't include
> tree-vect.h nor call check_vect.
> * gcc.dg/vect/vect-simd-9.c: Likewise.
> * gcc.dg/vect/vect-simd-10.c: New test.
> * gcc.target/i386/sse2-vect-simd-8.c: New test.
> * gcc.target/i386/sse2-vect-simd-9.c: New test.
> * gcc.target/i386/sse2-vect-simd-10.c: New test.
> * gcc.target/i386/avx2-vect-simd-8.c: New test.
> * gcc.target/i386/avx2-vect-simd-9.c: New test.
> * gcc.target/i386/avx2-vect-simd-10.c: New test.
> * gcc.target/i386/avx512f-vect-simd-8.c: New test.
> * gcc.target/i386/avx512f-vect-simd-9.c: New test.
> * gcc.target/i386/avx512f-vect-simd-10.c: New test.
>
>--- gcc/doc/md.texi.jj 2019-06-13 00:35:43.518942525 +0200
>+++ gcc/doc/md.texi 2019-06-18 15:32:38.496629946 +0200
>@@ -5454,6 +5454,14 @@ in operand 2. Store the result in vecto
> 0 and 1 have mode @var{m} and operand 2 has the mode appropriate for
> one element of @var{m}.
>
>+@cindex @code{vec_shl_@var{m}} instruction pattern
>+@item @samp{vec_shl_@var{m}}
>+Whole vector left shift in bits, i.e.@: away from element 0.
>+Operand 1 is a vector to be shifted.
>+Operand 2 is an integer shift amount in bits.
>+Operand 0 is where the resulting shifted vector is stored.
>+The output and input vectors should have the same modes.
>+
> @cindex @code{vec_shr_@var{m}} instruction pattern
> @item @samp{vec_shr_@var{m}}
> Whole vector right shift in bits, i.e.@: towards element 0.
>--- gcc/optabs.def.jj 2019-02-11 11:38:08.263617017 +0100
>+++ gcc/optabs.def 2019-06-18 14:56:57.934971410 +0200
>@@ -348,6 +348,7 @@ OPTAB_D (vec_packu_float_optab, "vec_pac
> OPTAB_D (vec_perm_optab, "vec_perm$a")
> OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
> OPTAB_D (vec_set_optab, "vec_set$a")
>+OPTAB_D (vec_shl_optab, "vec_shl_$a")
> OPTAB_D (vec_shr_optab, "vec_shr_$a")
>OPTAB_D (vec_unpack_sfix_trunc_hi_optab, "vec_unpack_sfix_trunc_hi_$a")
>OPTAB_D (vec_unpack_sfix_trunc_lo_optab, "vec_unpack_sfix_trunc_lo_$a")
>--- gcc/optabs.c.jj 2019-02-13 13:11:47.927612362 +0100
>+++ gcc/optabs.c 2019-06-18 16:45:29.347895585 +0200
>@@ -5444,19 +5444,45 @@ vector_compare_rtx (machine_mode cmp_mod
> }
>
> /* Check if vec_perm mask SEL is a constant equivalent to a shift of
>- the first vec_perm operand, assuming the second operand is a
>constant
>- vector of zeros. Return the shift distance in bits if so, or
>NULL_RTX
>- if the vec_perm is not a shift. MODE is the mode of the value
>being
>- shifted. */
>+ the first vec_perm operand, assuming the second operand (for left
>shift
>+ first operand) is a constant vector of zeros. Return the shift
>distance
>+ in bits if so, or NULL_RTX if the vec_perm is not a shift. MODE is
>the
>+ mode of the value being shifted. SHIFT_OPTAB is vec_shr_optab for
>right
>+ shift or vec_shl_optab for left shift. */
> static rtx
>-shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices
>&sel)
>+shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices
>&sel,
>+ optab shift_optab)
> {
> unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
> poly_int64 first = sel[0];
> if (maybe_ge (sel[0], GET_MODE_NUNITS (mode)))
> return NULL_RTX;
>
>- if (!sel.series_p (0, 1, first, 1))
>+ if (shift_optab == vec_shl_optab)
>+ {
>+ unsigned int nelt;
>+ if (!GET_MODE_NUNITS (mode).is_constant (&nelt))
>+ return NULL_RTX;
>+ unsigned firstidx = 0;
>+ for (unsigned int i = 0; i < nelt; i++)
>+ {
>+ if (known_eq (sel[i], nelt))
>+ {
>+ if (i == 0 || firstidx)
>+ return NULL_RTX;
>+ firstidx = i;
>+ }
>+ else if (firstidx
>+ ? maybe_ne (sel[i], nelt + i - firstidx)
>+ : maybe_ge (sel[i], nelt))
>+ return NULL_RTX;
>+ }
>+
>+ if (firstidx == 0)
>+ return NULL_RTX;
>+ first = firstidx;
>+ }
>+ else if (!sel.series_p (0, 1, first, 1))
> {
> unsigned int nelt;
> if (!GET_MODE_NUNITS (mode).is_constant (&nelt))
>@@ -5544,25 +5570,37 @@ expand_vec_perm_const (machine_mode mode
> target instruction. */
> vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
>
>- /* See if this can be handled with a vec_shr. We only do this if
>the
>- second vector is all zeroes. */
>- insn_code shift_code = optab_handler (vec_shr_optab, mode);
>- insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
>- ? optab_handler (vec_shr_optab, qimode)
>- : CODE_FOR_nothing);
>-
>- if (v1 == CONST0_RTX (GET_MODE (v1))
>- && (shift_code != CODE_FOR_nothing
>- || shift_code_qi != CODE_FOR_nothing))
>+ /* See if this can be handled with a vec_shr or vec_shl. We only do
>this
>+ if the second (for vec_shr) or first (for vec_shl) vector is all
>+ zeroes. */
>+ insn_code shift_code = CODE_FOR_nothing;
>+ insn_code shift_code_qi = CODE_FOR_nothing;
>+ optab shift_optab = unknown_optab;
>+ rtx v2 = v0;
>+ if (v1 == CONST0_RTX (GET_MODE (v1)))
>+ shift_optab = vec_shr_optab;
>+ else if (v0 == CONST0_RTX (GET_MODE (v0)))
>+ {
>+ shift_optab = vec_shl_optab;
>+ v2 = v1;
>+ }
>+ if (shift_optab != unknown_optab)
>+ {
>+ shift_code = optab_handler (shift_optab, mode);
>+ shift_code_qi = ((qimode != VOIDmode && qimode != mode)
>+ ? optab_handler (shift_optab, qimode)
>+ : CODE_FOR_nothing);
>+ }
>+ if (shift_code != CODE_FOR_nothing || shift_code_qi !=
>CODE_FOR_nothing)
> {
>- rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
>+ rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices,
>shift_optab);
> if (shift_amt)
> {
> struct expand_operand ops[3];
> if (shift_code != CODE_FOR_nothing)
> {
> create_output_operand (&ops[0], target, mode);
>- create_input_operand (&ops[1], v0, mode);
>+ create_input_operand (&ops[1], v2, mode);
> create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
> if (maybe_expand_insn (shift_code, 3, ops))
> return ops[0].value;
>@@ -5571,7 +5609,7 @@ expand_vec_perm_const (machine_mode mode
> {
> rtx tmp = gen_reg_rtx (qimode);
> create_output_operand (&ops[0], tmp, qimode);
>- create_input_operand (&ops[1], gen_lowpart (qimode, v0),
>qimode);
>+ create_input_operand (&ops[1], gen_lowpart (qimode, v2),
>qimode);
> create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
> if (maybe_expand_insn (shift_code_qi, 3, ops))
> return gen_lowpart (mode, ops[0].value);
>--- gcc/optabs-query.c.jj 2019-05-20 11:40:16.691121967 +0200
>+++ gcc/optabs-query.c 2019-06-18 15:26:53.028980804 +0200
>@@ -415,8 +415,9 @@ can_vec_perm_var_p (machine_mode mode)
> permute (if the target supports that).
>
> Note that additional permutations representing whole-vector shifts may
>- also be handled via the vec_shr optab, but only where the second
>input
>- vector is entirely constant zeroes; this case is not dealt with
>here. */
>+ also be handled via the vec_shr or vec_shl optab, but only where
>the
>+ second input vector is entirely constant zeroes; this case is not
>dealt
>+ with here. */
>
> bool
> can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel,
>--- gcc/tree-vect-generic.c.jj 2019-01-07 09:47:32.988518893 +0100
>+++ gcc/tree-vect-generic.c 2019-06-18 16:35:29.033319526 +0200
>@@ -1367,6 +1367,32 @@ lower_vec_perm (gimple_stmt_iterator *gs
> return;
> }
> }
>+ /* And similarly vec_shl pattern. */
>+ if (optab_handler (vec_shl_optab, TYPE_MODE (vect_type))
>+ != CODE_FOR_nothing
>+ && TREE_CODE (vec0) == VECTOR_CST
>+ && initializer_zerop (vec0))
>+ {
>+ unsigned int first = 0;
>+ for (i = 0; i < elements; ++i)
>+ if (known_eq (poly_uint64 (indices[i]), elements))
>+ {
>+ if (i == 0 || first)
>+ break;
>+ first = i;
>+ }
>+ else if (first
>+ ? maybe_ne (poly_uint64 (indices[i]),
>+ elements + i - first)
>+ : maybe_ge (poly_uint64 (indices[i]), elements))
>+ break;
>+ if (i == elements)
>+ {
>+ gimple_assign_set_rhs3 (stmt, mask);
>+ update_stmt (stmt);
>+ return;
>+ }
>+ }
> }
> else if (can_vec_perm_var_p (TYPE_MODE (vect_type)))
> return;
>--- gcc/tree-vect-stmts.c.jj 2019-06-17 23:18:53.620850072 +0200
>+++ gcc/tree-vect-stmts.c 2019-06-18 17:43:27.484350807 +0200
>@@ -6356,6 +6356,71 @@ scan_operand_equal_p (tree ref1, tree re
>
> /* Function check_scan_store.
>
>+ Verify if we can perform the needed permutations or whole vector
>shifts.
>+ Return -1 on failure, otherwise exact log2 of vectype's nunits. */
>+
>+static int
>+scan_store_can_perm_p (tree vectype, tree init, int
>*use_whole_vector_p = NULL)
>+{
>+ enum machine_mode vec_mode = TYPE_MODE (vectype);
>+ unsigned HOST_WIDE_INT nunits;
>+ if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
>+ return -1;
>+ int units_log2 = exact_log2 (nunits);
>+ if (units_log2 <= 0)
>+ return -1;
>+
>+ int i;
>+ for (i = 0; i <= units_log2; ++i)
>+ {
>+ unsigned HOST_WIDE_INT j, k;
>+ vec_perm_builder sel (nunits, nunits, 1);
>+ sel.quick_grow (nunits);
>+ if (i == 0)
>+ {
>+ for (j = 0; j < nunits; ++j)
>+ sel[j] = nunits - 1;
>+ }
>+ else
>+ {
>+ for (j = 0; j < (HOST_WIDE_INT_1U << (i - 1)); ++j)
>+ sel[j] = j;
>+ for (k = 0; j < nunits; ++j, ++k)
>+ sel[j] = nunits + k;
>+ }
>+ vec_perm_indices indices (sel, i == 0 ? 1 : 2, nunits);
>+ if (!can_vec_perm_const_p (vec_mode, indices))
>+ break;
>+ }
>+
>+ if (i == 0)
>+ return -1;
>+
>+ if (i <= units_log2)
>+ {
>+ if (optab_handler (vec_shl_optab, vec_mode) == CODE_FOR_nothing)
>+ return -1;
>+ int kind = 1;
>+ /* Whole vector shifts shift in zeros, so if init is all zero
>constant,
>+ there is no need to do anything further. */
>+ if ((TREE_CODE (init) != INTEGER_CST
>+ && TREE_CODE (init) != REAL_CST)
>+ || !initializer_zerop (init))
>+ {
>+ tree masktype = build_same_sized_truth_vector_type (vectype);
>+ if (!expand_vec_cond_expr_p (vectype, masktype, VECTOR_CST))
>+ return -1;
>+ kind = 2;
>+ }
>+ if (use_whole_vector_p)
>+ *use_whole_vector_p = kind;
>+ }
>+ return units_log2;
>+}
>+
>+
>+/* Function check_scan_store.
>+
> Check magic stores for #pragma omp scan {in,ex}clusive reductions. */
>
> static bool
>@@ -6596,34 +6661,9 @@ check_scan_store (stmt_vec_info stmt_inf
> if (!optab || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
> goto fail;
>
>- unsigned HOST_WIDE_INT nunits;
>- if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
>+ int units_log2 = scan_store_can_perm_p (vectype, *init);
>+ if (units_log2 == -1)
> goto fail;
>- int units_log2 = exact_log2 (nunits);
>- if (units_log2 <= 0)
>- goto fail;
>-
>- for (int i = 0; i <= units_log2; ++i)
>- {
>- unsigned HOST_WIDE_INT j, k;
>- vec_perm_builder sel (nunits, nunits, 1);
>- sel.quick_grow (nunits);
>- if (i == units_log2)
>- {
>- for (j = 0; j < nunits; ++j)
>- sel[j] = nunits - 1;
>- }
>- else
>- {
>- for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>- sel[j] = nunits + j;
>- for (k = 0; j < nunits; ++j, ++k)
>- sel[j] = k;
>- }
>- vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
>- if (!can_vec_perm_const_p (vec_mode, indices))
>- goto fail;
>- }
>
> return true;
> }
>@@ -6686,7 +6726,8 @@ vectorizable_scan_store (stmt_vec_info s
> unsigned HOST_WIDE_INT nunits;
> if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> gcc_unreachable ();
>- int units_log2 = exact_log2 (nunits);
>+ int use_whole_vector_p = 0;
>+ int units_log2 = scan_store_can_perm_p (vectype, *init,
>&use_whole_vector_p);
> gcc_assert (units_log2 > 0);
> auto_vec<tree, 16> perms;
> perms.quick_grow (units_log2 + 1);
>@@ -6696,21 +6737,25 @@ vectorizable_scan_store (stmt_vec_info s
> vec_perm_builder sel (nunits, nunits, 1);
> sel.quick_grow (nunits);
> if (i == units_log2)
>- {
>- for (j = 0; j < nunits; ++j)
>- sel[j] = nunits - 1;
>- }
>- else
>- {
>- for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>- sel[j] = nunits + j;
>- for (k = 0; j < nunits; ++j, ++k)
>- sel[j] = k;
>- }
>+ for (j = 0; j < nunits; ++j)
>+ sel[j] = nunits - 1;
>+ else
>+ {
>+ for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>+ sel[j] = j;
>+ for (k = 0; j < nunits; ++j, ++k)
>+ sel[j] = nunits + k;
>+ }
> vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
>- perms[i] = vect_gen_perm_mask_checked (vectype, indices);
>+ if (use_whole_vector_p && i < units_log2)
>+ perms[i] = vect_gen_perm_mask_any (vectype, indices);
>+ else
>+ perms[i] = vect_gen_perm_mask_checked (vectype, indices);
> }
>
>+ tree zero_vec = use_whole_vector_p ? build_zero_cst (vectype) :
>NULL_TREE;
>+ tree masktype = (use_whole_vector_p == 2
>+ ? build_same_sized_truth_vector_type (vectype) : NULL_TREE);
> stmt_vec_info prev_stmt_info = NULL;
> tree vec_oprnd1 = NULL_TREE;
> tree vec_oprnd2 = NULL_TREE;
>@@ -6742,8 +6787,9 @@ vectorizable_scan_store (stmt_vec_info s
> for (int i = 0; i < units_log2; ++i)
> {
> tree new_temp = make_ssa_name (vectype);
>- gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, v,
>- vec_oprnd1, perms[i]);
>+ gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR,
>+ zero_vec ? zero_vec : vec_oprnd1, v,
>+ perms[i]);
> new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> if (prev_stmt_info == NULL)
> STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
>@@ -6751,6 +6797,25 @@ vectorizable_scan_store (stmt_vec_info s
> STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> prev_stmt_info = new_stmt_info;
>
>+ if (use_whole_vector_p == 2)
>+ {
>+ /* Whole vector shift shifted in zero bits, but if *init
>+ is not initializer_zerop, we need to replace those elements
>+ with elements from vec_oprnd1. */
>+ tree_vector_builder vb (masktype, nunits, 1);
>+ for (unsigned HOST_WIDE_INT k = 0; k < nunits; ++k)
>+ vb.quick_push (k < (HOST_WIDE_INT_1U << i)
>+ ? boolean_false_node : boolean_true_node);
>+
>+ tree new_temp2 = make_ssa_name (vectype);
>+ g = gimple_build_assign (new_temp2, VEC_COND_EXPR, vb.build (),
>+ new_temp, vec_oprnd1);
>+ new_stmt_info = vect_finish_stmt_generation (stmt_info, g,
>gsi);
>+ STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
>+ prev_stmt_info = new_stmt_info;
>+ new_temp = new_temp2;
>+ }
>+
> tree new_temp2 = make_ssa_name (vectype);
> g = gimple_build_assign (new_temp2, code, v, new_temp);
> new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
>--- gcc/config/i386/sse.md.jj 2019-06-17 23:18:26.821267440 +0200
>+++ gcc/config/i386/sse.md 2019-06-18 15:37:28.342043528 +0200
>@@ -11758,6 +11758,19 @@ (define_insn "<shift_insn><mode>3<mask_n
> (set_attr "mode" "<sseinsnmode>")])
>
>
>+(define_expand "vec_shl_<mode>"
>+ [(set (match_dup 3)
>+ (ashift:V1TI
>+ (match_operand:VI_128 1 "register_operand")
>+ (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
>+ (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
>+ "TARGET_SSE2"
>+{
>+ operands[1] = gen_lowpart (V1TImode, operands[1]);
>+ operands[3] = gen_reg_rtx (V1TImode);
>+ operands[4] = gen_lowpart (<MODE>mode, operands[3]);
>+})
>+
> (define_expand "vec_shr_<mode>"
> [(set (match_dup 3)
> (lshiftrt:V1TI
>--- gcc/testsuite/gcc.dg/vect/vect-simd-8.c.jj 2019-06-17
>23:18:53.621850057 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-8.c 2019-06-18
>18:02:09.428798006 +0200
>@@ -3,7 +3,9 @@
> /* { dg-additional-options "-mavx" { target avx_runtime } } */
>/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2 "vect"
>{ target i?86-*-* x86_64-*-* } } } */
>
>+#ifndef main
> #include "tree-vect.h"
>+#endif
>
> int r, a[1024], b[1024];
>
>@@ -63,7 +65,9 @@ int
> main ()
> {
> int s = 0;
>+#ifndef main
> check_vect ();
>+#endif
> for (int i = 0; i < 1024; ++i)
> {
> a[i] = i;
>--- gcc/testsuite/gcc.dg/vect/vect-simd-9.c.jj 2019-06-17
>23:18:53.621850057 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-9.c 2019-06-18
>18:02:34.649406773 +0200
>@@ -3,7 +3,9 @@
> /* { dg-additional-options "-mavx" { target avx_runtime } } */
>/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2 "vect"
>{ target i?86-*-* x86_64-*-* } } } */
>
>+#ifndef main
> #include "tree-vect.h"
>+#endif
>
> int r, a[1024], b[1024];
>
>@@ -65,7 +67,9 @@ int
> main ()
> {
> int s = 0;
>+#ifndef main
> check_vect ();
>+#endif
> for (int i = 0; i < 1024; ++i)
> {
> a[i] = i;
>--- gcc/testsuite/gcc.dg/vect/vect-simd-10.c.jj 2019-06-18
>18:37:30.742838613 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-10.c 2019-06-18
>19:44:20.614082076 +0200
>@@ -0,0 +1,96 @@
>+/* { dg-require-effective-target size32plus } */
>+/* { dg-additional-options "-fopenmp-simd" } */
>+/* { dg-additional-options "-mavx" { target avx_runtime } } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" { target i?86-*-* x86_64-*-* } } } */
>+
>+#ifndef main
>+#include "tree-vect.h"
>+#endif
>+
>+float r = 1.0f, a[1024], b[1024];
>+
>+__attribute__((noipa)) void
>+foo (float *a, float *b)
>+{
>+ #pragma omp simd reduction (inscan, *:r)
>+ for (int i = 0; i < 1024; i++)
>+ {
>+ r *= a[i];
>+ #pragma omp scan inclusive(r)
>+ b[i] = r;
>+ }
>+}
>+
>+__attribute__((noipa)) float
>+bar (void)
>+{
>+ float s = -__builtin_inff ();
>+ #pragma omp simd reduction (inscan, max:s)
>+ for (int i = 0; i < 1024; i++)
>+ {
>+ s = s > a[i] ? s : a[i];
>+ #pragma omp scan inclusive(s)
>+ b[i] = s;
>+ }
>+ return s;
>+}
>+
>+int
>+main ()
>+{
>+ float s = 1.0f;
>+#ifndef main
>+ check_vect ();
>+#endif
>+ for (int i = 0; i < 1024; ++i)
>+ {
>+ if (i < 80)
>+ a[i] = (i & 1) ? 0.25f : 0.5f;
>+ else if (i < 200)
>+ a[i] = (i % 3) == 0 ? 2.0f : (i % 3) == 1 ? 4.0f : 1.0f;
>+ else if (i < 280)
>+ a[i] = (i & 1) ? 0.25f : 0.5f;
>+ else if (i < 380)
>+ a[i] = (i % 3) == 0 ? 2.0f : (i % 3) == 1 ? 4.0f : 1.0f;
>+ else
>+ switch (i % 6)
>+ {
>+ case 0: a[i] = 0.25f; break;
>+ case 1: a[i] = 2.0f; break;
>+ case 2: a[i] = -1.0f; break;
>+ case 3: a[i] = -4.0f; break;
>+ case 4: a[i] = 0.5f; break;
>+ case 5: a[i] = 1.0f; break;
>+ default: a[i] = 0.0f; break;
>+ }
>+ b[i] = -19.0f;
>+ asm ("" : "+g" (i));
>+ }
>+ foo (a, b);
>+ if (r * 16384.0f != 0.125f)
>+ abort ();
>+ float m = -175.25f;
>+ for (int i = 0; i < 1024; ++i)
>+ {
>+ s *= a[i];
>+ if (b[i] != s)
>+ abort ();
>+ else
>+ {
>+ a[i] = m - ((i % 3) == 1 ? 2.0f : (i % 3) == 2 ? 4.0f : 0.0f);
>+ b[i] = -231.75f;
>+ m += 0.75f;
>+ }
>+ }
>+ if (bar () != 592.0f)
>+ abort ();
>+ s = -__builtin_inff ();
>+ for (int i = 0; i < 1024; ++i)
>+ {
>+ if (s < a[i])
>+ s = a[i];
>+ if (b[i] != s)
>+ abort ();
>+ }
>+ return 0;
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-8.c.jj 2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-8.c 2019-06-18
>18:19:48.417341734 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+sse2_test (void)
>+{
>+ do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-9.c.jj 2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-9.c 2019-06-18
>18:20:05.770072628 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+sse2_test (void)
>+{
>+ do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-10.c.jj 2019-06-18
>19:46:09.015410603 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-10.c 2019-06-18
>19:50:31.621361409 +0200
>@@ -0,0 +1,15 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+sse2_test (void)
>+{
>+ do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-8.c.jj 2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-8.c 2019-06-18
>18:19:40.310467451 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+avx2_test (void)
>+{
>+ do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-9.c.jj 2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-9.c 2019-06-18
>18:19:56.479216712 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+avx2_test (void)
>+{
>+ do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-10.c.jj 2019-06-18
>19:50:47.692113611 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-10.c 2019-06-18
>19:50:56.180982721 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+avx2_test (void)
>+{
>+ do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-8.c.jj 2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-8.c 2019-06-18
>18:19:44.364404586 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+ do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-9.c.jj 2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-9.c 2019-06-18
>18:20:00.884148400 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+ do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-10.c.jj 2019-06-18
>19:51:12.309734025 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-10.c 2019-06-18
>19:51:18.285641883 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+ do_main ();
>+}
>
> Jakub
More information about the Gcc-patches
mailing list