[RFC][PR88838][SVE] Use 32-bit WHILELO in LP64 mode
Kugan Vivekanandarajah
kugan.vivekanandarajah@linaro.org
Thu Jun 6 01:28:00 GMT 2019
Hi Richard,
Thanks for the review. Attached is the latest patch.
For testcase like cond_arith_1.c, with the patch, gcc ICE in fwprop. I
am limiting fwprop in cases like this. Is there a better fix for this?
index cf2c9de..2c99285 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -1358,6 +1358,15 @@ forward_propagate_and_simplify (df_ref use,
rtx_insn *def_insn, rtx def_set)
else
mode = GET_MODE (*loc);
+ /* TODO. We can't get the mode for
+ (set (reg:VNx16BI 109)
+ (unspec:VNx16BI [
+ (reg:SI 131)
+ (reg:SI 106)
+ ] UNSPEC_WHILE_LO))
+ Thus, bailout when it is UNSPEC and MODEs are not compatible. */
+ if (GET_MODE_CLASS (mode) != GET_MODE_CLASS (GET_MODE (reg)))
+ return false;
new_rtx = propagate_rtx (*loc, mode, reg, src,
optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_insn)));
Thanks,
Kugan
On Mon, 3 Jun 2019 at 19:08, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> writes:
> > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> > index b3fae5b..ad838dd 100644
> > --- a/gcc/tree-vect-loop-manip.c
> > +++ b/gcc/tree-vect-loop-manip.c
> > @@ -415,6 +415,7 @@ vect_set_loop_masks_directly (struct loop *loop, loop_vec_info loop_vinfo,
> > bool might_wrap_p)
> > {
> > tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
> > + tree iv_type = LOOP_VINFO_MASK_IV_TYPE (loop_vinfo);
> > tree mask_type = rgm->mask_type;
> > unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter;
> > poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type);
> > @@ -445,11 +446,16 @@ vect_set_loop_masks_directly (struct loop *loop, loop_vec_info loop_vinfo,
> > tree index_before_incr, index_after_incr;
> > gimple_stmt_iterator incr_gsi;
> > bool insert_after;
> > - tree zero_index = build_int_cst (compare_type, 0);
> > standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > - create_iv (zero_index, nscalars_step, NULL_TREE, loop, &incr_gsi,
> > +
> > + tree zero_index = build_int_cst (iv_type, 0);
> > + tree step = build_int_cst (iv_type,
> > + LOOP_VINFO_VECT_FACTOR (loop_vinfo));
> > + /* Creating IV of iv_type. */
>
> s/Creating/Create/
>
> > + create_iv (zero_index, step, NULL_TREE, loop, &incr_gsi,
> > insert_after, &index_before_incr, &index_after_incr);
> >
> > + zero_index = build_int_cst (compare_type, 0);
> > tree test_index, test_limit, first_limit;
> > gimple_stmt_iterator *test_gsi;
> > if (might_wrap_p)
> > [...]
> > @@ -1066,11 +1077,17 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
> > if (this_type
> > && can_produce_all_loop_masks_p (loop_vinfo, this_type))
> > {
> > - /* Although we could stop as soon as we find a valid mode,
> > - it's often better to continue until we hit Pmode, since the
> > + /* See whether zero-based IV would ever generate all-false masks
> > + before wrapping around. */
> > + bool might_wrap_p = (iv_precision > cmp_bits);
> > + /* Stop as soon as we find a valid mode. If we decided to use
> > + cmp_type which is less than Pmode precision, it is often better
> > + to use iv_type corresponding to Pmode, since the
> > operands to the WHILE are more likely to be reusable in
> > - address calculations. */
> > - cmp_type = this_type;
> > + address calculations in this case. */
>
> We're not stopping as soon as we find a valid mode though. Any type
> that satisfies the if condition above is valid, but we pick wider
> cmp_types and iv_types for optimisation reasons. How about:
>
> /* Although we could stop as soon as we find a valid mode,
> there are at least two reasons why that's not always the
> best choice:
>
> - An IV that's Pmode or wider is more likely to be reusable
> in address calculations than an IV that's narrower than
> Pmode.
>
> - Doing the comparison in IV_PRECISION or wider allows
> a natural 0-based IV, whereas using a narrower comparison
> type requires mitigations against wrap-around.
>
> Conversely, if the IV limit is variable, doing the comparison
> in a wider type than the original type can introduce
> unnecessary extensions, so picking the widest valid mode
> is not always a good choice either.
>
> Here we prefer the first IV type that's Pmode or wider,
> and the first comparison type that's IV_PRECISION or wider.
> (The comparison type must be no wider than the IV type,
> to avoid extensions in the vector loop.)
>
> ??? We might want to try continuing beyond Pmode for ILP32
> targets if CMP_BITS < IV_PRECISION. */
>
> > + iv_type = this_type;
> > + if (!cmp_type || iv_precision > TYPE_PRECISION (cmp_type))
> > + cmp_type = this_type;
> > if (cmp_bits >= GET_MODE_BITSIZE (Pmode))
> > break;
> > }
>
> > [...]
> > @@ -9014,3 +9032,45 @@ optimize_mask_stores (struct loop *loop)
> > add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
> > }
> > }
> > +
> > +/* Decide whether it is possible to use a zero-based induction variable
> > + when vectorizing LOOP_VINFO with a fully-masked loop. If it is,
> > + return the value that the induction variable must be able to hold
> > + in order to ensure that the loop ends with an all-false mask.
> > + Return -1 otherwise. */
> > +widest_int
> > +vect_iv_limit_for_full_masking (loop_vec_info loop_vinfo)
> > +{
> > + tree niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
> > + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > + unsigned HOST_WIDE_INT max_vf = vect_max_vf (loop_vinfo);
> > +
> > + /* Now calculate the value that the induction variable must be able
>
> s/Now calculate/Calculate/
>
> since this comment is no longer following on from earlier code.
>
> OK with those changes, thanks.
>
> Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-PR88838-V5.patch
Type: text/x-patch
Size: 14838 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190606/75974e54/attachment.bin>
More information about the Gcc-patches
mailing list