This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [SVE] PR86753
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Mon, 26 Aug 2019 at 14:48, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Sun, Aug 25, 2019 at 11:13 PM Prathamesh Kulkarni
>> <prathamesh.kulkarni@linaro.org> wrote:
>> >
>> > On Fri, 23 Aug 2019 at 19:43, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> > >
>> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > > > On Fri, 23 Aug 2019 at 18:15, Richard Sandiford
>> > > > <richard.sandiford@arm.com> wrote:
>> > > >>
>> > > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > > >> > On Thu, 22 Aug 2019 at 16:44, Richard Biener <richard.guenther@gmail.com> wrote:
>> > > >> >> It looks a bit odd to me. I'd have expected it to work by generating
>> > > >> >> the stmts as before in the vectorizer and then on the stmts we care
>> > > >> >> invoke vn_visit_stmt that does both value-numbering and elimination.
>> > > >> >> Alternatively you could ask the VN state to generate the stmt for
>> > > >> >> you via vn_nary_build_or_lookup () (certainly that needs a bit more
>> > > >> >> work). One complication might be availability if you don't value-number
>> > > >> >> all stmts in the block, but well. I'm not sure constraining to a single
>> > > >> >> block is necessary - I've thought of having a "CSE"ing gimple_build
>> > > >> >> for some time (add & CSE new stmts onto a sequence), so one
>> > > >> >> should keep this mode in mind when designing the one working on
>> > > >> >> an existing BB. Note as you write it it depends on visiting the
>> > > >> >> stmts in proper order - is that guaranteed when for example
>> > > >> >> vectorizing SLP?
>> > > >> > Hi,
>> > > >> > Indeed, I wrote the function with assumption that, stmts would be
>> > > >> > visited in proper order.
>> > > >> > This doesn't affect SLP currently, because call to vn_visit_stmt in
>> > > >> > vect_transform_stmt is
>> > > >> > conditional on cond_to_vec_mask, which is only allocated inside
>> > > >> > vect_transform_loop.
>> > > >> > But I agree we could make it more general.
>> > > >> > AFAIU, the idea of constraining VN to single block was to avoid using defs from
>> > > >> > non-dominating scalar stmts during outer-loop vectorization.
>> > > >>
>> > > >> Maybe we could do the numbering in a separate walk immediately before
>> > > >> the transform phase instead.
>> > > > Um sorry, I didn't understand. Do you mean we should do dom based VN
>> > > > just before transform phase
>> > > > or run full VN ?
>> > >
>> > > No, I just meant that we could do a separate walk of the contents
>> > > of the basic block:
>> > >
>> > > > @@ -8608,6 +8609,8 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>> > > > {
>> > > > basic_block bb = bbs[i];
>> > > > stmt_vec_info stmt_info;
>> > > > + vn_bb_init (bb);
>> > > > + loop_vinfo->cond_to_vec_mask = new cond_vmask_map_type (8);
>> > > >
>> > >
>> > > ...here, rather than doing it on the fly during vect_transform_stmt
>> > > itself. The walk should be gated on LOOP_VINFO_FULLY_MASKED_P so that
>> > > others don't have to pay the compile-time penalty. (Same for
>> > > cond_to_vec_mask itself really.)
>> > Hi,
>> > Does the attached patch look OK ?
>> > In patch, I put call to vn_visit stmt in bb loop in
>> > vect_transform_loop to avoid replicating logic for processing phi and
>> > stmts.
>> > AFAIU, vect_transform_loop_stmt is only called from bb loop, so
>> > compile time penalty for checking cond_to_vec_mask
>> > should be pretty small ?
>> > If this is not OK, I will walk bb immediately before the bb loop.
>>
>> So if I understand correctly you never have vectorizable COND_EXPRs
>> in SLP mode? Because we vectorize all SLP chains before entering
>> the loop in vect_transform_loop where you VN existing scalar(!) stmts.
On the "!": the idea behind the patch is to find cases in which a
scalar condition is used in both a statement that needs to be masked
for correctness reasons and a statement that we can choose to mask
if we want to. It also tries (opportunisticly) to match the ?: order
with other conditions.
That's why it's operating on scalar values rather than vector values.
In principle it could be done as a subpass before vectorisation rather
than on the fly, when there aren't any vector stmts around.
>> Then all this hew hash-table stuff should not be needed since this
>> is what VN should provide you with. You of course need to visit
>> generated condition stmts. And condition support is weak
>> in VN due to it possibly having two operations in a single stmt.
>> Bad GIMPLE IL. So I'm not sure VN is up to the task here or
>> why you even need it given you are doing your own hashing?
> Well, we thought of using VN for comparing operands for cases
> operand_equal_p would not
> work. Actually, VN seems not to be required for test-cases in PR
> because both conditions
> are _4 != 0 (_35 = _4 != 0 and in cond_expr), which works to match
> with operand_equal_p.
Right, that's why I was suggesting in the earlier thread that we
treat value numbering as a follow-on. But...
> Input to vectorizer is:
>
> <bb 3> [local count: 1063004407]:
> # i_20 = PHI <i_16(7), 0(15)>
> # ivtmp_19 = PHI <ivtmp_9(7), 100(15)>
> _1 = (long unsigned int) i_20;
> _2 = _1 * 4;
> _3 = y_11(D) + _2;
> _4 = *_3;
> _5 = z_12(D) + _2;
> _35 = _4 != 0;
> iftmp.0_13 = .MASK_LOAD (_5, 32B, _35);
> iftmp.0_8 = _4 != 0 ? iftmp.0_13 : 10;
>
> In prepare_load_store_mask, we record (ne_expr, _4, 0) -> vec_mask in
> cond_to_vec_mask,
> and in vectorizable_condition, we look up (ne_expr, _4, 0) which does
> not require VN
> since operands are same.
>
> Initially, I was trying to change the generated vectorized code:
>
> mask__35.8_43 = vect__4.7_41 != vect_cst__42;
> vect_iftmp.12_50 = VEC_COND_EXPR <vect__4.7_41 != vect_cst__48,
> vect_iftmp.11_47, vect_cst__49>;
>
> where both conditions are equivalent because vect_cst__42 and
> vect_cst__48 are zero vectors but operand_equal_p failed to catch
> those.
>
> Sorry, I mixed up later between scalar and vector stmts.
> I wonder if we then need VN ? Perhaps there might be other cases where
> operands of scalar
> conditions may be equivalent but not match with operand_equal_p ?
> In the attached patch, changing operator==, to compare using
> operand_equal_p works for the tests.
...those are only very simple cases :-) The problem is that ifcvt
doesn't leave the code free of redundancies, and pattern stmt generation
can create redundancies too. Of course, we could fix those instead.
E.g. for:
void
f (short *restrict x, short *restrict a, short *restrict b, short *restrict c)
{
for (int i = 0; i < 100; ++i)
if (a[i] >= 1)
x[i] = b[i] >= 1 ? a[i] : c[i];
}
ifcvt produces:
<bb 3> [local count: 1063004407]:
# i_34 = PHI <i_30(10), 0(21)>
# ivtmp_5 = PHI <ivtmp_6(10), 100(21)>
_1 = (long unsigned int) i_34;
_2 = _1 * 2;
_3 = a_23(D) + _2;
_4 = *_3;
_7 = b_24(D) + _2;
_49 = _4 > 0;
_8 = .MASK_LOAD (_7, 16B, _49);
_12 = _4 > 0;
_13 = _8 > 0;
_9 = _12 & _13;
_10 = _4 > 0;
_11 = _8 > 0;
_27 = ~_11;
_15 = _10 & _27;
_14 = c_25(D) + _2;
iftmp.0_26 = .MASK_LOAD (_14, 16B, _15);
iftmp.0_19 = _9 ? _4 : iftmp.0_26;
_17 = x_28(D) + _2;
_50 = _4 > 0;
.MASK_STORE (_17, 16B, _50, iftmp.0_19);
i_30 = i_34 + 1;
ivtmp_6 = ivtmp_5 - 1;
if (ivtmp_6 != 0)
goto <bb 10>; [98.99%]
else
goto <bb 9>; [1.01%]
<bb 10> [local count: 1052266994]:
goto <bb 3>; [100.00%]
which has 4 copies of _4 > 0 (a[i] > 0) and 2 copies of _8 > 0 (b[i] > 0).
Looking through the definition of an SSA name means that we can cope
with these redundancies for single comparisons, but not for comparisons
combined through & and |. This hurts most when trying to decide whether
to invert comparisons.
But maybe value numbering won't cope with that either :-)
Thanks,
Richard