VEC_COND_EXPR optimizations v2

Richard Biener richard.guenther@gmail.com
Fri Aug 7 08:47:28 GMT 2020


On Fri, Aug 7, 2020 at 10:33 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 7 Aug 2020, Richard Biener wrote:
>
> > On Thu, Aug 6, 2020 at 8:07 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>
> >> On Thu, 6 Aug 2020, Christophe Lyon wrote:
> >>
> >>>> Was I on the right track configuring with
> >>>> --target=arm-none-linux-gnueabihf --with-cpu=cortex-a9
> >>>> --with-fpu=neon-fp16
> >>>> then compiling without any special option?
> >>>
> >>> Maybe you also need --with-float=hard, I don't remember if it's
> >>> implied by the 'hf' target suffix
> >>
> >> Thanks! That's what I was missing to reproduce the issue. Now I can
> >> reproduce it with just
> >>
> >> typedef unsigned int vec __attribute__((vector_size(16)));
> >> typedef int vi __attribute__((vector_size(16)));
> >> vi f(vec a,vec b){
> >>      return a==5 | b==7;
> >> }
> >>
> >> with -fdisable-tree-forwprop1 -fdisable-tree-forwprop2 -fdisable-tree-forwprop3 -O1
> >>
> >>    _1 = a_5(D) == { 5, 5, 5, 5 };
> >>    _3 = b_6(D) == { 7, 7, 7, 7 };
> >>    _9 = _1 | _3;
> >>    _7 = .VCOND (_9, { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 107);
> >>
> >> we fail to expand the equality comparison (expand_vec_cmp_expr_p returns
> >> false), while with -fdisable-tree-forwprop4 we do manage to expand
> >>
> >>    _2 = .VCONDU (a_5(D), { 5, 5, 5, 5 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }, 112);
> >>
> >> It doesn't make much sense to me that we can expand the more complicated
> >> form and not the simpler form of the same operation (both compare a to 5
> >> and produce a vector of -1 or 0 of the same size), especially when the
> >> target has an instruction (vceq) that does just what we want.
> >>
> >> Introducing boolean vectors was fine, but I think they should be real
> >> types, that we can operate on, not be forced to appear only as the first
> >> argument of a vcond.
> >>
> >> I can think of 2 natural ways to improve things: either implement vector
> >> comparisons in the ARM backend (possibly by forwarding to their existing
> >> code for vcond), or in the generic expansion code try using vcond if the
> >> direct comparison opcode is not provided.
> >>
> >> We can temporarily revert my patch, but I would like it to be temporary.
> >> Since aarch64 seems to handle the same code just fine, maybe someone who
> >> knows arm could copy the relevant code over?
> >>
> >> Does my message make sense, do people have comments?
> >
> > So what complicates things now (and to some extent pre-existed when you
> > used AVX512 which _could_ operate on boolean vectors) is that we
> > have split out the condition from VEC_COND_EXPR to separate stmts
> > but we do not expect backends to be able to code-generate the separate
> > form - instead we rely on the ISEL pass to trasform VEC_COND_EXPRs
> > to .VCOND[U] "merging" the compares again.  Now that process breaks
> > down once we have things like _9 = _1 | _3;  -  at some point I argued
> > that we should handle vector compares [and operations on boolean vectors]
> > as well in ISEL but then when it came up again for some reason I
> > disregarded that again.
> >
> > Thus - we don't want to go back to fixing up the generic expansion code
> > (which looks at one instruction at a time and is restricted by TER single-use
> > restrictions).
>
> Here, it would only be handling comparisons left over by ISEL because they
> do not feed a VEC_COND_EXPR, maybe not so bad. Handling it in ISEL would
> be more consistent, but then we may have to teach the vector lowering pass
> about this.
>
> > Instead we want to deal with this in ISEL which should
> > behave more intelligently.  In the above case it might involve turning
> > the _1 and _3 defs into .VCOND [with different result type], doing
> > _9 in that type and then somehow dealing with _7 ... but this eventually
> > means undoing the match simplification that introduced the code?
>
> For targets that do not have compact boolean vectors,
> VEC_COND_EXPR<*,-1,0> does nothing, it is just there as a technicality.
> Removing it to allow more simplifications makes sense, reintroducing it
> for expansion can make sense as well, I think it is ok if the second one
> reverses the first, but very locally, without having to propagate a change
> of type to the uses. So on ARM we could turn _1 and _3 into .VCOND
> producing bool:32 vectors, and replace _7 with a VIEW_CONVERT_EXPR. Or
> does using bool vectors like that seem bad? (maybe they aren't guaranteed
> to be equivalent to signed integer vectors with values 0 and -1 and we
> need to query the target to know if that is the case, or introduce an
> extra .VCOND)
>
> Also, we only want to replace comparisons with .VCOND if the target
> doesn't handle the comparison directly. For AVX512, we do want to produce
> SImode bool vectors (for k* registers) and operate on them directly,
> that's the whole point of introducing the bool vectors (if bool vectors
> were only used to feed VEC_COND_EXPR and were all turned into .VCOND
> before expansion, I don't see the point of specifying different types for
> bool vectors for AVX512 vs non-AVX512, as it would make no difference on
> what is passed to the backend).
>
> I think targets should provide some number of operations, including for
> instance bit_and and bit_ior on bool vectors, and be a bit consistent
> about what they provide, it becomes unmanageable in the middle-end
> otherwise...

It's currently somewhat of a mess I guess.  It might help to
restrict the simplification patterns to passes before vector lowering
so maybe add something like (or re-use)
canonicalize_math[_after_vectorization]_p ()?

Richard.

>
> --
> Marc Glisse


More information about the Gcc-patches mailing list