[PATCH] tree-optimization/94865 - combine BIT_INSERT_EXPR of BIT_FIELD_REF
Richard Biener
rguenther@suse.de
Thu May 7 12:40:27 GMT 2020
On Thu, 7 May 2020, Richard Sandiford wrote:
> Richard Biener <rguenther@suse.de> writes:
> > On Thu, 7 May 2020, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > This implements patterns combining vector element insertion of
> >> > vector element extraction to a VEC_PERM_EXPR of both vectors
> >> > when supported. Plus it adds the more generic identity transform
> >> > of inserting a piece of itself at the same position.
> >> >
> >> > Richard - is there anything I can do to make this SVE aware?
> >> > I'd need to construct an identity permute and "insert" into
> >> > that permute that element from the other (or same) vector.
> >> > I suppose for most element positions that won't work but
> >> > at least inserting at [0] should? I'm mostly struggling
> >> > on how to use vec_perm_builder here when nelts is not constant,
> >> > since it's derived from vec<> can I simply start with
> >> > a single pattern with 1 stride and then insert by using []?
> >>
> >> I guess for SVE we still want to know that the range is safe
> >> for all VL, so after dropping the is_constant check, we'd
> >> want something like:
> >>
> >> {
> >> poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> >> unsigned int min_nelts = constant_lower_bound (nelts);
> >> }
> >> (if (...
> >> && at + elemsz <= min_nelts)
> >>
> >> In theory (hah) it should then just be a case of changing the
> >> vec_perm_builder constructor to:
> >>
> >> vec_perm_builder sel (nelts, min_nelts, 3);
> >>
> >> and then iterating over min_nelts * 3 instead of nelts here:
> >>
> >> > + for (unsigned i = 0; i < nelts; ++i)
> >> > + sel.quick_push (i / elemsz == at
> >> > + ? nelts + elem * elemsz + i % elemsz : i);
> >>
> >> So as far as the encoding goes, the first min_nelts elements are arbitrary
> >> values, and the following two min_nelts sequences form individual linear
> >> series.
> >
> > OK - not sure why we need exactly three nelts per pattern here.
>
> There are three styles of encoding (see the VECTOR_CST docs in
> generic.texi for the full gory details):
>
> - replicated {a0,...,an} (1 element per pattern)
>
> - {a0,...,an} followed by replicated {b0,...,bn} (2 elements per pattern)
>
> - {a0,...,an} followed by {b0,...,bn,b0+step0,...,bn+stepn,b0+step0*2,...}
> (3 elements per pattern)
>
> The min_elts check ensures that the difference from the identity permute
> selector is all in {a0,...,an}. The rest of the vector contains the normal
> elements for an identity selector and extends for as long as the runtime
> VL needs it to extend.
>
> > It also looks like all the constant_multiple_p () checks constrain
> > things quite a bit.
>
> I don't think it constrains it beyond what we can reasonably do.
> For SVE this is most likely to be useful when converting between
> SVE and Advanced SIMD.
>
> > Oh, and does a BIT_FIELD_REF with poly-int position
> > extract multiple elements in the end?! For the case we are extracting
> > a sub-vector and thus elemsz != 1 we constrain it so that this
> > sub-vector is not of variable size (err, not "independently" so,
> > whatever that means..)?
>
> No, a poly-int position doesn't change how many elements we extract.
> It just defers the calculation of the position until runtime.
>
> > My brain hurts... how do you write a GIMPLE testcase for aarch64
> > SVE covering such cases?
>
> Gimple testcase: with difficulty :-) I don't think we have a
> gimple FE syntax for poly_ints yet.
>
> It might be possible to construct a C testcase using intrinsics.
> I'll give it a go...
>
> >> This ought to be work for both SVE and non-SVE, although obviously
> >> there's a bit of wasted work for non-SVE.
> >>
> >> (And thanks for asking :-))
> >
> > So like this, it seems to still work on the x86 testcases?
>
> LGTM. I think the elemsz calculate is going to run into the same
> kind of trouble as PR94980 for AVX/SVE vector booleans, but that
> shouldn't hold the patch up.
I see. Looks like I have to fix PR88540 first as I see both
gcc.target/i386/pr54855-8.c and gcc.target/i386/pr54855-9.c FAIL
after the patch. Those are testcases for fancy combiner patterns
involving min/max operations on the [0] lane, leaving the rest of
the vector unmodified. With the patch we enter PRE with
<bb 2> [local count: 1073741824]:
_1 = BIT_FIELD_REF <x_3(D), 64, 0>;
if (_1 > a_4(D))
goto <bb 3>; [50.00%]
else
goto <bb 4>; [50.00%]
<bb 3> [local count: 536870913]:
<bb 4> [local count: 1073741824]:
# iftmp.0_2 = PHI <_1(3), a_4(D)(2)>
x_5 = BIT_INSERT_EXPR <x_3(D), iftmp.0_2, 0>;
return x_5;
and of course PRE sees that on one path x is unchanged and turns
it into
<bb 2> [local count: 1073741824]:
_1 = BIT_FIELD_REF <x_3(D), 64, 0>;
if (_1 > a_4(D))
goto <bb 4>; [50.00%]
else
goto <bb 3>; [50.00%]
<bb 3> [local count: 536870912]:
_7 = BIT_INSERT_EXPR <x_3(D), a_4(D), 0>;
<bb 4> [local count: 1073741824]:
# prephitmp_8 = PHI <_7(3), x_3(D)(2)>
return prephitmp_8;
defeating RTL if-conversion (there's some IEEE_MIN/MAX UNSPECs
in the backend). Due to no -ffast-math we cannot use MIN/MAX_EXPR
here on GIMPLE (testcase passes with -ffast-math) but we have
IFN_FMIN/MAX for IEEE compatible fmin/max but neither does the x86
backend have expanders for those nor does phiopt use those
without -ffast-math.
So I'll delay the patch a bit and give that a go (I have halfway
finished phiopt support somewhere lying around).
Richard.
More information about the Gcc-patches
mailing list