[RFC] avoid type conversion through versioning loop

Richard Biener richard.guenther@gmail.com
Wed Mar 24 07:55:29 GMT 2021


On Wed, Mar 24, 2021 at 3:55 AM guojiufu <guojiufu@linux.ibm.com> wrote:
>
> On 2021-03-23 16:25, Richard Biener via Gcc wrote:
> > On Tue, Mar 23, 2021 at 4:33 AM guojiufu <guojiufu@imap.linux.ibm.com>
> > wrote:
> >>
> >> On 2021-03-22 16:31, Jakub Jelinek via Gcc wrote:
> >> > On Mon, Mar 22, 2021 at 09:22:26AM +0100, Richard Biener via Gcc wrote:
> >> >> Better than doing loop versioning is to enhance SCEV (and thus also
> >> >> dependence analysis) to track extra conditions they need to handle
> >> >> cases similar as to how niter analysis computes it's 'assumptions'
> >> >> condition.  That allows the versioning to be done when there's an
> >> >> actual beneficial transform (like vectorization) rather than just
> >> >> upfront for the eventual chance that there'll be any.  Ideally such
> >> >> transform would then choose IVs in their transformed copy that
> >> >> are analyzable w/o repeating such versioning exercise for the next
> >> >> transform.
> >> >
> >> > And it might be beneficial to perform some type promotion/demotion
> >> > pass, either early during vectorization or separately before
> >> > vectorization
> >> > on a loop copy guarded with the ifns e.g. ifconv uses too.
> >> > Find out what type sizes the loop use, first try to demote computations
> >> > to narrower types in the vectorized loop candidate (e.g. if something
> >> > is computed in a wider type only to have the result demoted to narrower
> >> > type), then pick up the widest type size still in use in the loop (ok,
> >> > this assumes we don't mix multiple vector sizes in the loop, but
> >> > currently
> >> > our vectorizer doesn't do that) and try to promote computations that
> >> > could
> >> > be promoted to that type size.  We do partially something like that
> >> > during
> >> > vect patterns for bool types, but not other types I think.
> >> >
> >> >       Jakub
> >>
> >> Thanks for the suggestions!
> >>
> >> Enhancing SCEV could help other optimizations and improve performance
> >> in
> >> some cases.
> >> While one of the direct ideas of using the '64bit type' is to
> >> eliminate
> >> conversions,
> >> even for some cases which are not easy to be optimized through
> >> ifconv/vectorization,
> >> for examples:
> >>
> >>    unsigned int i = 0;
> >>    while (a[i]>1e-3)
> >>      i++;
> >>
> >>    unsigned int i = 0;
> >>    while (p1[i] == p2[i] && p1[i] != '\0')
> >>      i++;
> >>
> >> Or only do versioning on type for this kind of loop? Any suggestions?
> >
> > But the "optimization" resulting from such versioning is hard to
> > determine upfront which means we'll pay quite a big code size cost
> > for unknown questionable gain.  What's the particular optimization
>
> Right.  Code size increasing is a big pain on large loops. If the gain
> is not significant, this optimization may not positive.
>
> > in the above cases?  Note that for example for
> >
> >     unsigned int i = 0;
> >     while (a[i]>1e-3)
> >        i++;
> >
> > you know that when 'i' wraps then the loop will not terminate.  There's
>
> Thanks :) The code would be "while (a[i]>1e-3 && i < n)", the upbound is
> checkable.  Otherwise, the optimization to avoid zext is not adoptable.
>
> > the address computation that is i * sizeof (T) which is done in a
> > larger
> > type to avoid overflow so we have &a + zext (i) * 8 - is that the
> > operation
> > that is 'slow' for you?
>
> This is the point: "zext(i)" is the instruction that I want to
> eliminate,
> which is the direct goal of the optimization.
>
> The gain of eliminating the 'zext' is visible or not, and the code size
> increasing is small enough or not, this is a question and needs to
> trade-off.
> It may be only acceptable if the loop is very small, then eliminating
> 'zext'
> would help to save runtime, and code size increase maybe not big.

OK, so I indeed think that the desire to micro-optimize a 'zext' doesn't
make versioning a good trade-off.  The micro-architecture should better
not make that totally slow (I'd expect an extra latency comparable to
the multiply or add on the &a + zext(i) * 8 instruction chain).

OTOH making SCEV analysis not give up but instead record the constraints
under which its solution is valid is a very good and useful thing to do.

Richard.

> Thanks again for your very helpful comments!
>
> BR.
> Jiufu Guo.
>
> >
> > Richard.
> >
> >> BR.
> >> Jiufu Guo.


More information about the Gcc mailing list