[RFC] avoid type conversion through versioning loop

Wed Mar 24 02:55:30 GMT 2021

On 2021-03-23 16:25, Richard Biener via Gcc wrote:
> On Tue, Mar 23, 2021 at 4:33 AM guojiufu <guojiufu@imap.linux.ibm.com> 
> wrote:
>> 
>> On 2021-03-22 16:31, Jakub Jelinek via Gcc wrote:
>> > On Mon, Mar 22, 2021 at 09:22:26AM +0100, Richard Biener via Gcc wrote:
>> >> Better than doing loop versioning is to enhance SCEV (and thus also
>> >> dependence analysis) to track extra conditions they need to handle
>> >> cases similar as to how niter analysis computes it's 'assumptions'
>> >> condition.  That allows the versioning to be done when there's an
>> >> actual beneficial transform (like vectorization) rather than just
>> >> upfront for the eventual chance that there'll be any.  Ideally such
>> >> transform would then choose IVs in their transformed copy that
>> >> are analyzable w/o repeating such versioning exercise for the next
>> >> transform.
>> >
>> > And it might be beneficial to perform some type promotion/demotion
>> > pass, either early during vectorization or separately before
>> > vectorization
>> > on a loop copy guarded with the ifns e.g. ifconv uses too.
>> > Find out what type sizes the loop use, first try to demote computations
>> > to narrower types in the vectorized loop candidate (e.g. if something
>> > is computed in a wider type only to have the result demoted to narrower
>> > type), then pick up the widest type size still in use in the loop (ok,
>> > this assumes we don't mix multiple vector sizes in the loop, but
>> > currently
>> > our vectorizer doesn't do that) and try to promote computations that
>> > could
>> > be promoted to that type size.  We do partially something like that
>> > during
>> > vect patterns for bool types, but not other types I think.
>> >
>> >       Jakub
>> 
>> Thanks for the suggestions!
>> 
>> Enhancing SCEV could help other optimizations and improve performance 
>> in
>> some cases.
>> While one of the direct ideas of using the '64bit type' is to 
>> eliminate
>> conversions,
>> even for some cases which are not easy to be optimized through
>> ifconv/vectorization,
>> for examples:
>> 
>>    unsigned int i = 0;
>>    while (a[i]>1e-3)
>>      i++;
>> 
>>    unsigned int i = 0;
>>    while (p1[i] == p2[i] && p1[i] != '\0')
>>      i++;
>> 
>> Or only do versioning on type for this kind of loop? Any suggestions?
> 
> But the "optimization" resulting from such versioning is hard to
> determine upfront which means we'll pay quite a big code size cost
> for unknown questionable gain.  What's the particular optimization

Right.  Code size increasing is a big pain on large loops. If the gain
is not significant, this optimization may not positive.

> in the above cases?  Note that for example for
> 
>     unsigned int i = 0;
>     while (a[i]>1e-3)
>        i++;
> 
> you know that when 'i' wraps then the loop will not terminate.  There's

Thanks :) The code would be "while (a[i]>1e-3 && i < n)", the upbound is
checkable.  Otherwise, the optimization to avoid zext is not adoptable.

> the address computation that is i * sizeof (T) which is done in a 
> larger
> type to avoid overflow so we have &a + zext (i) * 8 - is that the 
> operation
> that is 'slow' for you?

This is the point: "zext(i)" is the instruction that I want to 
eliminate,
which is the direct goal of the optimization.

The gain of eliminating the 'zext' is visible or not, and the code size
increasing is small enough or not, this is a question and needs to 
trade-off.
It may be only acceptable if the loop is very small, then eliminating 
'zext'
would help to save runtime, and code size increase maybe not big.

Thanks again for your very helpful comments!

BR.
Jiufu Guo.

> 
> Richard.
> 
>> BR.
>> Jiufu Guo.