[PATCH PR69564]Improve vectorizer's runtime alias check for wrapping type

Bin.Cheng amker.cheng@gmail.com
Mon Feb 27 12:47:00 GMT 2017


On Fri, Feb 24, 2017 at 12:34 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Feb 24, 2017 at 11:53 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> As analyzed in PR69564, inefficient code for runtime alias check is generated in benchmark
>> scimark2.  It is suspected vectorized loop doesn't run enough iterations to cover the loss
>> of the inefficient code.  This patch improves runtime alias check for (unsigned) wrapping
>> types.
>>
>> Originally, vectorizer checks if below two ranges overlap with each other
>>     range a: [min_a, max_a]  ; abs(length_a) = max_a - min_a
>>     range_b: [min_b, max_b]  ; abs(length_b) = max_b - min_b
>> with condition like:
>>     max_a <= min_b || max_b <= min_a
>>
>> With knowledge of length of the two ranges, this patch checks overlap with condition like:
>>     (min_b - min_a) >= abs(len_a) && (min_b - min_a) <= (- abs(len_b))
>> It's better because common sub expressions in b/a can be folded.
>>
>> Note there is an implicit condition for above statements that length of range needs to be
>> no larger than middle value of the unsigned type.  This is always true since object never
>> spans over half address space in GCC.
>>
>> As mentioned, this is only done for case with wrap type and also constant segment length
>> for both a and b, which is the most common case.  It is possible to improve signed cases
>> or compilation time unknown segment length cases too, but may not worth the effort.
>>
>> Unfortunately, test case is hard to create for such code optimization.
>> Bootstrap and test on x86)64 and AArch64.  Is it OK?
>
> The patch mixes the above together with better folding (cancelling
> equal offset, decomposing
> pointer-plus).  Please separate those.
>
> Note that seg_len is always positive and the abs(len_N) in the
> comments doesn't make
> much sense to me.
>
> Meanwhile the simplification (ignoring overflow issues) can go,
> assuming equal lenghts
>
>     mina + len <= minb || minb + len <= mina
>
> -> mina - minb <= -len || minb - mina <= -len
> -> minb - mina >= len || mina - minb >= len
>
> and as we know len >= 0 this is the same as
>
>    abs (minb - mina) >= len
>
> and thus even simpler than your variant.  But maybe I'm missing something?
>
> As for the overflow issues, you do all computation in sizetype and
> thus guarantee
> unsigned compares, but I think to do these kind of "simplifications" you have to
> use signed quantities which opens up the possibility of objects crossing the
> signed-address-space wrapping point and the simplification to break apart.

Hi Richard,
I might understand your concern about overflow now.  The validity of
simplification is based on unsigned wrapping behavior, rather than
"mis-use" of infinite precision for unsigned type.  Take example of
two simple ranges:  range_a [6, 12] and range_b [2, 8] and below
simplified condition:
      (min_b - min_a) >= len_a && (min_b - min_a) <= (0 - len_b)
For signed type, we have 2 - 6 == -4 > 2 - 8 == -6;
For unsigned wrapping type, we also have 2 - 6 == 0x80000000 + -4 > 2
- 8 == 0x80000000 + -6;

So actually I think the simplification stands for both signed type and
unsigned wrapping type.

I rewrite below comment in other to better explain that simplified
condition equals to the original one:

  /* Intersection between ranges [min_a, max_a] and [min_b, max_b] can be
     checked by below condition:

       max_a <= min_b || max_b <= min_a                           ;cond_A
     Or
       (min_a + len_a) <= min_b || (min_b + len_b) <= min_a

     For type which wraps on overflow, it equals to:

       (min_b - min_a) >= len_a && (min_b - min_a) <= (0 - len_b) ;cond_B

     Proof:
       1) In case min_b >= min_a, i.e, (min_b - min_a) doesn't underflow.
     1.1) When min_b >= max_a
          (min_b - min_a) >= len_a is true.
          (min_b - min_a) <= (0 - len_b) is true because (0 - len_b)
          wraps to a large number.  Note (min_b - min_a) can only take
          the maxmum vaue (0 - len_b) if (min_a == 0 && max_b == MAX).

          Thus both cond_A and cond_B evaluate to TRUE.

     1.2) when min_b >= min_a && min_b < max_a
          (min_b - min_a) >= len_a is false.

          Thus both cond_A and cond_B evaluate to FALSE.

       2) In case min_a > min_b, i.e, (min_b - min_a) underflow.
     2.1) when min_a > min_b && min_a < max_b
          (min_b - min_a) <= (0 - len_b) == (min_b - max_b) is false.

          Thus both cond_A and cond_B evaluate to FALSE.

     2.2) When min_a >= max_b
          (min_b - min_a) >= len_a is true because (min_b - min_a) can
          only take the maximum value when (min_b == 0 && max_a == MAX).
          (min_b - min_a) <= (0 - len_b) == (min_b - max_b) is true.

          Thus both cond_A and cond_B evaluate to TRUE.

       We iterate all possible cases for two ranges and for each case cond_A
       and cond_B take the same value.  So the two conditions equal to each
       other.  */

Do I understand the issue correctly?

Thanks,
bin



More information about the Gcc-patches mailing list