[PATCH PR69564]Improve vectorizer's runtime alias check for wrapping type
Bin.Cheng
amker.cheng@gmail.com
Mon Feb 27 12:47:00 GMT 2017
On Fri, Feb 24, 2017 at 12:34 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Feb 24, 2017 at 11:53 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> As analyzed in PR69564, inefficient code for runtime alias check is generated in benchmark
>> scimark2. It is suspected vectorized loop doesn't run enough iterations to cover the loss
>> of the inefficient code. This patch improves runtime alias check for (unsigned) wrapping
>> types.
>>
>> Originally, vectorizer checks if below two ranges overlap with each other
>> range a: [min_a, max_a] ; abs(length_a) = max_a - min_a
>> range_b: [min_b, max_b] ; abs(length_b) = max_b - min_b
>> with condition like:
>> max_a <= min_b || max_b <= min_a
>>
>> With knowledge of length of the two ranges, this patch checks overlap with condition like:
>> (min_b - min_a) >= abs(len_a) && (min_b - min_a) <= (- abs(len_b))
>> It's better because common sub expressions in b/a can be folded.
>>
>> Note there is an implicit condition for above statements that length of range needs to be
>> no larger than middle value of the unsigned type. This is always true since object never
>> spans over half address space in GCC.
>>
>> As mentioned, this is only done for case with wrap type and also constant segment length
>> for both a and b, which is the most common case. It is possible to improve signed cases
>> or compilation time unknown segment length cases too, but may not worth the effort.
>>
>> Unfortunately, test case is hard to create for such code optimization.
>> Bootstrap and test on x86)64 and AArch64. Is it OK?
>
> The patch mixes the above together with better folding (cancelling
> equal offset, decomposing
> pointer-plus). Please separate those.
>
> Note that seg_len is always positive and the abs(len_N) in the
> comments doesn't make
> much sense to me.
>
> Meanwhile the simplification (ignoring overflow issues) can go,
> assuming equal lenghts
>
> mina + len <= minb || minb + len <= mina
>
> -> mina - minb <= -len || minb - mina <= -len
> -> minb - mina >= len || mina - minb >= len
>
> and as we know len >= 0 this is the same as
>
> abs (minb - mina) >= len
>
> and thus even simpler than your variant. But maybe I'm missing something?
>
> As for the overflow issues, you do all computation in sizetype and
> thus guarantee
> unsigned compares, but I think to do these kind of "simplifications" you have to
> use signed quantities which opens up the possibility of objects crossing the
> signed-address-space wrapping point and the simplification to break apart.
Hi Richard,
I might understand your concern about overflow now. The validity of
simplification is based on unsigned wrapping behavior, rather than
"mis-use" of infinite precision for unsigned type. Take example of
two simple ranges: range_a [6, 12] and range_b [2, 8] and below
simplified condition:
(min_b - min_a) >= len_a && (min_b - min_a) <= (0 - len_b)
For signed type, we have 2 - 6 == -4 > 2 - 8 == -6;
For unsigned wrapping type, we also have 2 - 6 == 0x80000000 + -4 > 2
- 8 == 0x80000000 + -6;
So actually I think the simplification stands for both signed type and
unsigned wrapping type.
I rewrite below comment in other to better explain that simplified
condition equals to the original one:
/* Intersection between ranges [min_a, max_a] and [min_b, max_b] can be
checked by below condition:
max_a <= min_b || max_b <= min_a ;cond_A
Or
(min_a + len_a) <= min_b || (min_b + len_b) <= min_a
For type which wraps on overflow, it equals to:
(min_b - min_a) >= len_a && (min_b - min_a) <= (0 - len_b) ;cond_B
Proof:
1) In case min_b >= min_a, i.e, (min_b - min_a) doesn't underflow.
1.1) When min_b >= max_a
(min_b - min_a) >= len_a is true.
(min_b - min_a) <= (0 - len_b) is true because (0 - len_b)
wraps to a large number. Note (min_b - min_a) can only take
the maxmum vaue (0 - len_b) if (min_a == 0 && max_b == MAX).
Thus both cond_A and cond_B evaluate to TRUE.
1.2) when min_b >= min_a && min_b < max_a
(min_b - min_a) >= len_a is false.
Thus both cond_A and cond_B evaluate to FALSE.
2) In case min_a > min_b, i.e, (min_b - min_a) underflow.
2.1) when min_a > min_b && min_a < max_b
(min_b - min_a) <= (0 - len_b) == (min_b - max_b) is false.
Thus both cond_A and cond_B evaluate to FALSE.
2.2) When min_a >= max_b
(min_b - min_a) >= len_a is true because (min_b - min_a) can
only take the maximum value when (min_b == 0 && max_a == MAX).
(min_b - min_a) <= (0 - len_b) == (min_b - max_b) is true.
Thus both cond_A and cond_B evaluate to TRUE.
We iterate all possible cases for two ranges and for each case cond_A
and cond_B take the same value. So the two conditions equal to each
other. */
Do I understand the issue correctly?
Thanks,
bin
More information about the Gcc-patches
mailing list