[PATCH] PR/67682, break SLP groups up if only some elements match

Richard Biener richard.guenther@gmail.com
Tue Nov 3 13:39:00 GMT 2015

```On Tue, Oct 27, 2015 at 6:38 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 26/10/15 15:04, Richard Biener wrote:
>>
>>
>> apart from the fact that you'll post a new version you need to adjust
>> GROUP_GAP.
>> You also seem to somewhat "confuse" "first I stmts" and "a group of
>> size I", those
>> are not the same when the group has haps.  I'd say "a group of size i"
>> makes the
>> most sense here thus I suggest to adjust the function comment accordingly.
>
>
> Ok, thanks for pointing this out. My objective had been to only split the
> store groups - which in BB vectorization, always seem to have gap 0 1 1 ....
> 1. I didn't come up with a good scheme for how to split load groups, but it
> seemed that I didn't need to do anything there if I restricted to BB
> vectorization only. For example, consider (ignoring that we could multiply
> the first four elements by 1 and add 0 to the last four):
>
>     a[0] = b[I] + 1;
>     a[1] = b[J] + 2;
>     a[2] = b[K] + 3;
>     a[3] = b[L] + 4;
>     a[4] = b[M] * 3;
>     a[5] = b[N] * 4;
>     a[6] = b[O] * 5;
>     a[7] = b[P] * 7;
>
>
> with constants I,J,K,L,M,N,O,P. Even with those being a sequence 2 0 1 1 3 0
> 2 1 with overlaps and repetitions, this works fine for BB SLP (two subgroups
> of stores, *sharing* a load group but with different permutations). Likewise
> 0 1 2 3 0 2 4 6.
>
> For loop SLP, yes it looks like the load group needs to be split. So how;
> and what constraints to impose on those constants? (There is no single right
>
> A fairly-strict scheme could be that (I,J,K,L) must be within a contiguous
> block of memory, that does not overlap with the contiguous block containing
> (M,N,O,P). Then, splitting the load group on the boundary seems reasonable,
> and updating the gaps as you suggest. However, when you say "the group first
> elements GROUP_GAP is the gap at the _end_ of the whole group" - the gap at
> the end is the gap that comes after the last element and up to....what?
>
> Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we
> split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1?
> (IIUC, you suggest 1111 and 0111?)

As said on IRC it should be 4 1 1 1 and 4 1 1 1.

> If they are disjoint sets, but overlapping blocks of memory, say 0 2 4 6 1 3
> 5 7...then do we create two load groups, with gap 0 2 2 2 and 0 2 2 2 again?
> Does something record that the load groups access overlapping areas, and
> record the offset against each other?

No, I don't think we can split load groups that way.  So I think if
splitting store
groups works well (with having larger load groups) then that's the way to go
(even for loop vect).

> If there are repeated elements (as in the BB SLP case mentioned above), I'm
> not clear how we can split this effectively...so may have to rule out that
> case. (Moreover, if we are considering hybrid SLP, it may not be clear what
> the loop accesses are, we may be presented only with the SLP accesses. Do we
> necessarily want to pull those out of a load group?)
>
> So I expect I may resolve some of these issues as I progress, but I'm
> curious as to whether (and why) the patch was really broken (wrt gaps) as it
> stood...

Yes, the gaps were clearly bogously constructed in general.  If you have an
existing group you can only split it into non-overlapping groups.  Thus for
two load SLP nodes loading from 0 2 4 6 and from 1 3 5 7 you will have
a single "group" (0 1 2 3 4 5 6 7) and you can at most split it as
0 1 2 3, 4 5 6 7 which won't help in this case (but would be actually worse).

So I think restricting the splitting to the stores should work fine.

Richard.

> Thanks,
> Alan
>

```