[PATCH] PR/67682, break SLP groups up if only some elements match
Tue Nov 3 13:39:00 GMT 2015
On Tue, Oct 27, 2015 at 6:38 PM, Alan Lawrence <firstname.lastname@example.org> wrote:
> On 26/10/15 15:04, Richard Biener wrote:
>> apart from the fact that you'll post a new version you need to adjust
>> You also seem to somewhat "confuse" "first I stmts" and "a group of
>> size I", those
>> are not the same when the group has haps. I'd say "a group of size i"
>> makes the
>> most sense here thus I suggest to adjust the function comment accordingly.
> Ok, thanks for pointing this out. My objective had been to only split the
> store groups - which in BB vectorization, always seem to have gap 0 1 1 ....
> 1. I didn't come up with a good scheme for how to split load groups, but it
> seemed that I didn't need to do anything there if I restricted to BB
> vectorization only. For example, consider (ignoring that we could multiply
> the first four elements by 1 and add 0 to the last four):
> a = b[I] + 1;
> a = b[J] + 2;
> a = b[K] + 3;
> a = b[L] + 4;
> a = b[M] * 3;
> a = b[N] * 4;
> a = b[O] * 5;
> a = b[P] * 7;
> with constants I,J,K,L,M,N,O,P. Even with those being a sequence 2 0 1 1 3 0
> 2 1 with overlaps and repetitions, this works fine for BB SLP (two subgroups
> of stores, *sharing* a load group but with different permutations). Likewise
> 0 1 2 3 0 2 4 6.
> For loop SLP, yes it looks like the load group needs to be split. So how;
> and what constraints to impose on those constants? (There is no single right
> A fairly-strict scheme could be that (I,J,K,L) must be within a contiguous
> block of memory, that does not overlap with the contiguous block containing
> (M,N,O,P). Then, splitting the load group on the boundary seems reasonable,
> and updating the gaps as you suggest. However, when you say "the group first
> elements GROUP_GAP is the gap at the _end_ of the whole group" - the gap at
> the end is the gap that comes after the last element and up to....what?
> Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we
> split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1?
> (IIUC, you suggest 1111 and 0111?)
As said on IRC it should be 4 1 1 1 and 4 1 1 1.
> If they are disjoint sets, but overlapping blocks of memory, say 0 2 4 6 1 3
> 5 7...then do we create two load groups, with gap 0 2 2 2 and 0 2 2 2 again?
> Does something record that the load groups access overlapping areas, and
> record the offset against each other?
No, I don't think we can split load groups that way. So I think if
groups works well (with having larger load groups) then that's the way to go
(even for loop vect).
> If there are repeated elements (as in the BB SLP case mentioned above), I'm
> not clear how we can split this effectively...so may have to rule out that
> case. (Moreover, if we are considering hybrid SLP, it may not be clear what
> the loop accesses are, we may be presented only with the SLP accesses. Do we
> necessarily want to pull those out of a load group?)
> So I expect I may resolve some of these issues as I progress, but I'm
> curious as to whether (and why) the patch was really broken (wrt gaps) as it
Yes, the gaps were clearly bogously constructed in general. If you have an
existing group you can only split it into non-overlapping groups. Thus for
two load SLP nodes loading from 0 2 4 6 and from 1 3 5 7 you will have
a single "group" (0 1 2 3 4 5 6 7) and you can at most split it as
0 1 2 3, 4 5 6 7 which won't help in this case (but would be actually worse).
So I think restricting the splitting to the stores should work fine.
More information about the Gcc-patches