This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops
- From: Jeff Law <law at redhat dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 18 Dec 2017 09:06:25 -0700
- Subject: Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops
- Authentication-results: sourceware.org; auth=none
- References: <c56e84d9-5e0a-5430-4d2f-0584c51e3012@redhat.com> <1BA1571E-A51D-45EE-A2A7-B5370E01A077@gmail.com> <CAFiYyc1HcJs4jDsmXxqUxV2vDGPVL3w=8wv+y6LcBme2aC4=rA@mail.gmail.com>
On 12/18/2017 03:15 AM, Richard Biener wrote:
> On Fri, Dec 15, 2017 at 6:00 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>>> I hate this patch.
>>>
>>> The fundamental problem we have is that there are times when we very
>>> much want to thread jumps and yet there are other times when we do not.
>>>
>>> To date we have been able to largely select between those two by
>>> looking
>>> at the shape of the CFG and the jump thread to see how threading a
>>> particular jump would impact the shape of the CFG (particularly loops
>>> in
>>> the CFG).
>>>
>>> In this BZ we have a loop like this:
>>>
>>> 2
>>> |
>>> 3<-------+
>>> | |
>>> 4<---+ |
>>> / \ | |
>>> 5 6 | |
>>> \ / | |
>>> 7 | |
>>> / \ | |
>>> 8 11-+ |
>>> / \ |
>>> 9 10-------+
>>> |
>>> E
>>>
>>> We want to thread the path (6, 7) (7, 11). ie, there's a path through
>>> that inner loop where we don't need to do the loop test. Here's an
>>> example (from libgomp testsuite)
>>>
>>> (N is 1500)
>>>
>>> for (j = 0; j < N; j++)
>>> {
>>> if (j > 500)
>>> {
>>> x[i][j] = i + j + 3;
>>> y[j] = i*j + 10;
>>> }
>>> else
>>> x[i][j] = x[i][j]*3;
>>> }
>>>
>>>
>>>
>>> Threading (in effect) puts the loop test into both arms of the
>>> conditional, then removes it from the ELSE arm because the condition is
>>> always "keep looping" in that arm.
>>>
>>> This plays poorly with loop parallelization -- I tried to follow what's
>>> going on in there and just simply got lost. I got as far as seeing
>>> that
>>> the parallelization code thought there were loop carried dependencies.
>>> At least that's how it looked to me. But I don't see any loop carried
>>> dependencies in the code.
>>
>> Hmm. I'll double check if I remember on Monday.
>
> So I did and the ultimate reason GRAPHITE FAILs is because for
> the loop with the threading applied we can no longer compute
> number of iterations. That's of course bad for _all_ loop optimizers,
> not just GRAPHITE (we just don't have any other test coverage).
> The main issue here is that after the threading is applied the
> block with the loop exit no longer dominates the loop latch.
> This means the loop should really be split but it also means we should
> definitely avoid performing such threading before loop optimizers run,
> and not just if parallelizing loops.
>
> Can you adjust your patch this way? The condition you check seems
> to more or less match the case where we thread through the exit
> test _into_ the loop? The bad thing is really if in the resulting CFG
> the exit test isn't dominating the latch (thus we have an exit test
> that is not always executed):
The problem is we'll end up regressing other vectorization cases. I
looked at a variety of things WRT the shape of the CFG and ultimately
found that there wasn't anything I could use to distinguish the two
cases. I'll look again though.
Jeff