This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops


On 12/18/2017 03:15 AM, Richard Biener wrote:
> On Fri, Dec 15, 2017 at 6:00 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>>> I hate this patch.
>>>
>>> The fundamental problem we have is that there are times when we very
>>> much want to thread jumps and yet there are other times when we do not.
>>>
>>> To date we have been able to largely select between those two by
>>> looking
>>> at the shape of the CFG and the jump thread to see how threading a
>>> particular jump would impact the shape of the CFG (particularly loops
>>> in
>>> the CFG).
>>>
>>> In this BZ we have a loop like this:
>>>
>>>        2
>>>        |
>>>        3<-------+
>>>        |        |
>>>        4<---+   |
>>>       / \   |   |
>>>      5   6  |   |
>>>       \  /  |   |
>>>         7   |   |
>>>        / \  |   |
>>>       8  11-+   |
>>>      / \        |
>>>     9  10-------+
>>>     |
>>>     E
>>>
>>> We want to thread the path (6, 7) (7, 11).  ie, there's a path through
>>> that inner loop where we don't need to do the loop test.  Here's an
>>> example (from libgomp testsuite)
>>>
>>> (N is 1500)
>>>
>>>     for (j = 0; j < N; j++)
>>>        {
>>>          if (j > 500)
>>>            {
>>>              x[i][j] = i + j + 3;
>>>              y[j] = i*j + 10;
>>>            }
>>>          else
>>>            x[i][j] = x[i][j]*3;
>>>        }
>>>
>>>
>>>
>>> Threading (in effect) puts the loop test into both arms of the
>>> conditional, then removes it from the ELSE arm because the condition is
>>> always "keep looping" in that arm.
>>>
>>> This plays poorly with loop parallelization -- I tried to follow what's
>>> going on in there and just simply got lost.  I got as far as seeing
>>> that
>>> the parallelization code thought there were loop carried dependencies.
>>> At least that's how it looked to me.  But I don't see any loop carried
>>> dependencies in the code.
>>
>> Hmm. I'll double check if I remember on Monday.
> 
> So I did and the ultimate reason GRAPHITE FAILs is because for
> the loop with the threading applied we can no longer compute
> number of iterations.  That's of course bad for _all_ loop optimizers,
> not just GRAPHITE (we just don't have any other test coverage).
> The main issue here is that after the threading is applied the
> block with the loop exit no longer dominates the loop latch.
> This means the loop should really be split but it also means we should
> definitely avoid performing such threading before loop optimizers run,
> and not just if parallelizing loops.
> 
> Can you adjust your patch this way?  The condition you check seems
> to more or less match the case where we thread through the exit
> test _into_ the loop?  The bad thing is really if in the resulting CFG
> the exit test isn't dominating the latch (thus we have an exit test
> that is not always executed):
The problem is we'll end up regressing other vectorization cases.  I
looked at a variety of things WRT the shape of the CFG and ultimately
found that there wasn't anything I could use to distinguish the two
cases.   I'll look again though.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]