This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: gcc-patches at gcc dot gnu dot org,Jeff Law <law at redhat dot com>,gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 15 Dec 2017 18:00:40 +0100
- Subject: Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops
- Authentication-results: sourceware.org; auth=none
- References: <c56e84d9-5e0a-5430-4d2f-0584c51e3012@redhat.com>
On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>I hate this patch.
>
>The fundamental problem we have is that there are times when we very
>much want to thread jumps and yet there are other times when we do not.
>
>To date we have been able to largely select between those two by
>looking
>at the shape of the CFG and the jump thread to see how threading a
>particular jump would impact the shape of the CFG (particularly loops
>in
>the CFG).
>
>In this BZ we have a loop like this:
>
> 2
> |
> 3<-------+
> | |
> 4<---+ |
> / \ | |
> 5 6 | |
> \ / | |
> 7 | |
> / \ | |
> 8 11-+ |
> / \ |
> 9 10-------+
> |
> E
>
>We want to thread the path (6, 7) (7, 11). ie, there's a path through
>that inner loop where we don't need to do the loop test. Here's an
>example (from libgomp testsuite)
>
>(N is 1500)
>
> for (j = 0; j < N; j++)
> {
> if (j > 500)
> {
> x[i][j] = i + j + 3;
> y[j] = i*j + 10;
> }
> else
> x[i][j] = x[i][j]*3;
> }
>
>
>
>Threading (in effect) puts the loop test into both arms of the
>conditional, then removes it from the ELSE arm because the condition is
>always "keep looping" in that arm.
>
>This plays poorly with loop parallelization -- I tried to follow what's
>going on in there and just simply got lost. I got as far as seeing
>that
>the parallelization code thought there were loop carried dependencies.
>At least that's how it looked to me. But I don't see any loop carried
>dependencies in the code.
Hmm. I'll double check if I remember on Monday.
>
>You might naturally ask if threading this is actually wise. It seems
>to
>broadly fit into the cases where we throttle threading so as not to
>muck
>around too much with the loop structure. It's not terrible to detect a
>CFG of this shape and avoid threading.
>
>BUT....
>
>
>You can have essentially the same shape CFG (not 100% the same, but the
>same key characteristics), but where jump threading simplifies things
>in
>ways that are good for the SLP vectorizer (vect/bb-slp-16.c) or where
>jump threading avoids spurious warnings (graphite/scop-4.c)
>
>Initially I thought I'd seen a key difference in the contents of the
>latch block, but I think I was just up too late and mis-read the dump.
>
>So I don't see anything in the CFG shape or the contents of the blocks
>that can be reasonably analyzed at jump threading time. Given we have
>two opposite needs and no reasonable way I can find to select between
>them, I'm resorting to a disgusting hack. Namely to avoid threading
>through the latch to another point in the loop *when loop
>parallelization is enabled*.
>
>Let me be clear. This is a hack. I don't like it, not one little bit.
>But I don't see a way to resolve the regression without introducing
>other regressions and the loop parallelization code is a total mystery
>to me.
>
>Bootstrapped on x86_64 and regression tested with and without graphite.
>Confirmed it fixes the graphite related regressions mentioned in the BZ
>on x86_64.
>
>Committing to the trunk and hanging my head in shame.
I'd not have worried much about auto parallekization or graphite here. It does look like a missed handling there.
Richard.
>Jeff