This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops


On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>I hate this patch.
>
>The fundamental problem we have is that there are times when we very
>much want to thread jumps and yet there are other times when we do not.
>
>To date we have been able to largely select between those two by
>looking
>at the shape of the CFG and the jump thread to see how threading a
>particular jump would impact the shape of the CFG (particularly loops
>in
>the CFG).
>
>In this BZ we have a loop like this:
>
>        2
>        |
>        3<-------+
>        |        |
>        4<---+   |
>       / \   |   |
>      5   6  |   |
>       \  /  |   |
>         7   |   |
>        / \  |   |
>       8  11-+   |
>      / \        |
>     9  10-------+
>     |
>     E
>
>We want to thread the path (6, 7) (7, 11).  ie, there's a path through
>that inner loop where we don't need to do the loop test.  Here's an
>example (from libgomp testsuite)
>
>(N is 1500)
>
>     for (j = 0; j < N; j++)
>        {
>          if (j > 500)
>            {
>              x[i][j] = i + j + 3;
>              y[j] = i*j + 10;
>            }
>          else
>            x[i][j] = x[i][j]*3;
>        }
>
>
>
>Threading (in effect) puts the loop test into both arms of the
>conditional, then removes it from the ELSE arm because the condition is
>always "keep looping" in that arm.
>
>This plays poorly with loop parallelization -- I tried to follow what's
>going on in there and just simply got lost.  I got as far as seeing
>that
>the parallelization code thought there were loop carried dependencies.
>At least that's how it looked to me.  But I don't see any loop carried
>dependencies in the code.

Hmm. I'll double check if I remember on Monday. 

>
>You might naturally ask if threading this is actually wise.  It seems
>to
>broadly fit into the cases where we throttle threading so as not to
>muck
>around too much with the loop structure.  It's not terrible to detect a
>CFG of this shape and avoid threading.
>
>BUT....
>
>
>You can have essentially the same shape CFG (not 100% the same, but the
>same key characteristics), but where jump threading simplifies things
>in
>ways that are good for the SLP vectorizer (vect/bb-slp-16.c) or where
>jump threading avoids spurious warnings (graphite/scop-4.c)
>
>Initially I thought I'd seen a key difference in the contents of the
>latch block, but I think I was just up too late and mis-read the dump.
>
>So I don't see anything in the CFG shape or the contents of the blocks
>that can be reasonably analyzed at jump threading time.  Given we have
>two opposite needs and no reasonable way I can find to select between
>them, I'm resorting to a disgusting hack.  Namely to avoid threading
>through the latch to another point in the loop *when loop
>parallelization is enabled*.
>
>Let me be clear.  This is a hack.  I don't like it, not one little bit.
>But I don't see a way to resolve the regression without introducing
>other regressions and the loop parallelization code is a total mystery
>to me.
>
>Bootstrapped on x86_64 and regression tested with and without graphite.
>Confirmed it fixes the graphite related regressions mentioned in the BZ
>on x86_64.
>
>Committing to the trunk and hanging my head in shame.

I'd not have worried much about auto parallekization or graphite here. It does look like a missed handling there. 

Richard. 


>Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]