[RFC] More jump threading restrictions in the presence of loops.

Tue Oct 5 15:10:28 GMT 2021


On 10/5/2021 7:33 AM, Aldy Hernandez wrote:
> On Mon, Oct 4, 2021 at 6:29 PM Michael Matz <matz@suse.de> wrote:
>> Hello,
>>
>> On Mon, 4 Oct 2021, Jeff Law wrote:
>>
>>> And just in case it got lost.  Here's the analysis on 960218-1 on visium:
>>>
>>> We've got this going into ethread:
>>>
>>> int f (int x)
>>> {
>>>    int D.1420;
>>>    int a;
>>>
>>> ;;   basic block 2, loop depth 0, maybe hot
>>> ;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
>>>    a_4 = ~x_3(D);
>>>    goto <bb 4>; [INV]
>>> ;;    succ:       4 (FALLTHRU,EXECUTABLE)
>>>
>>> ;;   basic block 3, loop depth 1, maybe hot
>>> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       4 (TRUE_VALUE,EXECUTABLE)
>>>    gl = a_1;
>>> ;;    succ:       4 (FALLTHRU,DFS_BACK,EXECUTABLE)
>>>
>>> ;;   basic block 4, loop depth 1, maybe hot
>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       2 (FALLTHRU,EXECUTABLE)
>>> ;;                3 (FALLTHRU,DFS_BACK,EXECUTABLE)
>>>    # a_1 = PHI <a_4(2), 0(3)>
>>>    if (a_1 != 0)
>>>      goto <bb 3>; [INV]
>>>    else
>>>      goto <bb 5>; [INV]
>>> ;;    succ:       3 (TRUE_VALUE,EXECUTABLE)
>>> ;;                5 (FALSE_VALUE,EXECUTABLE)
>>>
>>> ;;   basic block 5, loop depth 0, maybe hot
>>> ;;    prev block 4, next block 1, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       4 (FALSE_VALUE,EXECUTABLE)
>>>    return;
>>> ;;    succ:       EXIT
>>>
>>> }
>> First notice that this doesn't have an empty latch block to start with
>> (in fact, there is no separate latch block at all), so the loop optimizers
>> haven't been initialized for simple latches at this point.  Still, I would
>> say that even then we want to be careful with the latch edge, as putting
>> code onto it will most probably create a problem downstream once we _do_
>> want to intialize the loop optimizers for simple latches.  So, that we are
>> careful here is okay, we are just too careful in this specific case.
>>
>>> There's a pretty obvious jump thread in there.  3->4->5.
>>>
>>> We used to allow this, but it's currently being rejected and I'm not
>>> sure it should.
>>>
>>> In simplest terms we're threading from inside the loop, through the
>>> latch to a point outside the loop.  At first glance, that seems safe.
>> Is at least the unrestricted jump threader (the one after loop optimizers)
>> picking this up?
>>
>> Independend of that, I agree, that this specific instance of threading
>> through the latch block, even though followed by to-be-copied
>> instructions, is okay.  We need to determine precisely when that is okay,
>> though.  In this case there are several reasons I could see why this is
>> so:
>> (a) while the thread is through the latch block, it's _not_ through the
>>      latch edge (which is 4->3).  That would create the problem, so here
>>      we're fine.
> It seems we all agree Jeff's finding should be allowed, so let's
> attack this one first, since it gets almost all of his visium
> failures.  I can submit the rest of the cases separately.
>
> The attached patch catches the IL discussed, and adds a relevant
> gimple FE test others can use for experimenting :).
>
> Tested on x86-64 and by visual inspection on visium-elf on the
> regressions Jeff pointed me at.
>
> OK?
>
> BTW Jeff, this fixes all the regressions you mention except:
>
> 1. pr68911.c
>
> The path not being threaded here is 7->10->11->12.  It crosses loops
> multiple times, so I believe the restriction code is correct.
>
> 7, 10, 12 are in loop1.
> 11 is in loop 2.
>
> So we have a path going from loop1 -> loop2 -> loop1.  I can't
> conceive any scenario where this is ok, but I can attach the entire IL
> if I'm missing something.
>
> 2. 961125-1.c
>
> This one is a bit trickier.  Here we're trying to thread the following
> conditional, which I mentioned when I contributed this work, we don't
> handle (and happens infrequently enough not to matter):
>
> +  // Loop 4
> +  <bb 6> [local count: 114863531]:
> +  # ptr_8 = PHI <ptr_2(4), ptr_2(5)>
> +  if (ptr_8 < &MEM <char[4]> [(void *)":ab" + 3B])
> +    goto <bb 7>; [50.00%]
> +  else
> +    goto <bb 17>; [50.00%]
>
> The hybrid threader doesn't handle &MEM in the final conditional.  As
> I mentioned earlier, if this becomes an issue, we can adapt class
> pointer_equiv_analyzer like we did for evrp.  I have a gimple FE test
> I will contribute as an XFAIL with an associated PR to keep us honest.
>
> That being said... in this particular test, this is all irrelevant
> because the path will be disallowed for two reasons:
>
> a) The path crosses loops, and the reason we didn't realize it in the
> old code was because the ASSERT_EXPR had pulled the SSA outside the
> loop, so it looks like the entire path is l in the same loop.  If you
> look at the original IL, it's not.
>
> b) Now the path actually fits the pattern being discussed in this
> patch, where there's an early exit out of a loop, so it looks like we
> should handle it.  But...in this case, we would fill a presently empty
> latch.  Interestingly, the old code didn't catch it, because....you
> guessed it...there was an ASSERT_EXPR in the latch.
>
> So I argue that even in the absence of MEM_REF handling, we shouldn't
> thread this path.
>
> 0001-Loosen-loop-crossing-restriction-in-threader.patch
>
>  From 5abe65668f602d53b8f3dc515508dc4616beb048 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez <aldyh@redhat.com>
> Date: Tue, 5 Oct 2021 15:03:34 +0200
> Subject: [PATCH] Loosen loop crossing restriction in threader.
>
> Crossing loops is generally discouraged from the threader, but we can
> make an exception when we don't cross the latch or enter another loop,
> since this is just an early exit out of the loop.
>
> In fact, the whole threaded path is logically outside the loop.  This
> has nice secondary effects.  For example, objects on the threaded path
> will no longer necessarily be live throughout the loop, so we can get
> register allocation improvements.  The threaded path can physically
> move outside the loop resulting in better icache efficiency, etc.
>
> Tested on x86-64 Linux, and on a visium-elf cross making sure that the
> following tests do not have an abort in the final assembly:
>
> gcc.c-torture/execute/960218-1.c
> gcc.c-torture/execute/visium-pending-4.c
> gcc.c-torture/execute/pr58209.c
>
> gcc/ChangeLog:
>
> 	* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/tree-ssa/ssa-thread-valid.c: New test.
I'll throw it in and see what we get :-)

jeff