[RFC] More jump threading restrictions in the presence of loops.

Sun Oct 17 01:32:17 GMT 2021

On 10/6/2021 4:22 AM, Aldy Hernandez wrote:
> [Here go the bits by Richi, tested on x86-64 Linux, and ongoing tests
> on aarch64 and ppc64le.]
>
> There is a lot of fall-out from this patch, as there were many threading
> tests that assumed the restrictions introduced by this patch were valid.
> Some tests have merely shifted the threading to after loop
> optimizations, but others ended up with no threading opportunities at
> all.  Surprisingly some tests ended up with more total threads.  It was
> a crapshoot all around.
>
> On a postive note, there are 6 tests that no longer XFAIL, and one
> guality test which now passes.
>
> I would appreciate someone reviewing the test changes.  I am unsure
> whether some of the tests should be removed, XFAILed, or some other
> thing.
>
> I felt a bit queasy about such a fundamental change wrt threading, so I
> ran it through my callgrind test harness (.ii files from a bootstrap).
> There was no change in overall compilation, DOM, or the VRP threaders.
>
> However, there was a slight increase of 1.63% in the backward threader.
> I'm pretty sure we could reduce this if we incorporated the restrictions
> into their profitability code.  This way we could stop the search when
> we ran into one of these restrictions.  Not sure it's worth it at this
> point.
>
> Note, that this ad-hoc testing is not meant to replace a more thorough
> SPEC, etc test.
>
> Tested on x86-64 Linux.
>
> OK pending tests on aarch64 and ppc64le?
>
> Co-authored-by: Richard Biener <rguenther@suse.de>
>
> 0001-Disallow-loop-rotation-and-loop-header-crossing-in-j.patch
>
>  From 5c0fb55b45733ec38918f5d7a576719da32e4a6c Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez <aldyh@redhat.com>
> Date: Mon, 4 Oct 2021 09:47:02 +0200
> Subject: [PATCH] Disallow loop rotation and loop header crossing in jump
>   threaders.
>
> There is a lot of fall-out from this patch, as there are many threading
> tests that assumed the restrictions introduced by this patch were valid.
> Some tests have merely shifted the threading to after loop
> optimizations, but others ended up with no threading opportunities at
> all.  Surprisingly some tests ended up with more total threads.  It was
> a crapshoot all around.
>
> On a postive note, there are 6 tests that no longer XFAIL, and one
> guality test which now passes.
>
> I would appreciate someone reviewing the test changes.  I am unsure
> whether some of the tests should be removed, XFAILed, or some other
> thing.
>
> I felt a bit queasy about such a fundamental change wrt threading, so I
> ran it through my callgrind test harness (.ii files from a bootstrap).
> There was no change in overall compilation, DOM, or the VRP threaders.
>
> However, there was a slight increase of 1.63% in the backward threader.
> I'm pretty sure we could reduce this if we incorporated the restrictions
> into their profitability code.  This way we could stop the search when
> we ran into one of these restrictions.  Not sure it's worth it at this
> point.
>
> Note, that this ad-hoc testing is not meant to replace a more thorough
> SPEC, etc test.
>
> Tested on x86-64 Linux.
>
> OK pending tests on aarch64 and ppc64le?
>
> Co-authored-by: Richard Biener <rguenther@suse.de>
>
> gcc/ChangeLog:
>
> 	* tree-ssa-threadupdate.c (cancel_thread): Dump threading reason
> 	on the same line as the threading cancellation.
> 	(jt_path_registry::cancel_invalid_paths): Avoid rotating loops.
> 	Avoid threading through loop headers where the path remains in the
> 	loop.
>
> libgomp/ChangeLog:
>
> 	* testsuite/libgomp.graphite/force-parallel-5.c: Remove xfail.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/Warray-bounds-87.c: Remove xfail.
> 	* gcc.dg/analyzer/pr94851-2.c: Remove xfail.
> 	* gcc.dg/graphite/pr69728.c: Remove xfail.
> 	* gcc.dg/graphite/scop-dsyr2k.c: Remove xfail.
> 	* gcc.dg/graphite/scop-dsyrk.c: Remove xfail.
> 	* gcc.dg/shrink-wrap-loop.c: Remove xfail.
> 	* gcc.dg/loop-8.c: Adjust for new threading restrictions.
> 	* gcc.dg/tree-ssa/ifc-20040816-1.c: Same.
> 	* gcc.dg/tree-ssa/pr21559.c: Same.
> 	* gcc.dg/tree-ssa/pr59597.c: Same.
> 	* gcc.dg/tree-ssa/pr71437.c: Same.
> 	* gcc.dg/tree-ssa/pr77445-2.c: Same.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Same.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Same.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
> 	* gcc.dg/vect/bb-slp-16.c: Same.
> 	* gcc.dg/tree-ssa/ssa-thread-invalid.c: New test.
So there is some light fallout on our friend visium.  I must say that 
having a port which fails to link if there's a call to abort () left in 
the program is awful helpful :-)    I reviewed the half-dozen or so 
tests that fail after this change, they all look like any jump threads 
are for loop rotation.  So I'm inclined to ignore those and just 
generate new baselines for the visium port.

Some additional thoughts on the tests below.  I didn't see anything 
that's worth an objection, but at least in a couple cases there's ways 
to save the test.  In others I might have made a slightly different 
decision, but I can live with yours.

I think once we reach a consensus on the tests, this will be good to go.

> diff --git a/gcc/testsuite/gcc.dg/loop-8.c b/gcc/testsuite/gcc.dg/loop-8.c
> index 90ea1c45524..66318fc08dc 100644
> --- a/gcc/testsuite/gcc.dg/loop-8.c
> +++ b/gcc/testsuite/gcc.dg/loop-8.c
> @@ -24,5 +24,9 @@ f (int *a, int *b)
>   
>   /* Load of 42 is moved out of the loop, introducing a new pseudo register.  */
>   /* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
> -/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" "loop2_invariant" } } */
> +
> +
> +/* ?? The expected behavior below depends on threading the 2->3->5 path
> +   in DOM2, but this is invalid since it would rotate the loop.  */
> +/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" "loop2_invariant" { xfail *-*-* } } } */
So maybe the thing to do here since I guess we want to keep the test 
would be to manually rotate the loop in the source.  In theory that 
should restore the test to validating what we want it to validate 
(specifically the behavior of LICM).

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
> index 0246ebf3c63..f83cefd8d89 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
> @@ -22,6 +22,8 @@
>   
>      All the cases are picked up by VRP1 as jump threads.  */
>   
> -/* There used to be 6 jump threads found by thread1, but they all
> -   depended on threading through distinct loops in ethread.  */
> -/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp-thread1" } } */
> +/* This test should be obsoleted.  We used to catch 2 total threads in
> +   vrp-thread1, but after adding loop rotating restrictions, we get
> +   none.  Interestingly, on x86-64 we now get 1 in DOM2, 5 in DOM3,
> +   and 1 in vrp-thread2.  */
> +/* { dg-final { scan-tree-dump-not "Threaded" "vrp-thread1" } } */
I think that testing nothing was threaded in vrp1 is probably best. 
Though I wouldn't lose any sleep if this just went away.

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
> index 8f0a12c12ee..68808bd09fc 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
> @@ -1,10 +1,9 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-vrp-thread1-stats -fdump-tree-dom2-stats" } */
> +/* { dg-options "-O2 -fdump-statistics" } */
>   
>   void bla();
>   
> -/* In the following case, we should be able to thread edge through
> -   the loop header.  */
> +/* No one should thread through the loop header.  */
>   
>   void thread_entry_through_header (void)
>   {
> @@ -14,8 +13,4 @@ void thread_entry_through_header (void)
>       bla ();
>   }
>   
> -/* There's a single jump thread that should be handled by the VRP
> -   jump threading pass.  */
> -/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp-thread1"} } */
> -/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp-thread1"} } */
> -/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
> +/* { dg-final { scan-tree-dump-not "Jumps threaded" "statistics"} } */
Similarly.

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> index b0a7d423475..24de9d57d50 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> @@ -1,8 +1,12 @@
>   /* { dg-do compile } */
>   /* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */
>   
> -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
> -/* { dg-final { scan-tree-dump-times "Registering jump" 1 "thread3" } } */
> +/* ?? We should obsolete this test.  All the threads in thread1 and
> +   thread3 we used to get cross the loop header but does not exit the
> +   loop, so they have been deemed invalid.  */
> +
> +/* { dg-final { scan-tree-dump-times "Registering jump" 0 "thread1" } } */
> +/* { dg-final { scan-tree-dump-times "Registering jump" 0 "thread3" } } */
>   
>   int sum0, sum1, sum2, sum3;
>   int foo (char *s, char **ret)
Similarly.

> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> index e68a9b62535..fc3adab3fc3 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> @@ -65,5 +65,9 @@ int main (void)
>     return 0;
>   }
>   
> -/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */
> +/* ?? The check below depends on jump threading.  There are no a
> +   couple threaded paths that are being rejected because they either
> +   rotate a loop or cross the loop header without exiting the
> +   loop.  */
> +/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" { xfail *-*-* } } } */
So we could manually rotate the loop in the source which in turn would 
allow this test to continue to validate SLP's behavior.