This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] jump threading multiple paths that start from the same BB




On 07/03/2018 08:16 PM, Jeff Law wrote:
On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
On 07/02/2018 07:08 AM, Christophe Lyon wrote:

On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
While poking around in the backwards threader I noticed that we bail if

we have already seen a starting BB.

         /* Do not jump-thread twice from the same block.  */
         if (bitmap_bit_p (threaded_blocks, entry->src->index)

This limitation discards paths that are sub-paths of paths that have
already been threaded.

The following patch scans the remaining to-be-threaded paths to identify

if any of them start from the same point, and are thus sub-paths of the

just-threaded path.  By removing the common prefix of blocks in upcoming

threadable paths, and then rewiring first non-common block
appropriately, we expose new threading opportunities, since we are no
longer starting from the same BB.  We also simplify the would-be
threaded paths, because we don't duplicate already duplicated paths.
[snip]
Hi,

I've noticed a regression on aarch64:
FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
threaded: 3"
very likely caused by this patch (appeared between 262282 and 262294)

Christophe

The test needs to be adjusted here.

The long story is that the aarch64 IL is different at thread3 time in
that it has 2 profitable sub-paths that can now be threaded with my
patch.  This is causing the threaded count to be 5 for aarch64, versus 3
for x86 64.  Previously we couldn't thread these in aarch64, so the
backwards threader would bail.

One can see the different threading opportunities by sticking
debug_all_paths() at the top of thread_through_all_blocks().  You will
notice that aarch64 has far more candidates to begin with.  The IL on
the x86 backend, has no paths that start on the same BB.  The aarch64,
on the other hand, has many to choose from:

path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,

Some of these prove unprofitable, but 2 more than before are profitable now.


BTW, I see another threading related failure on aarch64 which is
unrelated to my patch, and was previously there:

FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
threaded"

This is probably another IL incompatibility between architectures.

Anyways... the attached path fixes the regression.  I have added a note
to the test explaining the IL differences.  We really should rewrite all
the threading tests (I am NOT volunteering ;-)).

OK for trunk?
Aldy

curr.patch


gcc/testsuite/

	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
	has a slightly different IL that provides more threading
	opportunities.
OK.

WRT rewriting the tests.  I'd certainly agree that we don't have the
right set of knobs to allow us to characterize the target nor do we have
the right dumping/scanning facilities to describe and query the CFG changes.

The fact that the IL changes so much across targets is a sign that
target dependency (probably BRANCH_COST) is twiddling the gimple we
generate.  I strongly suspect we'd be a lot better off if we tackled the
BRANCH_COST problem first.

Huh. I've always accepted differing IL between architectures as a necessary evil for things like auto-vectorization and the like.

What's the ideal plan here? A knob to set default values for target dependent variables that can affect IL layout? Then we could pass -fthis-is-an-IL-test and things be normalized?

Thanks.
Aldy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]