This is the mail archive of the
mailing list for the GCC project.
RE: Predictive commoning leads to register to register moves through memory.
- From: Simon Dardis <Simon dot Dardis at imgtec dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>, Jeff Law <law at redhat dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Thu, 17 Sep 2015 15:58:32 +0000
- Subject: RE: Predictive commoning leads to register to register moves through memory.
- Authentication-results: sourceware.org; auth=none
- References: <B83211783F7A334B926F0C0CA42E32CAF21F34 at hhmail02 dot hh dot imgtec dot org> <55E082E9 dot 4000803 at redhat dot com> <CAFiYyc3A49Fx2pRQUm4tyX3dPhULzT5OqjSDZVJVt=u7G8nmfg at mail dot gmail dot com>
I've since taken another look at this recently and I've tracked the issue down to
tree-predcom.c, specifically ref_at_iteration almost always generating MEM_REFs.
With MEM_REFs, GCC's RTL GCSE cannot compare them as equal and hence
remove them. A previous version of the code did generate ARRAY_REFs
(pre 204458), but that was changed to generate MEM_REFs for pr/58653.
Would something like:
@@ -1409,7 +1409,21 @@ ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts)
DECL_SIZE (field), bitsize_zero_node);
+ /* Generate an ARRAY_REF for array references when all details are INTEGER_CST
+ rather than a MEM_REF so that CSE passes can potientially optimize them. */
+ else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF
+ && TREE_CODE (DR_STEP (dr)) == INTEGER_CST
+ && TREE_CODE (DR_INIT (dr)) == INTEGER_CST
+ && TREE_CODE (DR_OFFSET (dr)) == INTEGER_CST)
+ /* Reverse engineer the element from memory offset. */
+ tree offset = size_binop (MINUS_EXPR, coff, off);
+ tree sizdiv = TYPE_SIZE (TREE_TYPE (TREE_TYPE (DR_BASE_OBJECT (dr))));
+ sizdiv = div_if_zero_remainder (EXACT_DIV_EXPR, sizdiv, ssize_int (BITS_PER_UNIT));
+ tree element = div_if_zero_remainder (EXACT_DIV_EXPR, offset, sizdiv);
+ if (element != NULL_TREE)
+ return build4 (ARRAY_REF, TREE_TYPE (DR_REF (dr)), DR_BASE_OBJECT (dr),
+ element, NULL_TREE, NULL_TREE);
return fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)), addr, alias_ptr);
be an appropriate start to fixing this? That fix appears to work in in my testing.
From: Richard Biener [mailto:email@example.com]
Sent: 31 August 2015 11:40
To: Jeff Law
Cc: Simon Dardis; firstname.lastname@example.org
Subject: Re: Predictive commoning leads to register to register moves through memory.
On Fri, Aug 28, 2015 at 5:48 PM, Jeff Law <email@example.com> wrote:
> On 08/28/2015 09:43 AM, Simon Dardis wrote:
>> Following Jeff's advice to extract more information from GCC, I've
>> narrowed the cause down to the predictive commoning pass inserting
>> the load in a loop header style basic block. However, the next pass
>> in GCC, tree-cunroll promptly removes the loop and joins the loop
>> header to the body of the (non)loop. More oddly, disabling
>> conditional store elimination pass or the dominator optimizations
>> pass or disabling of jump-threading with --param
>> max-jump-thread-duplication-stmts=0 nets the above assembly code. Any
>> ideas on an approach for this issue?
> I'd probably start by looking at the .optimized tree dump in both
> cases to understand the difference, then (most liklely) tracing that
> through the RTL optimizers into the register allocator.
It's the known issue of LIM (here the one after pcom and complete unrolling of the inner loop) being too aggressive with store-motion. Here the comptete array is replaced with registers for the outer loop. Were 'poly' a local variable we'd have optimized it away completely.
_8 = 1.0e+0 / pretmp_42;
_12 = _8 * _8;
poly = _12;
# prephitmp_30 = PHI <_12(6), _36(9)>
# T_lsm.8_22 = PHI <_8(6), pretmp_42(9)>
poly_I_lsm0.10_38 = MEM[(double *)&poly + 8B];
_2 = prephitmp_30 * poly_I_lsm0.10_38;
_54 = _2 * poly_I_lsm0.10_38;
_67 = poly_I_lsm0.10_38 * _54;
_80 = poly_I_lsm0.10_38 * _67;
_93 = poly_I_lsm0.10_38 * _80;
_106 = poly_I_lsm0.10_38 * _93;
_19 = poly_I_lsm0.10_38 * _106;
count_23 = count_28 + 1;
if (count_23 != iterations_6(D))
goto <bb 5>;
goto <bb 8>;
poly = _2;
poly = _54;
poly = _67;
poly = _80;
poly = _93;
poly = _106;
poly = _19;
i1 = 9;
T = T_lsm.8_22;
note that DOM misses to CSE poly (a known defect), but heh, doing that would only increase register pressure even more.
Note the above is on x86_64.