This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Predictive commoning leads to register to register moves through memory.


On Fri, Aug 28, 2015 at 5:48 PM, Jeff Law <law@redhat.com> wrote:
> On 08/28/2015 09:43 AM, Simon Dardis wrote:
>
>> Following Jeff's advice[1] to extract more information from GCC, I've
>> narrowed the cause down to the predictive commoning pass inserting
>> the load in a loop header style basic block. However, the next pass
>> in GCC, tree-cunroll promptly removes the loop and joins the loop
>> header to the body of the (non)loop. More oddly, disabling
>> conditional store elimination pass or the dominator optimizations
>> pass or disabling of jump-threading with --param
>> max-jump-thread-duplication-stmts=0 nets the above assembly code. Any
>> ideas on an approach for this issue?
>
> I'd probably start by looking at the .optimized tree dump in both cases to
> understand the difference, then (most liklely) tracing that through the RTL
> optimizers into the register allocator.

It's the known issue of LIM (here the one after pcom and complete unrolling of
the inner loop) being too aggressive with store-motion.  Here the comptete
array is replaced with registers for the outer loop.  Were 'poly' a
local variable
we'd have optimized it away completely.

  <bb 6>:
  _8 = 1.0e+0 / pretmp_42;
  _12 = _8 * _8;
  poly[1] = _12;

  <bb 7>:
  # prephitmp_30 = PHI <_12(6), _36(9)>
  # T_lsm.8_22 = PHI <_8(6), pretmp_42(9)>
  poly_I_lsm0.10_38 = MEM[(double *)&poly + 8B];
  _2 = prephitmp_30 * poly_I_lsm0.10_38;
  _54 = _2 * poly_I_lsm0.10_38;
  _67 = poly_I_lsm0.10_38 * _54;
  _80 = poly_I_lsm0.10_38 * _67;
  _93 = poly_I_lsm0.10_38 * _80;
  _106 = poly_I_lsm0.10_38 * _93;
  _19 = poly_I_lsm0.10_38 * _106;
  count_23 = count_28 + 1;
  if (count_23 != iterations_6(D))
    goto <bb 5>;
  else
    goto <bb 8>;

  <bb 8>:
  poly[2] = _2;
  poly[3] = _54;
  poly[4] = _67;
  poly[5] = _80;
  poly[6] = _93;
  poly[7] = _106;
  poly[8] = _19;
  i1 = 9;
  T = T_lsm.8_22;

note that DOM misses to CSE poly[1] (a known defect), but heh, doing that
would only increase register pressure even more.

Note the above is on x86_64.

Richard.


> jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]