This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion


Jeff Law wrote:
On 05/22/2015 09:42 AM, Alan Lawrence wrote:
This patch does so (and makes slightly less conservative, to tackle the
example above). I found I had to make this a separate pass, so that the
phi nodes were cleaned up at the end of the pass before running
tree_if_conversion.
What PHI node cleanup needs to be done? I don't doubt something's needed, but would like to understand the cleanup -- depending on what needs to be done, it may be the case that we can cleanup on-the-fly or it may point at a general issue we should be resolving prior to running tree_if_conversion.

If I change pass_ch_vect to return 0 rather than TODO_update_cfg, my testcase gives:

foo ()
{
  int m;
  int i;
  unsigned int ivtmp_3;
  int _5;
  int _6;
  int _7;
  int _10;
  int _12;
  unsigned int ivtmp_13;
  int _15;
  unsigned int ivtmp_19;
  int _20;
  unsigned int ivtmp_23;
  unsigned int ivtmp_27;

  <bb 2>:
  _10 = A[0];

  <bb 8>:
  # m_11 = PHI <4(2)>
  # _12 = PHI <_10(2)>
  # i_1 = PHI <0(2)>
  # ivtmp_19 = PHI <16(2)>
  _20 = m_11 * _12;
  A[i_1] = _20;
  i_22 = i_1 + 1;
  ivtmp_23 = ivtmp_19 - 1;
  if (ivtmp_23 != 0)
    goto <bb 9>;
  else
    goto <bb 7>;

  <bb 9>:

  <bb 3>:
  # i_26 = PHI <i_9(10), i_22(9)>
  # ivtmp_27 = PHI <ivtmp_13(10), ivtmp_23(9)>
  _5 = A[i_26];
  _6 = _5 & i_26;
  if (_6 != 0)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 4>:

  <bb 5>:
  # m_14 = PHI <5(3), 4(4)>

  <bb 6>:
  # m_2 = PHI <m_14(5)>
  # _15 = PHI <_5(5)>
  # i_16 = PHI <i_26(5)>
  # ivtmp_3 = PHI <ivtmp_27(5)>
  _7 = m_2 * _15;
  A[i_16] = _7;
  i_9 = i_16 + 1;
  ivtmp_13 = ivtmp_3 - 1;
  if (ivtmp_13 != 0)
    goto <bb 10>;
  else
    goto <bb 7>;

  <bb 10>:
  goto <bb 3>;

  <bb 7>:
  return;

}

if-conversion then bails out in if_convertible_phi_p, with a phi (whose result is a virtual operand, specifically an SSA name) used as operand to another phi ("Difficult to handle this virtual phi" in tree-if-conv.c): see m_2, i_16. Returning TODO_update_cfg, causes merging of blocks 2 and 8, and blocks 5 and 6, giving instead:

foo ()
{
  int m;
  int i;
  int _5;
  int _6;
  int _7;
  int _10;
  unsigned int ivtmp_13;
  int _20;
  unsigned int ivtmp_23;
  unsigned int ivtmp_27;

  <bb 2>:
  _10 = A[0];
  _20 = _10 * 4;
  A[0] = _20;
  i_22 = 1;
  ivtmp_23 = 15;
  if (ivtmp_23 != 0)
    goto <bb 3>;
  else
    goto <bb 8>;

  <bb 3>:

  <bb 4>:
  # i_26 = PHI <i_9(7), i_22(3)>
  # ivtmp_27 = PHI <ivtmp_13(7), ivtmp_23(3)>
  _5 = A[i_26];
  _6 = _5 & i_26;
  if (_6 != 0)
    goto <bb 6>;
  else
    goto <bb 5>;

  <bb 5>:

  <bb 6>:
  # m_14 = PHI <5(4), 4(5)>
  _7 = _5 * m_14;
  A[i_26] = _7;
  i_9 = i_26 + 1;
  ivtmp_13 = ivtmp_27 + 4294967295;
  if (ivtmp_13 != 0)
    goto <bb 7>;
  else
    goto <bb 8>;

  <bb 7>:
  goto <bb 4>;

  <bb 8>:
  return;

}

and one can see that the troublesome phi's are gone.

Don't we have a flag to turn off loop header copying? If so, does adding that flag to the test "fix" it without resorting to something gross like preserving the old logic for the first pass and new logic for the second pass.

Ah, yes, thanks, -fno-tree-ch works. In fact here the problem was caused by the second pass, which following Richard's suggestion, I've now gated with -ftree-vectorize also - which is turned off by default at -O2, so the test is passing without changes.

However, it turns out I was implicitly using different logic in the two passes: the 'if (!single_exit(loop)) return true' always bailed out in the first pass_ch, as single_exit-ness is not known in the absence of LOOPS_HAVE_RECORDED_EXITS, so the first pass_ch was still using the original logic anyway! I've now had to refactor the two classes to allow different code anyway, which makes the distinction clear.

(I also tried a more complicated version of do_while_loop_p that computed it's own single_exit criteria to have that in the first pass; this gave more test failures, which are fixable, but it's not really the purpose of this patch to do additional header-copying when we are not vectorizing.)

My biggest worry would be cases where data initialized by loop_optimizer_init gets invalidated by the header copying. Have you looked at all at that possibility? I don't have anything specific in mind to point you at -- just a general concern.

Well, this code from tree-ssa-loop-ch.c appears to update the preheader and loop latch (I've verified with printf's that loop->latch, loop->header, loop_preheader_edge(loop) and loop_latch_edge(loop) are all sensible after this):

/* Ensure that the latch and the preheader is simple (we know that they
	 are not now, since there was the loop exit condition.  */
      split_edge (loop_preheader_edge (loop));
      split_edge (loop_latch_edge (loop));

and I think the exit edges don't change (they get cloned outside the loop). The one remaining worry might be irreducible code, but I believe this should have been removed by the stage pass_ch_vect runs: I've also run an entire bootstrap + testsuite with both (a) an assertion in pass_ch_vect::process_loop_p that none of the blocks in the loop are BB_IRREDUCIBLE, and (b) a call to verify_loop_structure at the end of pass_ch_base::copy_headers.

I agree this isn't (/cannot be) totally definitive, so if you are not sufficiently reassured - would you be if I called loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS) at the end of pass_ch_vect, redoing the setup done in pass_tree_loop_init::execute?

Patch v2 at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01355.html .

Thanks for the review!

--Alan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]