Bug 63168 - loop header copying fails - not vectorized: latch block not empty
Summary: loop header copying fails - not vectorized: latch block not empty
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 5.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 28643 (view as bug list)
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2014-09-04 13:48 UTC by Marc Glisse
Modified: 2014-10-16 10:14 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 5.0
Known to fail:
Last reconfirmed: 2014-09-04 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2014-09-04 13:48:10 UTC
Consider the 2 codes below.

int mymax1(int *it, int *end)
{
  int max = *it++;
  for (; it != end; it++)
    if (*it > max)
      max = *it;
  return max;
}
int mymax2(int *it, int *end)
{
  int max = *it;
  while (++it != end)
    if (*it > max)
      max = *it;
  return max;
}

Compiled with g++ -Ofast, the first one is vectorized but not the second one. It gives up immediately because of "latch block not empty". That seems rather fragile :-(

  <bb 3>:
  max_8 = *it_6;
  max_10 = MAX_EXPR <max_2, max_8>;

  <bb 4>:
  # it_1 = PHI <it_6(3), it_4(D)(2)>
  # max_2 = PHI <max_10(3), max_5(2)>
  it_6 = it_1 + 4;
  if (it_6 != end_7(D))
    goto <bb 3>;
  else
    goto <bb 5>;


There are several related bugs (like PR 33447), feel free to mark as a dup if you can identify one for sure.

(code taken from http://stackoverflow.com/q/25622109/1918193 apparently there may be other issues with what gcc produces)
Comment 1 Richard Biener 2014-09-04 14:18:13 UTC
Confirmed.  For some reason loop header copying is not applied to the 2nd variant.
Comment 2 Richard Biener 2014-10-15 14:59:08 UTC
It looks like do_while_loop_p is fooled by the CFG structure and hits

static bool
do_while_loop_p (struct loop *loop)
{
  gimple stmt = last_stmt (loop->latch);

  /* If the latch of the loop is not empty, it is not a do-while loop.  */
  if (stmt
      && gimple_code (stmt) != GIMPLE_LABEL)
    return false;

the loop at CH time looks like
      |
      v
/-> <bb 5>
|      exit test  --->
|      |
|   <bb 3>
|      ...
|      |
\   <bb 4>
 --   (empty but with a PHI)

phiopt introduces this singleton PHI:

   <bb 3>:
   max_8 = *it_6;
-  if (max_8 > max_2)
-    goto <bb 5>;
-  else
-    goto <bb 4>;
+  max_10 = MAX_EXPR <max_2, max_8>;
 
   <bb 4>:
+  # max_9 = PHI <max_10(3)>
 
   <bb 5>:
-  # max_9 = PHI <max_2(4), max_8(3)>
-
-  <bb 6>:
-  # it_1 = PHI <it_6(5), it_4(D)(2)>
-  # max_2 = PHI <max_9(5), max_5(2)>
+  # it_1 = PHI <it_6(4), it_4(D)(2)>
+  # max_2 = PHI <max_9(4), max_5(2)>

it looks like the CH predicate relies on copy-propagated form
(a latch block with an SSA name copy should also be considered empty IMHO).

I think that cfgcleanup should merge

  <bb A>
   ....
    | single-succ/pred
  <bb B>
    # var = PHI <arg>

and rewrite the PHIs to copies.  Its comment even says so:

  /* Merging the blocks may create new opportunities for folding
     conditional branches (due to the elimination of single-valued PHI
     nodes).  */
  if (single_succ_p (bb)
      && can_merge_blocks_p (bb, single_succ (bb)))
    {
      merge_blocks (bb, single_succ (bb));
      return true;
    }

Ah, but

/* Checks whether we can merge block B into block A.  */

static bool
gimple_can_merge_blocks_p (basic_block a, basic_block b)
{
...
  /* Protect the loop latches.  */
  if (current_loops && b->loop_father->latch == b)
    return false;

triggers here (merge_blocks doesn't handle merging with a latch).

Mine.
Comment 3 Richard Biener 2014-10-16 08:34:54 UTC
*** Bug 28643 has been marked as a duplicate of this bug. ***
Comment 4 Richard Biener 2014-10-16 10:14:18 UTC
Fixed.
Comment 5 Richard Biener 2014-10-16 10:14:23 UTC
Author: rguenth
Date: Thu Oct 16 10:13:52 2014
New Revision: 216304

URL: https://gcc.gnu.org/viewcvs?rev=216304&root=gcc&view=rev
Log:
2014-10-16  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/63168
	* tree-cfg.c (gimple_can_merge_blocks_p): Only protect
	latches if after merging they are no longer simple.
	* cfghooks.c (merge_blocks): Handle merging a latch block
	into another block.

	* gcc.dg/tree-ssa/loop-40.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/loop-40.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfghooks.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-cfg.c