More aggressive threading causing loop-interchange-9.c regression

Tue Sep 7 11:49:59 GMT 2021

Hi folks.

I have a pending patch for the path solver that pulls in global ranges 
when available (the stuff in SSA_NAME_RANGE_INFO).  In doing so, I have 
run into a regression I was hoping the loop experts could pontificate on.

The regression comes from the simple_reduc_1() function in 
tree-ssa/loop-interchange-9.c, and it involves the following path:

=========== BB 5 ============
Imports: n_10(D)  j_19
Exports: n_10(D)  j_13  j_19
          j_13 : j_19(I)
n_10(D)	int [1, 257]
j_19	int [0, 256]
Relational : (j_13 > j_19)
     <bb 5> [local count: 118111600]:
     # sum_21 = PHI <sum_14(4), sum_11(3)>
     c[j_19] = sum_21;
     j_13 = j_19 + 1;
     if (n_10(D) > j_13)
       goto <bb 9>; [89.00%]
     else
       goto <bb 6>; [11.00%]

j_13 : int [1, 257]
5->9  (T) n_10(D) : 	int [2, 257]
5->9  (T) j_13 : 	int [1, 256]
5->9  (T) j_19 : 	int [0, 255]
5->6  (F) n_10(D) : 	int [1, 257]
5->6  (F) j_13 : 	int [1, 257]
5->6  (F) j_19 : 	int [0, 256]

=========== BB 9 ============
n_10(D)	int [2, 257]
j_13	int [1, 256]
Relational : (n_10(D) > j_19)
Relational : (n_10(D) > j_13)
     <bb 9> [local count: 105119324]:
     goto <bb 3>; [100.00%]

=========== BB 3 ============
Imports: n_10(D)
Exports: n_10(D)
n_10(D)	int [1, +INF]
     <bb 3> [local count: 118111600]:
     # j_19 = PHI <j_13(9), 0(7)>
     sum_11 = c[j_19];
     if (n_10(D) > 0)
       goto <bb 8>; [89.00%]
     else
       goto <bb 5>; [11.00%]

With knowledge from previous passes (SSA_NAME_RANGE_INFO), we know that 
the loop cannot legally iterate outside the size of c[256].  So, j_13 
lies within [1, 257] and n_10 is [2, +INF] at the end of the path.  This 
allows us to thread BB3 to BB8.  Doing so, causes the test to fail here:

/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is 
interchanged" 1 "linterchange" } } */

All the blocks lie within the same loop, and the path passes all the 
tests in path_profitable_p().

Is there something about the shape of this path that should make it 
invalid in the backward threader, or should the test be marked with 
-fdisable-tree-threadN (or something else entirely)?  Note that the 
backward threader is allowed to peel through loop headers.

I am attaching the gimple dump as well as the graphviz output for easier 
analysis.

Thanks.
Aldy
-------------- next part --------------
__attribute__((noinline))
void simple_reduc_1 (int n)
{
  int i;
  int sum;
  int j;
  int _1;
  int _2;
  int _3;

  <bb 2> [local count: 14598063]:
  if (n_10(D) > 0)
    goto <bb 7>; [89.00%]
  else
    goto <bb 6>; [11.00%]

  <bb 7> [local count: 12992276]:

  <bb 3> [local count: 118111600]:
  # j_19 = PHI <j_13(9), 0(7)>
  sum_11 = c[j_19];
  if (n_10(D) > 0)
    goto <bb 8>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 8> [local count: 105119324]:

  <bb 4> [local count: 955630225]:
  # sum_20 = PHI <sum_14(10), sum_11(8)>
  # i_22 = PHI <i_15(10), 0(8)>
  _1 = a[i_22][j_19];
  _2 = b[i_22][j_19];
  _3 = _1 * _2;
  sum_14 = _3 + sum_20;
  i_15 = i_22 + 1;
  if (n_10(D) > i_15)
    goto <bb 10>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 10> [local count: 850510901]:
  goto <bb 4>; [100.00%]

  <bb 5> [local count: 118111600]:
  # sum_21 = PHI <sum_14(4), sum_11(3)>
  c[j_19] = sum_21;
  j_13 = j_19 + 1;
  if (n_10(D) > j_13)
    goto <bb 9>; [89.00%]
  else
    goto <bb 6>; [11.00%]

  <bb 9> [local count: 105119324]:
  goto <bb 3>; [100.00%]

  <bb 6> [local count: 14598063]:
  return;

}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graph.ps
Type: application/postscript
Size: 22338 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc/attachments/20210907/ea6ed978/attachment-0001.ps>