[Bug tree-optimization/90270] [8/9/10 Regression] Do not select best induction variable optimization

Mon Apr 29 02:30:00 GMT 2019

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90270

--- Comment #4 from bin cheng <amker at gcc dot gnu.org> ---
On AArch64, iovpts generates following code:
  <bb 3> [local count: 954449108]:
  # crc_20 = PHI <crc_7(D)(2), crc_12(5)>
  # ivtmp.5_18 = PHI <1(2), ivtmp.5_17(5)>
  _19 = &final_counts + 18446744073709551612;
  _1 = MEM[base: _19, index: ivtmp.5_18, step: 4, offset: 0B];
  crc_10 = crcu32 (_1, crc_20);
  _5 = &track_counts + 18446744073709551612;
  _2 = MEM[base: _5, index: ivtmp.5_18, step: 4, offset: 0B];
  crc_12 = crcu32 (_2, crc_10);
  ivtmp.5_17 = ivtmp.5_18 + 1;
  if (ivtmp.5_17 != 9)
    goto <bb 5>; [87.50%]
  else
    goto <bb 4>; [12.50%]
Which looks optimal to me if _19/_5 can be hoisted out of loop.  And it is
intended to be hoisted by rtl liv.  (TREE liv doesn't help much, that's another
story)

Problem is in dom3 pass, cprop_operand, _19/_5 is propagated into memory access
although it causes invalid addressing mode on AArch64:
  [&MEM[(void *)&final_counts + -4B], &MEM[(void *)&final_counts + -4B]] 
EQUIVALENCES: { _19 } (1 elements)
Optimizing statement _1 = MEM[base: _19, index: ivtmp.5_18, step: 4, offset:
0B];
  Replaced '_19' with constant '&MEM[(void *)&final_counts + -4B]'
  Folded to: _1 = MEM[symbol: final_counts, index: ivtmp.5_18, step: 4, offset:
-4B];
LKUP STMT _1 = MEM[symbol: final_counts, index: ivtmp.5_18, step: 4, offset:
-4B] with .MEM_22
2>>> STMT _1 = MEM[symbol: final_counts, index: ivtmp.5_18, step: 4, offset:
-4B] with .MEM_22

it's kept in this form to the end of GIMPLE, then badly legitimized.

So ivopts worked hard to get addressing mode and invariant expression correct
in this case, we need to avoid immature transformations afterwards.

BTW, with dom disabled by -fno-tree-dominator-opts, vrp2 does the same
transformation too.  -fno-tree-vrp is also necessary to get the optimal code.

Well, you can argue [base + iv << 2] is sub-optimal comparing to [base + iv],
but that's hard to tune.  Also bias to the original IV is in general preferred
for reasons like smaller setup code, better debug info, and even for
performance in complicated loops.