[Bug rtl-optimization/50176] [4.7 Regression] 4.7 generates spill-fill dealing with char->int conversion

Tue Jan 10 12:12:00 GMT 2012

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50176

--- Comment #10 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-01-10 12:11:31 UTC ---
We are expanding from

  # BLOCK 5 freq:9100
  # PRED: 5 [91.0%]  (dfs_back,true,exec) 3 [91.0%]  (true,exec)
  # outptr_89 = PHI <outptr_77(5), outptr_26(3)>
  # col_90 = PHI <col_78(5), 0(3)>
  D.1396_32 = MEM[base: inptr0_14, index: col_90, offset: 0B];
  y_33 = (int) D.1396_32;
  D.1398_35 = MEM[base: inptr1_19, index: col_90, offset: 0B];  <----
  D.1400_38 = MEM[base: inptr2_24, index: col_90, offset: 0B];
  cr.0_40 = (unsigned int) D.1400_38;
  D.1402_41 = cr.0_40 * 4;
  D.1403_43 = Crrtab_42(D) + D.1402_41;
  D.1404_44 = *D.1403_43;
  D.1405_45 = D.1404_44 + y_33;
  D.1406_46 = (sizetype) D.1405_45;
  D.1407_48 = range_limit_47(D) + D.1406_46;
  D.1408_49 = *D.1407_48;
  MEM[base: outptr_89, offset: 0B] = D.1408_49;
  cb.1_51 = (unsigned int) D.1398_35;                  <----------
  D.1411_52 = cb.1_51 * 4;
  D.1412_54 = Cbgtab_53(D) + D.1411_52;
  D.1413_55 = *D.1412_54;
  D.1414_59 = Crgtab_58(D) + D.1402_41;
  D.1415_60 = *D.1414_59;
  D.1416_61 = D.1413_55 + D.1415_60;
  D.1417_62 = D.1416_61 >> 16;
  D.1418_63 = D.1417_62 + y_33;
  D.1419_64 = (sizetype) D.1418_63;
  D.1420_65 = range_limit_47(D) + D.1419_64;
  D.1421_66 = *D.1420_65;
  MEM[base: outptr_89, offset: 1B] = D.1421_66;
  D.1423_71 = Cbbtab_70(D) + D.1411_52;
  D.1424_72 = *D.1423_71;
  D.1425_73 = D.1424_72 + y_33;
  D.1426_74 = (sizetype) D.1425_73;
  D.1427_75 = range_limit_47(D) + D.1426_74;
  D.1428_76 = *D.1427_75;
  MEM[base: outptr_89, offset: 2B] = D.1428_76;
  outptr_77 = outptr_89 + 3;
  col_78 = col_90 + 1;
  if (col_78 != num_cols.2_88)
    goto <bb 5>;

where you can see that we could reduce the lifetime of a QImode register
in favor of a SImode register by moving the extension right after the load.
This is what both -fschedule-insns and -fschedule-insns -fsched-pressure
achieve (which have both good non-regressed code generation).

On the tree level there isn't really an issue apart from the fact that
after expansion combine sees

;; D.1398_35 = MEM[base: inptr1_19, index: col_90, offset: 0B];

(insn 47 46 0 (set (reg:QI 83 [ D.1398 ])
        (mem:QI (plus:SI (reg/v/f:SI 75 [ inptr1 ])
                (reg/v:SI 117 [ col ])) [0 MEM[base: inptr1_19, index: col_90,
offset: 0B]+0 S1 A8])) t.c:42 -1
     (nil))

...

;; MEM[base: outptr_89, offset: 0B] = D.1408_49;

....

(insn 54 53 0 (set (mem:QI (reg/v/f:SI 116 [ outptr ]) [0 MEM[base: outptr_89,
offset: 0B]+0 S1 A8])
        (reg:QI 150)) t.c:45 -1
     (nil))

;; D.1411_52 = cb.1_51 * 4;

(insn 55 54 56 (parallel [
            (set (reg:SI 151)
                (zero_extend:SI (reg:QI 83 [ D.1398 ])))
            (clobber (reg:CC 17 flags))
        ]) t.c:47 -1
     (nil))

thus there is a store between the load and the zero_extend (and combine
only combines forward, not backward):

  /* Verify that I2 and I1 are valid for combining.  */
  if (! can_combine_p (i2, i3, i0, i1, NULL_RTX, NULL_RTX, &i2dest, &i2src)

already fails.

This is a missed optimization on the RTL level.