Differences between revisions 14 and 15
Revision 14 as of 2017-11-27 00:44:49
Size: 5752
Editor: kugan
Comment:
Revision 15 as of 2017-11-29 01:52:49
Size: 8639
Editor: kugan
Comment:
Deletions are marked like this. Additions are marked like this.
Line 51: Line 51:
   {{{
insert example for type promotion
   }}}
Before type promotion:
{{{

;; Function crc (crc, funcdef_no=0, decl_uid=3129, cgraph_uid=0, symbol_order=0)

crc (short unsigned int crc, unsigned char data)
{
  unsigned char carry;
  unsigned char x16;
  unsigned char i;
  unsigned char _1;
  unsigned char _2;
  short unsigned int _7;
  unsigned char ivtmp_24;
  unsigned char ivtmp_25;

  <bb 2> [11.11%] [count: INV]:

  <bb 3> [88.89%] [count: INV]:
  # crc_26 = PHI <crc_9(D)(2), crc_4(5)>
  # data_27 = PHI <data_10(D)(2), data_13(5)>
  # ivtmp_25 = PHI <8(2), ivtmp_24(5)>
  _1 = (unsigned char) crc_26;
  _2 = _1 ^ data_27;
  x16_12 = _2 & 1;
  data_13 = data_27 >> 1;
  _7 = crc_26 >> 1;
  if (x16_12 != 0)
    goto <bb 4>; [50.00%] [count: INV]
  else
    goto <bb 5>; [50.00%] [count: INV]

  <bb 4> [44.45%] [count: INV]:
  crc_20 = _7 ^ 8193;
  crc_17 = crc_20 | 32768;

  <bb 5> [88.89%] [count: INV]:
  # crc_4 = PHI <crc_17(4), _7(3)>
  ivtmp_24 = ivtmp_25 - 1;
  if (ivtmp_24 != 0)
    goto <bb 3>; [87.50%] [count: INV]
  else
    goto <bb 6>; [12.50%] [count: INV]

  <bb 6> [11.11%] [count: INV]:
  # crc_14 = PHI <crc_4(5)>
  return crc_14;

}

}}}

After type promotion:
{{{

;; Function crc (crc, funcdef_no=0, decl_uid=3129, cgraph_uid=0, symbol_order=0)

crc (short unsigned int crc, unsigned char data)
{
  unsigned char carry;
  unsigned char x16;
  unsigned char i;
  unsigned int _1;
  unsigned int _2;
  unsigned int _3;
  unsigned int _4;
  unsigned int _5;
  unsigned int _6;
  unsigned int _7;
  short unsigned int _8;
  unsigned int _9;
  unsigned int _10;
  unsigned int _12;
  unsigned int _13;
  unsigned int _14;
  unsigned int _15;
  unsigned int _16;
  unsigned int _17;
  unsigned int _20;
  unsigned int _21;
  unsigned int _23;
  unsigned int _24;
  unsigned int _25;
  unsigned int _26;
  unsigned int _27;

  <bb 2> [11.11%] [count: INV]:
  _10 = (unsigned int) data_18(D);
  _9 = (unsigned int) crc_28(D);

  <bb 3> [88.89%] [count: INV]:
  # _26 = PHI <_9(2), _4(5)>
  # _27 = PHI <_10(2), _13(5)>
  # _25 = PHI <8(2), _24(5)>
  _1 = (unsigned int) _26;
  _5 = _27 & 255;
  _21 = _1 & 255;
  _2 = _21 ^ _5;
  _12 = _2 & 1;
  _6 = _27 & 255;
  _13 = _6 >> 1;
  _23 = _26 & 65535;
  _7 = _23 >> 1;
  _3 = _12 & 255;
  if (_3 != 0)
    goto <bb 4>; [50.00%] [count: INV]
  else
    goto <bb 5>; [50.00%] [count: INV]

  <bb 4> [44.45%] [count: INV]:
  _15 = _7 & 65535;
  _20 = _15 ^ 8193;
  _17 = _20 | 4294934528;

  <bb 5> [88.89%] [count: INV]:
  # _4 = PHI <_17(4), _7(3)>
  _24 = _25 - 1;
  _16 = _24 & 255;
  if (_16 != 0)
    goto <bb 3>; [87.50%] [count: INV]
  else
    goto <bb 6>; [12.50%] [count: INV]

  <bb 6> [11.11%] [count: INV]:
  # _14 = PHI <_4(5)>
  _8 = (short unsigned int) _14;
  return _8;

}



}}}
   
Line 64: Line 195:
== Proposed Improvements == == Current Issues ==
Line 70: Line 201:
 * Signed to/from unsigned conversions results in additional CONVERT_EXPRs apart from AND_EXPR/SEXT_EXPR because of the GIMPLE type systems. Some of this CONVERT_EXPRs are not optimised away. This could be improved  * Signed to/from unsigned conversions results in additional CONVERT_EXPRs apart from AND_EXPR/SEXT_EXPR because of the GIMPLE type systems. Some of this CONVERT_EXPRs are not optimised away. This could be improved.
Line 74: Line 205:
 * In some cases, subsequent passes like forwprop are undoing the type promotion.

Redundant Zero/Sign Extensions

GIMPLE optimisations are performed without any consideration for the PROMOTE_MODE. Later when cfgexpand generates RTL, it is lowered to promoted mode, as per PROMOTE_MODE definition, statement by statement. In doing so, we generate ZERO_EXTEND/SIGN_EXTEND as required and it is the responsibility of the RTL infrastructure to optimise any redundant extensions.

Currently we rely on md patterrns with extensions so the RTL infrastructure can combine them and eliminate. This is mainly done in simplify_rtx/combine RTL pass. This mostly works within the basic blocks. In RISC processors, the problem is excessive as there are few instructions that can perform extension implicitly as part of the operations.

REE pass helps here to limited extend in the sense they can combine extensions with definitions across basic blocs. However, since REE is performed after register allocations this does not help with the register pressure as the extension instruction needing additional registers. And also, REE seems to be mainly tuned for patterns that are common in CISC like architecture.

Past Attempts to Fix Redundant Zero/Sign Extensions

Extension Elimination Pass in RTL

Tom de Vries proposed pass_ee which analyse what parts of registers are used and concludes based on that analysis that an extension is redundant before register allocation. It can replace a extension with a regcopy without changing the semantics of the program if the analysis finds that the extended portion of the register is not used. A version if this patch is posted at https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00408.html but this never made it upstream.

One of the issue we face in implementing this in RTL is that the sign information is lost when transitioning to RTL. This makes the analysis difficult.

Using VRP range info while generating RTL

Value range is used to annotate RTL SUBREG with SRP_SIGNED_AND_UNSIGNED when RTL SUBREG is generated. RTL infrastructure will then be able optimized away when this is true.

For Example, when we have a conversion and when we know that value in _5 is already converted, we can annotate that.

_6 = (short int) _5;   Value range of _6: [3, 10]

(insn 13 12 0 (set (reg:SI 73 [ D.2640 ])
        (sign_extend:SI (subreg:HI (reg:SI 81) 0))) t.c:6 -1
        (nil))

Value range of _6 means reg:SI 81 is already sign and zero extended (SRP_SIGNED_AND_UNSIGNED) and this will be set.

However, value ranges calculated by VRP are for the type of SSA_NAME and if the operation can WRAP, higher bits in PROMOTE_MODE can be unpredictable as it stands now. Additional WRAP attribute is needed if this is to be useful.

Example (on alpha where promoted register width is 64 bit and _344 is 32 bit unsigned variable) (more information in https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02458.html):

_343 = ivtmp.179_52 + 2147483645;        [0x80000004, 0x800000043]
_344 = _343 * 2;                         [0x8, 0x86]
_345 = (integer(kind=4)) _344;           [0x8, 0x86]

The patch which is reverted can be found in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00672.html

Type promotion

The motivation for this pass is to lower GIMPLE to promoted mode and let the GIMPLE optimisations optimise it. The expectation is that this intern will minimise generation of SUBREGs and avoid redundant zero/sign extensions.

Working of the pass

Type promotion pass promotes SSA_NAME at the place it is defined in-place. As a result, all the uses are automatically promoted (in promote_ssa). However, to preserve the semantics, some of the uses may need to be extended (with AND_EXPR for zero-extend and SEXT_EXPR for sign extend) with extend instructions. In some cases, it has to be in the original type (not the promoted type) and CONVERT_EXPR are inserted.

Before type promotion:

;; Function crc (crc, funcdef_no=0, decl_uid=3129, cgraph_uid=0, symbol_order=0)

crc (short unsigned int crc, unsigned char data)
{
  unsigned char carry;
  unsigned char x16;
  unsigned char i;
  unsigned char _1;
  unsigned char _2;
  short unsigned int _7;
  unsigned char ivtmp_24;
  unsigned char ivtmp_25;

  <bb 2> [11.11%] [count: INV]:

  <bb 3> [88.89%] [count: INV]:
  # crc_26 = PHI <crc_9(D)(2), crc_4(5)>
  # data_27 = PHI <data_10(D)(2), data_13(5)>
  # ivtmp_25 = PHI <8(2), ivtmp_24(5)>
  _1 = (unsigned char) crc_26;
  _2 = _1 ^ data_27;
  x16_12 = _2 & 1;
  data_13 = data_27 >> 1;
  _7 = crc_26 >> 1;
  if (x16_12 != 0)
    goto <bb 4>; [50.00%] [count: INV]
  else
    goto <bb 5>; [50.00%] [count: INV]

  <bb 4> [44.45%] [count: INV]:
  crc_20 = _7 ^ 8193;
  crc_17 = crc_20 | 32768;

  <bb 5> [88.89%] [count: INV]:
  # crc_4 = PHI <crc_17(4), _7(3)>
  ivtmp_24 = ivtmp_25 - 1;
  if (ivtmp_24 != 0)
    goto <bb 3>; [87.50%] [count: INV]
  else
    goto <bb 6>; [12.50%] [count: INV]

  <bb 6> [11.11%] [count: INV]:
  # crc_14 = PHI <crc_4(5)>
  return crc_14;

}

After type promotion:

;; Function crc (crc, funcdef_no=0, decl_uid=3129, cgraph_uid=0, symbol_order=0)

crc (short unsigned int crc, unsigned char data)
{
  unsigned char carry;
  unsigned char x16;
  unsigned char i;
  unsigned int _1;
  unsigned int _2;
  unsigned int _3;
  unsigned int _4;
  unsigned int _5;
  unsigned int _6;
  unsigned int _7;
  short unsigned int _8;
  unsigned int _9;
  unsigned int _10;
  unsigned int _12;
  unsigned int _13;
  unsigned int _14;
  unsigned int _15;
  unsigned int _16;
  unsigned int _17;
  unsigned int _20;
  unsigned int _21;
  unsigned int _23;
  unsigned int _24;
  unsigned int _25;
  unsigned int _26;
  unsigned int _27;

  <bb 2> [11.11%] [count: INV]:
  _10 = (unsigned int) data_18(D);
  _9 = (unsigned int) crc_28(D);

  <bb 3> [88.89%] [count: INV]:
  # _26 = PHI <_9(2), _4(5)>
  # _27 = PHI <_10(2), _13(5)>
  # _25 = PHI <8(2), _24(5)>
  _1 = (unsigned int) _26;
  _5 = _27 & 255;
  _21 = _1 & 255;
  _2 = _21 ^ _5;
  _12 = _2 & 1;
  _6 = _27 & 255;
  _13 = _6 >> 1;
  _23 = _26 & 65535;
  _7 = _23 >> 1;
  _3 = _12 & 255;
  if (_3 != 0)
    goto <bb 4>; [50.00%] [count: INV]
  else
    goto <bb 5>; [50.00%] [count: INV]

  <bb 4> [44.45%] [count: INV]:
  _15 = _7 & 65535;
  _20 = _15 ^ 8193;
  _17 = _20 | 4294934528;

  <bb 5> [88.89%] [count: INV]:
  # _4 = PHI <_17(4), _7(3)>
  _24 = _25 - 1;
  _16 = _24 & 255;
  if (_16 != 0)
    goto <bb 3>; [87.50%] [count: INV]
  else
    goto <bb 6>; [12.50%] [count: INV]

  <bb 6> [11.11%] [count: INV]:
  # _14 = PHI <_4(5)>
  _8 = (short unsigned int) _14;
  return _8;

}

It should be again highlighted that type promotion as it stands now is a lowering pass and expects subsequent passes to optimise any redundancies. As part of this patch, VRP pass is extended to handle SEXT_EXPR. It is expected that other optimisation passes might need enhancing to support type promotion. This has to be done based on test cases extctated from benchmarking.

Current status

The patch (https://gcc.gnu.org/ml/gcc/2017-09/msg00024.html) is based on r249469. Passes bootstrap+test on x86_64-unknown-linux-gnu, pplc64le-linux, arm-linux-gnueabihf and aarch64-linx-gnu. As expected there are testecase regressions especially for cases where it scans for some specific instructions/patterns. No functional regressions are observed. Benchmarking on SPRC2006 and SPEC2k17 for aarch64-linux-gnu shows that the performance was the same (difference is in the noice region). However, we observed some code-size improvements for many benchmarks.

There are some enhancements possible within the type promotion and in the subsequent optimisation that could improve performance.

Current Issues

  • Parameters/return types are always converted to/from promoted type. However, ABI in many cases guarantees or do not care about the undefined bits. Therefore, type promotion pass could use the ABI definition. This needs some form of support from GIMPLE to keep the type systems happy.
    • insert example
  • Signed to/from unsigned conversions results in additional CONVERT_EXPRs apart from AND_EXPR/SEXT_EXPR because of the GIMPLE type systems. Some of this CONVERT_EXPRs are not optimised away. This could be improved.
    • insert example
  • In some cases, subsequent passes like forwprop are undoing the type promotion.

Testcases that are regressing and needs fixing

Plan forward

  • Item 1
  • Item N

None: TypePromotion (last edited 2017-11-29 01:52:49 by kugan)