This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: VRP causing extra register usage


On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com> wrote:
>Hi,
>
>  When analyzing code size differences between a 4.9.x compiler and
>  trunk for the AVR target, I found quite a few cases where extra
>  registers were being used to hold smaller types (two 8 bit registers
>  for a uint8_t, for example).
>
>On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was the
>point
>  at which the dumps start to diverge.
>
>  For code like this
>
>#include <stdint.h>
>
>uint16_t get(const uint16_t wvalue)
>{
>  const uint8_t type = (wvalue >> 8);
>  uint16_t size = 0;
>
>  if (type == 1)
>  {
>    size = 20;
>  }
>  else if (type == 2)
>  {
>    size = 32;
>  }
>  return size;
>}
>
>  VRP substitutes wvalue for the type variable in the conditionals 
>  (simplify_cond_using_ranges:9506), as it figures out that the range 
>  of wvalue is the same as a uint8_t). That is, code goes from
>
><bb 2>:
>_3 = wvalue_2(D) >> 8;
>type_4 = (const uint8_t) _3;
>if (type_4 == 1)
>  goto <bb 5>;
>else
>  goto <bb 3>;
>
>to
>
><bb 2>:
>_3 = wvalue_2(D) >> 8;
>type_4 = (const uint8_t) _3;
>if (_3 == 1)
>  goto <bb 5>;
>else
>  goto <bb 3>;
>
>  This "widening" causes later passes to allocate extra registers 
>  (holding zeros for the extra bits) and generate extra comparisons
>  for the AVR target.
>
>  I found that in the 4.9.2 compiler I was benchmarking against, VRP
>  didn't figure out the range for wvalue and therefore the folding
>  didn't happen, which was why the code was better.
>
>  With the native compiler on my machine (gcc 5.2 x86_64) - replacing 
>  uint8_t by uint32_t and uint16_t by uint64_t, and shifting right by 
>  32 bits instead of 8 shows the same effect - the generated code uses
>  rdi instead of just di to hold the type variable.
>
>  Is this a bug? Should the folding happen only if the type
>  conversion was from a smaller type to a bigger one? Or is the backend
>  supposed to detect this pattern and deal with it?

We should probably avoid widening beyond word_mode (or sth else if there is sth suitable).

Richard.

>Regards
>Senthil
>
>
>details-raw vrp1 dump
>
>;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0,
>symbol_order=0)
>
>;; 1 loops found
>;;
>;; Loop 0
>;;  header 0, latch 1
>;;  depth 0, outer -1
>;;  nodes: 0 1 2 3 4 5
>;; 2 succs { 5 3 }
>;; 3 succs { 4 5 }
>;; 4 succs { 5 }
>;; 5 succs { 1 }
>
>ASSERT_EXPRs to be inserted
>
>Assertions to be inserted for type_4
>	if (type_4 == 1)
>
>	BB #3
>	EDGE 2->3 2 [55.9%]  (FALSE_VALUE,EXECUTABLE)
>	PREDICATE: type_4 ne_expr 1
>
>
>
>
>Updating SSA:
>Registering new PHI nodes in block #2
>Updating SSA information for statement type_4 = (const uint8_t) _3;
>Updating SSA information for statement if (type_4 == 1)
>Registering new PHI nodes in block #3
>Updating SSA information for statement type_6 = ASSERT_EXPR <type_4,
>type_4 != 1>;
>Updating SSA information for statement if (type_4 == 2)
>Registering new PHI nodes in block #4
>Registering new PHI nodes in block #5
>
>SSA replacement table
>N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>
>type_6 -> { type_4 }
>Incremental SSA update started at block: 2
>Number of blocks in CFG: 6
>Number of blocks to update: 2 ( 33%)
>Affected blocks: 2 3
>
>
>
>SSA form after inserting ASSERT_EXPRs
>get (const uint16_t wvalue, const uint8_t windex, const void * * const
>address)
>{
>  uint16_t size;
>  const uint8_t type;
>  unsigned int _3;
>
>  <bb 2>:
>  _3 = wvalue_2(D) >> 8;
>  type_4 = (const uint8_t) _3;
>  if (type_4 == 1)
>    goto <bb 5>;
>  else
>    goto <bb 3>;
>
>  <bb 3>:
>  type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>  if (type_6 == 2)
>    goto <bb 4>;
>  else
>    goto <bb 5>;
>
>  <bb 4>:
>
>  <bb 5>:
>  # size_1 = PHI <20(2), 0(3), 32(4)>
>  return size_1;
>
>}
>
>
>Immediate_uses: 
>
>size_1 : --> single use.
>return size_1;
>
>wvalue_2(D) : --> single use.
>_3 = wvalue_2(D) >> 8;
>
>_3 : --> single use.
>type_4 = (const uint8_t) _3;
>
>type_4 : -->3 uses.
>type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>if (type_4 == 1)
>
>.MEM_5(D) : --> single use.
># VUSE <.MEM_5(D)>
>return size_1;
>
>type_6 : --> single use.
>if (type_6 == 2)
>
>Adding destination of edge (0 -> 2) to worklist
>
>Simulating block 2
>
>Visiting statement:
>_3 = wvalue_2(D) >> 8;
>Intersecting
>  [0, 255]
>and
>  [0, +INF]
>to
>  [0, 255]
>Found new range for _3: [0, 255]
>interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t) _3;
>marking stmt to be not simulated again
>
>Visiting statement:
>type_4 = (const uint8_t) _3;
>Found new range for type_4: [0, +INF]
>interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR <type_4,
>type_4 != 1>;
>interesting_ssa_edges: adding SSA use in if (type_4 == 1)
>marking stmt to be not simulated again
>
>Visiting statement:
>if (type_4 == 1)
>
>Visiting conditional with predicate: if (type_4 == 1)
>
>With known ranges
>	type_4: [0, +INF]
>
>Predicate evaluates to: DON'T KNOW
>Adding destination of edge (2 -> 5) to worklist
>Adding destination of edge (2 -> 3) to worklist
>
>Simulating block 3
>
>Visiting statement:
>type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>Intersecting
>  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>and
>  [0, +INF]
>to
>  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>Found new range for type_6: ~[1, 1]
>interesting_ssa_edges: adding SSA use in if (type_6 == 2)
>marking stmt to be not simulated again
>
>Visiting statement:
>if (type_6 == 2)
>
>Visiting conditional with predicate: if (type_6 == 2)
>
>With known ranges
>	type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>
>Predicate evaluates to: DON'T KNOW
>Adding destination of edge (3 -> 4) to worklist
>
>Simulating block 4
>
>Simulating block 5
>
>Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>    Argument #0 (2 -> 5 executable)
>	20: [20, 20]
>    Argument #1 (3 -> 5 executable)
>	0: [0, 0]
>Meeting
>  [20, 20]
>and
>  [0, 0]
>to
>  [0, 20]
>    Argument #2 (4 -> 5 executable)
>	32: [32, 32]
>Meeting
>  [0, 20]
>and
>  [32, 32]
>to
>  [0, 32]
>Intersecting
>  [0, 32]
>and
>  [0, +INF]
>to
>  [0, 32]
>Found new range for size_1: [0, 32]
>interesting_ssa_edges: adding SSA use in return size_1;
>marking stmt to be not simulated again
>
>Visiting statement:
>return size_1;
>
>Value ranges after VRP:
>
>size_1: [0, 32]
>wvalue_2(D): VARYING
>_3: [0, 255]
>type_4: [0, +INF]
>type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>
>
>
>Substituting values and folding statements
>
>Folding statement: _3 = wvalue_2(D) >> 8;
>Not folded
>Folding statement: type_4 = (const uint8_t) _3;
>Not folded
>Folding statement: if (type_4 == 1)
>Folded into: if (_3 == 1)
>
>Folding statement: if (type_6 == 2)
>Not folded
>Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>No folding possible
>Folding statement: return size_1;
>Not folded
>get (const uint16_t wvalue, const uint8_t windex, const void * * const
>address)
>{
>  uint16_t size;
>  const uint8_t type;
>  unsigned int _3;
>
>  <bb 2>:
>  _3 = wvalue_2(D) >> 8;
>  type_4 = (const uint8_t) _3;
>  if (_3 == 1)
>    goto <bb 5>;
>  else
>    goto <bb 3>;
>
>  <bb 3>:
>  if (type_4 == 2)
>    goto <bb 4>;
>  else
>    goto <bb 5>;
>
>  <bb 4>:
>
>  <bb 5>:
>  # size_1 = PHI <20(2), 0(3), 32(4)>
>  return size_1;
>
>}



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]