This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: VRP causing extra register usage


On November 13, 2015 2:35:41 PM GMT+01:00, Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com> wrote:
>
>On Thu, Nov 12, 2015 at 08:37:02PM +0100, Richard Biener wrote:
>> On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj
><senthil_kumar.selvaraj@atmel.com> wrote:
>> >Hi,
>> >
>> >  When analyzing code size differences between a 4.9.x compiler and
>> >  trunk for the AVR target, I found quite a few cases where extra
>> >  registers were being used to hold smaller types (two 8 bit
>registers
>> >  for a uint8_t, for example).
>> >
>> >On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was
>the
>> >point
>> >  at which the dumps start to diverge.
>> >
>> >  For code like this
>> >
>> >#include <stdint.h>
>> >
>> >uint16_t get(const uint16_t wvalue)
>> >{
>> >  const uint8_t type = (wvalue >> 8);
>> >  uint16_t size = 0;
>> >
>> >  if (type == 1)
>> >  {
>> >    size = 20;
>> >  }
>> >  else if (type == 2)
>> >  {
>> >    size = 32;
>> >  }
>> >  return size;
>> >}
>> >
>> >  VRP substitutes wvalue for the type variable in the conditionals 
>> >  (simplify_cond_using_ranges:9506), as it figures out that the
>range 
>> >  of wvalue is the same as a uint8_t). That is, code goes from
>> >
>> ><bb 2>:
>> >_3 = wvalue_2(D) >> 8;
>> >type_4 = (const uint8_t) _3;
>> >if (type_4 == 1)
>> >  goto <bb 5>;
>> >else
>> >  goto <bb 3>;
>> >
>> >to
>> >
>> ><bb 2>:
>> >_3 = wvalue_2(D) >> 8;
>> >type_4 = (const uint8_t) _3;
>> >if (_3 == 1)
>> >  goto <bb 5>;
>> >else
>> >  goto <bb 3>;
>> >
>> >  This "widening" causes later passes to allocate extra registers 
>> >  (holding zeros for the extra bits) and generate extra comparisons
>> >  for the AVR target.
>> >
>> >  I found that in the 4.9.2 compiler I was benchmarking against, VRP
>> >  didn't figure out the range for wvalue and therefore the folding
>> >  didn't happen, which was why the code was better.
>> >
>> >  With the native compiler on my machine (gcc 5.2 x86_64) -
>replacing 
>> >  uint8_t by uint32_t and uint16_t by uint64_t, and shifting right
>by 
>> >  32 bits instead of 8 shows the same effect - the generated code
>uses
>> >  rdi instead of just di to hold the type variable.
>> >
>> >  Is this a bug? Should the folding happen only if the type
>> >  conversion was from a smaller type to a bigger one? Or is the
>backend
>> >  supposed to detect this pattern and deal with it?
>> 
>> We should probably avoid widening beyond word_mode (or sth else if
>there is sth suitable).
>> 
>
>Hmm, that does fix this problem. The below patch allows folding only if
>the operand size is smaller or equal to a word.
>
>diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>index e2393e4..3f5668c 100644
>--- a/gcc/tree-vrp.c
>+++ b/gcc/tree-vrp.c
>@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
>       innerop = gimple_assign_rhs1 (def_stmt);
> 
>       if (TREE_CODE (innerop) == SSA_NAME
>+    && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
>+      <= GET_MODE_SIZE(word_mode))
>          && !POINTER_TYPE_P (TREE_TYPE (innerop)))
>        {
>          value_range *vr = get_value_range (innerop);
>
>However, if the type conversion was from a smaller type to a larger
>type, and the
>smaller type was bigger than a word, this patch results in worse code.
>For example.
>
>#include <stdint.h>
>
>uint16_t get(const uint16_t wvalue)
>{
>  const uint32_t type = (wvalue >> 8);
>  uint16_t size = 0;
>
>  if (type == 1)
>  {
>    size = 20;
>  }
>  else if (type == 2)
>  {
>    size = 32;
>  }
>  return size;
>}
>
>Before the patch, the folding done caused type to be substituted with
>wvalue, and the
>smaller size resulted in better code. After the patch, since wvalue is
>bigger than a 
>word (for AVR), the folding doesn't happen and registers are allocated
>to hold all 32
>bits.
>
>How does the below patch look? It does the folding only if it is
>beneficial i.e., the
>value being substituted (innerop) is smaller than or equal the current
>one (op0)?

I think widenings up to word-mode are OK.

>diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>index e2393e4..3f5668c 100644
>--- a/gcc/tree-vrp.c
>+++ b/gcc/tree-vrp.c
>@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
>       innerop = gimple_assign_rhs1 (def_stmt);
> 
>       if (TREE_CODE (innerop) == SSA_NAME
>+    && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
>+      <= GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(op0))))
>          && !POINTER_TYPE_P (TREE_TYPE (innerop)))
>        {
>          value_range *vr = get_value_range (innerop);
>
>
>Regards
>Senthil
>
>> Richard.
>> 
>> >Regards
>> >Senthil
>> >
>> >
>> >details-raw vrp1 dump
>> >
>> >;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0,
>> >symbol_order=0)
>> >
>> >;; 1 loops found
>> >;;
>> >;; Loop 0
>> >;;  header 0, latch 1
>> >;;  depth 0, outer -1
>> >;;  nodes: 0 1 2 3 4 5
>> >;; 2 succs { 5 3 }
>> >;; 3 succs { 4 5 }
>> >;; 4 succs { 5 }
>> >;; 5 succs { 1 }
>> >
>> >ASSERT_EXPRs to be inserted
>> >
>> >Assertions to be inserted for type_4
>> >	if (type_4 == 1)
>> >
>> >	BB #3
>> >	EDGE 2->3 2 [55.9%]  (FALSE_VALUE,EXECUTABLE)
>> >	PREDICATE: type_4 ne_expr 1
>> >
>> >
>> >
>> >
>> >Updating SSA:
>> >Registering new PHI nodes in block #2
>> >Updating SSA information for statement type_4 = (const uint8_t) _3;
>> >Updating SSA information for statement if (type_4 == 1)
>> >Registering new PHI nodes in block #3
>> >Updating SSA information for statement type_6 = ASSERT_EXPR <type_4,
>> >type_4 != 1>;
>> >Updating SSA information for statement if (type_4 == 2)
>> >Registering new PHI nodes in block #4
>> >Registering new PHI nodes in block #5
>> >
>> >SSA replacement table
>> >N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>> >
>> >type_6 -> { type_4 }
>> >Incremental SSA update started at block: 2
>> >Number of blocks in CFG: 6
>> >Number of blocks to update: 2 ( 33%)
>> >Affected blocks: 2 3
>> >
>> >
>> >
>> >SSA form after inserting ASSERT_EXPRs
>> >get (const uint16_t wvalue, const uint8_t windex, const void * *
>const
>> >address)
>> >{
>> >  uint16_t size;
>> >  const uint8_t type;
>> >  unsigned int _3;
>> >
>> >  <bb 2>:
>> >  _3 = wvalue_2(D) >> 8;
>> >  type_4 = (const uint8_t) _3;
>> >  if (type_4 == 1)
>> >    goto <bb 5>;
>> >  else
>> >    goto <bb 3>;
>> >
>> >  <bb 3>:
>> >  type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >  if (type_6 == 2)
>> >    goto <bb 4>;
>> >  else
>> >    goto <bb 5>;
>> >
>> >  <bb 4>:
>> >
>> >  <bb 5>:
>> >  # size_1 = PHI <20(2), 0(3), 32(4)>
>> >  return size_1;
>> >
>> >}
>> >
>> >
>> >Immediate_uses: 
>> >
>> >size_1 : --> single use.
>> >return size_1;
>> >
>> >wvalue_2(D) : --> single use.
>> >_3 = wvalue_2(D) >> 8;
>> >
>> >_3 : --> single use.
>> >type_4 = (const uint8_t) _3;
>> >
>> >type_4 : -->3 uses.
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >if (type_4 == 1)
>> >
>> >.MEM_5(D) : --> single use.
>> ># VUSE <.MEM_5(D)>
>> >return size_1;
>> >
>> >type_6 : --> single use.
>> >if (type_6 == 2)
>> >
>> >Adding destination of edge (0 -> 2) to worklist
>> >
>> >Simulating block 2
>> >
>> >Visiting statement:
>> >_3 = wvalue_2(D) >> 8;
>> >Intersecting
>> >  [0, 255]
>> >and
>> >  [0, +INF]
>> >to
>> >  [0, 255]
>> >Found new range for _3: [0, 255]
>> >interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t)
>_3;
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >type_4 = (const uint8_t) _3;
>> >Found new range for type_4: [0, +INF]
>> >interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR
><type_4,
>> >type_4 != 1>;
>> >interesting_ssa_edges: adding SSA use in if (type_4 == 1)
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >if (type_4 == 1)
>> >
>> >Visiting conditional with predicate: if (type_4 == 1)
>> >
>> >With known ranges
>> >	type_4: [0, +INF]
>> >
>> >Predicate evaluates to: DON'T KNOW
>> >Adding destination of edge (2 -> 5) to worklist
>> >Adding destination of edge (2 -> 3) to worklist
>> >
>> >Simulating block 3
>> >
>> >Visiting statement:
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >Intersecting
>> >  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>> >and
>> >  [0, +INF]
>> >to
>> >  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>> >Found new range for type_6: ~[1, 1]
>> >interesting_ssa_edges: adding SSA use in if (type_6 == 2)
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >if (type_6 == 2)
>> >
>> >Visiting conditional with predicate: if (type_6 == 2)
>> >
>> >With known ranges
>> >	type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>> >
>> >Predicate evaluates to: DON'T KNOW
>> >Adding destination of edge (3 -> 4) to worklist
>> >
>> >Simulating block 4
>> >
>> >Simulating block 5
>> >
>> >Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>> >    Argument #0 (2 -> 5 executable)
>> >	20: [20, 20]
>> >    Argument #1 (3 -> 5 executable)
>> >	0: [0, 0]
>> >Meeting
>> >  [20, 20]
>> >and
>> >  [0, 0]
>> >to
>> >  [0, 20]
>> >    Argument #2 (4 -> 5 executable)
>> >	32: [32, 32]
>> >Meeting
>> >  [0, 20]
>> >and
>> >  [32, 32]
>> >to
>> >  [0, 32]
>> >Intersecting
>> >  [0, 32]
>> >and
>> >  [0, +INF]
>> >to
>> >  [0, 32]
>> >Found new range for size_1: [0, 32]
>> >interesting_ssa_edges: adding SSA use in return size_1;
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >return size_1;
>> >
>> >Value ranges after VRP:
>> >
>> >size_1: [0, 32]
>> >wvalue_2(D): VARYING
>> >_3: [0, 255]
>> >type_4: [0, +INF]
>> >type_6: ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
>> >
>> >
>> >
>> >Substituting values and folding statements
>> >
>> >Folding statement: _3 = wvalue_2(D) >> 8;
>> >Not folded
>> >Folding statement: type_4 = (const uint8_t) _3;
>> >Not folded
>> >Folding statement: if (type_4 == 1)
>> >Folded into: if (_3 == 1)
>> >
>> >Folding statement: if (type_6 == 2)
>> >Not folded
>> >Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>> >No folding possible
>> >Folding statement: return size_1;
>> >Not folded
>> >get (const uint16_t wvalue, const uint8_t windex, const void * *
>const
>> >address)
>> >{
>> >  uint16_t size;
>> >  const uint8_t type;
>> >  unsigned int _3;
>> >
>> >  <bb 2>:
>> >  _3 = wvalue_2(D) >> 8;
>> >  type_4 = (const uint8_t) _3;
>> >  if (_3 == 1)
>> >    goto <bb 5>;
>> >  else
>> >    goto <bb 3>;
>> >
>> >  <bb 3>:
>> >  if (type_4 == 2)
>> >    goto <bb 4>;
>> >  else
>> >    goto <bb 5>;
>> >
>> >  <bb 4>:
>> >
>> >  <bb 5>:
>> >  # size_1 = PHI <20(2), 0(3), 32(4)>
>> >  return size_1;
>> >
>> >}
>> 
>> 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]