This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: VRP causing extra register usage
- From: Richard Biener <rguenther at suse dot de>
- To: Senthil Kumar Selvaraj <senthil_kumar dot selvaraj at atmel dot com>,Richard Biener <richard dot guenther at gmail dot com>
- Cc: gcc at gcc dot gnu dot org,law at redhat dot com,amacleod at redhat dot com
- Date: Fri, 13 Nov 2015 16:15:23 +0100
- Subject: Re: VRP causing extra register usage
- Authentication-results: sourceware.org; auth=none
- References: <20151112191005 dot GA2731 at jaguar dot atmel dot com> <B093C07E-9DD6-4AC5-90C3-5005E39A6123 at gmail dot com> <20151113133541 dot GA17639 at jaguar dot atmel dot com>
On November 13, 2015 2:35:41 PM GMT+01:00, Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com> wrote:
>
>On Thu, Nov 12, 2015 at 08:37:02PM +0100, Richard Biener wrote:
>> On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj
><senthil_kumar.selvaraj@atmel.com> wrote:
>> >Hi,
>> >
>> > When analyzing code size differences between a 4.9.x compiler and
>> > trunk for the AVR target, I found quite a few cases where extra
>> > registers were being used to hold smaller types (two 8 bit
>registers
>> > for a uint8_t, for example).
>> >
>> >On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was
>the
>> >point
>> > at which the dumps start to diverge.
>> >
>> > For code like this
>> >
>> >#include <stdint.h>
>> >
>> >uint16_t get(const uint16_t wvalue)
>> >{
>> > const uint8_t type = (wvalue >> 8);
>> > uint16_t size = 0;
>> >
>> > if (type == 1)
>> > {
>> > size = 20;
>> > }
>> > else if (type == 2)
>> > {
>> > size = 32;
>> > }
>> > return size;
>> >}
>> >
>> > VRP substitutes wvalue for the type variable in the conditionals
>> > (simplify_cond_using_ranges:9506), as it figures out that the
>range
>> > of wvalue is the same as a uint8_t). That is, code goes from
>> >
>> ><bb 2>:
>> >_3 = wvalue_2(D) >> 8;
>> >type_4 = (const uint8_t) _3;
>> >if (type_4 == 1)
>> > goto <bb 5>;
>> >else
>> > goto <bb 3>;
>> >
>> >to
>> >
>> ><bb 2>:
>> >_3 = wvalue_2(D) >> 8;
>> >type_4 = (const uint8_t) _3;
>> >if (_3 == 1)
>> > goto <bb 5>;
>> >else
>> > goto <bb 3>;
>> >
>> > This "widening" causes later passes to allocate extra registers
>> > (holding zeros for the extra bits) and generate extra comparisons
>> > for the AVR target.
>> >
>> > I found that in the 4.9.2 compiler I was benchmarking against, VRP
>> > didn't figure out the range for wvalue and therefore the folding
>> > didn't happen, which was why the code was better.
>> >
>> > With the native compiler on my machine (gcc 5.2 x86_64) -
>replacing
>> > uint8_t by uint32_t and uint16_t by uint64_t, and shifting right
>by
>> > 32 bits instead of 8 shows the same effect - the generated code
>uses
>> > rdi instead of just di to hold the type variable.
>> >
>> > Is this a bug? Should the folding happen only if the type
>> > conversion was from a smaller type to a bigger one? Or is the
>backend
>> > supposed to detect this pattern and deal with it?
>>
>> We should probably avoid widening beyond word_mode (or sth else if
>there is sth suitable).
>>
>
>Hmm, that does fix this problem. The below patch allows folding only if
>the operand size is smaller or equal to a word.
>
>diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>index e2393e4..3f5668c 100644
>--- a/gcc/tree-vrp.c
>+++ b/gcc/tree-vrp.c
>@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
> innerop = gimple_assign_rhs1 (def_stmt);
>
> if (TREE_CODE (innerop) == SSA_NAME
>+ && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
>+ <= GET_MODE_SIZE(word_mode))
> && !POINTER_TYPE_P (TREE_TYPE (innerop)))
> {
> value_range *vr = get_value_range (innerop);
>
>However, if the type conversion was from a smaller type to a larger
>type, and the
>smaller type was bigger than a word, this patch results in worse code.
>For example.
>
>#include <stdint.h>
>
>uint16_t get(const uint16_t wvalue)
>{
> const uint32_t type = (wvalue >> 8);
> uint16_t size = 0;
>
> if (type == 1)
> {
> size = 20;
> }
> else if (type == 2)
> {
> size = 32;
> }
> return size;
>}
>
>Before the patch, the folding done caused type to be substituted with
>wvalue, and the
>smaller size resulted in better code. After the patch, since wvalue is
>bigger than a
>word (for AVR), the folding doesn't happen and registers are allocated
>to hold all 32
>bits.
>
>How does the below patch look? It does the folding only if it is
>beneficial i.e., the
>value being substituted (innerop) is smaller than or equal the current
>one (op0)?
I think widenings up to word-mode are OK.
>diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>index e2393e4..3f5668c 100644
>--- a/gcc/tree-vrp.c
>+++ b/gcc/tree-vrp.c
>@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
> innerop = gimple_assign_rhs1 (def_stmt);
>
> if (TREE_CODE (innerop) == SSA_NAME
>+ && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
>+ <= GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(op0))))
> && !POINTER_TYPE_P (TREE_TYPE (innerop)))
> {
> value_range *vr = get_value_range (innerop);
>
>
>Regards
>Senthil
>
>> Richard.
>>
>> >Regards
>> >Senthil
>> >
>> >
>> >details-raw vrp1 dump
>> >
>> >;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0,
>> >symbol_order=0)
>> >
>> >;; 1 loops found
>> >;;
>> >;; Loop 0
>> >;; header 0, latch 1
>> >;; depth 0, outer -1
>> >;; nodes: 0 1 2 3 4 5
>> >;; 2 succs { 5 3 }
>> >;; 3 succs { 4 5 }
>> >;; 4 succs { 5 }
>> >;; 5 succs { 1 }
>> >
>> >ASSERT_EXPRs to be inserted
>> >
>> >Assertions to be inserted for type_4
>> > if (type_4 == 1)
>> >
>> > BB #3
>> > EDGE 2->3 2 [55.9%] (FALSE_VALUE,EXECUTABLE)
>> > PREDICATE: type_4 ne_expr 1
>> >
>> >
>> >
>> >
>> >Updating SSA:
>> >Registering new PHI nodes in block #2
>> >Updating SSA information for statement type_4 = (const uint8_t) _3;
>> >Updating SSA information for statement if (type_4 == 1)
>> >Registering new PHI nodes in block #3
>> >Updating SSA information for statement type_6 = ASSERT_EXPR <type_4,
>> >type_4 != 1>;
>> >Updating SSA information for statement if (type_4 == 2)
>> >Registering new PHI nodes in block #4
>> >Registering new PHI nodes in block #5
>> >
>> >SSA replacement table
>> >N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j
>> >
>> >type_6 -> { type_4 }
>> >Incremental SSA update started at block: 2
>> >Number of blocks in CFG: 6
>> >Number of blocks to update: 2 ( 33%)
>> >Affected blocks: 2 3
>> >
>> >
>> >
>> >SSA form after inserting ASSERT_EXPRs
>> >get (const uint16_t wvalue, const uint8_t windex, const void * *
>const
>> >address)
>> >{
>> > uint16_t size;
>> > const uint8_t type;
>> > unsigned int _3;
>> >
>> > <bb 2>:
>> > _3 = wvalue_2(D) >> 8;
>> > type_4 = (const uint8_t) _3;
>> > if (type_4 == 1)
>> > goto <bb 5>;
>> > else
>> > goto <bb 3>;
>> >
>> > <bb 3>:
>> > type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> > if (type_6 == 2)
>> > goto <bb 4>;
>> > else
>> > goto <bb 5>;
>> >
>> > <bb 4>:
>> >
>> > <bb 5>:
>> > # size_1 = PHI <20(2), 0(3), 32(4)>
>> > return size_1;
>> >
>> >}
>> >
>> >
>> >Immediate_uses:
>> >
>> >size_1 : --> single use.
>> >return size_1;
>> >
>> >wvalue_2(D) : --> single use.
>> >_3 = wvalue_2(D) >> 8;
>> >
>> >_3 : --> single use.
>> >type_4 = (const uint8_t) _3;
>> >
>> >type_4 : -->3 uses.
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >if (type_4 == 1)
>> >
>> >.MEM_5(D) : --> single use.
>> ># VUSE <.MEM_5(D)>
>> >return size_1;
>> >
>> >type_6 : --> single use.
>> >if (type_6 == 2)
>> >
>> >Adding destination of edge (0 -> 2) to worklist
>> >
>> >Simulating block 2
>> >
>> >Visiting statement:
>> >_3 = wvalue_2(D) >> 8;
>> >Intersecting
>> > [0, 255]
>> >and
>> > [0, +INF]
>> >to
>> > [0, 255]
>> >Found new range for _3: [0, 255]
>> >interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t)
>_3;
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >type_4 = (const uint8_t) _3;
>> >Found new range for type_4: [0, +INF]
>> >interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR
><type_4,
>> >type_4 != 1>;
>> >interesting_ssa_edges: adding SSA use in if (type_4 == 1)
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >if (type_4 == 1)
>> >
>> >Visiting conditional with predicate: if (type_4 == 1)
>> >
>> >With known ranges
>> > type_4: [0, +INF]
>> >
>> >Predicate evaluates to: DON'T KNOW
>> >Adding destination of edge (2 -> 5) to worklist
>> >Adding destination of edge (2 -> 3) to worklist
>> >
>> >Simulating block 3
>> >
>> >Visiting statement:
>> >type_6 = ASSERT_EXPR <type_4, type_4 != 1>;
>> >Intersecting
>> > ~[1, 1] EQUIVALENCES: { type_4 } (1 elements)
>> >and
>> > [0, +INF]
>> >to
>> > ~[1, 1] EQUIVALENCES: { type_4 } (1 elements)
>> >Found new range for type_6: ~[1, 1]
>> >interesting_ssa_edges: adding SSA use in if (type_6 == 2)
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >if (type_6 == 2)
>> >
>> >Visiting conditional with predicate: if (type_6 == 2)
>> >
>> >With known ranges
>> > type_6: ~[1, 1] EQUIVALENCES: { type_4 } (1 elements)
>> >
>> >Predicate evaluates to: DON'T KNOW
>> >Adding destination of edge (3 -> 4) to worklist
>> >
>> >Simulating block 4
>> >
>> >Simulating block 5
>> >
>> >Visiting PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>> > Argument #0 (2 -> 5 executable)
>> > 20: [20, 20]
>> > Argument #1 (3 -> 5 executable)
>> > 0: [0, 0]
>> >Meeting
>> > [20, 20]
>> >and
>> > [0, 0]
>> >to
>> > [0, 20]
>> > Argument #2 (4 -> 5 executable)
>> > 32: [32, 32]
>> >Meeting
>> > [0, 20]
>> >and
>> > [32, 32]
>> >to
>> > [0, 32]
>> >Intersecting
>> > [0, 32]
>> >and
>> > [0, +INF]
>> >to
>> > [0, 32]
>> >Found new range for size_1: [0, 32]
>> >interesting_ssa_edges: adding SSA use in return size_1;
>> >marking stmt to be not simulated again
>> >
>> >Visiting statement:
>> >return size_1;
>> >
>> >Value ranges after VRP:
>> >
>> >size_1: [0, 32]
>> >wvalue_2(D): VARYING
>> >_3: [0, 255]
>> >type_4: [0, +INF]
>> >type_6: ~[1, 1] EQUIVALENCES: { type_4 } (1 elements)
>> >
>> >
>> >
>> >Substituting values and folding statements
>> >
>> >Folding statement: _3 = wvalue_2(D) >> 8;
>> >Not folded
>> >Folding statement: type_4 = (const uint8_t) _3;
>> >Not folded
>> >Folding statement: if (type_4 == 1)
>> >Folded into: if (_3 == 1)
>> >
>> >Folding statement: if (type_6 == 2)
>> >Not folded
>> >Folding PHI node: size_1 = PHI <20(2), 0(3), 32(4)>
>> >No folding possible
>> >Folding statement: return size_1;
>> >Not folded
>> >get (const uint16_t wvalue, const uint8_t windex, const void * *
>const
>> >address)
>> >{
>> > uint16_t size;
>> > const uint8_t type;
>> > unsigned int _3;
>> >
>> > <bb 2>:
>> > _3 = wvalue_2(D) >> 8;
>> > type_4 = (const uint8_t) _3;
>> > if (_3 == 1)
>> > goto <bb 5>;
>> > else
>> > goto <bb 3>;
>> >
>> > <bb 3>:
>> > if (type_4 == 2)
>> > goto <bb 4>;
>> > else
>> > goto <bb 5>;
>> >
>> > <bb 4>:
>> >
>> > <bb 5>:
>> > # size_1 = PHI <20(2), 0(3), 32(4)>
>> > return size_1;
>> >
>> >}
>>
>>