[PATCH] PR tree-optimization/103254 - Limit depth for all GORI expressions.

Andrew MacLeod amacleod@redhat.com
Fri Nov 19 15:00:01 GMT 2021


On 11/19/21 04:21, Richard Biener wrote:
> On Thu, Nov 18, 2021 at 8:30 PM Andrew MacLeod via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> At issue here is the dynamic approach we currently use for outgoing edge
>> calculations.  It isn't normally a problem, but once you get a very
>> large number of possible outgoing values (ie very long unrolled blocks)
>> with pairs of values on a statement, and individual queries for each
>> one, it becomes exponential if they relate to each other.
>>
>> So.  We previously introduced a param for PR 100081 which limited the
>> depth of logical expressions GORI would pursue because we always make 2
>> or 4 queries for each logical expression, which accelerated the
>> exponential behaviour much quicker.
>>
>> This patch reuses that to apply to any statement which has 2 ssa-names,
>> as the potential for 2 queries can happen then as well..  leading to
>> exponential behaviour.  Each statement we step back through with
>> multiple ssa-names decreases the odds of calculating a useful outgoing
>> range anyway.  This will remove ridiculous behaviour like this PR
>> demonstrates.
>>
>> Next release I plan to revamp GORI to cache outgoing values, which will
>> eliminate all this sort of behaviour, and we can remove all such
>> restrictions.
>>
>> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK for trunk?
> I think it's reasonable.  SCEV tries to limit the overall number of SSA -> def
> visits, capturing deep vs. wide in a more uniform (and single) way.
> (--param scev-max-expr-size and the duplicate - huh - --param
> scev-max-expr-complexity).
>
> For SCEV running into the limit means fatal fail of the analysis, for ranger
> it only means less refinement?  So it might make a difference which path
> you visit and which not (just thinking about reproducability of range analysis
> when we say, swap operands of a stmt)?
>
Yes, Its means less refinement for "deeper" dependencies in the block..  
It will not be affected by operand swapping because we will either look 
at dependencies for both operands of a stmt, or neither.   The path you 
visit shouldn't make any difference. Note this has nothing to do with 
generating ranges for those statements, this is only for refining 
outgoing ranges created by the condition at the bottom of the block.

To be precise, we will stop looking for possible range changes once our 
dependency search, which always starts from the bottom of the block, 
encounters 6 stmts with multiple SSA operands. Statements with a single 
SSA operand will continue to build the dependency chains.

so for instance, following just the chain for first operands of this 
somewhat truncated listing:
  <...>
   _137 = (short int) j_131;
   _138 = _129 & _137;
   _139 = (int) _138;
   j_140 = j_131 & _139;    << depth 6     ... we don't go look at j_131 
or _139 for further dependencies
   _146 = (short int) j_140;
   _147 = _138 & _146;
   _148 = (int) _147;
   j_149 = j_140 & _148;   << depth 5
   _155 = (short int) j_149;
   _156 = _147 & _155;
   _157 = (int) _156;
   j_158 = j_149 & _157;   << depth 4
   _164 = (short int) j_158;
   _165 = _156 & _164;
   _166 = (int) _165;
   j_167 = j_158 & _166;  <<-- depth 3
   _1 = (short int) j_167;
   _3 = _1 & _165;            <<-depth 2
   _5 = (int) _3;
   j_22 = _5 & j_167;         <<-- depth 1
   j_20 = j_22 + 4;
   if (j_20 == 4)
     goto <bb 3>;

We'll generate dependencies, and thus able to generate outgoing ranges 
for all these ssa-names still until we hit a dependency depth of 6 (the 
current default), which for one chain comes to an end at

j_140 = j_131 & _139

   We will be able to generate outgoing ranges for j_131 and _139 on the 
edges, but we will not try for any ssa_names used to define those.. ie, 
_138 might not be able to have outgoing ranges generated for it in this 
block.

working the j_20 : [4,4] range on the true edge back thru those 
expressions, once we get that deep the odds of finding something of 
value is remote :-)

We will also only generate these outgoing ranges when they are asked 
for.  Many of these names are not used outside the block (especially 
back the deeper you go), and so never get asked for anyway...  but you 
could :-)

Anyway, this limit in no way introduces any kind of ordering variation..

Andrew

PS for amusement and information overload...  it turns out in this hunk 
of code we end up knowing that globally the range of most of these names 
are [0, 4], and the actual ranges we COULD generate if asked:

3->5  (T) _1 :  short int [0, 4]
3->5  (T) _3 :  short int [0, 4]
3->5  (T) _5 :  int [0, 4]
3->5  (T) j_20 :        int [4, 4]
3->5  (T) j_22 :        int [0, 0]
3->5  (T) _120 :        short int [0, 4]
3->5  (T) _121 :        int [0, 4]
3->5  (T) _128 :        short int [0, 4]
3->5  (T) _129 :        short int [0, 4]
3->5  (T) _130 :        int [0, 4]
3->5  (T) j_131 :       int [0, 4]
3->5  (T) _137 :        short int [0, 4]
3->5  (T) _138 :        short int [0, 4]
3->5  (T) _139 :        int [0, 4]
3->5  (T) j_140 :       int [0, 4]
3->5  (T) _146 :        short int [0, 4]
3->5  (T) _147 :        short int [0, 4]
3->5  (T) _148 :        int [0, 4]
3->5  (T) j_149 :       int [0, 4]
3->5  (T) _155 :        short int [0, 4]
3->5  (T) _156 :        short int [0, 4]
3->5  (T) _157 :        int [0, 4]
3->5  (T) j_158 :       int [0, 4]
3->5  (T) _164 :        short int [0, 4]
3->5  (T) _165 :        short int [0, 4]
3->5  (T) _166 :        int [0, 4]
3->5  (T) j_167 :       int [0, 4]
3->4  (F) _1 :  short int [1, 4]
3->4  (F) _3 :  short int [1, 4]
3->4  (F) _5 :  int [1, 4]
3->4  (F) j_20 :        int [5, 8]
3->4  (F) j_22 :        int [1, 4]
3->4  (F) _120 :        short int [1, 4]
3->4  (F) _121 :        int [1, 4]
3->4  (F) _128 :        short int [1, 4]
3->4  (F) _129 :        short int [1, 4]
3->4  (F) _130 :        int [1, 4]
3->4  (F) j_131 :       int [1, 4]
3->4  (F) _137 :        short int [1, 4]
3->4  (F) _138 :        short int [1, 4]
3->4  (F) _139 :        int [1, 4]
3->4  (F) j_140 :       int [1, 4]
3->4  (F) _146 :        short int [1, 4]
3->4  (F) _147 :        short int [1, 4]
3->4  (F) _148 :        int [1, 4]
3->4  (F) j_149 :       int [1, 4]
3->4  (F) _155 :        short int [1, 4]
3->4  (F) _156 :        short int [1, 4]
3->4  (F) _157 :        int [1, 4]
3->4  (F) j_158 :       int [1, 4]
3->4  (F) _164 :        short int [1, 4]
3->4  (F) _165 :        short int [1, 4]
3->4  (F) _166 :        int [1, 4]
3->4  (F) j_167 :       int [1, 4]

So the chains go slightly deeper than my example, but the end result is 
most of the names in the _30 thru _119 range on the false edge would 
come back with [0,4] instead of [1,4] with this change.

The actual dependency chains from the dump:

Exports: _1  _3  _5  j_20  j_22  _120  _128  _129  j_131  _137 _138  
_139  j_140  _146  _147  _148  j_149  _155  _156  _157 j_158  _164  
_165  _166  j_167

The exports are the names that ranger can potentially generate a range 
on outgoing edges that may differ from the range in the block.  It means 
GORI might be able to take the condition at the bottom and adjust those 
ranges.   Any name not in the list will simply use the range it has in 
the block.  Adjusting that param value will change the number of these.

The dependency list for each ssa name is also shown:   and we can see 
how the higher in the block we go, the deeper in the dependency chain 
from the bottom of the block we are, and thus there are less names in 
the list for each name

          _1 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146  
_147  _148  j_149  _155  _156  _157  j_158  _164  _165 _166  j_167
          _3 : _1  _120  _128  _129  j_131  _137  _138  _139 j_140  
_146  _147  _148  j_149  _155  _156  _157  j_158  _164 _165  _166  j_167
          _5 : _1  _3  _120  _128  _129  j_131  _137  _138  _139 j_140  
_146  _147  _148  j_149  _155  _156  _157  j_158  _164 _165  _166  j_167
          j_20 : _1  _3  _5  j_22  _120  _128  _129  j_131  _137 _138  
_139  j_140  _146  _147  _148  j_149  _155  _156  _157 j_158  _164  
_165  _166  j_167
          j_22 : _1  _3  _5  _120  _128  _129  j_131  _137  _138 _139  
j_140  _146  _147  _148  j_149  _155  _156  _157  j_158 _164  _165  
_166  j_167
          _129 : _120  _128
          _137 : j_131
          _138 : _120  _128  _129  j_131  _137
          j_140 : j_131  _139
          _146 : j_131  _139  j_140
          _147 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146
          _148 : _147
          j_149 : j_131  _139  j_140  _147  _148
          _155 : j_131  _139  j_140  _147  _148  j_149
          _156 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146  
_147  _148  j_149  _155
          _157 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146  
_147  _148  j_149  _155  _156
          j_158 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146  
_147  _148  j_149  _155  _156  _157
          _164 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146  
_147  _148  j_149  _155  _156  _157  j_158
          _165 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146  
_147  _148  j_149  _155  _156  _157  j_158  _164
          _166 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146  
_147  _148  j_149  _155  _156  _157  j_158  _164  _165
          j_167 : _120  _128  _129  j_131  _137  _138  _139  j_140 _146  
_147  _148  j_149  _155  _156  _157  j_158  _164  _165  _166



More information about the Gcc-patches mailing list