[PATCH 2/3] Fix incorrect loop exit edge probability [PR103270]

Tue Dec 21 03:56:37 GMT 2021

On 2021/12/16 19:18, Jan Hubicka wrote:
>>>
>>>
>>> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
>>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>>> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
>>> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
>>> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
>>> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
>>> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
>>> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
>>> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
>>> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
>>> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
>>> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
>>> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
>>> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
>>> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
>>> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
>>> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
>>> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
>>> loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
>>> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
>>> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
>>> loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
>>> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
>>> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
>>> opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
>>> guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
>>>
>>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>>> no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
>>> first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
>>> DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
>>> combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
>>>
>>> Loop count: 37870
>>>   avg. # of iter: 8444.77
>>>   median # of iter: 7.00
>>>   avg. (1% cutoff) # of iter: 174.68
>>>   avg. (5% cutoff) # of iter: 55.14
>>>   avg. (10% cutoff) # of iter: 35.21
>>>   avg. (20% cutoff) # of iter: 26.23
>>>   avg. (30% cutoff) # of iter: 21.70
>>
>> This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
>> But this issue should be fixed.
>>
>>
>> ./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/
>>
>> benchspec
>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
>> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
>> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
>> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
>> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
>> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
>> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
>> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
>> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
>> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
>> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
>> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
>> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
>> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
>> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
>> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
>> loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
>> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
>> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
>> loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
>> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
>> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
>> opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
>> guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%
>>
>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>> no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
>> first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
>> DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
>> combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%
> 
> Thank you.  So it seems that the problem does not trigger in Spec but I
> was also wondering if our current predict.def values are anywhere near
> to reality.
> 
> THe table reads as follows:  
>  - BRANCHES is number of branches the heuristics hit on (so extra loop
>    exit has 80 and therefore we do not have that good statistics on it)
>  - HITRATE is the probability that the prediction goes given direction
>    during the train run.
>    after / is the value which would be reached by perfect predictor
>    (which predict branch to the direction that dominates during train)
>    Extra loop exit is 81% out of 89% so it is pretty close to optimum
>  - COVERAGE is how many times the predicted branch was executed
> 
> In general the idea is that for most heuristics (wihch can not determine
> exact value like loop iteraitons) HITRATE values can be put to
> predict.def so the Dempster-Shafer formula (DS theory) combines the
> hypothesis sort of realistically (it assumes that all the predictors are
> staistically independent which they are not).
> 
> We have HITRATE 67 for extra loop exit which is bit off what we do have
> in the measured data, but I think our predict.def is still based on
> spec2006 numbers.
> 
> So the patch is OK.  Perhaps we could experiment with updating
> predict.def (It does develop even when run across same benchmark suite
> since early optimizations change - this stage1 I think the threading
> work definitly affects the situation substantially)

Thanks, committed to r12-6085.

> 
> Honza
>>
>> Loop count: 38058
>>   avg. # of iter: 8403.32
>>   median # of iter: 7.00
>>   avg. (1% cutoff) # of iter: 173.72
>>   avg. (5% cutoff) # of iter: 54.90
>>   avg. (10% cutoff) # of iter: 35.20
>>   avg. (20% cutoff) # of iter: 26.35
>>   avg. (30% cutoff) # of iter: 21.87
>>
>>
>> -- 
>> Thanks,
>> Xionghu

-- 
Thanks,
Xionghu