Implement loop guard heuristics

Mon Jun 27 18:36:00 GMT 2016

> Jan Hubicka <hubicka@ucw.cz> writes:
> > Hi,
> > this patch implements loop guard heuristics predicting that if one loop is
> > nested in another and controlled by a conditional that conditional is likely
> > false. This helps to get more realistic predictionsin larger loop nests.
> 
> Just curious: what's the basis for the condition being likely false?
> (Sorry if this is a well-known heuristic.)

Well, the reason is that if you have large loop nest there must be some way
to trim it down.  The "science" behind predictors is that they are driven
by real world data. You implement predictor, collect profile feedback and
then use analyze_brprob script to tell you how often given predictor match.

These values are then fed into predict.def and prodice.c uses Dempster-Shaffer
method to combine the hypoteses into likely outcome.  

Here are data collected by Martin (Liska) on SPEC.  Loop guard predicts
1109 branches statically, executed 6531156943 times and the prediction
is correct 61.88% cases.

Not the most reliable heuristics but it did caused quite interesting improvements
this weekend
https://gcc.opensuse.org/SPEC/CINT/sb-vangelis-head-64/recent.html
(gzip&gcc seems to get off-noise improvements at least)

HEURISTICS                               BRANCHES  (REL)  HITRATE                COVERAGE COVERAGE  (REL)
loop guard with recursion                       3   0.0%  17.17% /  93.91%           7717    7.72K   0.0%
no prediction                               10690  18.6%  33.65% /  85.08%   163062046405  163.06G  15.6%
loop iv compare                                40   0.1%  52.06% /  52.15%        8806870    8.81M   0.0%
early return (on trees)                      6599  11.5%  54.39% /  86.51%    33950850516   33.95G   3.2%
loop guard                                   1109   1.9%  61.88% /  88.38%     6531156943    6.53G   0.6%
opcode values positive (on trees)            3368   5.9%  64.55% /  90.39%    17957835504   17.96G   1.7%
continue                                      505   0.9%  66.66% /  82.85%    10086801881   10.09G   1.0%
call                                        11893  20.7%  67.26% /  92.26%    34994677942   34.99G   3.3%
opcode values nonequal (on trees)            7118  12.4%  67.63% /  81.38%    74854159732   74.85G   7.2%
loop iterations                              2761   4.8%  67.99% /  67.99%   408310259008  408.31G  39.1%
const return                                  271   0.5%  69.39% /  87.09%      301566712  301.57M   0.0%
pointer (on trees)                           5990  10.4%  69.59% /  87.18%    16667684592   16.67G   1.6%
combined                                    57560 100.0%  69.74% /  80.61%  1044875078660    1.04T 100.0%
DS theory                                   29243  50.8%  70.14% /  86.40%   160792553051  160.79G  15.4%
loop exit with recursion                       40   0.1%  72.17% /  92.33%      454740027  454.74M   0.0%
recursive call                                 65   0.1%  75.19% /  76.33%      189825093  189.83M   0.0%
first match                                 17627  30.6%  77.81% /  78.31%   721020479204  721.02G  69.0%
extra loop exit                               139   0.2%  82.80% /  88.17%     1696816070    1.70G   0.2%
loop exit                                    2813   4.9%  85.36% /  87.83%    60086533565   60.09G   5.8%
null return                                   393   0.7%  91.47% /  93.08%     3268678197    3.27G   0.3%
guessed loop iterations                      8084  14.0%  91.73% /  92.49%   242056066564  242.06G  23.2%
guess loop iv compare                         207   0.4%  97.75% /  97.79%     4319186982    4.32G   0.4%
negative return                               277   0.5%  97.94% /  99.23%     1062119028    1.06G   0.1%
Fortran loop preheader                       2536   4.4%  99.81% /  99.88%     6138904177    6.14G   0.6%
noreturn call                                2468   4.3% 100.00% / 100.00%     8232182915    8.23G   0.8%
Fortran fail alloc                            393   0.7% 100.00% / 100.00%            393   393.00   0.0%
Fortran repeated allocation/deallocation      218   0.4% 100.00% / 100.00%            218   218.00   0.0%
Fortran zero-sized array                      677   1.2% 100.00% / 100.00%      112723807  112.72M   0.0%
Fortran overflow                             1282   2.2% 100.00% / 100.00%      175074185  175.07M   0.0%

Loop count: 17886
  avg. # of iter: 12141.20
  median # of iter: 10.00
  avg. (1% cutoff) # of iter: 1038.28
  avg. (5% cutoff) # of iter: 464.53
  avg. (10% cutoff) # of iter: 222.38
  avg. (20% cutoff) # of iter: 64.87
  avg. (30% cutoff) # of iter: 44.06