Bug 77484 - [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP
Summary: [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 6.0
: P2 normal
Target Milestone: 6.4
Assignee: Jan Hubicka
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2016-09-05 11:53 UTC by Wilco
Modified: 2017-02-02 13:35 UTC (History)
3 users (show)

See Also:
Host:
Target: aarch64
Build:
Known to work:
Known to fail: 6.0, 7.0
Last reconfirmed: 2016-09-15 00:00:00


Attachments
predict (461 bytes, text/plain)
2016-12-01 15:14 UTC, Jan Hubicka
Details
predict (508 bytes, text/plain)
2016-12-01 16:06 UTC, Jan Hubicka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Wilco 2016-09-05 11:53:23 UTC
Changes in the static branch predictor (around August last year) caused regressions on SPEC2000. The PRED_CALL predictor causes GAP to regress by 6-8% on AArch64, and this hasn't been fixed on trunk. With this predictor turned off, INT is 0.6% faster and FP 0.4%.

The reason is that the predictor causes calls that are guarded by if-statements to be placed at the end of the function. For Gap this is bad as it often executes several such statements in a row, resulting in 2 extra taken branches and additional I-cache misses per if-statement. So it seems that on average this prediction makes things worse.

Overall the static prediction and -freorder-blocks provide a benefit. However does the gain of each static prediction being correct outweigh the cost of the prediction being incorrect? Has this been measured for each of the static predictors across multiple targets?
Comment 1 Richard Biener 2016-09-06 07:57:55 UTC
IIRC the measurements have been run on x86 only, they are done "statically", that is, verify the prediction against real outcomes as computed by the edge profile
which is target independent.
Comment 2 Jan Hubicka 2016-09-06 11:59:54 UTC
> IIRC the measurements have been run on x86 only, they are done "statically",
> that is, verify the prediction against real outcomes as computed by the edge
> profile
> which is target independent.

Yes, the measurements should be target independent modulo the differences we already
see at early gimple level (different headers, different word sizes etc). 

I will take a look how the predictions work for gap and if they can be improved.
Of course even improvements in profile guessing doesn't always guarantee better
final code.  It sometimes happen that codegen just got lucky with numbers it used
to be fed with.

We do have improtant problems with jump threading and profile updating. That needs
to be solved first IMO.  I will try to find time for that during Cauldron.
Comment 3 Ramana Radhakrishnan 2016-09-15 13:39:47 UTC
Confirmed then
Comment 4 Jan Hubicka 2016-12-01 13:51:41 UTC
Wilco,
do you have a specific function where this happens?

Martin,
do you know what is hitrate of call predictor here?  I am not sure how much we can do about this (it is all heuristics after all).
Comment 5 Wilco 2016-12-01 14:52:12 UTC
(In reply to Jan Hubicka from comment #4)
> Wilco,
> do you have a specific function where this happens?
> 
> Martin,
> do you know what is hitrate of call predictor here?  I am not sure how much
> we can do about this (it is all heuristics after all).

Top functions in the profile with this issue are EvElmList, Sum, EvAssList, Diff, Prod. It's the macro EVAL, it does a test and then an indirect call. If is used multiple times in a row all the extra taken branches start to take their toll.
Comment 6 Jan Hubicka 2016-12-01 15:14:45 UTC
Created attachment 40216 [details]
predict

Aha, indirect calls should probably be treated separately as their use cases are quite
special. What about this patch? (Maritn, it would be great if you can run the analyze_brprob
for it)

Honza
Comment 7 Wilco 2016-12-01 15:57:00 UTC
(In reply to Jan Hubicka from comment #6)
> Created attachment 40216 [details]
> predict
> 
> Aha, indirect calls should probably be treated separately as their use cases
> are quite
> special. What about this patch? (Maritn, it would be great if you can run
> the analyze_brprob
> for it)

Yes that's it, a single run shows 12% speedup with this patch!
Comment 8 Jan Hubicka 2016-12-01 15:58:59 UTC
> Yes that's it, a single run shows 12% speedup with this patch!

Looks promising.  We probably should try to differentiate from polymorphic calls
as virtual methods are also used in different patterns.  
let me cook the patch.

Honza
Comment 9 Jan Hubicka 2016-12-01 16:06:03 UTC
Created attachment 40217 [details]
predict

Hi,
here is patch adding the polymorphic case, too.

Honza
Comment 10 Wilco 2016-12-06 11:30:49 UTC
(In reply to Jan Hubicka from comment #9)
> Created attachment 40217 [details]
> predict
> 
> Hi,
> here is patch adding the polymorphic case, too.
> 
> Honza

Looks good - gap still improves by 12%, SPECINT2k by 0.5%, SPECFP2k flat. So that fixes this issue.
Comment 11 Martin Liška 2016-12-06 13:28:03 UTC
I'm planning to run SPEC benchmarks late this week to find a proper value for the new predictor.
Comment 12 Wilco 2016-12-15 12:10:42 UTC
(In reply to wilco from comment #10)
> (In reply to Jan Hubicka from comment #9)
> > Created attachment 40217 [details]
> > predict
> > 
> > Hi,
> > here is patch adding the polymorphic case, too.
> > 
> > Honza
> 
> Looks good - gap still improves by 12%, SPECINT2k by 0.5%, SPECFP2k flat. So
> that fixes this issue.

I also ran SPEC2006 which didn't show any differences.(In reply to Martin Liška from comment #11)
> I'm planning to run SPEC benchmarks late this week to find a proper value
> for the new predictor.

Any news on that? I ran SPEC2006 as well with the suggested values, and this didn't show any differences.
Comment 13 Jakub Jelinek 2016-12-21 10:55:40 UTC
GCC 6.3 is being released, adjusting target milestone.
Comment 14 Jan Hubicka 2017-01-01 15:41:01 UTC
Author: hubicka
Date: Sun Jan  1 15:40:29 2017
New Revision: 243995

URL: https://gcc.gnu.org/viewcvs?rev=243995&root=gcc&view=rev
Log:

	PR middle-end/77484
	* predict.def (PRED_CALL): Update hitrate.
	(PRED_INDIR_CALL, PRED_POLYMORPHIC_CALL): New predictors.
	* predict.c (tree_estimate_probability_bb): Split CALL predictor
	into direct/indirect/polymorphic variants.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/predict.c
    trunk/gcc/predict.def
Comment 15 Dominik Vogt 2017-01-02 16:18:04 UTC
The commit in comment 14 has instroduced size and runtime regressions in the Spec2006 testsuite on s390x:

Runtime (only changes > 2%):

                               run-old.result                run-new.result
f416.gamess                             6.55s    6.70s (   2.29%,  -2.24% )
i400.perlbench                          7.17s    7.37s (   2.79%,  -2.71% )
i445.gobmk                              3.64s    3.55s (  -2.47%,   2.54% )
i458.sjeng                              3.83s    3.75s (  -2.09%,   2.13% )
i473.astar                              7.33s    7.62s (   3.96%,  -3.81% )
i483.xalancbmk                          7.47s    8.06s (   7.90%,  -7.32% )

Executable size ("+/- lines" menas number of instructions);

f470.lbm: 2718 old.s 27 changed (+0 lines)
i429.mcf: 2801 old.s 2091 BIGGER! (+346 lines) (2 funcs bigger)
i462.libquantum: 8200 old.s 2723 smaller (-15 lines)
i473.astar: 9458 old.s 4936 smaller (-294 lines) (11 funcs bigger)
f410.bwaves: 9035 old.s 0 identical (+/- 0 lines)
i401.bzip2: 18190 old.s 8032 smaller (-11 lines) (6 funcs bigger)
f437.leslie3d: 19536 old.s 1939 smaller (-14 lines)
i458.sjeng: 34678 old.s 23820 smaller (-197 lines) (7 funcs bigger)
f433.milc: 29745 old.s 14898 smaller (-276 lines) (5 funcs bigger)
f482.sphinx3: 37726 old.s 23881 BIGGER! (+115 lines) (15 funcs bigger)
i456.hmmer: 64427 old.s 33803 smaller (-698 lines) (28 funcs bigger)
f444.namd: 55512 old.s 1785 smaller (-2 lines) (1 funcs bigger)
f434.zeusmp: 63606 old.s 2764 BIGGER! (+4 lines) (1 funcs bigger)
f459.GemsFDTD: 76971 old.s 32948 BIGGER! (+30 lines) (3 funcs bigger)
f436.cactusADM: 148768 old.s 60861 smaller (-547 lines) (68 funcs bigger)
f435.gromacs: 198339 old.s 86425 smaller (-1483 lines) (62 funcs bigger)
i471.omnetpp: 118737 old.s 37232 BIGGER! (+1879 lines) (59 funcs bigger)
i445.gobmk: 216664 old.s 152439 smaller (-1352 lines) (178 funcs bigger)
f450.soplex: 94178 old.s 55926 smaller (-2624 lines) (39 funcs bigger)
f453.povray: 221353 old.s 144618 smaller (-1680 lines) (118 funcs bigger)
i400.perlbench: 248535 old.s 232015 smaller (-1201 lines) (209 funcs bigger)
f454.calculix: 372030 old.s 222377 smaller (-788 lines) (96 funcs bigger)
i464.h264ref: 302278 old.s 152578 BIGGER! (+51 lines) (45 funcs bigger)
i403.gcc: 715454 old.s 614572 smaller (-2639 lines) (504 funcs bigger)
f465.tonto: 760124 old.s 174792 smaller (-987 lines) (140 funcs bigger)
f447.dealII: 553779 old.s 247834 smaller (-1134 lines) (246 funcs bigger)
f481.wrf: 811803 old.s 238154 smaller (-76 lines) (120 funcs bigger)
i483.xalancbmk: 743937 old.s 441474 BIGGER! (+2434 lines) (733 funcs bigger)
f416.gamess: 1913604 old.s 1175120 BIGGER! (+7805 lines) (327 funcs bigger)
Comment 16 Jan Hubicka 2017-01-02 20:15:16 UTC
>                                run-old.result                run-new.result
> f416.gamess                             6.55s    6.70s (   2.29%,  -2.24% )
> i400.perlbench                          7.17s    7.37s (   2.79%,  -2.71% )
> i445.gobmk                              3.64s    3.55s (  -2.47%,   2.54% )
> i458.sjeng                              3.83s    3.75s (  -2.09%,   2.13% )
> i473.astar                              7.33s    7.62s (   3.96%,  -3.81% )

I can imagine perlbench to have indirect call in the internal loop, but the other
benchmarks may be just a noise for reducing the hitrate down.

> i483.xalancbmk                          7.47s    8.06s (   7.90%,  -7.32% )

This however is probably a bug.  Does it help to change the direction of predictor
for polymorphic calls back to likely not taken?
Index: predict.c
===================================================================
--- predict.c   (revision 244002)
+++ predict.c   (working copy)
@@ -2789,7 +2789,7 @@ tree_estimate_probability_bb (basic_bloc
                  if (gimple_call_fndecl (stmt))
                    predict_edge_def (e, PRED_CALL, NOT_TAKEN);
                  else if (virtual_method_call_p (gimple_call_fn (stmt)))
-                   predict_edge_def (e, PRED_POLYMORPHIC_CALL, TAKEN);
+                   predict_edge_def (e, PRED_POLYMORPHIC_CALL, NOT_TAKEN);
                  else
                    predict_edge_def (e, PRED_INDIR_CALL, TAKEN);
                  break;


Honza
Comment 17 Dominik Vogt 2017-01-03 10:25:02 UTC
Can you make sense of these results?  The size of gamess has not changed, but the runtime has but still looks noticeably worse.  The astar performance looks similar to yesterday's result without the change from comment 16.

--
Diffing i473.astar 9458 old.s 4936 smaller (-294 lines) (11 funcs bigger)
Diffing i458.sjeng 34678 old.s 23820 smaller (-197 lines) (7 funcs bigger)
Diffing i445.gobmk 216664 old.s 152439 smaller (-1352 lines) (178 funcs bigger)
Diffing i400.perlbench 248535 old.s 232015 smaller (-1201 lines) (209 funcs bigger)
Diffing i483.xalancbmk 743937 old.s 374620 BIGGER! (+404 lines) (630 funcs bigger)
Diffing f416.gamess 1913604 old.s 1175120 BIGGER! (+7805 lines) (327 funcs bigger)
--                               run-old.result                run-new.result
f416.gamess                             6.55s    6.70s (   2.29%,  -2.24% )
i400.perlbench                          7.69s    7.20s (  -6.37%,   6.81% )
i445.gobmk                              3.65s    3.55s (  -2.74%,   2.82% )
i458.sjeng                              3.83s    3.75s (  -2.09%,   2.13% )
i473.astar                              7.34s    7.61s (   3.68%,  -3.55% )
i483.xalancbmk                          7.62s    7.55s (  -0.92%,   0.93% )
--
Comment 18 Dominik Vogt 2017-01-03 10:27:37 UTC
(The perlbench result looks like a bad measurement result; we sometimes have this on devel machine for unknown reasons, possibly when someone compiles or tests on a different partition.)
Comment 19 Wilco 2017-01-05 13:43:20 UTC
> The commit in comment 14 has instroduced size and runtime regressions in the
> Spec2006 testsuite on s390x:

I get reproducible regressions on AArch64 as well with the latest patch (changes >0.5%):

400.perlbench	-1.26%
403.gcc		-3.16%
445.gobmk	-2.70%
458.sjeng	1.65%
464.h264ref	-0.78%
453.povray	-2.65%

It seems this is worse than the earlier versions of the patch which all used NOT_TAKEN.
Comment 20 Jan Hubicka 2017-01-06 12:46:10 UTC
Hi,
it turns out that Martin added another column to his statistics script which I have misinterpretted.
https://gcc.opensuse.org/SPEC/CINT/sb-terbium-head-64/recent.html also shows interesting reaction
to the change.  I will update the probabilities to correct values one by one and let us see
how benchmarks react.  Is changing one a day enough for periodic testers to catch up?

The hitrates on spec2k6 combined with spec v6 is as follows.
It means that indirect call should have 14%, call  67% and polymorphic call
should have oposite outcime and 59% The statistic samples are quite small and
dominated by one use, so we may diverge from those values if it seem to make
sense.

Honza


COMBINED
========
HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)
loop guard with recursion                      13   0.0%       92.31%   85.06% /  85.06%     6672440267    6.67G   0.4%
Fortran loop preheader                         42   0.0%       97.62%   97.78% /  99.07%        1448718    1.45M   0.0%
loop iv compare                                71   0.1%       78.87%   49.26% /  63.89%      163168352  163.17M   0.0%
loop exit with recursion                       85   0.1%       75.29%   84.89% /  86.96%     9873741435    9.87G   0.6%
extra loop exit                                98   0.1%       71.43%   31.39% /  76.96%      312263024  312.26M   0.0%
recursive call                                121   0.1%       64.46%   37.55% /  82.78%      531996473  532.00M   0.0%
Fortran repeated allocation/deallocation      392   0.4%      100.00%  100.00% / 100.00%            630   630.00   0.0%
guess loop iv compare                         402   0.4%       90.05%   95.82% /  96.04%     5683151771    5.68G   0.3%
indirect call                                 425   0.4%       52.00%   14.06% /  91.41%     3815332963    3.82G   0.2%
Fortran zero-sized array                      549   0.6%       99.64%  100.00% / 100.00%       20794317   20.79M   0.0%
const return                                  651   0.7%       94.93%   84.04% /  93.18%     1067774653    1.07G   0.1%
null return                                   716   0.7%       92.18%   91.83% /  93.39%     3421321956    3.42G   0.2%
negative return                               734   0.8%       97.14%   64.62% /  65.01%     4744886315    4.74G   0.3%
continue                                      773   0.8%       66.11%   79.71% /  87.43%    29089649102   29.09G   1.6%
polymorphic call                              803   0.8%       43.59%   59.05% /  86.63%     3828555030    3.83G   0.2%
Fortran fail alloc                            944   1.0%      100.00%  100.00% / 100.00%         167691  167.69K   0.0%
Fortran overflow                             1237   1.3%      100.00%  100.00% / 100.00%       55197159   55.20M   0.0%
loop guard                                   1861   1.9%       48.90%   69.86% /  84.62%    17979028127   17.98G   1.0%
noreturn call                                3769   3.9%       99.95%  100.00% / 100.00%     8175053425    8.18G   0.5%
loop exit                                    5017   5.2%       83.50%   90.01% /  91.68%   143815212145  143.82G   8.1%
opcode values positive (on trees)            5763   6.0%       66.23%   60.41% /  86.25%    43349211449   43.35G   2.4%
loop iterations                              6276   6.6%       99.94%   78.54% /  78.54%   662671304506  662.67G  37.3%
early return (on trees)                      9512   9.9%       61.02%   58.05% /  85.82%    56222551226   56.22G   3.2%
pointer (on trees)                          10969  11.5%       63.29%   75.18% /  89.33%    23318221976   23.32G   1.3%
opcode values nonequal (on trees)           11633  12.2%       63.23%   74.53% /  85.23%   118246028039  118.25G   6.7%
guessed loop iterations                     13631  14.2%       96.54%   92.61% /  93.12%   417886815060  417.89G  23.5%
call                                        20747  21.7%       54.47%   67.24% /  92.56%    50096613362   50.10G   2.8%
Comment 21 Jan Hubicka 2017-01-06 16:10:41 UTC
Author: hubicka
Date: Fri Jan  6 16:10:09 2017
New Revision: 244167

URL: https://gcc.gnu.org/viewcvs?rev=244167&root=gcc&view=rev
Log:
	PR middle-end/77484
	* predict.def (PRED_POLYMORPHIC_CALL): Set to 58
	* predict.c (tree_estimate_probability_bb): Reverse direction of
	polymorphic call predictor.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/predict.c
    trunk/gcc/predict.def
Comment 22 Dominik Vogt 2017-01-07 10:49:40 UTC
> Is changing one a day enough for periodic testers to catch up?

I'll try to keep up with testing.

> New Revision: 244167

Which numbers do you need r244167 vs. r244166 or vs. 243994 or both?  (If I'm supposed to run the statistics script I'd need a pointer where to find and how to run it.)
Comment 23 Markus Trippelsdorf 2017-01-07 11:52:32 UTC
Unfortunately vmakarov SPEC tester is currently stalled for most archs.
However it still works for POWER7 and here r244167 shows no effect.

https://vmakarov.fedorapeople.org/spec/spec2000.ibm-p730-05-lp5/gcc/home.html
Comment 24 Jan Hubicka 2017-01-08 09:53:39 UTC
Author: hubicka
Date: Sun Jan  8 09:53:06 2017
New Revision: 244207

URL: https://gcc.gnu.org/viewcvs?rev=244207&root=gcc&view=rev
Log:
	PR middle-end/77484
	* predict.def (PRED_INDIR_CALL): Set to 86.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/predict.def
Comment 25 Jan Hubicka 2017-01-10 09:15:26 UTC
Author: hubicka
Date: Tue Jan 10 09:14:54 2017
New Revision: 244260

URL: https://gcc.gnu.org/viewcvs?rev=244260&root=gcc&view=rev
Log:
	PR middle-end/77484
	* predict.def (PRED_CALL): Set to 67.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/predict.def
Comment 26 Jan Hubicka 2017-01-14 09:21:50 UTC
Hello, did the Gap scores on arm too? Both Itanium and PPC testers seems to show improved gap scores, so hope arm and the other ppc tester too.
Comment 27 Wilco 2017-01-16 17:26:40 UTC
(In reply to Jan Hubicka from comment #26)
> Hello, did the Gap scores on arm too? Both Itanium and PPC testers seems to
> show improved gap scores, so hope arm and the other ppc tester too.

On SPEC2000 the latest changes look good, compared to the old predictor gap improved by 10% and INT/FP by 0.8%/0.6%. I'll run SPEC2006 tonight.
Comment 28 Jan Hubicka 2017-01-16 23:34:26 UTC
> On SPEC2000 the latest changes look good, compared to the old predictor gap
> improved by 10% and INT/FP by 0.8%/0.6%. I'll run SPEC2006 tonight.

It is rather surprising you are seeing such large changes for one branch predictor
change.  Is most of it really comming just from the bb-reorder changes? On x86 the
effect is mostly within noise and on Itanium Gap improve by 2-3%.
It may be interesting to experiment with reorderin and prediction more on this target.

Honza
Comment 29 Wilco 2017-01-17 00:29:46 UTC
(In reply to Jan Hubicka from comment #28)
> > On SPEC2000 the latest changes look good, compared to the old predictor gap
> > improved by 10% and INT/FP by 0.8%/0.6%. I'll run SPEC2006 tonight.
> 
> It is rather surprising you are seeing such large changes for one branch
> predictor
> change.  Is most of it really comming just from the bb-reorder changes? On
> x86 the
> effect is mostly within noise and on Itanium Gap improve by 2-3%.
> It may be interesting to experiment with reorderin and prediction more on
> this target.

When I looked at gap at the time, the main change was the reordering of a few if statements in several hot functions. Incorrect block frequencies also change register allocation in a bad way, but I didn't notice anything obvious in gap. And many optimizations are being disabled on blocks with an incorrect frequency - this happens all over the place and is the issue causing the huge Coremark regression.

I could do some experiments but I believe the key underlying problem is that GCC treats the block frequencies as accurate when they are really very vague estimates (often incorrect) and so should only be used to break ties.

In fact I would claim that even modelling if-statements as a balanced 50/50 is incorrect. It suggests that a block that is guarded by multiple if-statements handling exceptional cases is much less important than the very same block that isn't, even if they are both always executed. Without profile data providing actual frequencies we should not optimize the outer block for speed and the inner block for size.
Comment 30 Jan Hubicka 2017-01-17 00:34:54 UTC
> 
> When I looked at gap at the time, the main change was the reordering of a few
> if statements in several hot functions. Incorrect block frequencies also change
> register allocation in a bad way, but I didn't notice anything obvious in gap.
> And many optimizations are being disabled on blocks with an incorrect frequency
> - this happens all over the place and is the issue causing the huge Coremark
> regression.

This is the issue with jump threading code no longer sanely updating profile,
right?  I will try to find time to look into it this week.
> 
> I could do some experiments but I believe the key underlying problem is that
> GCC treats the block frequencies as accurate when they are really very vague
> estimates (often incorrect) and so should only be used to break ties.
> 
> In fact I would claim that even modelling if-statements as a balanced 50/50 is
> incorrect. It suggests that a block that is guarded by multiple if-statements
> handling exceptional cases is much less important than the very same block that
> isn't, even if they are both always executed. Without profile data providing
> actual frequencies we should not optimize the outer block for speed and the
> inner block for size.

There are --param options to control this. They was originally tuned based on
Spec2000 and x86_64 scores (in GCC 3.x timeframe). if you can get resonable
data that they are not working very well anymore (or for ARM), we could try to
tune them better.

I have WIP patches to get the propagation bit more fine grained and propagate i.e.
info if BB is reachable only bo known to be cold path (such that one that has
EH edge on it). This may make the logic bit more reliable.

Honza
Comment 31 Wilco 2017-01-17 01:16:07 UTC
(In reply to Jan Hubicka from comment #30)
> > 
> > When I looked at gap at the time, the main change was the reordering of a few
> > if statements in several hot functions. Incorrect block frequencies also change
> > register allocation in a bad way, but I didn't notice anything obvious in gap.
> > And many optimizations are being disabled on blocks with an incorrect frequency
> > - this happens all over the place and is the issue causing the huge Coremark
> > regression.
> 
> This is the issue with jump threading code no longer sanely updating profile,
> right?  I will try to find time to look into it this week.

I don't know the exact details but James proved that the blocks are incorrectly assumed cold so part of the optimization doesn't trigger as expected. I'm not sure whether that is because the frequencies got too low, set incorrectly or not set at all. 

> > I could do some experiments but I believe the key underlying problem is that
> > GCC treats the block frequencies as accurate when they are really very vague
> > estimates (often incorrect) and so should only be used to break ties.
> > 
> > In fact I would claim that even modelling if-statements as a balanced 50/50 is
> > incorrect. It suggests that a block that is guarded by multiple if-statements
> > handling exceptional cases is much less important than the very same block that
> > isn't, even if they are both always executed. Without profile data providing
> > actual frequencies we should not optimize the outer block for speed and the
> > inner block for size.
> 
> There are --param options to control this. They was originally tuned based on
> Spec2000 and x86_64 scores (in GCC 3.x timeframe). if you can get resonable
> data that they are not working very well anymore (or for ARM), we could try
> to
> tune them better.
> 
> I have WIP patches to get the propagation bit more fine grained and
> propagate i.e.
> info if BB is reachable only bo known to be cold path (such that one that has
> EH edge on it). This may make the logic bit more reliable.

I'll have a look, but I think the key is to think in terms of block importance (from cold to hot). Apart from highly skewed cases (eg. exception edges or loops), most blocks should be equally important to optimize.
Comment 32 Jan Hubicka 2017-01-19 09:58:40 UTC
Apparently fixed. The coremark is PR77445
Comment 33 Wilco 2017-01-19 14:24:24 UTC
(In reply to Jan Hubicka from comment #32)
> Apparently fixed. The coremark is PR77445

Yes, my SPEC2006 results look good, no real change. Coremark is now up by 20% or more, thanks for that :-)
Comment 34 Dominik Vogt 2017-02-01 16:40:15 UTC
Some Spec2006 results on s390x (zEC12) for the files:

r243995 vs. r243994 (comment 14)
-------------------
                               run-old.result                run-new.result
f410.bwaves                             1.27s    1.28s (   0.79%,  -0.78% )
f416.gamess                             7.09s    6.61s (  -6.77%,   7.26% )
f433.milc                               5.53s    5.54s (   0.18%,  -0.18% )
f434.zeusmp                             2.19s    2.19s (   0.00%,   0.00% )
f435.gromacs                            1.34s    1.34s (   0.00%,   0.00% )
f436.cactusADM                         24.63s   24.67s (   0.16%,  -0.16% )
f437.leslie3d                           2.76s    2.75s (  -0.36%,   0.36% )
f444.namd                              12.13s   12.13s (   0.00%,   0.00% )
f447.dealII                             2.03s    2.02s (  -0.49%,   0.50% )
f450.soplex                             3.92s    3.96s (   1.02%,  -1.01% )
f453.povray                             2.89s    2.87s (  -0.69%,   0.70% )
f454.calculix                          17.32s   17.23s (  -0.52%,   0.52% )
f459.GemsFDTD                           7.24s    7.19s (  -0.69%,   0.70% )
f465.tonto                              0.94s    0.94s (   0.00%,   0.00% )
f470.lbm                                2.65s    2.66s (   0.38%,  -0.38% )
f481.wrf                                3.84s    3.85s (   0.26%,  -0.26% )
f482.sphinx3                           10.50s   10.54s (   0.38%,  -0.38% )
i400.perlbench                          7.58s    7.37s (  -2.77%,   2.85% )
i401.bzip2                              3.98s    3.95s (  -0.75%,   0.76% )
i403.gcc                                1.01s    1.00s (  -0.99%,   1.00% )
i429.mcf                                1.49s    1.49s (   0.00%,   0.00% )
i445.gobmk                              3.56s    3.62s (   1.69%,  -1.66% )
i456.hmmer                              1.59s    1.57s (  -1.26%,   1.27% )
i458.sjeng                              3.81s    3.84s (   0.79%,  -0.78% )
i462.libquantum                        17.13s   17.46s (   1.93%,  -1.89% )
i464.h264ref                            3.14s    3.31s (   5.41%,  -5.14% )
i471.omnetpp                           11.50s   11.51s (   0.09%,  -0.09% )
i473.astar                              7.22s    7.54s (   4.43%,  -4.24% )
i483.xalancbmk                          7.51s    8.15s (   8.52%,  -7.85% )

--

f470.lbm 2984 insns +0 changed
i429.mcf 4165 insns +346 BIGGER!, 3 funcs bigger (max +339 insns)
i462.libquantum 11735 insns -7 smaller, 4 funcs bigger (max +3 insns)
i473.astar 12460 insns -181 smaller, 12 funcs bigger (max +9 insns)
i401.bzip2 22439 insns -13 smaller, 6 funcs bigger (max +25 insns)
f437.leslie3d 28725 insns -21 smaller
i458.sjeng 38864 insns -144 smaller, 10 funcs bigger (max +26 insns)
f433.milc 35091 insns -262 smaller, 7 funcs bigger (max +6 insns)
f482.sphinx3 51879 insns +139 BIGGER!, 16 funcs bigger (max +326 insns)
i456.hmmer 85157 insns -677 smaller, 26 funcs bigger (max +463 insns)
f444.namd 76220 insns +0 changed, 1 funcs bigger (max +11 insns)
f434.zeusmp 73937 insns +6 BIGGER!, 1 funcs bigger (max +7 insns)
f459.GemsFDTD 111465 insns -151 smaller, 3 funcs bigger (max +21 insns)
f436.cactusADM 201648 insns -638 smaller, 69 funcs bigger (max +103 insns)
f435.gromacs 250725 insns -1425 smaller, 64 funcs bigger (max +316 insns)
i471.omnetpp 135992 insns +2095 BIGGER!, 70 funcs bigger (max +2101 insns)
i445.gobmk 249112 insns -726 smaller, 188 funcs bigger (max +1282 insns)
f450.soplex 131531 insns -3584 smaller, 33 funcs bigger (max +546 insns)
f453.povray 247399 insns -1827 smaller, 118 funcs bigger (max +1481 insns)
i400.perlbench 305683 insns -886 smaller, 207 funcs bigger (max +480 insns)
f454.calculix 478026 insns -837 smaller, 97 funcs bigger (max +1036 insns)
i464.h264ref 316483 insns +35 BIGGER!, 44 funcs bigger (max +2229 insns)
i403.gcc 800574 insns -4048 smaller, 514 funcs bigger (max +1614 insns)
f465.tonto 1138432 insns -970 smaller, 139 funcs bigger (max +653 insns)
f447.dealII 764322 insns -873 smaller, 248 funcs bigger (max +2980 insns)
f481.wrf 1081604 insns +32 BIGGER!, 126 funcs bigger (max +404 insns)
i483.xalancbmk 919758 insns +1914 BIGGER!, 731 funcs bigger (max +1835 insns)
f416.gamess 2553939 insns +9369 BIGGER!, 328 funcs bigger (max +9157 insns)

statistics:
-----------
29      tests (total)
8       test executables have grown (more insns)
18      test executables have shrunk (fewer insns)
10140169        insns total (old)
-3334   insns difference
-328    insns per 1,000,000
+435    weighted insns per 1,000,000 *
3065    functions have grown (total) **
+9157   insns in most grown function
 * Each test case is scaled to 1000000 insns.  The displayed number is the
   average of all tests.
Comment 35 Dominik Vogt 2017-02-01 16:40:28 UTC
r244167 vs. r244166 (comment 21)
-------------------
                               run-old.result                run-new.result
f410.bwaves                             1.27s    1.27s (   0.00%,   0.00% )
f416.gamess                             6.87s    6.87s (   0.00%,   0.00% )
f433.milc                               5.57s    5.57s (   0.00%,   0.00% )
f434.zeusmp                             2.18s    2.19s (   0.46%,  -0.46% )
f435.gromacs                            1.34s    1.34s (   0.00%,   0.00% )
f436.cactusADM                         24.71s   24.69s (  -0.08%,   0.08% )
f437.leslie3d                           2.76s    2.76s (   0.00%,   0.00% )
f444.namd                              12.13s   12.13s (   0.00%,   0.00% )
f447.dealII                             2.04s    2.03s (  -0.49%,   0.49% )
f450.soplex                             3.91s    3.98s (   1.79%,  -1.76% )
f453.povray                             2.90s    2.89s (  -0.34%,   0.35% )
f454.calculix                          17.29s   17.29s (   0.00%,   0.00% )
f459.GemsFDTD                           7.27s    7.30s (   0.41%,  -0.41% )
f465.tonto                              0.94s    0.94s (   0.00%,   0.00% )
f470.lbm                                2.65s    2.66s (   0.38%,  -0.38% )
f481.wrf                                3.84s    3.84s (   0.00%,   0.00% )
f482.sphinx3                           10.62s   10.62s (   0.00%,   0.00% )
i400.perlbench                          7.27s    7.34s (   0.96%,  -0.95% )
i401.bzip2                              3.97s    3.97s (   0.00%,   0.00% )
i403.gcc                                1.01s    1.01s (   0.00%,   0.00% )
i429.mcf                                1.49s    1.49s (   0.00%,   0.00% )
i445.gobmk                              3.59s    3.59s (   0.00%,   0.00% )
i456.hmmer                              1.57s    1.59s (   1.27%,  -1.26% )
i458.sjeng                              3.77s    3.76s (  -0.27%,   0.27% )
i462.libquantum                        17.09s   17.14s (   0.29%,  -0.29% )
i464.h264ref                            3.09s    3.09s (   0.00%,   0.00% )
i471.omnetpp                           11.16s   11.25s (   0.81%,  -0.80% )
i473.astar                              7.56s    7.58s (   0.26%,  -0.26% )
i483.xalancbmk                          7.80s    7.37s (  -5.51%,   5.83% )

--

i471.omnetpp 138049 insns -75 smaller, 26 funcs bigger (max +202 insns)
f450.soplex 127589 insns +43 BIGGER!, 13 funcs bigger (max +108 insns)
f453.povray 245456 insns +10 BIGGER!, 5 funcs bigger (max +5 insns)
f447.dealII 764156 insns +150 BIGGER!, 35 funcs bigger (max +1800 insns)
i483.xalancbmk 921932 insns -2045 smaller, 391 funcs bigger (max +1513 insns)

command line:
-------------
  # ak-scripts/compare.sh --suffix -244167-244166 -r 10 -o /home/vogt/src/gcc/install-244166 -n /home/vogt/src/gcc/install-244167 -c -O3 -march=zEC12 -funroll-loops

output files:
-------------
executable diff: /home/vogt/src/minispec-2006/diff-31012017.result-244167-244166
functions grown: /home/vogt/src/minispec-2006/funcs-grown-31012017.result-244167-244166
build times:     /home/vogt/src/minispec-2006/buildtime-31012017.result-244167-244166

statistics:
-----------
29      tests (total)
3       test executables have grown (more insns)
2       test executables have shrunk (fewer insns)
10143197        insns total (old)
-1917   insns difference
-188    insns per 1,000,000
-77     weighted insns per 1,000,000 *
470     functions have grown (total) **
+1800   insns in most grown function
Comment 36 Dominik Vogt 2017-02-01 16:42:35 UTC
r244207 vs. r244206 (comment 24)
-------------------
                               run-old.result                run-new.result
f410.bwaves                             1.27s    1.27s (   0.00%,   0.00% )
f416.gamess                             6.87s    7.21s (   4.95%,  -4.72% )
f433.milc                               5.57s    5.57s (   0.00%,   0.00% )
f434.zeusmp                             2.18s    2.18s (   0.00%,   0.00% )
f435.gromacs                            1.34s    1.36s (   1.49%,  -1.47% )
f436.cactusADM                         24.63s   24.56s (  -0.28%,   0.29% )
f437.leslie3d                           2.76s    2.76s (   0.00%,   0.00% )
f444.namd                              12.13s   12.13s (   0.00%,   0.00% )
f447.dealII                             2.03s    2.02s (  -0.49%,   0.50% )
f450.soplex                             3.98s    3.98s (   0.00%,   0.00% )
f453.povray                             2.89s    2.90s (   0.35%,  -0.34% )
f454.calculix                          17.28s   17.30s (   0.12%,  -0.12% )
f459.GemsFDTD                           7.29s    7.29s (   0.00%,   0.00% )
f465.tonto                              0.94s    0.94s (   0.00%,   0.00% )
f470.lbm                                2.65s    2.64s (  -0.38%,   0.38% )
f481.wrf                                3.84s    3.84s (   0.00%,   0.00% )
f482.sphinx3                           10.61s   10.58s (  -0.28%,   0.28% )
i400.perlbench                          7.32s    7.46s (   1.91%,  -1.88% )
i401.bzip2                              3.97s    3.97s (   0.00%,   0.00% )
i403.gcc                                1.00s    1.01s (   1.00%,  -0.99% )
i429.mcf                                1.49s    1.49s (   0.00%,   0.00% )
i445.gobmk                              3.59s    3.61s (   0.56%,  -0.55% )
i456.hmmer                              1.57s    1.56s (  -0.64%,   0.64% )
i458.sjeng                              3.76s    3.77s (   0.27%,  -0.27% )
i462.libquantum                        17.11s   17.08s (  -0.18%,   0.18% )
i464.h264ref                            3.09s    3.29s (   6.47%,  -6.08% )
i471.omnetpp                           11.20s   11.16s (  -0.36%,   0.36% )
i473.astar                              7.58s    7.56s (  -0.26%,   0.26% )
i483.xalancbmk                          7.43s    7.49s (   0.81%,  -0.80% )

--

i401.bzip2 22375 insns +0 changed
i458.sjeng 38701 insns -8 smaller
f482.sphinx3 52038 insns +7 BIGGER!, 1 funcs bigger (max +7 insns)
i456.hmmer 84421 insns +0 changed
f436.cactusADM 201172 insns -6 smaller, 11 funcs bigger (max +5 insns)
f435.gromacs 249282 insns -3 smaller, 1 funcs bigger (max +2 insns)
i471.omnetpp 137988 insns -86 smaller, 3 funcs bigger (max +2 insns)
i445.gobmk 247886 insns +11 BIGGER!, 6 funcs bigger (max +17 insns)
f450.soplex 127628 insns +3 BIGGER!, 2 funcs bigger (max +2 insns)
f453.povray 245457 insns -2 smaller, 3 funcs bigger (max +3 insns)
i400.perlbench 304597 insns +249 BIGGER!, 17 funcs bigger (max +419 insns)
f454.calculix 477770 insns identical
i464.h264ref 316393 insns -3 smaller, 4 funcs bigger (max +7 insns)
i403.gcc 796623 insns +798 BIGGER!, 31 funcs bigger (max +977 insns)
f465.tonto 1141420 insns +0 changed
f447.dealII 764301 insns +1 BIGGER!, 11 funcs bigger (max +139 insns)
f481.wrf 1084840 insns -11 smaller
i483.xalancbmk 919877 insns +3 BIGGER!, 1 funcs bigger (max +12 insns)
f416.gamess 2562020 insns +2 BIGGER!, 1 funcs bigger (max +2 insns)

statistics:
-----------
29      tests (total)
8       test executables have grown (more insns)
7       test executables have shrunk (fewer insns)
10141266        insns total (old)
+955    insns difference
+94     insns per 1,000,000
+38     weighted insns per 1,000,000 *
92      functions have grown (total) **
+977    insns in most grown function
Comment 37 Dominik Vogt 2017-02-01 16:43:39 UTC
r244260 vs. r244256 (comment 25)
-------------------
                               run-old.result                run-new.result
f410.bwaves                             1.27s    1.27s (   0.00%,   0.00% )
f416.gamess                             6.80s    6.82s (   0.29%,  -0.29% )
f433.milc                               5.56s    5.53s (  -0.54%,   0.54% )
f434.zeusmp                             2.18s    2.18s (   0.00%,   0.00% )
f435.gromacs                            1.36s    1.33s (  -2.21%,   2.26% )
f436.cactusADM                         24.66s   24.75s (   0.36%,  -0.36% )
f437.leslie3d                           2.76s    2.75s (  -0.36%,   0.36% )
f444.namd                              12.13s   12.13s (   0.00%,   0.00% )
f447.dealII                             2.05s    2.01s (  -1.95%,   1.99% )
f450.soplex                             3.97s    3.92s (  -1.26%,   1.28% )
f453.povray                             2.91s    2.86s (  -1.72%,   1.75% )
f454.calculix                          17.28s   17.36s (   0.46%,  -0.46% )
f459.GemsFDTD                           7.28s    7.14s (  -1.92%,   1.96% )
f465.tonto                              0.94s    0.94s (   0.00%,   0.00% )
f470.lbm                                2.66s    2.65s (  -0.38%,   0.38% )
f481.wrf                                3.84s    3.84s (   0.00%,   0.00% )
f482.sphinx3                           10.59s   10.61s (   0.19%,  -0.19% )
i400.perlbench                          7.49s    7.30s (  -2.54%,   2.60% )
i401.bzip2                              3.97s    3.96s (  -0.25%,   0.25% )
i403.gcc                                1.01s    1.01s (   0.00%,   0.00% )
i429.mcf                                1.49s    1.49s (   0.00%,   0.00% )
i445.gobmk                              3.61s    3.53s (  -2.22%,   2.27% )
i456.hmmer                              1.56s    1.57s (   0.64%,  -0.64% )
i458.sjeng                              3.77s    3.79s (   0.53%,  -0.53% )
i462.libquantum                        17.13s   17.08s (  -0.29%,   0.29% )
i464.h264ref                            3.30s    3.17s (  -3.94%,   4.10% )
i471.omnetpp                           11.38s   11.52s (   1.23%,  -1.22% )
i473.astar                              7.58s    7.26s (  -4.22%,   4.41% )
i483.xalancbmk                          7.53s    7.73s (   2.66%,  -2.59% )

--

f470.lbm 2984 insns +0 changed
i429.mcf 4506 insns -346 smaller, 1 funcs bigger (max +2 insns)
i462.libquantum 11728 insns +7 BIGGER!, 5 funcs bigger (max +4 insns)
i473.astar 12309 insns +182 BIGGER!, 8 funcs bigger (max +109 insns)
i401.bzip2 22375 insns +11 BIGGER!, 20 funcs bigger (max +25 insns)
f437.leslie3d 28715 insns +21 BIGGER!, 2 funcs bigger (max +18 insns)
i458.sjeng 38693 insns +145 BIGGER!, 15 funcs bigger (max +69 insns)
f433.milc 34740 insns +265 BIGGER!, 49 funcs bigger (max +72 insns)
f482.sphinx3 52048 insns -148 smaller, 37 funcs bigger (max +195 insns)
i456.hmmer 84420 insns +676 BIGGER!, 61 funcs bigger (max +518 insns)
f444.namd 76218 insns +0 changed, 1 funcs bigger (max +11 insns)
f434.zeusmp 73993 insns -7 smaller, 1 funcs bigger (max +1 insns)
f459.GemsFDTD 111458 insns +85 BIGGER!, 9 funcs bigger (max +89 insns)
f436.cactusADM 201167 insns +608 BIGGER!, 86 funcs bigger (max +264 insns)
f435.gromacs 249275 insns +1416 BIGGER!, 104 funcs bigger (max +978 insns)
i471.omnetpp 137902 insns -2351 smaller, 64 funcs bigger (max +410 insns)
i445.gobmk 247898 insns +57 BIGGER!, 182 funcs bigger (max +782 insns)
f450.soplex 127631 insns +3348 BIGGER!, 56 funcs bigger (max +2104 insns)
f453.povray 245450 insns +1900 BIGGER!, 197 funcs bigger (max +2029 insns)
i400.perlbench 304835 insns +632 BIGGER!, 365 funcs bigger (max +930 insns)
f454.calculix 477770 insns +714 BIGGER!, 182 funcs bigger (max +562 insns)
i464.h264ref 316389 insns -34 smaller, 61 funcs bigger (max +1116 insns)
i403.gcc 797389 insns +2408 BIGGER!, 503 funcs bigger (max +1371 insns)
f465.tonto 1141420 insns -449 smaller, 329 funcs bigger (max +874 insns)
f447.dealII 764299 insns +556 BIGGER!, 291 funcs bigger (max +1826 insns)
f481.wrf 1084747 insns -388 smaller, 196 funcs bigger (max +1552 insns)
i483.xalancbmk 919878 insns -411 smaller, 507 funcs bigger (max +1508 insns)
f416.gamess 2561829 insns -9468 smaller, 714 funcs bigger (max +1562 insns)

statistics:
-----------
29      tests (total)
17      test executables have grown (more insns)
9       test executables have shrunk (fewer insns)
10141892        insns total (old)
-571    insns difference
-56     insns per 1,000,000
-524    weighted insns per 1,000,000 *
4046    functions have grown (total) **
+2104   insns in most grown function
Comment 38 Dominik Vogt 2017-02-01 16:51:02 UTC
Finally, the total between after the last and before the first patch.  Overall, some tests gain some performance and others lose some.  The total number of instructions has grown somewhat (especially tonto, calculix, dealII and wrf), but there's no obvious connection between an increased number of instructions and loss of performance.

Is this what can be expected of the patches?

All compiled with -O3 -funroll-loops -march=zEC12.

r244260 vs. r243994
-------------------
                               run-old.result                run-new.result
f410.bwaves                             1.28s    1.27s (  -0.78%,   0.79% )
f416.gamess                             7.10s    6.82s (  -3.94%,   4.11% )
f433.milc                               5.53s    5.53s (   0.00%,   0.00% )
f434.zeusmp                             2.19s    2.18s (  -0.46%,   0.46% )
f435.gromacs                            1.34s    1.33s (  -0.75%,   0.75% )
f436.cactusADM                         24.72s   24.80s (   0.32%,  -0.32% )
f437.leslie3d                           2.76s    2.75s (  -0.36%,   0.36% )
f444.namd                              12.13s   12.13s (   0.00%,   0.00% )
f447.dealII                             2.03s    2.02s (  -0.49%,   0.50% )
f450.soplex                             3.90s    3.92s (   0.51%,  -0.51% )
f453.povray                             2.88s    2.86s (  -0.69%,   0.70% )
f454.calculix                          17.32s   17.36s (   0.23%,  -0.23% )
f459.GemsFDTD                           7.22s    7.13s (  -1.25%,   1.26% )
f465.tonto                              0.93s    0.93s (   0.00%,   0.00% )
f470.lbm                                2.65s    2.66s (   0.38%,  -0.38% )
f481.wrf                                3.84s    3.84s (   0.00%,   0.00% )
f482.sphinx3                           10.49s   10.56s (   0.67%,  -0.66% )
i400.perlbench                          7.58s    7.25s (  -4.35%,   4.55% )
i401.bzip2                              3.98s    3.96s (  -0.50%,   0.51% )
i403.gcc                                1.00s    1.01s (   1.00%,  -0.99% )
i429.mcf                                1.49s    1.49s (   0.00%,   0.00% )
i445.gobmk                              3.55s    3.53s (  -0.56%,   0.57% )
i456.hmmer                              1.56s    1.55s (  -0.64%,   0.65% )
i458.sjeng                              3.81s    3.79s (  -0.52%,   0.53% )
i462.libquantum                        17.12s   17.11s (  -0.06%,   0.06% )
i464.h264ref                            3.14s    3.17s (   0.96%,  -0.95% )
i471.omnetpp                           11.39s   11.52s (   1.14%,  -1.13% )
i473.astar                              7.22s    7.26s (   0.55%,  -0.55% )
i483.xalancbmk                          7.62s    7.69s (   0.92%,  -0.91% )

--

f470.lbm 2984 insns identical
i429.mcf 4165 insns -4 smaller
i462.libquantum 11735 insns +0 changed
i473.astar 12460 insns +32 BIGGER!, 2 funcs bigger (max +79 insns)
f410.bwaves 9820 insns +7 BIGGER!, 1 funcs bigger (max +7 insns)
i401.bzip2 22439 insns -63 smaller
f437.leslie3d 28725 insns +9 BIGGER!, 5 funcs bigger (max +19 insns)
i458.sjeng 38864 insns -26 smaller, 2 funcs bigger (max +24 insns)
f433.milc 35091 insns -70 smaller, 1 funcs bigger (max +5 insns)
f482.sphinx3 51879 insns +4 BIGGER!, 5 funcs bigger (max +15 insns)
i456.hmmer 85157 insns -33 smaller, 4 funcs bigger (max +91 insns)
f444.namd 76220 insns -3 smaller
f434.zeusmp 73937 insns +43 BIGGER!, 3 funcs bigger (max +27 insns)
f459.GemsFDTD 111465 insns +84 BIGGER!, 5 funcs bigger (max +57 insns)
f436.cactusADM 201648 insns +125 BIGGER!, 37 funcs bigger (max +68 insns)
f435.gromacs 250725 insns -53 smaller, 11 funcs bigger (max +25 insns)
i471.omnetpp 135992 insns -435 smaller, 15 funcs bigger (max +80 insns)
i445.gobmk 249112 insns -1167 smaller, 16 funcs bigger (max +82 insns)
f450.soplex 131531 insns -558 smaller, 22 funcs bigger (max +18 insns)
f453.povray 247399 insns -48 smaller, 3 funcs bigger (max +92 insns)
i400.perlbench 305683 insns -216 smaller, 51 funcs bigger (max +554 insns)
f454.calculix 478026 insns +485 BIGGER!, 22 funcs bigger (max +157 insns)
i464.h264ref 316483 insns -76 smaller, 8 funcs bigger (max +76 insns)
i403.gcc 800574 insns -782 smaller, 100 funcs bigger (max +1674 insns)
f465.tonto 1138432 insns +2511 BIGGER!, 235 funcs bigger (max +455 insns)
f447.dealII 764322 insns +597 BIGGER!, 171 funcs bigger (max +295 insns)
f481.wrf 1081604 insns +2769 BIGGER!, 141 funcs bigger (max +2329 insns)
i483.xalancbmk 919758 insns -483 smaller, 277 funcs bigger (max +1002 insns)
f416.gamess 2553939 insns -1589 smaller, 127 funcs bigger (max +46 insns)

statistics:
-----------
29      tests (total)
11      test executables have grown (more insns)
16      test executables have shrunk (fewer insns)
10140169        insns total (old)
+1060   insns difference
+104    insns per 1,000,000
-360    weighted insns per 1,000,000 *
1264    functions have grown (total) **
+2329   insns in most grown function
Comment 39 Jan Hubicka 2017-02-02 13:35:06 UTC
> Finally, the total between after the last and before the first patch.  Overall,
> some tests gain some performance and others lose some.  The total number of
> instructions has grown somewhat (especially tonto, calculix, dealII and wrf),
> but there's no obvious connection between an increased number of instructions
> and loss of performance.
> 
> Is this what can be expected of the patches?

I would say so - the prediction controls a lot of different heuristics
and call predictor is quite weak (random) so it is expected to have bit random
effects.

I also can't see much of corelation in the tests, so I guess it is just
random noise.  Thanks for the tests!

Honza