66584 – gcc differs in static, branch-prediction cost from icc in switch.

Bug 66584 - gcc differs in static, branch-prediction cost from icc in switch.

Summary: gcc differs in static, branch-prediction cost from icc in switch.

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c++ (show other bugs)
Version:	unknown

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-06-18 10:16 UTC by Jason McG
Modified:	2015-06-18 17:45 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2015-06-18 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jason McG 2015-06-18 10:16:27 UTC

This means that code optimised for icc is sub-optimal for icc and the reverse is true. I feel that this feature should be more clearly documented in the web pages & man pages.

For the following code:

extern void bar1();
extern void bar2();
extern void bar3();

void foo(int i) {
  switch(i) {
  case 1:
    bar1(); // gcc: least likely | icc: most likely
    break;
  case 2:
    bar2(); // gcc: less likely | icc: less likely
    break;
  default:
    bar3(); // gcc: most likely | icc: least likely
  }
}

gcc v4.8.2 & v5.10 for -O2 & -O3 produce:

foo(int):
	cmpl	$1, %edi
	je	.L3
	cmpl	$2, %edi
	jne	.L8
	jmp	bar2()
.L8:
	jmp	bar3()
.L3:
	jmp	bar1()

From which the static probabilities are quoted, above.

Conversely icc produces:

foo(int):
        cmpl      $1, %edi                                      #9.10
        jne       ..B1.3        # Prob 67%                      #9.10
        jmp       bar1()                                      #11.2
..B1.3:                         # Preds ..B1.1
        cmpl      $2, %edi                                      #9.10
        jne       ..B1.5        # Prob 50%                      #9.10
        jmp       bar2()                                      #14.5
..B1.5:                         # Preds ..B1.3
        jmp       bar3()                                      #17.5

From which the static probabilities are quoted, above.

Please not: I feel this is *only* a bug in the documentation!

It would be nice if my optimised (for speed) code would be optimised (for speed) on both platforms in the same way, so that I wouldn't have to optimise my code, but that is only my taste.

Comment 1 Jason McG 2015-06-18 10:40:56 UTC

(In reply to Jason McG from comment #0)

I got my static bp summaries wrong, corrected:

> void foo(int i) {
>   switch(i) {
>   case 1:
    bar1(); // gcc: less likely (same as default) | icc: most likely
>     break;
>   case 2:
    bar2(); // gcc: most likely | icc: less likely
>     break;
>   default:
    bar3(); // gcc: less likely (same as case 1) | icc: least likely
>   }
> }

Comment 2 Eric Botcazou 2015-06-18 14:24:19 UTC

What would you like us to document exactly?  How are we supposed to track the evolution of ICC over time?  Why not simply ask ICC to change its heuristics?

Comment 3 Andrew Pinski 2015-06-18 14:44:47 UTC

More than that, documenting gcc's branch heuristics is just in the code. And you can see it what gcc figures out via debug dumps. It is also harder to document this due to different things. For an example if your case 2 contained a call to call function, it would change gcc's choices too.

Comment 4 Jason McG 2015-06-18 15:23:39 UTC

(In reply to Eric Botcazou from comment #2)
> What would you like us to document exactly?  How are we supposed to track
...

Perhaps I was unclear. I am asking that you point out to me in the gcc documentation where it comments regarding code generation for switch-statements, and that it might make clear that the lexographic "top-down" approach nor the default label are preferred but other, more complex heuristics are used that defeat static branch-prediction analysis. I do not think this is unreasonable. So I disagree entirely that this is invalid.

Comment 5 Jason McG 2015-06-18 15:33:14 UTC

(In reply to Andrew Pinski from comment #3)
> More than that, documenting gcc's branch heuristics is just in the code. And
...

I am sorry, but I do not agree. As per my other reply, in comment 4, I think it is entirely reasonable to request that an additional statement along the lines of "code generation for switch statements may not follow a top-down approach nor prefer the default case (if it exists) in terms of static branch-prediction. The __builtin_expected() intrinsic has no effect" is included in the web-based documentation in a suitable sub-section. If some algorithm were used, then adding a link to the web site to the appropriate paper might be a quick remedy?

This would ensure that developers are not mislead by the argument that __builtin_expected() takes thinking they could use that, nor would they be mislead by lore assuming that the default or a lexographic top-down approach to static branch-prediction were used.

Comment 6 Andrew Pinski 2015-06-18 15:46:42 UTC

If someone cares so much about the static branch predictor, they would be a compiler developer. This is the first time I have seen a non-compiler developer care about documenting gcc heuristics. Note there is no one paper. Also if branches are 50/50, the decision can change based on adding another statement and is hard to document otherwise.

Comment 7 Jason McG 2015-06-18 16:24:55 UTC

(In reply to Andrew Pinski from comment #6)
> If someone cares so much about the static branch predictor, they would be a
...

I am not a compiler developer and I do care about this in the code I work upon.

I occasionally have switch statements that I statically know have a more commonly used case label (e.g. could be made the default) that static branch-prediction would reduce the cost of calling. It would have saved me some time if the documentation I was requesting were present in a more accessible form that writing test code & reviewing the assembler that gcc generates.

Aside: I have read https://ols.fedoraproject.org/GCC/Reprints-2008/sayle-reprint.pdf and no-where does this article describe, apart from in passing, the effect of mis-predicted branches that using static branch-prediction could avoid by clearly implementing a "preferred branch" e.g. the default upon their cost models. If I had time I'd research this, unfortunately I do not.

Comment 8 Jason McG 2015-06-18 16:27:49 UTC

(In reply to Andrew Pinski from comment #6)
...
> compiler developer. This is the first time I have seen a non-compiler
> developer care about documenting gcc heuristics. Note there is no one paper.
...

See comment 5. The documentation I am proposing is pretty trivial, I repeat: "code generation for switch statements may not follow a top-down approach nor prefer the default case (if it exists) in terms of static branch-prediction. The __builtin_expected() intrinsic has no effect". I find it had to understand how that documentation could be considered contentious.

Comment 9 Eric Botcazou 2015-06-18 17:45:13 UTC

> Perhaps I was unclear. I am asking that you point out to me in the gcc
> documentation where it comments regarding code generation for
> switch-statements, and that it might make clear that the lexographic
> "top-down" approach nor the default label are preferred but other, more
> complex heuristics are used that defeat static branch-prediction analysis. I
> do not think this is unreasonable.

I see, but documenting the code generation strategy in the user documentation is not really appropriate, it is subject to change without notice and the doc could be quickly out of sync.  Moreover, it would be a slippery slope: if we start to document this case and not the dozens of others, users would either ask for the others or draw false conclusions from their absence.

So WONTFIX seems to be the best resolution here.