Bug 59521 - __builtin_expect not effective in switch
Summary: __builtin_expect not effective in switch
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.9.0
: P3 enhancement
Target Milestone: 9.0
Assignee: Martin Liška
URL:
Keywords: deferred
Depends on:
Blocks:
 
Reported: 2013-12-16 02:27 UTC by Ulrich Drepper
Modified: 2018-09-03 07:54 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-12-16 00:00:00


Attachments
Draft patch (not working) (2.01 KB, patch)
2017-07-07 05:40 UTC, Yuri Gribov
Details | Diff
Semi-working patch (3.12 KB, patch)
2017-07-11 09:25 UTC, Martin Liška
Details | Diff
New draft patch (4.06 KB, patch)
2017-07-12 16:37 UTC, Yuri Gribov
Details | Diff
Yet another patch (3.72 KB, patch)
2017-07-17 07:43 UTC, Yuri Gribov
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ulrich Drepper 2013-12-16 02:27:28 UTC
When used in switch, __builtin_expect should reorder the comparisons appropriately.  Take this code:

#include <stdio.h>
void
f(int ch) {
  switch (__builtin_expect(ch, 333)) {
    case 3: puts("a"); break;                                                 
    case 42: puts("e"); break;                                                 
    case 333: puts("i"); break;                                                 
    } 
}

Current mainline (and also prior versions, I tested 4.8.2) produce with -O3 code like this:

0000000000000000 <f>:
   0:	83 ff 2a             	cmp    $0x2a,%edi
   3:	74 33                	je     38 <f+0x38>
   5:	81 ff 4d 01 00 00    	cmp    $0x14d,%edi
   b:	74 1b                	je     28 <f+0x28>
   d:	83 ff 03             	cmp    $0x3,%edi
  10:	74 06                	je     18 <f+0x18>
  12:	c3                   	retq   
  13:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  18:	bf 00 00 00 00       	mov    $0x0,%edi
  1d:	e9 00 00 00 00       	jmpq   22 <f+0x22>
  22:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  28:	bf 00 00 00 00       	mov    $0x0,%edi
  2d:	e9 00 00 00 00       	jmpq   32 <f+0x32>
  32:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  38:	bf 00 00 00 00       	mov    $0x0,%edi
  3d:	e9 00 00 00 00       	jmpq   42 <f+0x42>

Instead the test for 333/$0x14d should have been moved to the front.
Comment 1 Marek Polacek 2013-12-16 16:04:41 UTC
Confirmed.
Comment 2 Yuri Gribov 2017-07-07 05:40:50 UTC
Created attachment 41698 [details]
Draft patch (not working)

This can be added easily but I ran into a block in combine_predictions_for_bb which resets predictions whenever block has more than one outgoing edge and predictor is not strong enough (like e.g. noreturn predictor).
Comment 3 Yuri Gribov 2017-07-07 05:47:22 UTC
Cc Martin to comment, as he added the problematic part in combine_predictions_for_bb.
Comment 4 Martin Liška 2017-07-10 07:20:23 UTC
I'll take a look.
Comment 5 Martin Liška 2017-07-11 09:24:35 UTC
Ok, your patch is definitely needed to properly propagate the builtin probability to a proper edge. Apart from that I added code that preserves that probability incombine_predictions_for_bb. Having that we're quite close:

$ ./xgcc -B. ~/Programming/testcases/pr59521.c -O2 -c -S -fdump-tree-optimized=/dev/stdout

;; Function f (f, funcdef_no=11, decl_uid=2306, cgraph_uid=11, symbol_order=11)

Removing basic block 7
f (int ch)
{
  <bb 2> [100.00%] [count: INV]:
  switch (ch_4(D)) <default: <L3> [3.33%] [count: INV], case 3: <L0> [3.33%] [count: INV], case 42: <L1> [3.33%] [count: INV], case 333: <L2> [90.00%] [count: INV]>

But still there's a missing piece that will rearrange switch statement so that case 333 will be first:

f:
.LFB11:
        .cfi_startproc
        cmpl    $42, %edi
        je      .L3
        cmpl    $333, %edi
        jne     .L8
        movl    $.LC2, %edi
        jmp     puts

Can you please Yuri take a look?
Comment 6 Martin Liška 2017-07-11 09:25:51 UTC
Created attachment 41713 [details]
Semi-working patch
Comment 7 Yuri Gribov 2017-07-11 09:38:10 UTC
(In reply to Martin Liška from comment #5)
> Apart from that I added code that preserves
> that probability in combine_predictions_for_bb.

Yes, I did pretty much the same locally)

> But still there's a missing piece that will rearrange switch statement so
> that case 333 will be first:

Right, I'll take it from there then.
Comment 8 Yuri Gribov 2017-07-12 16:37:52 UTC
Created attachment 41737 [details]
New draft patch

How about this? I added edge attribute for builtin_expect and later used it in emit_case_decision_tree to emit expected comparison prior to decision tree for other case values.

With this patch GCC optimizes original example. If it's not too ugly I'll do proper testing and send to gcc-patches.
Comment 9 Martin Liška 2017-07-13 10:15:40 UTC
(In reply to Yuri Gribov from comment #8)
> Created attachment 41737 [details]
> New draft patch
> 
> How about this? I added edge attribute for builtin_expect and later used it
> in emit_case_decision_tree to emit expected comparison prior to decision
> tree for other case values.
> 
> With this patch GCC optimizes original example. If it's not too ugly I'll do
> proper testing and send to gcc-patches.

The patch works for me for the described case, but does not for PGO, which should do the same based on real numbers:

$ cat pr59521-profile.c
#include <stdio.h>
void
f(int ch) {
  switch (ch) {
    case 100: puts("100"); break;                                                 
    case 10: puts("10"); break;                                                 
    case 1: puts("1"); break;                                                 
    } 
}

int main()
{
  for (int i = 0; i < 10000; i++)
  {
    int v;
    if (i % 100 == 0)
      v = 100;
    else if(i % 10 == 0)
      v = 10;
    else
      v = 1;
    f(v);
  }
}

$ gcc pr59521-profile.c -fprofile-generate && ./a.out && gcc pr59521-profile.c -fprofile-use -S -fdump-tree-optimized=/dev/stdout

...
f (int ch)
{
  <bb 2> [100.00%] [count: 10000]:
  switch (ch_2(D)) <default: <L3> [0.00%] [count: 0], case 1: <L2> [90.00%] [count: 9000], case 10: <L1> [9.00%] [count: 900], case 100: <L0> [1.00%] [count: 100]>
...

But assembly looks as follows:

	cmpl	$10, %eax
	je	.L3
	cmpl	$100, %eax
	je	.L4
	cmpl	$1, %eax
	je	.L5
	jmp	.L6

Just a small note, Honza's planning to rewrite switch expansion to happen on tree level. Maybe (hopefully) such transformations
will be easier on tree level?
Comment 10 Yuri Gribov 2017-07-14 12:13:36 UTC
(In reply to Martin Liška from comment #9)
> The patch works for me for the described case, but does not for PGO, which
> should do the same based on real numbers:

Problem here is that we optimize only very_likely edges.  They requires 99.95% probability whereas in your testcase we only get 90%.  Changing mods to "% 10000" and "% 1000000" results in desired asm:

        cmpl    $1, %eax
        je      .L3
        cmpl    $10, %eax
        je      .L4
        cmpl    $100, %eax
        je      .L5

> Just a small note, Honza's planning to rewrite switch expansion to happen on
> tree level. Maybe (hopefully) such transformations
> will be easier on tree level?

Thanks, that's important to consider.  I'll send patch for review and Cc him to maybe comment.  Probly I'll just rebase when his work is in.
Comment 11 Martin Liška 2017-07-14 12:17:44 UTC
(In reply to Yuri Gribov from comment #10)
> (In reply to Martin Liška from comment #9)
> > The patch works for me for the described case, but does not for PGO, which
> > should do the same based on real numbers:
> 
> Problem here is that we optimize only very_likely edges.  They requires
> 99.95% probability whereas in your testcase we only get 90%.  Changing mods
> to "% 10000" and "% 1000000" results in desired asm:
> 
>         cmpl    $1, %eax
>         je      .L3
>         cmpl    $10, %eax
>         je      .L4
>         cmpl    $100, %eax
>         je      .L5

Maybe I miss something, but I would expect to sort all branches in emit_case_decision_tree as either predictors can sort branches, or one have a profile feedback. Having a chain of equal comparisons, that should be always beneficial, or?

> 
> > Just a small note, Honza's planning to rewrite switch expansion to happen on
> > tree level. Maybe (hopefully) such transformations
> > will be easier on tree level?
> 
> Thanks, that's important to consider.  I'll send patch for review and Cc him
> to maybe comment.  Probly I'll just rebase when his work is in.

Actually he was convincing me to rewrite it, but I still have more unfinished tasks from history which I should start with ;)
Comment 12 Ulrich Drepper 2017-07-14 13:58:57 UTC
On Fri, Jul 14, 2017 at 2:17 PM, marxin at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> Maybe I miss something, but I would expect to sort all branches in
> emit_case_decision_tree as either predictors can sort branches, or one have a
> profile feedback. Having a chain of equal comparisons, that should be always
> beneficial, or?

I agree.  There seems to be no negative effect.  If you use a stable
sort algorithm the programmer can have influence when needed since the
program's order is preserved.  If the compiler has probability
information it should use it.  Note, I just mean the order of the
tests.  Deciding about placing code in cold sections is a different
story but this isn't what we're talking about here, right?
Comment 13 Yuri Gribov 2017-07-17 07:43:26 UTC
Created attachment 41770 [details]
Yet another patch

Ok, this one works both for __builtin_expect and -fprofile-generate and survives bootstrap on x64. I've modified switch decision tree creation to select pivot value based on probabilities rather than number of values on left and right (somewhat similar to ID3 algorithm for decision trees).

If patch looks fine in general, I'll add tests and submit for review.
Comment 14 Yuri Gribov 2017-07-18 07:05:16 UTC
Patch posted in https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01016.html
Comment 15 Daniel Fruzynski 2017-12-14 20:14:47 UTC
+1 for this, I wanted to request this today too. I see that some patch is ready, how is review going?
Comment 16 Yury Gribov 2017-12-15 09:36:00 UTC
(In reply to Daniel Fruzynski from comment #15)
> +1 for this, I wanted to request this today too. I see that some patch is
> ready, how is review going?

This work was picked up by Martin, I'm not sure if it's already in trunk (see his presentation in https://slideslive.com/38902416/switch-lowering-improvements).
Comment 17 Martin Liška 2017-12-18 13:53:09 UTC
Unfortunately I decided to postpone it to GCC 9.x.
Comment 18 Eric Gallager 2018-06-22 04:39:15 UTC
(In reply to Martin Liška from comment #17)
> Unfortunately I decided to postpone it to GCC 9.x.

GCC 9.x development is ongoing now.
Comment 19 Martin Liška 2018-06-22 08:32:41 UTC
(In reply to Eric Gallager from comment #18)
> (In reply to Martin Liška from comment #17)
> > Unfortunately I decided to postpone it to GCC 9.x.
> 
> GCC 9.x development is ongoing now.

Yes, I've just installed first part of switch enhancement patches. I'm definitely planning to do this PR in GCC 9.1.
Comment 20 Martin Liška 2018-09-03 07:52:28 UTC
Author: marxin
Date: Mon Sep  3 07:51:56 2018
New Revision: 264050

URL: https://gcc.gnu.org/viewcvs?rev=264050&root=gcc&view=rev
Log:
Make __builtin_expect effective in switch statements (PR middle-end/PR59521).

2018-09-03  Martin Liska  <mliska@suse.cz>

  PR middle-end/59521
	* predict.c (set_even_probabilities): Add likely_edges
        argument and handle cases where we have precisely one
        likely edge.
	(combine_predictions_for_bb): Catch also likely_edges.
	(tree_predict_by_opcode): Handle gswitch statements.
	* tree-cfg.h (find_case_label_for_value): New declaration.
	(find_taken_edge_switch_expr): Likewise.
	* tree-switch-conversion.c (switch_decision_tree::balance_case_nodes):
        Find pivot in decision tree based on probabily, not by number of
        nodes.
2018-09-03  Martin Liska  <mliska@suse.cz>

  PR middle-end/59521
	* c-c++-common/pr59521-1.c: New test.
	* c-c++-common/pr59521-2.c: New test.
	* gcc.dg/tree-prof/pr59521-3.c: New test.

Added:
    trunk/gcc/testsuite/c-c++-common/pr59521-1.c
    trunk/gcc/testsuite/c-c++-common/pr59521-2.c
    trunk/gcc/testsuite/gcc.dg/tree-prof/pr59521-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/predict.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-cfg.h
    trunk/gcc/tree-switch-conversion.c
Comment 21 Martin Liška 2018-09-03 07:54:04 UTC
Done.