The following c/++ code generates assembly that favors the unlikely case with gcc 4.3 and 4.4: extern void likely(); extern void unlikely(); void test_expect(char * a, char *b, char *c, char *d) { if (__builtin_expect(!!(a == b && c == d), 1)) { likely(); } else { unlikely(); } } Compiled with g++44 -O2 -S -fverbose-asm -m32. (or -m64). A few notes: gcc 4.1 seems to generate better assembly. If this code is code is compiled as c code and not C++ with gcc 4.4, the likely case is properly favored. Changing the if statement into "if (__builtin_expect(!!(a == b),1) && __builtin_expect(!!(c == d), 1))" fixes the problem as well. With this code, gcc generates better code for the likely case.
I can see this problem with 4.3.2 (Debian 4.3.2-1.1), 4.2.4 (Debian 4.2.4-6), 4.4.0 20090514 (Red Hat 4.4.0-6). I do not see it with 4.1.2 20080704 (Red Hat 4.1.2-46) and 4.1.3 20080704 (prerelease) (Debian 4.1.2-25).
With 4.5 we seem to be all fine here: jh@gcc17:~/trunk/build/gcc$ ./g++ -B ./ -O2 tt.c -fdump-tree-all-details -S -fdump-rtl-all-details-blocks jh@gcc17:~/trunk/build/gcc$ more tt.s .file "tt.c" .text .p2align 4,,15 .globl _Z11test_expectPcS_S_S_ .type _Z11test_expectPcS_S_S_, @function _Z11test_expectPcS_S_S_: .LFB0: cmpq %rcx, %rdx sete %dl cmpq %rsi, %rdi sete %al testb %al, %dl je .L2 jmp _Z6likelyv .L2: jmp _Z8unlikelyv .LFE0: and the conditional jump is predicted in the proper direction in greg dumps as having 1% probability to reach unlikely. Of course we miss here the conditional tailcall case. Honza
The trunk g++ output looks good. As I said, there is a simple workaround. But since this is a regression from 4.1, a fix in 4.4 would be nice. Additionally, it would be great if you could document the full scope of the bug (like if all conditions can trigger this etc) and if there is any simpler workaround.
My original tests were done on x86/x86_64. I observed the same problem with gcc 4.3.2 when compiling for PowerPC on AIX. For some reason, the output is properly optimized with the same version (different build though) when compiling for Sparc on Solaris.
I can't reproduce this with 4.4 either, since http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01924.html gcc output looks like what Jan pasted here.
Are you compiling with g++ and -O2? On the only 4.4 version I have access to right now (gcc version 4.4.0 20090514 (Red Hat 4.4.0-6), I get: $ g++44 -O2 -S test_expect.c $ cat test_expect.s (snip) .globl _Z11test_expectPcS_S_S_ .type _Z11test_expectPcS_S_S_, @function _Z11test_expectPcS_S_S_: .LFB0: cmpq %rcx, %rdx je .L5 .L2: jmp _Z8unlikelyv .p2align 4,,10 .p2align 3 .L5: cmpq %rsi, %rdi jne .L2 .p2align 4,,6 jmp _Z6likelyv I get a similar output for the versions I listed in the report.
On the 4.4 branch actually cc1 behaves differently from cc1plus. In any case, I believe the two setne's plus testb code is actually the worst of the options, and e.g. with -Os it is even much larger. Just compile the testcase on 4.5 with -Os and -Os -D'__builtin_expect(a,b)=(a)' In the former case (i.e. with __builtin_expect) if generates: cmpq %rcx, %rdx sete %dl xorl %eax, %eax cmpq %rsi, %rdi movzbl %dl, %edx sete %al testl %eax, %edx je .L2 xorl %eax, %eax jmp likely .L2: xorl %eax, %eax jmp unlikely while in the latter case generates: cmpq %rcx, %rdx jne .L2 cmpq %rsi, %rdi jne .L2 xorl %eax, %eax jmp likely .L2: xorl %eax, %eax jmp unlikely (what would be actually the best code for -O2 when __builtin_expect is used and tells that likely () is likely).
Created attachment 19906 [details] gcc45-pr42233.patch The reason why it behaves differently with __builtin_expect from normal condition is that in the latter case gimple_boolify changes the types of the comparisons etc. to _Bool, while for __builtin_expect it doesn't recurse into its first argument. The attached patch fixes that. Then we are back to the 4.3 generated code, i.e. the same as is generated without __builtin_expect. At this point the problem is described in a comment above add_reg_br_prob_note: /* Verify that there is exactly single jump instruction since last and attach REG_BR_PROB note specifying probability. ??? We really ought to pass the probability down to RTL expanders and let it re-distribute it when the conditional expands into multiple conditionals. This is however difficult to do. */ Since here there are two jumps, not one, no REG_BR_PROB note is added.
I'm relieved that cond-optab is in no way related to this. :-) I'm not expert of the gimplifier but the patch makes sense and looks good.
Created attachment 19909 [details] gcc45-pr42233-2.patch Trunk patch that seems to fix this together with the previous patch. Going to bootstrap/regtest both now.
Subject: Bug 42233 Author: jakub Date: Fri Feb 19 09:50:30 2010 New Revision: 156888 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156888 Log: PR middle-end/42233 * gimplify.c (gimple_boolify): For __builtin_expect call gimple_boolify also on its first argument. Modified: trunk/gcc/ChangeLog trunk/gcc/gimplify.c
Subject: Bug 42233 Author: jakub Date: Fri Feb 19 09:53:51 2010 New Revision: 156889 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156889 Log: PR middle-end/42233 * expr.h (jumpifnot, jumpifnot_1, jumpif, jumpif_1, do_jump, do_jump_1, do_compare_rtx_and_jump): Add PROB argument. * dojump.c: Include output.h. (inv): New inline function. (jumpifnot, jumpifnot_1, jumpif, jumpif_1, do_jump_1, do_jump, do_jump_by_parts_greater_rtx, do_jump_by_parts_greater, do_jump_by_parts_zero_rtx, do_jump_by_parts_equality_rtx, do_jump_by_parts_equality, do_compare_and_jump): Add PROB argument, pass it down to other calls. (do_compare_rtx_and_jump): Likewise. If PROB is not -1, add REG_BR_PROB note to the conditional jump. * cfgexpand.c (add_reg_br_prob_note): Removed. (expand_gimple_cond): Don't call it, add the probability as last argument to jumpif_1/jumpifnot_1. * Makefile.in (dojump.o): Depend on output.h. * builtins.c (expand_errno_check): Adjust do_compare_rtx_and_jump callers. * expmed.c (emit_store_flag_force, do_cmp_and_jump): Likewise. * stmt.c (do_jump_if_equal): Likewise. * cfgrtl.c (rtl_lv_add_condition_to_bb): Likewise. * loop-unswitch.c (compare_and_jump_seq): Likewise. * config/rs6000/rs6000.c (rs6000_aix_emit_builtin_unwind_init): Likewise. * optabs.c (expand_doubleword_shift, expand_abs): Likewise. * expr.c (expand_expr_real_1): Adjust do_jump, jumpifnot and jumpifnot_1 callers. (expand_expr_real_2): Adjust jumpifnot_1 and do_compare_rtx_and_jump callers. (store_expr): Adjust jumpifnot caller. (store_constructor): Adjust jumpif caller. Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/builtins.c trunk/gcc/cfgexpand.c trunk/gcc/cfgrtl.c trunk/gcc/config/rs6000/rs6000.c trunk/gcc/dojump.c trunk/gcc/expmed.c trunk/gcc/expr.c trunk/gcc/expr.h trunk/gcc/loop-unswitch.c trunk/gcc/optabs.c trunk/gcc/stmt.c
Fixed on the trunk so far. I might try to do a backport in a few days to 4.4 branch, though the second patch won't be trivial, as dojump.c changed massively with SSA expansion changes.
Subject: Bug 42233 Author: jakub Date: Fri Feb 19 12:47:18 2010 New Revision: 156893 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156893 Log: PR middle-end/42233 * loop-doloop.c (add_test): Adjust do_compare_rtx_and_jump caller. Modified: trunk/gcc/ChangeLog trunk/gcc/loop-doloop.c
Subject: Bug 42233 Author: jakub Date: Fri Feb 19 18:19:06 2010 New Revision: 156903 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156903 Log: PR middle-end/42233 * dojump.c (do_jump) <case TRUTH_NOT_EXPR>: Invert priority. Modified: trunk/gcc/ChangeLog trunk/gcc/dojump.c
Created attachment 19924 [details] gcc44-pr42233.patch Backported patch for 4.4 branch, so far only compile tested.
Subject: Bug 42233 Author: jakub Date: Mon Mar 8 11:46:28 2010 New Revision: 157274 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157274 Log: PR middle-end/42233 * dojump.c (do_jump) <case TRUTH_NOT_EXPR>: Invert priority. PR bootstrap/43121 * except.c (sjlj_emit_function_enter): Don't call add_reg_br_prob_note, instead add REG_BR_PROB note to the last insn directly. * rtl.h (add_reg_br_prob_note): Remove prototype. PR middle-end/42233 * loop-doloop.c (add_test): Adjust do_compare_rtx_and_jump caller. PR middle-end/42233 * expr.h (jumpifnot, jumpifnot_1, jumpif, jumpif_1, do_jump, do_jump_1, do_compare_rtx_and_jump): Add PROB argument. * dojump.c: Include output.h. (inv): New inline function. (jumpifnot, jumpifnot_1, jumpif, jumpif_1, do_jump_1, do_jump, do_jump_by_parts_greater_rtx, do_jump_by_parts_greater, do_jump_by_parts_zero_rtx, do_jump_by_parts_equality_rtx, do_jump_by_parts_equality, do_compare_and_jump): Add PROB argument, pass it down to other calls. (do_compare_rtx_and_jump): Likewise. If PROB is not -1, add REG_BR_PROB note to the conditional jump. * cfgexpand.c (add_reg_br_prob_note): Removed. (expand_gimple_cond): Don't call it, add the probability as last argument to jumpif_1/jumpifnot_1. * Makefile.in (dojump.o): Depend on output.h. * builtins.c (expand_errno_check): Adjust do_compare_rtx_and_jump callers. * expmed.c (emit_store_flag_force, do_cmp_and_jump): Likewise. * stmt.c (do_jump_if_equal): Likewise. * cfgrtl.c (rtl_lv_add_condition_to_bb): Likewise. * loop-unswitch.c (compare_and_jump_seq): Likewise. * config/rs6000/rs6000.c (rs6000_aix_emit_builtin_unwind_init): Likewise. * optabs.c (expand_doubleword_shift, expand_abs): Likewise. * expr.c (expand_expr_real_1): Adjust do_jump, jumpifnot and jumpifnot_1 callers. (expand_expr_real_2): Adjust jumpifnot_1 and do_compare_rtx_and_jump callers. (store_expr): Adjust jumpifnot caller. (store_constructor): Adjust jumpif caller. PR middle-end/42233 * gimplify.c (gimple_boolify): For __builtin_expect call gimple_boolify also on its first argument. Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/Makefile.in branches/gcc-4_4-branch/gcc/cfgexpand.c branches/gcc-4_4-branch/gcc/cfgrtl.c branches/gcc-4_4-branch/gcc/config/rs6000/rs6000.c branches/gcc-4_4-branch/gcc/dojump.c branches/gcc-4_4-branch/gcc/except.c branches/gcc-4_4-branch/gcc/expmed.c branches/gcc-4_4-branch/gcc/expr.c branches/gcc-4_4-branch/gcc/expr.h branches/gcc-4_4-branch/gcc/gimplify.c branches/gcc-4_4-branch/gcc/loop-doloop.c branches/gcc-4_4-branch/gcc/loop-unswitch.c branches/gcc-4_4-branch/gcc/optabs.c branches/gcc-4_4-branch/gcc/rtl.h branches/gcc-4_4-branch/gcc/stmt.c
Fixed for 4.4.3 too. Don't plan to backport this to 4.3.