Compile following code with options -march=armv5te -O2 extern void *memcpy(void *dst, const void *src, int n); void *memmove(void *dst, const void *src, int n) { const char *p = src; char *q = dst; if (__builtin_expect(q < p, 1)) { return memcpy(dst, src, n); } else { int i=0; for (; i<n; i++) q[i] = p[i]; } return dst; } gcc generates: memmove: cmp r1, r0 str r4, [sp, #-4]! mov r3, r0 mov ip, r1 mov r4, r2 bls .L8 ldmfd sp!, {r4} b memcpy .L8: cmp r2, #0 movgt r2, #0 ble .L4 .L5: ldrb r1, [ip, r2] @ zero_extendqisi2 strb r1, [r3, r2] add r2, r2, #1 cmp r2, r4 bne .L5 .L4: mov r0, r3 ldmfd sp!, {r4} bx lr The if block is expected to be more frequent than the else block, but the generated code is not very efficient. Better code could be: cmp r1, r0 bhi memcpy str r4, [sp, #-4]! mov r3, r0 mov ip, r1 mov r4, r2 L8: ...
The problem here essentially appears to be that GCC can't seem to generate conditional tail-calls (or conditional calls for that matter in this case) with -fno-optimize-sibling-calls . I don't read this as a problem with builtin_expect per-se but that of GCC not being able to generate a conditional tail-call / call. A simpler test is essentially the following. void foo (int x) { if (x) bar (); else baz (); } This is also not just a target problem but probably one for the RTL optimizers rather any where else..
What do you expect with -fno-optimize-sibling-calls ...
This bug is still present with gcc 5.2 -O3 (which does include -foptimize-sibling-calls). void fire_special_event(void); void conditional_call(int cond) { if(cond) fire_special_event(); } The above code compiles to (x86-64 gcc 5.2 -O3) testl %edi, %edi jne .L4 rep ret .L4: jmp fire_special_event This sequence would be better: testl %edi, %edi jne fire_special_event ret godbolt link: https://goo.gl/0K6EZx Later functions in that listing are related to http://stackoverflow.com/questions/97987/advantage-of-switch-over-if-else-statement Is there a linker limitation on relocations for conditional-branch targets that aren't part of the current compilation unit? neither clang 3.7 nor icc 13 do any better than gcc. It seems to work for me when modifying the asm by hand to jnz _Z18fire_special_eventv, and linking to a separately-compiled definition.