int min(int a, int b) { if (a < b) return a; else return b; } should produce rlwinm r2,r3,1,31,31 subfc r0,r4,r3 rlwinm r0,r4,1,31,31 subfe r0,r2,r0 andc r2,r4,r0 and r0,r3,r0 or r3,r0,r2 blr but instead produces a branch sequence mr r0,r3 mr r3,r4 cmpw cr7,r0,r4 bgelr- cr7 mr r3,r0 blr with unnecessary dependencies.
Confirmed.
XLC chooses the straight-line code sequence versus compare and branch based on a cost model. This should not be a uniform change in behavior for PowerPC.
Also, do not enable when optimizing for size.
I'm not sure that subfc/subfe is going to be cheaper than a compare and a branch, even if the branch is mispredicted half the time. Do you have timing results?
I should note when I was doing SPEC work, using subfc/subfe did not help SPEC at all (I tried to change the source and also rs6000.md).
On the mainline, we now produce: cmpw cr7,r3,r4 blelr- cr7 mr r3,r4 blr
Current trunk as well as all supported GCC versions before it still emits the same code (see below). XLC 12 on gcc111.fsffrance.org also emits a branch (see below). Ditto for Clang. David, in light of this and in light of comments #4 and #5, do you still believe that GCC should change as you suggested in the Description? .min: # 0x00000000 (H.10.NO_SYMBOL) cmp 0,r3,r4 bc BO_IF,CR0_LT,__L10 oril r3,r4,0x0000 bcr BO_ALWAYS,CR0_LT __L10: # 0x00000010 (H.10.NO_SYMBOL+0x10) bcr BO_ALWAYS,CR0_LT $ cat ~/tmp/t.c && /build/gcc-trunk/gcc/xgcc -B /build/gcc-trunk/gcc -O2 -S -Wall -Wextra -Wpedantic -o/dev/stdout ~/tmp/t.c int min(int a, int b) { if (a < b) return a; else return b; } .file "t.c" .machine power8 .abiversion 2 .section ".toc","aw" .section ".text" .align 2 .p2align 4,,15 .globl min .type min, @function min: cmpw 7,3,4 ble 7,.L2 mr 3,4 .L2: extsw 3,3 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size min,.-min .ident "GCC: (GNU) 6.0.0 20160125 (experimental)" .section .note.GNU-stack,"",@progbits
Branchless code generally is better.
I noticed while looking at an unrelated bug that when targeting power7 or power8 Clang makes use of the isel instruction and emits the following: min: # @min cmpw 3, 4 isel 3, 3, 4, 0 blr Gcc also has the capability of using isel but it's disabled by default even when targeting power8 and must be explicitly enabled via -misel. With it, GCC emits the following branchless code: min: cmpw 7,3,4 isel 3,3,4,28 extsw 3,3 blr Since the instruction exists for just this purpose (eliminating branches), would it make sense to enable it by default? (I suppose one concern with it might be that it's not being very extensively tested.)
isel is not generally performance win for Power using GCC. It is enabled for LLVM because LLVM has a simplistic basic block scheduler and isel allows LLVM to form larger basic blocks to provide the scheduler with more freedom of movement.
The signed version can be done in four insns: 1: subfc r5,r3,r4 subfe r6,r6,r6 and r7,r6,r5 addc r8,r7,r3 (superopt finds 16 versions, all similar). The unsigned version can be done in six: 33: subfc r5,r3,r4 srwi r6,r4,31 srwi r7,r3,31 subfe r8,r6,r7 and r9,r8,r5 addc r10,r9,r3 (superopt finds 240 versions, many with one or two xoris ,,0x8000 which doesn't work for 64 bit, and many with srawi as well, which can be more expensive than srwi; all remaining are similar). For 32-bit min/max on a 64-bit cpu, we can use only "cheap", non-carry instructions: extsw r3,r3 extsw r4,r4 subf r5,r4,r3 srdi r6,r5,32 and r7,r6,r5 add r8,r7,r4 (and unsigned exts for unsigned). Those extends often disappear into surrounding insns, or because the ABI requires the regs to be extended already, etc.
(Never mind those last "addc" insn, they can just as well be plain "add", I pasted the wrong ones).
Trunk now generates isel for power9.