Compile this code with a recent trunk gcc: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ int zero; int one; int two; extern unsigned compute_mod(unsigned); void cnt(unsigned n) { #ifdef HIDE n = compute_mod(n); #else n %= 3; #endif switch (n) { case 0: ++zero; break; case 1: ++one; break; case 2: ++two; break; #ifdef OPT default: __builtin_unreachable(); #endif } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On x86-64, without HIDE defined, the code compiles with -O3 to: 0: 89 f8 mov %edi,%eax 2: ba ab aa aa aa mov $0xaaaaaaab,%edx 7: f7 e2 mul %edx 9: d1 ea shr %edx b: 8d 04 52 lea (%rdx,%rdx,2),%eax e: 29 c7 sub %eax,%edi 10: 83 ff 01 cmp $0x1,%edi 13: 74 1b je 30 <_Z3cntj+0x30> 15: 83 ff 02 cmp $0x2,%edi 18: 75 0e jne 28 <_Z3cntj+0x28> 1a: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 21 <_Z3cntj+0x21> 21: c3 retq 22: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 28: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 2f <_Z3cntj+0x2f> 2f: c3 retq 30: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 37 <_Z3cntj+0x37> 37: c3 retq This is good, the compiler knows there are only three possible values and does not emit any code for a default case. If I make sure the compiler doesn't know anything about the arithmetic operation by calling a function but telling the compiler there is no other case by using __builtin_unreachable() then the generated code is even better: 0: 48 83 ec 08 sub $0x8,%rsp 4: e8 00 00 00 00 callq 9 <_Z3cntj+0x9> 9: 83 f8 01 cmp $0x1,%eax c: 74 22 je 30 <_Z3cntj+0x30> e: 72 10 jb 20 <_Z3cntj+0x20> 10: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 17 <_Z3cntj+0x17> 17: 48 83 c4 08 add $0x8,%rsp 1b: c3 retq 1c: 0f 1f 40 00 nopl 0x0(%rax) 20: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 27 <_Z3cntj+0x27> 27: 48 83 c4 08 add $0x8,%rsp 2b: c3 retq 2c: 0f 1f 40 00 nopl 0x0(%rax) 30: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 37 <_Z3cntj+0x37> 37: 48 83 c4 08 add $0x8,%rsp 3b: c3 retq There is only one compare instruction. This is how even the first case should look like. Even more interesting: just defining the OPT macro does not change anything. So, there is currently no way to get the optimal behaviour. We certainly don't want to artificially the function calls.
The point is that with the modulus visible VRP will remove the default case (or rather take a random one -- here the first one -- as new default): <bb 2>: n_9 = n_8(D) % 3; switch (n_9) <default: <L0>, case 1: <L1>, case 2: <L2>> <L0>: zero.0_1 = zero; _2 = zero.0_1 + 1; zero = _2; goto <bb 6>; <L1>: one.1_3 = one; _4 = one.1_3 + 1; one = _4; goto <bb 6>; <L2>: two.2_5 = two; _6 = two.2_5 + 1; two = _6; <bb 6>: return; if we don't have the modulo then the default case will not be optimized away (not the one with the unreachable either) before RTL expansion. So it looks like the four-cases variant where one case later vanishes happens to be expanded better. Which means we should improve expansion for the IL case above (stmt.c:expand_case). Or work around by choosing another value as the "default" if that makes expand_case happy.