The optimizer at -O3 for x86_64 finds opportunities to generate the "adc" (add with carry) instruction to avoid a conditional branch. However, the dst of the adc can only be in a register. The optimizer misses a chance to use "adc" with the destination in memory, and will instead use a conditional branch in a straightforward manner. The following program shows that foo1 uses adc, but foo0 does not. Although I haven't tested this, I'm sure that the other 3 possibilities in this family (involving sbb and/or different literal values to add/subtract) will also not find the opportunity to do a memory update with adc. extern void consumep(int *); extern void consume(int); void foo0(unsigned a, unsigned b, int *victim) { if (a > b) { (*victim)++; } consumep(victim); } void foo1(unsigned a, unsigned b, int victim) { if (a > b) { victim++; } consume(victim); }
There are various scary comments in ifcvt.c, noce_process_if_block() regarding memory operands, like: /* Only operate on register destinations, and even then avoid extending the lifetime of hard registers on small register class machines. */ and /* Don't operate on sources that may trap or are volatile. */ and /* Avoid store speculation: given "if (...) x = a" where x is a MEM, we only want to do the store if x is always set somewhere in the function. This avoids cases like if (pthread_mutex_trylock(mutex)) ++global_variable; where we only want global_variable to be changed if the mutex is held. FIXME: This should ideally be expressed directly in RTL somehow. */ I don't think it is always safe to simplify global memory operands.
Setting Component to Generic RTL optimization.
GCC, ICC, clang nor MSVC does this. Someone would need to look at the instruction to see if it is valid to do for the C++11 memory model too. Does it write to the memory location even without the add?