extern bool z, c; uint8_t f(uint8_t dest, uint8_t src) { u8 res = dest - src; z = !res; c = src > dest; return res; } With -O3, LLVM outputs this: f(unsigned char, unsigned char): mov eax, edi sub al, sil sete byte ptr [rip + z] setb byte ptr [rip + c] ret GCC outputs this: f(unsigned char, unsigned char): mov eax, edi sub al, sil sete BYTE PTR z[rip] cmp dil, sil setb BYTE PTR c[rip] ret It seems desirable to eliminate the `cmp`, unless there's some weird flag stall thing I'm not aware of.
Confirmed, PR 3507 is part of it (maybe all of it) as shown by: #include <stdbool.h> #include <stdint.h> extern bool z, c; uint8_t f(uint8_t dest, uint8_t src) { uint8_t res = dest - src; //z = !res; c = src > dest; return res; }
Here is a testcase which shows the issue on other targets (aarch64) too: #include <stdbool.h> #include <stdint.h> extern bool z, c; uint32_t f(uint32_t dest, uint32_t src) { uint32_t res = dest - src; z = !res; c = src > dest; return res; }
LLVM has a separate module to merge sub and cmp, GCC can do similar thing. Alternative choice is canonicalizing cmp patterns to be same as subs' with a unused dest(result of sub), then CSE/PRE would be able to do the elimination work?