Take: bool g(bool a, bool b) { return ~a & b; } ---- CUT --- Currently we produce: and w1, w1, 255 and w0, w0, 255 bic w0, w1, w0 and w0, w0, 1 ---- CUT --- But we should produce: bic w0, w1, w0 and w0, w0, 1 The zero extends are not needed. This happens because combine does the correct thing until it tries to figure out the cutting point:Trying 2, 8 -> 16: 2: r98:SI=zero_extend(x0:QI) REG_DEAD x0:QI 8: r102:SI=~r98:SI&r99:SI REG_DEAD r98:SI REG_DEAD r99:SI 16: x0:SI=r102:SI&0x1 REG_DEAD r102:SI Failed to match this instruction: (set (reg:SI 0 x0) (and:SI (and:SI (not:SI (reg:SI 0 x0 [ a ])) (reg/v:SI 99 [ b ])) (const_int 1 [0x1]))) Successfully matched this instruction: (set (reg:SI 102) (not:SI (reg:SI 0 x0 [ a ]))) Failed to match this instruction: (set (reg:SI 0 x0) (and:SI (and:SI (reg:SI 102) (reg/v:SI 99 [ b ])) (const_int 1 [0x1]))) If we had chose (and:SI (not:SI (reg:SI 0 x0 [ a ])) (reg/v:SI 99 [ b ])) instead, we would have gotten the correct thing.
It happens to work on x86-64(with -march=skylake-avx512) becausewe get a zero_extend instead of an and there. I still don't understand how x86 is able to figure out the &1 part. Trying 11, 9 -> 12: 11: r94:SI=zero_extend(r97:SI#0) REG_DEAD r97:SI 9: r92:SI=zero_extend(r96:SI#0) REG_DEAD r96:SI 12: {r95:SI=~r92:SI&r94:SI;clobber flags:CC;} REG_DEAD r92:SI REG_UNUSED flags:CC REG_DEAD r94:SI Failed to match this instruction: (parallel [ (set (reg:SI 95) (zero_extend:SI (and:QI (not:QI (subreg:QI (reg:SI 96) 0)) (subreg:QI (reg:SI 97) 0)))) (clobber (reg:CC 17 flags)) ]) Failed to match this instruction: (set (reg:SI 95) (zero_extend:SI (and:QI (not:QI (subreg:QI (reg:SI 96) 0)) (subreg:QI (reg:SI 97) 0)))) Successfully matched this instruction: (set (reg:QI 94 [ b ]) (and:QI (not:QI (subreg:QI (reg:SI 96) 0)) (subreg:QI (reg:SI 97) 0))) Successfully matched this instruction: (set (reg:SI 95) (zero_extend:SI (reg:QI 94 [ b ])))
I think this will be fixed/improved by https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602089.html .
Even a simple: unsigned char g(unsigned char a, unsigned char b) { return ((~a) & b)&1; } Produces the extra zero extend. But it is ok with: unsigned char g(unsigned char *a, unsigned char *b) { return ((~*a) & *b)&1; } It looks like it is hard register related too ...
*** Bug 109832 has been marked as a duplicate of this bug. ***