In the attached test case, the aarch64 target inserts a uxtb instruction that is not needed: ldrb w1, [x0] orr w1, w1, 32 uxtb w1, w1 cmp w1, 116 The arm backend handles that just fine: ldrb r3, [r0] @ zero_extendqisi2 orr r3, r3, #32 cmp r3, #116 It also works with 33 or 34 instead of 32 for whatever reasons.
Created attachment 39675 [details] C reproducer
Created attachment 39676 [details] Aarch64 output at -O2
Created attachment 39677 [details] arm (thumb2) output at -O2
Note this testcase needs to be improved as I have a patch which converts a switch with just one case and a default into anew if statement.
Created attachment 39678 [details] ppc64le Hmm, AFAICT the same seems to happen on powerpc64le: lbz 9,0(3) # Load zero ori 9,9,0x20 # ors in 32 rlwinm 9,9,0,0xff # zero extend value AFAICT cmpwi 7,9,116 So far tested: good: mipsel, x86_64, armhf (thumb2) bad: aarch64, powerpc64le
(In reply to Andrew Pinski from comment #4) > Note this testcase needs to be improved as I have a patch which converts a > switch with just one case and a default into anew if statement. FWIW, The issue is exactly the same with if statements, like in: int TrieCase3(const char *string) { if((string[0] | 32) == 't') { if((string[1] | 32) == 'a') { if((string[2] | 32) == 'g') { return 42; } } } return -1; }
Something similar happens on x86_64 with -Os. The OR instruction operates on %eax instead of %al: 0000000000000000 <TrieCase3>: 0: 8a 07 mov (%rdi),%al 2: 83 c8 20 or $0x20,%eax 5: 3c 74 cmp $0x74,%al (Byte-wise OR would be 0x0c 0x20, one byte shorter.)
This looks like missing removal of casts. Note in C, char_var|32 is really the same as ((int)char_var)|32
(In reply to Andrew Pinski from comment #8) > This looks like missing removal of casts. > > Note in C, char_var|32 is really the same as ((int)char_var)|32 Well. The loads of the byte are already zero-extend loads. The current code is like saying: (int) ((int)char|32) :)
This is how we expand it on aarch64: (insn 10 9 11 (set (reg:QI 81) (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) t.c:4 -1 (nil)) (insn 11 10 12 (set (reg:SI 82) (ior:SI (subreg:SI (reg:QI 81) 0) (const_int 32 [0x20]))) t.c:4 -1 (nil)) (insn 12 11 13 (set (reg:SI 83) (zero_extend:SI (subreg:QI (reg:SI 82) 0))) t.c:4 -1 (nil)) (insn 13 12 14 (set (reg:CC 66 cc) (compare:CC (reg:SI 83) (const_int 116 [0x74]))) t.c:4 -1 (nil)) ----- (set (reg:SI 83) (ior:SI (and:SI (subreg:SI (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8]) 0) (const_int 223 [0xdf])) (const_int 32 [0x20]))) Notice how the and there is 223, but really that can be still a zero_extend. Basically combine is going funny. ---- CUT ---- Note for 33, orr does not accept 33 so combine does not see 33 and does not change the and part around the subreg.
> Basically combine is going funny. Hey, don't blame combine, it's all simplify-rtx's doing. If in simplify_binary_operation_1 you disable the blocks after /* Minimize the number of bits set in C1, i.e. C1 := C1 & ~C2. */ and after /* If we have (ior (and (X C1) C2)), simplify this by making C1 as small as possible if C1 actually changes. */ all works as you want (this is on rs6000, pretty much the same thing): === Trying 10, 11 -> 12: Failed to match this instruction: (set (reg:SI 131) (ior:SI (zero_extend:SI (mem:QI (reg/v/f:DI 128 [ string ]) [0 *string_9(D)+0 S1 A8])) (const_int 96 [0x60]))) Successfully matched this instruction: (set (reg:SI 130) (zero_extend:SI (mem:QI (reg/v/f:DI 128 [ string ]) [0 *string_9(D)+0 S1 A8]))) Successfully matched this instruction: (set (reg:SI 131) (ior:SI (reg:SI 130) (const_int 96 [0x60]))) allowing combination of insns 10, 11 and 12 original costs 8 + 4 + 4 = 16 replacement costs 8 + 4 = 12 deferring deletion of insn with uid = 10. modifying insn i2 11: r130:SI=zero_extend([r128:DI]) deferring rescan insn with uid = 11. modifying insn i3 12: r131:SI=r130:SI|0x60 REG_DEAD r130:SI deferring rescan insn with uid = 12. === giving lbz 9,0(3) ori 9,9,0x60 cmpwi 7,9,116 bne 7,.L5 etc.
I have a patch.
Author: segher Date: Tue Oct 3 16:02:38 2017 New Revision: 253384 URL: https://gcc.gnu.org/viewcvs?rev=253384&root=gcc&view=rev Log: simplify-rtx: Remove non-simplifying simplification (PR77729) If we have (X&C1)|C2 simplify_binary_operation_1 makes C1 as small as possible. This makes worse code in common cases like when the AND with C1 is from a zero-extension. This patch fixes it by removing this transformation (twice). PR rtl-optimization/77729 * simplify-rtx.c (simplify_binary_operation_1): Delete the (X&C1)|C2 to (X&(C1&~C2))|C2 transformations. Modified: trunk/gcc/ChangeLog trunk/gcc/simplify-rtx.c
Fixed on trunk; no backports planned.