If conversion is causing extraordinary bad code for AVR. This occurs on 4.3 4.4 is no better. Testcase: int z; int foo2(void) { return ( ++z >= 0); } Conversion replaces if using store flag. However, there appears to be no account of relative costs and rather bad interactions with mode narrowing and widening After conversion we have: z++ Extract upper byte of 'z' Sign extend One complement Shift right 15 RTL prior to ce1 pass: ;; Pred edge ENTRY [100.0%] (fallthru) (note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 3 5 2 NOTE_INSN_FUNCTION_BEG) (insn 5 2 6 2 test1.c:15 (set (reg:HI 43 [ z ]) (mem/c/i:HI (symbol_ref:HI ("z") <var_decl 0x7fe000b0 z>) [3 z+0 S2 A8])) 10 {*movhi} (nil)) (insn 6 5 7 2 test1.c:15 (set (reg:HI 41 [ z.5 ]) (plus:HI (reg:HI 43 [ z ]) (const_int 1 [0x1]))) 25 {addhi3} (expr_list:REG_DEAD (reg:HI 43 [ z ]) (nil))) (insn 7 6 8 2 test1.c:15 (set (mem/c/i:HI (symbol_ref:HI ("z") <var_decl 0x7fe000b0 z>) [3 z+0 S2 A8]) (reg:HI 41 [ z.5 ])) 10 {*movhi} (nil)) (insn 8 7 9 2 test1.c:15 (set (reg:HI 44) (const_int 0 [0x0])) 10 {*movhi} (nil)) (insn 9 8 10 2 test1.c:15 (set (cc0) (reg:HI 41 [ z.5 ])) 95 {tsthi} (expr_list:REG_DEAD (reg:HI 41 [ z.5 ]) (nil))) (jump_insn 10 9 25 2 test1.c:15 (set (pc) (if_then_else (lt (cc0) (const_int 0 [0x0])) (label_ref 12) (pc))) 109 {branch} (expr_list:REG_BR_PROB (const_int 2100 [0x834]) (nil))) ;; End of basic block 2 -> ( 4 3) ;; lr out 28 [r28] 32 [__SP_L__] 34 [argL] 44 ;; live out 28 [r28] 32 [__SP_L__] 34 [argL] 44 RTL after ce1: (note 2 3 5 2 NOTE_INSN_FUNCTION_BEG) (insn 5 2 6 2 test1.c:15 (set (reg:HI 43 [ z ]) (mem/c/i:HI (symbol_ref:HI ("z") <var_decl 0x7fe000b0 z>) [3 z+0 S2 A8])) 10 {*movhi} (nil)) (insn 6 5 7 2 test1.c:15 (set (reg:HI 41 [ z.5 ]) (plus:HI (reg:HI 43 [ z ]) (const_int 1 [0x1]))) 25 {addhi3} (expr_list:REG_DEAD (reg:HI 43 [ z ]) (nil))) (insn 7 6 8 2 test1.c:15 (set (mem/c/i:HI (symbol_ref:HI ("z") <var_decl 0x7fe000b0 z>) [3 z+0 S2 A8]) (reg:HI 41 [ z.5 ])) 10 {*movhi} (nil)) (insn 8 7 9 2 test1.c:15 (set (reg:HI 44) (const_int 0 [0x0])) 10 {*movhi} (nil)) (insn 9 8 35 2 test1.c:15 (set (cc0) (reg:HI 41 [ z.5 ])) 95 {tsthi} (expr_list:REG_DEAD (reg:HI 41 [ z.5 ]) (nil))) (insn 35 9 36 2 test1.c:15 (set (reg:QI 49) (subreg:QI (reg:HI 41 [ z.5 ]) 1)) 4 {*movqi} (nil)) (insn 36 35 37 2 test1.c:15 (set (reg:HI 48) (sign_extend:HI (reg:QI 49))) 84 {extendqihi2} (nil)) (insn 37 36 38 2 test1.c:15 (set (reg:HI 50) (not:HI (reg:HI 48))) 82 {one_cmplhi2} (nil)) (insn 38 37 32 2 test1.c:15 (set (reg:HI 44) (lshiftrt:HI (reg:HI 50) (const_int 15 [0xf]))) 71 {lshrhi3} (nil)) (insn 32 38 33 2 test1.c:16 (set (reg:QI 24 r24) (subreg:QI (reg:HI 44) 0)) 4 {*movqi} (nil)) (insn 33 32 23 2 test1.c:16 (set (reg:QI 25 r25 [+1 ]) (subreg:QI (reg:HI 44) 1)) 4 {*movqi} (expr_list:REG_DEAD (reg:HI 44) (nil))) Final (annotated) code: 22 /* prologue: frame size=0 */ 23 /* prologue end (size=0) */ 24 .LM2: 25 0000 8091 0000 lds r24,z 26 0004 9091 0000 lds r25,(z)+1 27 0008 0196 adiw r24,1 28 000a 9093 0000 sts (z)+1,r25 29 000e 8093 0000 sts z,r24 30 0012 892F mov r24,r25 31 0014 9927 clr r25 ;SIGN EXTEND 32 0016 87FD sbrc r24,7 ;ditto 33 0018 9095 com r25 ; ditto 34 001a 8095 com r24 ; ONE'SCOMPLEMENT 35 001c 9095 com r25 ; ditto 36 .LM3: 37 001e 8827 clr r24 ;LSHIFT 15 38 0020 990F lsl r25 ; ditto 39 0022 881F rol r24 ; ditto 40 0024 9927 clr r25 ; ditto 41 /* epilogue: frame size=0 */ 42 0026 0895 ret
Are previous versions any better? Is this a regression?
4.2.2 has same issue. Can someone benchmark -fno-if-conversion for AVR? Transforms are made independent of target. BRANCH_COST is already 0 for avr. Here is a related example. This time result is a power of two so it needs a more complicated shift. int z; int foo2(void) { if ( ++z >= 0) return 16; else return 0; } produces: 24 .LM2: 25 /*DEBUG: 0x0 0 12 */ 26 0000 8091 0000 lds r24,z 27 0004 9091 0000 lds r25,(z)+1 28 /*DEBUG: 0x4 4 8 */ 29 0008 0196 adiw r24,1 30 /*DEBUG: 0x5 1 12 */ 31 000a 9093 0000 sts (z)+1,r25 32 000e 8093 0000 sts z,r24 33 /*DEBUG: 0x9 4 16 */ 34 0012 292F mov r18,r25 35 0014 220F lsl r18 36 0016 330B sbc r19,r19 37 0018 2795 ror r18 38 /*DEBUG: 0xd 4 12 */ 39 001a 2095 com r18 40 001c 3095 com r19 41 /*DEBUG: 0xf 2 28 */ 42 001e 2227 clr r18 43 0020 330F lsl r19 44 0022 221F rol r18 45 0024 3327 clr r19 46 /*DEBUG: 0x13 4 44 */ 47 0026 2295 swap r18 48 0028 3295 swap r19 49 002a 307F andi r19,0xf0 50 002c 3227 eor r19,r18 51 002e 207F andi r18,0xf0 52 0030 3227 eor r19,r18 53 .LM3: 54 /*DEBUG: 0x19 6 4 */ 55 0032 C901 movw r24,r18 56 /* epilogue start */ 57 /*DEBUG: 0x1a 1 4 */ 58 0034 0895 ret If I played around with mode of operands I'm sure I could do worse.
I see same output like Andy in 4.6.1 and trunk. Also played around with avr_rtx_costs in trunk, but that does not help; only fix is -fno-if-conversion.
Created attachment 24691 [details] C Testcase pr36884.c This testcase shows the dis-optimization in 4.7. trunk ====================================================== Compiled with -O2 -S: swap: mov r18,r25 rol r18 clr r18 rol r18 ldi r19,lo8(0) clr __tmp_reg__ lsr r19 ror r18 ror __tmp_reg__ lsr r19 ror r18 ror __tmp_reg__ mov r19,r18 mov r18,__tmp_reg__ sbrc r25,6 ori r19,hi8(8192) .L3: sbrc r25,5 ori r19,hi8(16384) .L4: sbrc r25,4 ori r19,hi8(-32768) .L5: mov r24,r18 mov r25,r19 ret ====================================================== Compiled with -O2 -S -fno-if-conversion: swap: sbrc r25,7 rjmp .L6 ldi r18,lo8(0) ldi r19,hi8(0) .L2: sbrc r25,6 ori r19,hi8(8192) .L3: sbrc r25,5 ori r19,hi8(16384) .L4: sbrc r25,4 ori r19,hi8(-32768) .L5: mov r24,r18 mov r25,r19 ret .L6: ldi r18,lo8(64) ldi r19,hi8(64) rjmp .L2 ====================================================== The compiler does a lengthy extract and reuse of the MSB.
I had a look in ifcvt.c. Obviously, noce_try_sign_mask does it's work unconditionally and does not take into account costs. It appears that it assumes that cheap barrel shifter is available as on most 32-bit machines.
GCC 4.7.0 is being released, adjusting target milestone.
(In reply to comment #5) > Obviously, noce_try_sign_mask does it's work unconditionally and does > not take into account costs. It appears that it assumes that cheap > barrel shifter is available as on most 32-bit machines. This can probably be fixed using lshift_cheap_p. You should move lshift_cheap_p to optabs.c and use it in ifcvt.c to see if lshifts are cheap.
I think the biggest issue now is: ``` ;; _6 = _2 >= 0; (insn 10 7 11 (set (reg:HI 48) (not:HI (reg:HI 44 [ _2 ]))) "/app/example.cpp":6:3 -1 (nil)) (insn 11 10 12 (set (reg:HI 49) (lshiftrt:HI (reg:HI 48) (const_int 15 [0xf]))) "/app/example.cpp":6:3 -1 (nil)) (insn 12 11 0 (set (reg:QI 45 [ _6 ]) (subreg:QI (reg:HI 49) 0)) "/app/example.cpp":6:3 -1 (nil)) ``` If this was expanded instead using the bit select. I had noticed a similar thing in PR 109907 too.