36884 – ifcvt poor optimization

Bug 36884 - ifcvt poor optimization

Summary: ifcvt poor optimization

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.3.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2008-07-20 19:44 UTC by Andy Hutchinson
Modified:	2023-05-26 01:18 UTC (History)
CC List:	4 users (show)

See Also:	109907
Host:
Target:	avr-unknown-none
Build:
Known to work:
Known to fail:	4.6.1, 4.7.0
Last reconfirmed:	2011-07-05 19:13:14

Attachments
C Testcase pr36884.c (111 bytes, text/plain) 2011-07-05 13:20 UTC, Georg-Johann Lay	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andy Hutchinson 2008-07-20 19:44:48 UTC

If conversion is causing extraordinary bad code for AVR.
This occurs on 4.3 4.4 is no better.

Testcase:

int z;

int foo2(void)
{
  return ( ++z >= 0);
}

Conversion replaces if using store flag. However, there appears to be no account of relative costs and rather bad interactions with mode narrowing and widening

After conversion we have:

z++
Extract upper byte of 'z'
Sign extend
One complement
Shift right 15


RTL prior to ce1 pass:

;; Pred edge  ENTRY [100.0%]  (fallthru)
(note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)

(insn 5 2 6 2 test1.c:15 (set (reg:HI 43 [ z ])
        (mem/c/i:HI (symbol_ref:HI ("z") <var_decl 0x7fe000b0 z>) [3 z+0 S2 A8])) 10 {*movhi} (nil))

(insn 6 5 7 2 test1.c:15 (set (reg:HI 41 [ z.5 ])
        (plus:HI (reg:HI 43 [ z ])
            (const_int 1 [0x1]))) 25 {addhi3} (expr_list:REG_DEAD (reg:HI 43 [ z ])
        (nil)))

(insn 7 6 8 2 test1.c:15 (set (mem/c/i:HI (symbol_ref:HI ("z") <var_decl 0x7fe000b0 z>) [3 z+0 S2 A8])
        (reg:HI 41 [ z.5 ])) 10 {*movhi} (nil))

(insn 8 7 9 2 test1.c:15 (set (reg:HI 44)
        (const_int 0 [0x0])) 10 {*movhi} (nil))

(insn 9 8 10 2 test1.c:15 (set (cc0)
        (reg:HI 41 [ z.5 ])) 95 {tsthi} (expr_list:REG_DEAD (reg:HI 41 [ z.5 ])
        (nil)))

(jump_insn 10 9 25 2 test1.c:15 (set (pc)
        (if_then_else (lt (cc0)
                (const_int 0 [0x0]))
            (label_ref 12)
            (pc))) 109 {branch} (expr_list:REG_BR_PROB (const_int 2100 [0x834])
        (nil)))
;; End of basic block 2 -> ( 4 3)
;; lr  out 	 28 [r28] 32 [__SP_L__] 34 [argL] 44
;; live  out 	 28 [r28] 32 [__SP_L__] 34 [argL] 44





RTL after ce1:

(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)

(insn 5 2 6 2 test1.c:15 (set (reg:HI 43 [ z ])
        (mem/c/i:HI (symbol_ref:HI ("z") <var_decl 0x7fe000b0 z>) [3 z+0 S2 A8])) 10 {*movhi} (nil))

(insn 6 5 7 2 test1.c:15 (set (reg:HI 41 [ z.5 ])
        (plus:HI (reg:HI 43 [ z ])
            (const_int 1 [0x1]))) 25 {addhi3} (expr_list:REG_DEAD (reg:HI 43 [ z ])
        (nil)))

(insn 7 6 8 2 test1.c:15 (set (mem/c/i:HI (symbol_ref:HI ("z") <var_decl 0x7fe000b0 z>) [3 z+0 S2 A8])
        (reg:HI 41 [ z.5 ])) 10 {*movhi} (nil))

(insn 8 7 9 2 test1.c:15 (set (reg:HI 44)
        (const_int 0 [0x0])) 10 {*movhi} (nil))

(insn 9 8 35 2 test1.c:15 (set (cc0)
        (reg:HI 41 [ z.5 ])) 95 {tsthi} (expr_list:REG_DEAD (reg:HI 41 [ z.5 ])
        (nil)))

(insn 35 9 36 2 test1.c:15 (set (reg:QI 49)
        (subreg:QI (reg:HI 41 [ z.5 ]) 1)) 4 {*movqi} (nil))

(insn 36 35 37 2 test1.c:15 (set (reg:HI 48)
        (sign_extend:HI (reg:QI 49))) 84 {extendqihi2} (nil))

(insn 37 36 38 2 test1.c:15 (set (reg:HI 50)
        (not:HI (reg:HI 48))) 82 {one_cmplhi2} (nil))

(insn 38 37 32 2 test1.c:15 (set (reg:HI 44)
        (lshiftrt:HI (reg:HI 50)
            (const_int 15 [0xf]))) 71 {lshrhi3} (nil))

(insn 32 38 33 2 test1.c:16 (set (reg:QI 24 r24)
        (subreg:QI (reg:HI 44) 0)) 4 {*movqi} (nil))

(insn 33 32 23 2 test1.c:16 (set (reg:QI 25 r25 [+1 ])
        (subreg:QI (reg:HI 44) 1)) 4 {*movqi} (expr_list:REG_DEAD (reg:HI 44)
        (nil)))



Final (annotated) code:


  22               	/* prologue: frame size=0 */
  23               	/* prologue end (size=0) */
  24               	.LM2:
  25 0000 8091 0000 		lds r24,z
  26 0004 9091 0000 		lds r25,(z)+1
  27 0008 0196      		adiw r24,1
  28 000a 9093 0000 		sts (z)+1,r25
  29 000e 8093 0000 		sts z,r24
  30 0012 892F      		mov r24,r25
  31 0014 9927      		clr r25  ;SIGN EXTEND
  32 0016 87FD      		sbrc r24,7 ;ditto
  33 0018 9095      		com r25 ; ditto
  34 001a 8095      		com r24 ; ONE'SCOMPLEMENT
  35 001c 9095      		com r25 ; ditto
  36               	.LM3:
  37 001e 8827      		clr r24 ;LSHIFT 15
  38 0020 990F      		lsl r25 ; ditto
  39 0022 881F      		rol r24 ; ditto
  40 0024 9927      		clr r25 ; ditto
  41               	/* epilogue: frame size=0 */
  42 0026 0895      		ret

Comment 1 Eric Weddington 2008-07-20 20:01:44 UTC

Are previous versions any better? Is this a regression?

Comment 2 Andy Hutchinson 2008-07-22 02:37:40 UTC

4.2.2 has same issue.

Can someone benchmark -fno-if-conversion for AVR?

Transforms are made independent of target. BRANCH_COST is already 0 for avr.

Here is a related example. This time result  is a power of two so it needs a more complicated shift.

int z;

int foo2(void)
{
  if ( ++z >= 0)
  return 16;
  else
  return 0;
  
}

produces:
  24               	.LM2:
  25               	/*DEBUG: 0x0		0	12 */
  26 0000 8091 0000 		lds r24,z
  27 0004 9091 0000 		lds r25,(z)+1
  28               	/*DEBUG: 0x4		4	8 */
  29 0008 0196      		adiw r24,1
  30               	/*DEBUG: 0x5		1	12 */
  31 000a 9093 0000 		sts (z)+1,r25
  32 000e 8093 0000 		sts z,r24
  33               	/*DEBUG: 0x9		4	16 */
  34 0012 292F      		mov r18,r25
  35 0014 220F      		lsl r18
  36 0016 330B      		sbc r19,r19
  37 0018 2795      		ror r18
  38               	/*DEBUG: 0xd		4	12 */
  39 001a 2095      		com r18
  40 001c 3095      		com r19
  41               	/*DEBUG: 0xf		2	28 */
  42 001e 2227      		clr r18
  43 0020 330F      		lsl r19
  44 0022 221F      		rol r18
  45 0024 3327      		clr r19
  46               	/*DEBUG: 0x13		4	44 */
  47 0026 2295      		swap r18
  48 0028 3295      		swap r19
  49 002a 307F      		andi r19,0xf0
  50 002c 3227      		eor r19,r18
  51 002e 207F      		andi r18,0xf0
  52 0030 3227      		eor r19,r18
  53               	.LM3:
  54               	/*DEBUG: 0x19		6	4 */
  55 0032 C901      		movw r24,r18
  56               	/* epilogue start */
  57               	/*DEBUG: 0x1a		1	4 */
  58 0034 0895      		ret

If I played around with mode of operands I'm sure I could do worse.

Comment 3 Georg-Johann Lay 2011-06-29 19:13:14 UTC

I see same output like Andy in 4.6.1 and trunk.

Also played around with avr_rtx_costs in trunk, but that does not help; only fix is -fno-if-conversion.

Comment 4 Georg-Johann Lay 2011-07-05 13:20:15 UTC

Created attachment 24691 [details]
C Testcase pr36884.c

This testcase shows the dis-optimization in 4.7. trunk

======================================================
Compiled with -O2 -S:

swap:
	mov r18,r25
	rol r18
	clr r18
	rol r18
	ldi r19,lo8(0)
	clr __tmp_reg__
	lsr r19
	ror r18
	ror __tmp_reg__
	lsr r19
	ror r18
	ror __tmp_reg__
	mov r19,r18
	mov r18,__tmp_reg__
	sbrc r25,6
	ori r19,hi8(8192)
.L3:
	sbrc r25,5
	ori r19,hi8(16384)
.L4:
	sbrc r25,4
	ori r19,hi8(-32768)
.L5:
	mov r24,r18
	mov r25,r19
	ret

======================================================
Compiled with -O2 -S -fno-if-conversion:

swap:
	sbrc r25,7
	rjmp .L6
	ldi r18,lo8(0)
	ldi r19,hi8(0)
.L2:
	sbrc r25,6
	ori r19,hi8(8192)
.L3:
	sbrc r25,5
	ori r19,hi8(16384)
.L4:
	sbrc r25,4
	ori r19,hi8(-32768)
.L5:
	mov r24,r18
	mov r25,r19
	ret
.L6:
	ldi r18,lo8(64)
	ldi r19,hi8(64)
	rjmp .L2

======================================================

The compiler does a lengthy extract and reuse of the MSB.

Comment 5 Georg-Johann Lay 2011-07-05 13:22:53 UTC

I had a look in ifcvt.c.

Obviously, noce_try_sign_mask does it's work unconditionally and does not take into account costs.  It appears that it assumes that cheap barrel shifter is available as on most 32-bit machines.

Comment 6 Richard Biener 2012-03-22 08:26:19 UTC

GCC 4.7.0 is being released, adjusting target milestone.

Comment 7 Steven Bosscher 2012-10-15 07:47:27 UTC

(In reply to comment #5)
> Obviously, noce_try_sign_mask does it's work unconditionally and does
> not take into account costs.  It appears that it assumes that cheap 
> barrel shifter is available as on most 32-bit machines.

This can probably be fixed using lshift_cheap_p.  You should move 
lshift_cheap_p to optabs.c and use it in ifcvt.c to see if lshifts
are cheap.

Comment 8 Andrew Pinski 2023-05-26 01:18:38 UTC

I think the biggest issue now is:
```
;; _6 = _2 >= 0;

(insn 10 7 11 (set (reg:HI 48)
        (not:HI (reg:HI 44 [ _2 ]))) "/app/example.cpp":6:3 -1
     (nil))

(insn 11 10 12 (set (reg:HI 49)
        (lshiftrt:HI (reg:HI 48)
            (const_int 15 [0xf]))) "/app/example.cpp":6:3 -1
     (nil))

(insn 12 11 0 (set (reg:QI 45 [ _6 ])
        (subreg:QI (reg:HI 49) 0)) "/app/example.cpp":6:3 -1
     (nil))

```

If this was expanded instead using the bit select. 
I had noticed a similar thing in PR 109907 too.