[Bug tree-optimization/81611] [8 Regression] gcc un-learned loop / post-increment optimization
gjl at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Jan 25 11:39:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81611
--- Comment #19 from Georg-Johann Lay <gjl at gcc dot gnu.org> ---
Hi, thanks for all that work and efforts.
I tried that patch for the following small test:
extern void foo (void);
extern char volatile vv;
void func2 (const int *p)
{
while (1)
{
int var = *p++;
if (var == 10)
return foo();
if (var == 0)
break;
}
}
void func3 (const int *p, const __flash char *f)
{
while (1)
{
int var = *p++;
if (var == 10)
return foo();
vv = *f++;
if (!vv)
break;
}
}
$ avr-gcc -Os -mmcu=avr5 inc.c -S -dp
Unfortunately, the code is still quote sub-optimal, in particular due to
reg-reg moves all over the place, apart from missing post-inc opportunities.
For example, func3 compiles as follows:
func3:
.L7:
movw r20,r24 ; 37 [c=4 l=1] *movhi/0
subi r20,-2 ; 9 [c=4 l=2] addhi3_clobber/1
sbci r21,-1
movw r30,r24 ; 38 [c=4 l=1] *movhi/0
ld r24,Z ; 10 [c=8 l=2] *movhi/2
ldd r25,Z+1
sbiw r24,10 ; 11 [c=12 l=1] cmphi3/5
brne .L6 ; 12 [c=16 l=1] branch
jmp foo ; 14 [c=0 l=2] call_insn/3
.L8:
movw r22,r26 ; 5 [c=4 l=1] *movhi/0
rjmp .L7 ; 46 [c=4 l=1] jump
.L6:
movw r26,r22 ; 39 [c=4 l=1] *movhi/0
adiw r26,1 ; 18 [c=4 l=1] addhi3_clobber/0
movw r30,r22 ; 40 [c=4 l=1] *movhi/0
lpm r24,Z ; 19 [c=4 l=1] movqi_insn/3
sts vv,r24 ; 20 [c=4 l=2] movqi_insn/2
lds r18,vv ; 21 [c=4 l=2] movqi_insn/3
movw r24,r20 ; 22 [c=4 l=1] *movhi/0
cpse r18,__zero_reg__ ; 24 [c=0 l=1] enable_interrupt-3
rjmp .L8
ret ; 43 [c=0 l=1] return
In particular, moving values back and forth and bad register selection is a
common and well known annoyance (insns 37, 38, 5, 39, 40, 22).
Just to give an impression of optimal code, which would read something like:
func3:
;; Use Z=r30/31 for F. LPM can only use indirect and
;; post-inc with Z.
movw r30, r22
;; Use X=r26/27 for P. X register can only use indirect and
;; post-inc addressing, which is fine for that purpose.
movw r26, r24
.L7:
;; var = *p++
ld r24,X+
ld r25,X+
;; var == 10 ?
sbiw r24,10
brne .L6
jmp foo
.L6:
;; vv = *f++
lpm r24,Z+
sts vv,r24
;; if (!vv) break
lds r24,vv
cpse r24,__zero_reg__
rjmp .L7
ret
If uses 12 instructions instead of 12, operates faster (usually focus is on
code size) and has a register footprint of 6 whereas gcc needs 12.
More information about the Gcc-bugs
mailing list