This is GCC Bugzilla
This is GCC Bugzilla Version 2.20+
View Bug Activity | Format For Printing | Clone This Bug
Consider the following: char *x; volatile int y; void foo(char *p) { y += *p; } void main(void) { char *p1 = x; foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); } For the AVR target this will generate ugly code. Having a double saved variable etc. /* prologue: frame size=0 */ push r14 push r15 push r16 push r17 /* prologue end (size=4) */ lds r24,x lds r25,(x)+1 movw r16,r24 subi r16,lo8(-(1)) sbci r17,hi8(-(1)) call foo movw r14,r16 sec adc r14,__zero_reg__ adc r15,__zero_reg__ movw r24,r16 call foo movw r16,r14 subi r16,lo8(-(1)) sbci r17,hi8(-(1)) movw r24,r14 call foo etc.. The results gets much better when writing it like "foo(p); p++;" /* prologue: frame size=0 */ push r16 push r17 /* prologue end (size=2) */ movw r16,r24 call foo subi r16,lo8(-(1)) sbci r17,hi8(-(1)) movw r24,r16 call foo subi r16,lo8(-(1)) sbci r17,hi8(-(1)) And the results get near optimal when using larger increments then the target can add immediately ( >64). The compiler then adds the cumulative offset. Which would be the most optimal case if also done for lower increments. movw r16,r24 call foo movw r24,r16 subi r24,lo8(-(65)) sbci r25,hi8(-(65)) call foo movw r24,r16 subi r24,lo8(-(130)) sbci r25,hi8(-(130)) This worst behaviour is shown for 4.1.2, 4.2.2, 4.3.0 Better results (still non-optimal) are with 3.4.6 and 3.3.6. But 4.0.4 is producing the most optimal code for the original foo(p++) Ugly code is also being seen for arm/thumb and pdp-11. But good code for arm/arm So it's a multi-target problem, not just the avr!
Created an attachment (id=14920) [edit] Test case showing the three cases Compile using -fno-line. For the AVR I used: avr-gcc -Wall -Os -fno-inline -mmcu=avr5 --save-temps main.c
Confirmed. void foo(char *p); void test1(char * p) { foo(p++); foo(p++); foo(p++); foo(p++); } void test2(char * p) { foo(p); p++; foo(p); p++; foo(p); p++; foo(p); p++; } The problem is with the first variant we have two registers life over each function call, while with the second variant only one. This can be seen from the optimized tree-dump already: test1 (p) { <bb 2>: p_3 = p_1(D) + 1; foo (p_1(D)); p_5 = p_3 + 1; foo (p_3); p_7 = p_5 + 1; foo (p_5); foo (p_7) [tail call]; return; } test2 (p) { <bb 2>: foo (p_1(D)); p_2 = p_1(D) + 1; foo (p_2); p_3 = p_2 + 1; foo (p_3); p_4 = p_3 + 1; foo (p_4) [tail call]; return; } and is initially caused by gimplification which produces p.0 = p; p = p + 1; foo (p.0); from foo (p++ ); no further pass undos this transformation. With GCC 4.0 TER produced foo (p); foo (p + 1B); foo (p + 2B); ... where we can generate good code from. From 4.1 on this is no longer done.
No what happened with 4.0 is rather DOM would prop x+1 for each x. Really this comes down to scheduling of instructions and moving them closer to their usage.
Couldn't this be fixed also by changing the initial gimplification from: p.0 = p; p = p + 1; foo (p.0); to: p.0 = p; foo (p.0); p = p + 1; ?
Subject: Re: Scheduling of post-modified function arguments is not good On Wed, 24 Jun 2009, steven at gcc dot gnu dot org wrote: > ------- Comment #4 from steven at gcc dot gnu dot org 2009-06-24 07:42 ------- > Couldn't this be fixed also by changing the initial gimplification from: > > p.0 = p; > p = p + 1; > foo (p.0); > > to: > > p.0 = p; > foo (p.0); > p = p + 1; Probably yes. Richard.