Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 34737
Product:  
Component:  
Status: NEW
Resolution:
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Wouter van Gulik <wvangulik@xs4all.nl>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
main.c Test case showing the three cases text/plain 2008-01-11 08:17 168 bytes Edit
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 34737 depends on: Show dependency tree
Show dependency graph
Bug 34737 blocks:

Additional Comments:





Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2008-01-11 09:42 Opened: 2008-01-11 08:14
Consider the following:

char *x;
volatile int y;

void foo(char *p)
{
    y += *p;
}

void main(void)
{
    char *p1 = x;
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
    foo(p1++);
}

For the AVR target this will generate ugly code. Having a double saved variable
etc.

/* prologue: frame size=0 */
    push r14
    push r15
    push r16
    push r17
/* prologue end (size=4) */
    lds r24,x
    lds r25,(x)+1
    movw r16,r24
    subi r16,lo8(-(1))
    sbci r17,hi8(-(1))
    call foo
    movw r14,r16
    sec
    adc r14,__zero_reg__
    adc r15,__zero_reg__
    movw r24,r16
    call foo
    movw r16,r14
    subi r16,lo8(-(1))
    sbci r17,hi8(-(1))
    movw r24,r14
    call foo
etc..

The results gets much better when writing it like "foo(p); p++;"

/* prologue: frame size=0 */
        push r16
        push r17
/* prologue end (size=2) */
        movw r16,r24
        call foo
        subi r16,lo8(-(1))
        sbci r17,hi8(-(1))
        movw r24,r16
        call foo
        subi r16,lo8(-(1))
        sbci r17,hi8(-(1))

And the results get near optimal when using larger increments then the target
can add immediately ( >64). The compiler then adds the cumulative offset. Which
would be the most optimal case if also done for lower increments.

        movw r16,r24
        call foo
        movw r24,r16
        subi r24,lo8(-(65))
        sbci r25,hi8(-(65))
        call foo
        movw r24,r16
        subi r24,lo8(-(130))
        sbci r25,hi8(-(130))

This worst behaviour is shown for 4.1.2, 4.2.2, 4.3.0
Better results (still non-optimal) are with 3.4.6 and 3.3.6.
But 4.0.4 is producing the most optimal code for the original foo(p++)

Ugly code is also being seen for arm/thumb and pdp-11.
But good code for arm/arm

So it's a multi-target problem, not just the avr!

------- Comment #1 From Wouter van Gulik 2008-01-11 08:17 -------
Created an attachment (id=14920) [edit]
Test case showing the three cases

Compile using -fno-line.
For the AVR I used: avr-gcc -Wall -Os -fno-inline -mmcu=avr5 --save-temps
main.c

------- Comment #2 From Richard Guenther 2008-01-11 09:42 -------
Confirmed.

void foo(char *p);

void test1(char * p)
{
    foo(p++);
    foo(p++);
    foo(p++);
    foo(p++);
}

void test2(char * p)
{
    foo(p); p++;
    foo(p); p++;
    foo(p); p++;
    foo(p); p++;
}

The problem is with the first variant we have two registers life over each
function call, while with the second variant only one.  This can be seen
from the optimized tree-dump already:

test1 (p)
{
<bb 2>: 
  p_3 = p_1(D) + 1;
  foo (p_1(D));
  p_5 = p_3 + 1;
  foo (p_3);
  p_7 = p_5 + 1;
  foo (p_5);
  foo (p_7) [tail call];
  return;

}

test2 (p)
{
<bb 2>:
  foo (p_1(D));
  p_2 = p_1(D) + 1;
  foo (p_2);
  p_3 = p_2 + 1;
  foo (p_3);
  p_4 = p_3 + 1;
  foo (p_4) [tail call];
  return;

}

and is initially caused by gimplification which produces

  p.0 = p;
  p = p + 1;
  foo (p.0);

from

  foo (p++ );

no further pass undos this transformation.

With GCC 4.0 TER produced

  foo (p);
  foo (p + 1B);
  foo (p + 2B);
...

where we can generate good code from.  From 4.1 on this is no longer done.

------- Comment #3 From Andrew Pinski 2008-01-11 11:33 -------
No what happened with 4.0 is rather DOM would prop x+1 for each x.

Really this comes down to scheduling of instructions and moving them closer to
their usage.

------- Comment #4 From Steven Bosscher 2009-06-24 07:42 -------
Couldn't this be fixed also by changing the initial gimplification from:

  p.0 = p;
  p = p + 1;
  foo (p.0);

to:

  p.0 = p;
  foo (p.0);
  p = p + 1;

?

------- Comment #5 From rguenther@suse.de 2009-06-24 09:07 -------
Subject: Re:  Scheduling of post-modified function
 arguments is not good

On Wed, 24 Jun 2009, steven at gcc dot gnu dot org wrote:

> ------- Comment #4 from steven at gcc dot gnu dot org  2009-06-24 07:42 -------
> Couldn't this be fixed also by changing the initial gimplification from:
> 
>   p.0 = p;
>   p = p + 1;
>   foo (p.0);
> 
> to:
> 
>   p.0 = p;
>   foo (p.0);
>   p = p + 1;

Probably yes.

Richard.

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug