Bug 36770 - PowerPC missed autoincrement opportunity
Summary: PowerPC missed autoincrement opportunity
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2008-07-09 14:56 UTC by Gunnar von Boehn
Modified: 2013-12-24 23:13 UTC (History)
7 users (show)

See Also:
Host: powerpc64-unknown-linux-gnu
Target: powerpc64-unknown-linux-gnu
Build: powerpc64-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2008-07-09 18:22:19


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gunnar von Boehn 2008-07-09 14:56:27 UTC
GCC fails to generate efficient code for basic pointer operations.

Please have a look at this example:
***
test.c:
register int * src asm("r15");

int test( ){
  src[1]=src[0];
  src++;
}

main(){
}

***

compile the above with gcc -S -O3 test.c

shows us the following ASM output:

test:
        mr 9,15
        addi 15,15,4
        lwz 0,0(9)
        stw 0,4(9)
        blr

compile with gcc -S -Os test.c
Gives this output
test:
        mr 9,15
        addi 15,15,4
        lwz 0,0(9)
        stw 0,4(9)
        blr


As you can see both -O3 and -Os produce the same output.
The generated output is far from optimal.

GCC generates for the simple pointer operation this code:
        mr 9,15
        addi 15,15,4
        lwz 0,0(9)
        stw 0,4(9)

But GCC should rather generate this:
        lwz 0,0(15)
        stwu 0,4(15)


Two of the four instructions are unneeded.
We've here code with literally thousands of unneeded instructions generated like this.


I very much hope that this information is helpful to you and that you can fix this.

Many thanks in advance

Gunnar von Boehn
Comment 1 Andrew Pinski 2008-07-09 18:22:19 UTC
forward-propagate is causing some of the issues as shown by:
int *test(int *a ){
  a[1]=a[0];
  a++;
  return a;
}
But using the register extension is causing the rest :).  I would recommend against using them in real code really, they are not that useful and cause too many issues in general.
Comment 2 Gunnar von Boehn 2008-07-10 09:18:51 UTC
(In reply to comment #1)
> forward-propagate is causing some of the issues as shown by:
> int *test2(int *a ){
>   a[1]=a[0];
>   a++;
>   return a;
> }

Your example creates the following ASM code:
test2:
        mr 9,3
        addi 3,3,4
        lwz 0,0(9)
        stw 0,4(9)
        blr

Correct would be:
test2:
        lwz 0,0(3)
        stwu 0,4(3)
        blr

Is you can see the created bad code is just the same.
This is independent of the register pinning.

Can I understand you comment a verification that the forward propagation is broken in GCC/PPC?


Kind regards

Gunnar von Boehn
Comment 3 Paolo Bonzini 2008-07-11 11:56:07 UTC
Yes, the code produced shows that something (probably fwprop, I trust Andrew though I'd like to see dumps) is turning the GIMPLE code

  temp = a[0];
  a[1] = temp;
  temp++;

into something harder to optimize.  It might be also a pass-ordering problem.
Comment 4 Paolo Bonzini 2008-07-18 13:41:47 UTC
auto-inc-dec should be taught about transforming

   a <- b + c
   ...
   *(b + c)

into

   a <- b
   ...
   *(a += c) pre
Comment 5 Paolo Bonzini 2008-07-18 14:13:26 UTC
Hmm, even that wouldn't restore the optimization.  The problem here is that there is another access via b, like

   a <- b + c
   *b
   *(b + c)

The bad placement of the first assignment (bad because doing it the other way round would have lower register pressure) in turn happens as soon as gimplification.
Comment 6 Steven Bosscher 2013-12-24 23:13:41 UTC
(In reply to Gunnar von Boehn from comment #2)
> Correct would be:
> test2:
>         lwz 0,0(3)
>         stwu 0,4(3)
>         blr
> 
> Is you can see the created bad code is just the same.
> This is independent of the register pinning.

At least with gcc 4.7.1 and gcc 4.9.0 (r206195) the register pinning
makes all the difference.

$ cat t.c
register int * src asm("r15");

int test1( ){
    src[1]=src[0];
    src++;
}

int *test2(int *a ){
    a[1]=a[0];
    a++;
    return a;
}

$ ./cc1 -quiet -O2 t.c
$ cat t.s
...
.L.test1:
        lwz 10,0(15)
        mr 9,15
        addi 15,15,4
        stw 10,4(9)
        blr
...
.L.test2:
        lwz 9,0(3)
        stwu 9,4(3)
        blr


This is basically the same as bug 44281.