This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/46854] PowerPC optimization regression


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46854

--- Comment #4 from joakim.tjernlund at transmode dot se <joakim.tjernlund at transmode dot se> 2010-12-09 18:23:59 UTC ---
Here is the copy an an earlier mail I sent to the list in November:

Using gcc 4.4.4 -Os on
loop(long *to, long *from, long len)
{
    for (; len; --len)
        *++to = *++from;
}
I get
/* gcc 4.4.4 -Os
loop:
        addi 5,5,1
        li 9,0
        mtctr 5
        b .L2
.L3:
        lwzx 0,4,9
        stwx 0,3,9
.L2:
        addi 9,9,4
        bdnz .L3
        blr
 */

gcc 3.4.6 has:
/* gcc 3.4.6 -Os
loop:
        mr. 0,5
        mtctr 0
        beqlr- 0
.L8:
        lwzu 0,4(4)
        stwu 0,4(3)
        bdnz .L8
        blr
 */

It doesn't matter which cpu type I use. It seems impossible
to make gcc produce small/faster code with newer gcc.

Perhaps lwzx/stwx is faster on bigger Power cpus but this
can't be true for all cpus, can it?
That should matter though because I asked gcc to produce smaller
code with -Os


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]