Bug 36693 - missed optimization for pointer access with offset on powerpc
Summary: missed optimization for pointer access with offset on powerpc
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.2.3
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2008-07-02 10:20 UTC by René Bürgel
Modified: 2015-11-22 17:37 UTC (History)
2 users (show)

See Also:
Host:
Target: powerpc-linux-uclibc
Build:
Known to work:
Known to fail: 4.1.2, 4.2.3, 4.3.1
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description René Bürgel 2008-07-02 10:20:56 UTC
the following two equivalent functions are compiled into different asm-code. The bad thing is, that the more readable function (get_and_increment2) creates worse code. It is bigger and slower. This is because it uses one register more than the more optimized version get_and_increment1.


struct IntPtr
{
        int* m_ReadPtr;
};

int get_and_increment1(struct IntPtr* i)
{
        return *(i->m_ReadPtr++);
}

int get_and_increment2(struct IntPtr* i)
{
        i->m_ReadPtr++;
        return *(i->m_ReadPtr - 1);
}


00000000 <get_and_increment1>:
   0:   81 23 00 00     lwz     r9,0(r3)
   4:   80 09 00 00     lwz     r0,0(r9)
   8:   39 29 00 04     addi    r9,r9,4
   c:   91 23 00 00     stw     r9,0(r3)
  10:   7c 03 03 78     mr      r3,r0
  14:   4e 80 00 20     blr

00000018 <get_and_increment2>:
  18:   81 23 00 00     lwz     r9,0(r3)
  1c:   39 29 00 04     addi    r9,r9,4
  20:   91 23 00 00     stw     r9,0(r3)
  24:   80 69 ff fc     lwz     r3,-4(r9)
  28:   4e 80 00 20     blr
Comment 1 Andrew Pinski 2008-08-11 01:31:23 UTC
Hmm, this works on the trunk with -O2 -mtune=cell which means this is a scheduling issue:
[apinski@dhcp-10-98-10-216 ~]$ ~/gcc-mainline/bin/gcc -O2 -o - -S t.c -mtune=cell
        .file   "t.c"
        .section        ".text"
        .align 2
        .p2align 3,,7
        .globl get_and_increment1
        .type   get_and_increment1, @function
get_and_increment1:
        lwz 9,0(3)
        addi 0,9,4
        stw 0,0(3)
        lwz 3,0(9)
        blr
        .size   get_and_increment1,.-get_and_increment1
        .align 2
        .p2align 3,,7
        .globl get_and_increment2
        .type   get_and_increment2, @function
get_and_increment2:
        lwz 9,0(3)
        addi 0,9,4
        stw 0,0(3)
        lwz 3,0(9)
        blr
        .size   get_and_increment2,.-get_and_increment2
        .ident  "GCC: (GNU) 4.4.0 20080810 (experimental) [trunk revision 138922]"
        .section        .note.GNU-stack,"",@progbits
Comment 2 Segher Boessenkool 2015-11-22 17:37:53 UTC
GCC now generates the same code for both, no matter what tuning.

get_and_increment1:
        lwz 9,0(3)
        addi 10,9,4
        stw 10,0(3)
        lwz 3,0(9)
        blr

Closing as fixed.