the following two equivalent functions are compiled into different asm-code. The bad thing is, that the more readable function (get_and_increment2) creates worse code. It is bigger and slower. This is because it uses one register more than the more optimized version get_and_increment1. struct IntPtr { int* m_ReadPtr; }; int get_and_increment1(struct IntPtr* i) { return *(i->m_ReadPtr++); } int get_and_increment2(struct IntPtr* i) { i->m_ReadPtr++; return *(i->m_ReadPtr - 1); } 00000000 <get_and_increment1>: 0: 81 23 00 00 lwz r9,0(r3) 4: 80 09 00 00 lwz r0,0(r9) 8: 39 29 00 04 addi r9,r9,4 c: 91 23 00 00 stw r9,0(r3) 10: 7c 03 03 78 mr r3,r0 14: 4e 80 00 20 blr 00000018 <get_and_increment2>: 18: 81 23 00 00 lwz r9,0(r3) 1c: 39 29 00 04 addi r9,r9,4 20: 91 23 00 00 stw r9,0(r3) 24: 80 69 ff fc lwz r3,-4(r9) 28: 4e 80 00 20 blr
Hmm, this works on the trunk with -O2 -mtune=cell which means this is a scheduling issue: [apinski@dhcp-10-98-10-216 ~]$ ~/gcc-mainline/bin/gcc -O2 -o - -S t.c -mtune=cell .file "t.c" .section ".text" .align 2 .p2align 3,,7 .globl get_and_increment1 .type get_and_increment1, @function get_and_increment1: lwz 9,0(3) addi 0,9,4 stw 0,0(3) lwz 3,0(9) blr .size get_and_increment1,.-get_and_increment1 .align 2 .p2align 3,,7 .globl get_and_increment2 .type get_and_increment2, @function get_and_increment2: lwz 9,0(3) addi 0,9,4 stw 0,0(3) lwz 3,0(9) blr .size get_and_increment2,.-get_and_increment2 .ident "GCC: (GNU) 4.4.0 20080810 (experimental) [trunk revision 138922]" .section .note.GNU-stack,"",@progbits
GCC now generates the same code for both, no matter what tuning. get_and_increment1: lwz 9,0(3) addi 10,9,4 stw 10,0(3) lwz 3,0(9) blr Closing as fixed.