This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
MIPS GCC always generates outline memcpy when optimizing for size?
- From: Anders Montonen <Anders dot Montonen at iki dot fi>
- To: gcc-help at gcc dot gnu dot org
- Date: Sun, 2 Feb 2014 14:19:52 +0200
- Subject: MIPS GCC always generates outline memcpy when optimizing for size?
- Authentication-results: sourceware.org; auth=none
Hi,
It seems that GCC configured for MIPS will always generate a call to memcpy when optimizing for size, is this expected behaviour? I have encountered this both with a self-built GCC 4.8.1 (configured for mipsel-sde-elf) and Microchip's XC32 compiler, which is based on GCC 4.5.2.
Here's what I get when building the following code with GCC 4.8.1 and -march=m4k:
#include <stdint.h>
uint32_t foo(const uint8_t *pA)
{
uint32_t sum = 0;
uint32_t tmp, ii;
for (ii = 0; ii < 256; ii++)
{
__builtin_memcpy(&tmp, &pA[ii*sizeof(tmp)], sizeof(tmp));
sum += tmp;
}
return sum;
}
With -O1, the following is generated:
00000000 <foo>:
0: 27bdfff8 addiu sp,sp,-8
4: 00002821 move a1,zero
8: 00001021 move v0,zero
c: 24070400 li a3,1024
10: 00851821 addu v1,a0,a1
14: 88660003 lwl a2,3(v1)
18: 98660000 lwr a2,0(v1)
1c: afa60000 sw a2,0(sp)
20: 24a50004 addiu a1,a1,4
24: 14a7fffa bne a1,a3,10 <foo+0x10>
28: 00461021 addu v0,v0,a2
2c: 03e00008 jr ra
30: 27bd0008 addiu sp,sp,8
But with -Os, I get this:
00000000 <foo>:
0: 27bdffd0 addiu sp,sp,-48
4: afb30028 sw s3,40(sp)
8: afb20024 sw s2,36(sp)
c: afb10020 sw s1,32(sp)
10: afb0001c sw s0,28(sp)
14: afbf002c sw ra,44(sp)
18: 00809821 move s3,a0
1c: 00008021 move s0,zero
20: 00008821 move s1,zero
24: 24120400 li s2,1024
28: 02702821 addu a1,s3,s0
2c: 27a40010 addiu a0,sp,16
30: 0c000000 jal 0 <foo>
34: 24060004 li a2,4
38: 8fa20010 lw v0,16(sp)
3c: 26100004 addiu s0,s0,4
40: 1612fff9 bne s0,s2,28 <foo+0x28>
44: 02228821 addu s1,s1,v0
48: 8fbf002c lw ra,44(sp)
4c: 02201021 move v0,s1
50: 8fb30028 lw s3,40(sp)
54: 8fb20024 lw s2,36(sp)
58: 8fb10020 lw s1,32(sp)
5c: 8fb0001c lw s0,28(sp)
60: 03e00008 jr ra
64: 27bd0030 addiu sp,sp,48
-O2 produces identical code to -O1, modulo allocated registers and scheduling. As a sidenote, the store of tmp to the stack is unnecessary and could be optimized away.
Regards,
Anders Montonen
(I am not subscribed to the list, so please cc me)