This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/49865] New: Unneccessary reload causes small size regression from 4.6.1
- From: "sgunderson at bigfoot dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 27 Jul 2011 11:24:56 +0000
- Subject: [Bug target/49865] New: Unneccessary reload causes small size regression from 4.6.1
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49865
Summary: Unneccessary reload causes small size regression from
4.6.1
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: sgunderson@bigfoot.com
Target: i?86-*-*
Comparing 4.6.1 with gcc-snapshot from Debian:
gcc version 4.7.0 20110709 (experimental) [trunk revision 176106] (Debian
20110709-1)
Given this code:
fugl:~> cat test.cpp
#include <string.h>
class MyClass {
void func();
float f[1024];
int i;
};
void MyClass::func()
{
memset(f, 0, sizeof(f));
i = 0;
}
and compiling with
fugl:~> /usr/lib/gcc-snapshot/bin/g++ -Os -c test.cpp
g++ produces, according to objdump:
00000000 <_ZN7MyClass4funcEv>:
0: 55 push %ebp
1: 31 c0 xor %eax,%eax
3: 89 e5 mov %esp,%ebp
5: b9 00 04 00 00 mov $0x400,%ecx
a: 57 push %edi
b: 8b 7d 08 mov 0x8(%ebp),%edi
e: f3 ab rep stos %eax,%es:(%edi)
10: 8b 45 08 mov 0x8(%ebp),%eax
13: c7 80 00 10 00 00 00 movl $0x0,0x1000(%eax)
1a: 00 00 00
1d: 5f pop %edi
1e: 5d pop %ebp
1f: c3 ret
while 4.6.1 has a more efficient sequence:
00000000 <_ZN7MyClass4funcEv>:
0: 55 push %ebp
1: b9 00 04 00 00 mov $0x400,%ecx
6: 89 e5 mov %esp,%ebp
8: 31 c0 xor %eax,%eax
a: 8b 55 08 mov 0x8(%ebp),%edx
d: 57 push %edi
e: 89 d7 mov %edx,%edi
10: f3 ab rep stos %eax,%es:(%edi)
12: c7 82 00 10 00 00 00 movl $0x0,0x1000(%edx)
19: 00 00 00
1c: 5f pop %edi
1d: 5d pop %ebp
1e: c3 ret
It seems 4.6 is able to take a copy of the "this" pointer from a register
before the "rep stos" operation, which is one byte smaller than reloading it
from the stack when it needs to clear "i".
Of course, the _most_ efficient code sequence here would be doing the i = 0
before the memset, but I'm not sure if this is legal. However, eax should still
contain zero, so the mov could be done from eax instead of from a constant.