This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Missed optimization opportunity


I've come across an issue when working on a smart pointer implementation. Gcc does not seem to propagate constants enough, missing some optimization opportunities. I don't think that this issue is specific to smart pointers, so there might be other cases when gcc generates suboptimal code.

Attached a simple test case. The smart pointer here is a unique pointer, always only a single instance holds a raw pointer to the resource. The deletion can be customized through a policy class. In main(), I allocate an int, then pass it through several smart pointers. At the end, the last smart pointer holds the raw pointer to the allocated memory.

Compiled as:
g++ -g -O3 -o gccoptbug.o -c gccoptbug.cpp
g++ -o gccoptbug gccoptbug.o

The generated code on AMD64 looks like this:
0x00000000004004d0 <+0>: sub $0x8,%rsp
0x00000000004004d4 <+4>: mov $0x4,%edi
0x00000000004004d9 <+9>: callq 0x4004c0 <_Znwm@plt> ; operator new
0x00000000004004de <+14>: mov %rax,%rdi
0x00000000004004e1 <+17>: callq 0x4004a0 <_ZdlPv@plt> ; operator delete
0x00000000004004e6 <+22>: xor %edi,%edi
0x00000000004004e8 <+24>: callq 0x4004a0 <_ZdlPv@plt>
0x00000000004004ed <+29>: xor %edi,%edi
0x00000000004004ef <+31>: callq 0x4004a0 <_ZdlPv@plt>
0x00000000004004f4 <+36>: xor %edi,%edi
0x00000000004004f6 <+38>: callq 0x4004a0 <_ZdlPv@plt>
0x00000000004004fb <+43>: xor %edi,%edi
0x00000000004004fd <+45>: callq 0x4004a0 <_ZdlPv@plt>
0x0000000000400502 <+50>: xor %eax,%eax
0x0000000000400504 <+52>: add $0x8,%rsp
0x0000000000400508 <+56>: retq

The allocated memory is freed, then op delete is called four times with a 0 pointer. The dtor and the called deleter fn was inlined. So far so good.

If I modify the deleter policy to call op delete only when the pointer is not zero (#if 1 at line 6), the generated code changes to:
0x00000000004004d0 <+0>: sub $0x58,%rsp
0x00000000004004d4 <+4>: mov $0x4,%edi
0x00000000004004d9 <+9>: callq 0x4004c0 <_Znwm@plt>
0x00000000004004de <+14>: lea 0x40(%rsp),%rdi
0x00000000004004e3 <+19>: mov %rax,0x40(%rsp)
0x00000000004004e8 <+24>: movq $0x0,(%rsp)
0x00000000004004f0 <+32>: movq $0x0,0x10(%rsp)
0x00000000004004f9 <+41>: movq $0x0,0x20(%rsp)
0x0000000000400502 <+50>: movq $0x0,0x30(%rsp)
0x000000000040050b <+59>: callq 0x400630 <Ptr<int, Deleter<int> >::~Ptr()>
0x0000000000400510 <+64>: lea 0x30(%rsp),%rdi
0x0000000000400515 <+69>: callq 0x400630 <Ptr<int, Deleter<int> >::~Ptr()>
0x000000000040051a <+74>: lea 0x20(%rsp),%rdi
0x000000000040051f <+79>: callq 0x400630 <Ptr<int, Deleter<int> >::~Ptr()>
0x0000000000400524 <+84>: lea 0x10(%rsp),%rdi
0x0000000000400529 <+89>: callq 0x400630 <Ptr<int, Deleter<int> >::~Ptr()>
0x000000000040052e <+94>: mov %rsp,%rdi
0x0000000000400531 <+97>: callq 0x400630 <Ptr<int, Deleter<int> >::~Ptr()>
0x0000000000400536 <+102>: xor %eax,%eax
0x0000000000400538 <+104>: add $0x58,%rsp
0x000000000040053c <+108>: retq

Instead of eliminating the calls to op delete, the actual smart ptr objects appear on the stack, and the dtor is not inlined anymore.

gcc 4.4 and 4.5 optimizes as expected:
   0x0000000000400640 <+0>:    sub    $0x8,%rsp
   0x0000000000400644 <+4>:    mov    $0x4,%edi
   0x0000000000400649 <+9>:    callq  0x400540 <_Znwm@plt>
   0x000000000040064e <+14>:    test   %rax,%rax
   0x0000000000400651 <+17>:    je     0x40065b <main()+27>
   0x0000000000400653 <+19>:    mov    %rax,%rdi
   0x0000000000400656 <+22>:    callq  0x400510 <_ZdlPv@plt>
   0x000000000040065b <+27>:    xor    %eax,%eax
   0x000000000040065d <+29>:    add    $0x8,%rsp
   0x0000000000400661 <+33>:    retq

4.6 and 4.7 (r182889) generates the suboptimal code as above.

I've checked bugzilla, and #46076 ( is related, I guess. There, Jan Hubicka 2010-10-19 03:20:48 UTC writes that main() is optimized for size. To check this, I've added foo() to the test case, and it is optimized correctly w/ 4.6 and 4.7. Moreover, -Os produces the same foo() and main() functions. However, the size optimized version is more than 3 times as large as the other one. Is this normal?

Regards, Peter

template<typename T>
struct Deleter
	static void Delete(T* p_) 
#if 0 // if enabled, Delete() is not inlined
		if (p_)
	 		delete p_; 

template<typename T, class D = Deleter<T> >
class Ptr
	Ptr() :	m_ptr(0)

	Ptr(T* p_) : m_ptr(p_)

	Ptr(const Ptr& p_) : m_ptr(p_.Forget())


	T* Forget() const
		T* s = m_ptr;
		m_ptr = 0;
		return s;

	mutable T*	m_ptr;

int main()
	typedef Ptr<int> MyPtr;

	MyPtr p0 = new int;
	MyPtr p1 = p0;
	MyPtr p2 = p1;
	MyPtr p3 = p2;
	MyPtr p4 = p3;

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]