Empty destructor definition disables optimization.
Marc Glisse
marc.glisse@inria.fr
Wed Dec 21 07:46:00 GMT 2016
On Tue, 20 Dec 2016, Juan Cabrera wrote:
> I was reading the assembly output for a specific usecase of a custom
> optional<T> implementation I'm writing to see if the `optional<T>` abstraction
> would incur in more overhead than what I was expecting.
>
> This is the samallest and simplest sample code I came up with that illustrates
> the problem I'm having:
>
> #include <exception>
>
> class M
> {
> public:
> int value;
> bool valid;
>
> ~M() { }
>
> int get() const
> {
> if (valid)
> return value;
> std::terminate();
> }
> };
>
> M func_1();
> void func_2(int a);
>
> int main(int, char**)
> {
> const auto i = func_1();
> if (i.valid)
> {
> func_2(i.get());
> func_2(i.get());
> func_2(i.get());
> }
>
> return 0;
> }
>
> Now the issue is that the compiler checks `i.valid` to be true on each call to
> `i.get()` even though the enclosing `if` statement has already done so.
>
> This is the assembly output from g++ 6.2.1 (compiled with -O3)
>
> main:
> sub rsp, 24
> mov rdi, rsp
> call func_1()
> cmp BYTE PTR [rsp+4], 0
> jne .L7
> .L2:
> xor eax, eax
> add rsp, 24
> ret
> .L7:
> mov edi, DWORD PTR [rsp]
> call func_2(int)
> cmp BYTE PTR [rsp+4], 0
> mov edi, DWORD PTR [rsp]
> je .L3
> call func_2(int)
> cmp BYTE PTR [rsp+4], 0
> mov edi, DWORD PTR [rsp]
> je .L3
> call func_2(int)
> jmp .L2
> .L3:
> call std::terminate()
>
> The following slight modification produces better code:
>
> // class M ....
>
> M func_1();
> void func_2(...);
>
> int main(int, char**)
> {
> const auto i = func_1();
> if (i.valid)
> {
> func_2(i.get(), i.get(), i.get());
> }
>
> return 0;
> }
>
> --> (compiled with -O3)
>
> main:
> sub rsp, 24
> mov rdi, rsp
> call func_1()
> cmp BYTE PTR [rsp+4], 0
> je .L2
> mov edi, DWORD PTR [rsp]
> xor eax, eax
> mov edx, edi
> mov esi, edi
> call func_2(...)
> .L2:
> xor eax, eax
> add rsp, 24
> ret
>
> After doing a bunch of tests I found out that by removing the destructor
> definition from `M` the generated code for the first code example becomes:
>
> main:
> push rbx
> call func_1()
> mov rbx, rax
> shr rax, 32
> test al, al
> je .L2
> mov edi, ebx
> call func_2(int)
> mov edi, ebx
> call func_2(int)
> mov edi, ebx
> call func_2(int)
> .L2:
> xor eax, eax
> pop rbx
> ret
>
> That's basically what I was expecting.
> Of course removing the destructor definition is not an option for the the real
> code I'm working on (the `value` is inside a union, etc).
>
> Why is the compiling not able to optimize those comparison away?
> clang++ also generates basically the same code as g++ in all cases which makes
> me think I must be overlooking something important there.
Hello,
the hard part for the compiler is making sure that calling func2 cannot
modify i.valid. When you have a destructor, the calling convention uses
the return slot optimization, and gcc considers that the variable i
escapes at that point. Without the destructor, the class is returned in a
register and copied into a purely local i, which func2 has no way of
modifying.
M* bad;
M func1(){
M ret;
bad = &ret; // magic: this is the address of i!
return ret;
}
void func2(int){
bad->valid = false;
}
I don't know if gcc could consider such code illegal (in the middle-end,
so also in all languages, not just C++), you could search in bugzilla if
there were previous requests, and if not file one to get a definitive
answer.
--
Marc Glisse
More information about the Gcc-help
mailing list