Missed optimization wrt. constructor clobbers?
Avi Kivity
avi@scylladb.com
Wed Dec 7 11:02:00 GMT 2016
On 12/07/2016 12:47 AM, Marc Glisse wrote:
> On Tue, 6 Dec 2016, Avi Kivity wrote:
>
>> Consider the following code
>>
>>
>> === begin code ===
>>
>> #include <experimental/optional>
>>
>> using namespace std::experimental;
>>
>> struct array_of_optional {
>> optional<int> v[100];
>> };
>>
>> array_of_optional
>> f(const array_of_optional& a) {
>> return a;
>> }
>>
>> === end code ===
>>
>>
>> Compiling with -O3 (6.2.1), I get:
>>
>>
>> 0000000000000000 <f(array_of_optional const&)>:
>> 0: 48 8d 8f 20 03 00 00 lea 0x320(%rdi),%rcx
>> 7: 48 89 f8 mov %rdi,%rax
>> a: 48 89 fa mov %rdi,%rdx
>> d: 0f 1f 00 nopl (%rax)
>> 10: c6 42 04 00 movb $0x0,0x4(%rdx)
>> 14: 80 7e 04 00 cmpb $0x0,0x4(%rsi)
>> 18: 74 0a je 24 <f(array_of_optional
>> const&)+0x24>
>> 1a: 44 8b 06 mov (%rsi),%r8d
>> 1d: c6 42 04 01 movb $0x1,0x4(%rdx)
>> 21: 44 89 02 mov %r8d,(%rdx)
>> 24: 48 83 c2 08 add $0x8,%rdx
>> 28: 48 83 c6 08 add $0x8,%rsi
>> 2c: 48 39 ca cmp %rcx,%rdx
>> 2f: 75 df jne 10 <f(array_of_optional
>> const&)+0x10>
>> 31: f3 c3 repz retq
>
> For high-level optimizations, I find it better to look at the file
> created by compiling with -fdump-tree-optimized.
>
I guess you have to read a few of them to get a feel for it.
>> However, because we're constructing into the return value, we're
>> under no obligation to leave the memory untouched, so this can be
>> optimized into a memcpy, which can be significantly faster if the
>> optionals are randomly engaged; but gcc doesn't notice that.
>
> Feel free to file an enhancement PR in gcc's bugzilla. The easiest is
> probably to handle it in libstdc++ in the copy constructor, under some
> conditions (trivially copy constructible and not too large). But some
> tools might complain about the read from uninitialized memory, even if
> it is safe.
I think this is too fragile. For example optional<optional<int>> would
not benefit from the optimization.
>
> Optimizers could turn
>
> out.engaged=0
> if(in.engaged)
> out.engaged=1
>
> into out.engaged=in.engaged
>
> but the condition would still be there, and I don't see the optimizers
> introducing the extra reads/writes, seems unlikely to be added.
>
That's a pity, because the extra writes would make it much faster.
The optimizers do feel free to write to padding holes, no? Clobbered
memory could be treated as a padding hole.
More information about the Gcc-help
mailing list