[Bug libstdc++/80335] New: perf of copying std::optional<trivial>

glisse at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Wed Apr 5 23:42:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80335

            Bug ID: 80335
           Summary: perf of copying std::optional<trivial>
           Product: gcc
           Version: 7.0.1
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

I was surprised recently, while profiling some application, to see
boost::optional::assign relatively high (about 4%, while I am only using it in
a single place). After checking, it seems that std::optional in libstdc++ has
the same issue. The point is, copy-assigning an optional<int> involves 2
conditions (is lhs initialized, is rhs initialized) while a brutal
memcpy/memmove would be perfectly safe for a trivial type like int (sanitizers
and valgrind might occasionally complain about reading uninitialized memory
though).

#include <optional>
typedef std::optional<int> O;
void f1(O&a, O const&b){ a=b; }
void f2(O&a, O const&b){ __builtin_memmove(&a,&b,sizeof(O)); }

yields

        cmpb    $0, 4(%rdi)
        movzbl  4(%rsi), %eax
        je      .L2
        testb   %al, %al
        jne     .L8
        movb    $0, 4(%rdi)
        ret
        .p2align 4,,10
        .p2align 3
.L2:
        testb   %al, %al
        je      .L9
        movl    (%rsi), %eax
        movb    $1, 4(%rdi)
        movl    %eax, (%rdi)
        ret
        .p2align 4,,10
        .p2align 3
.L9:
        ret
        .p2align 4,,10
        .p2align 3
.L8:
        movl    (%rsi), %eax
        movl    %eax, (%rdi)
        ret

vs

        movq    (%rsi), %rax
        movq    %rax, (%rdi)
        ret

I am wondering under what conditions we could implement copying this way:
- type small enough: don't want an expensive copy for an empty optional
- type "trivial": no need to run the destructor, copy assignment, etc
- not using a sanitizer, not going to use valgrind: that cannot really be
tested...


More information about the Gcc-bugs mailing list