Bug 60621 - std::vector::emplace_back generates massively more code than push_back
Summary: std::vector::emplace_back generates massively more code than push_back
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.7.2
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-22 23:32 UTC by Marc Mutz
Modified: 2016-07-12 11:19 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Source of the programme used to generate the mentioned numbers (212 bytes, text/x-c++src)
2014-03-22 23:32 UTC, Marc Mutz
Details
New version of the test programme. (250 bytes, text/x-c++src)
2015-02-11 11:13 UTC, Marc Mutz
Details
New version of marc's code (323 bytes, text/x-csrc)
2015-07-22 13:53 UTC, julien.blanc
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Mutz 2014-03-22 23:32:26 UTC
Created attachment 32429 [details]
Source of the programme used to generate the mentioned numbers

Compiling the attached program on Linux AMD64 with the following command lines:

    $ g++ -O2 -std=c++11 -o emplace-vs-push_back{.pb,.cpp}
    $ g++ -O2 -std=c++11 -o emplace-vs-push_back{.eb,.cpp} -DEMPLACE_BACK

and stripping the resulting executables:

    $ strip emplace-vs-push_back.*

I get the following sizes:

    $ size emplace-vs-push_back.*
       text    data     bss     dec     hex filename
       5570     696      40    6306    18a2 emplace-vs-push_back.eb
       4338     672      40    5050    13ba emplace-vs-push_back.pb

IOW: the emplace_back version generates roughly 1K more text (code).

This is surprising, since functionally, emplace_back is the same as push_back(S&&), except that it saves one move ctor and one dtor call due to in-place construction. This should result in _less_ code generated, not more.

   $ g++ -v
   Using built-in specs.
   COLLECT_GCC=g++
   COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
   Target: x86_64-linux-gnu
   Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
   Thread model: posix
   gcc version 4.7.2 (Debian 4.7.2-5)
Comment 1 Marc Glisse 2014-03-23 08:24:07 UTC
Some things that help:
-fabi-version=0
-fwhole-program (so it knows emplace_back won't be used anywhere else, and it can inline it and remove the unneeded paths)
Comment 2 Marc Mutz 2014-03-23 11:26:46 UTC
Yes, that helps a bit, but emplace_back still generates larger code than the corresponding rvalue-push_back. Considering that the latter also needs to generate the implicitly defined move ctor for S, this is still somewhat surprising and runs counter to the motivation to have emplace_back in the first place.
Comment 3 Marc Mutz 2015-02-11 11:12:46 UTC
Now, what is _really_ weird is that push_back(T&&) _calls_ emplace_back(). I also tried the magic incantation

   g++ --param large-unit-insns=100000000 \
       --param inline-unit-growth=100000000 \
       --param max-inline-insns-single=100000000 \
       --param large-function-growth=100000000 \
       --param large-function-insns=100000000 -O2

to no avail. I can get the two version to within 80 bytes of text of each other by adding -fno-exceptions, so it's probably related to that. The (implicit) move ctor of S cannot throw, but the std::string(const char*) ctor can. Ie. in the rvalue-push_back case, emplace_back only dabbles in noexcept operations, and in the 3xconst char* case, it needs to deal with three throwing ctors.

I can reduce the text size to within a few hundreds of bytes by marking both emplace_back and _M_emplace_back_aux as __attribute__((always_inline)), so something prevents gcc from inlining even when turning the inlining paramters all the way up.

I can also reduce the text size by passing std::strings instead of conat char*s:

   text    data     bss     dec     hex filename
   5628     672      40    6340    18c4 emplace-vs-push_back.eb
   4991     672      40    5703    1647 emplace-vs-push_back.nt
   4516     648      40    5204    1454 emplace-vs-push_back.pb

(where .nt is EMPLACE_BACK_NOTHROW). Still a large gap...

Have we accepted another auto_ptr into the standard? :)
Comment 4 Marc Mutz 2015-02-11 11:13:54 UTC
Created attachment 34723 [details]
New version of the test programme.
Comment 5 julien.blanc 2015-07-22 13:52:40 UTC
Testing a bit, it really looks like the issue resides in how and where the temporary string objects are created.

Changing marc’s code to have

struct S {
   S(const char* a, const char * b, const char *c);
};

makes it reverting back to only 0.3k more text (which can be explained because two emplace_back function instanciation are needed vs one), and better insertion performance (insertion time is worse otherwise, which breaks emplace_back purpose).

The same goes if strings are constructed before being passed to S constructor.

(see new attachment).

Looks like an optimizer issue to me.

(note : tested with gcc 4.9.2)
Comment 6 julien.blanc 2015-07-22 13:53:36 UTC
Created attachment 36032 [details]
New version of marc's code
Comment 7 Marc Mutz 2016-07-07 08:04:05 UTC
Retesting with GCC 6.1, it looks better now:

  $ g++ -O2 -o emplace-vs-push_back{.pb,.cpp}
  $ g++ -O2 -o emplace-vs-push_back{.eb,.cpp} -DEMPLACE_BACK
  $ strip emplace-vs-push_back.*
  $ size emplace-vs-push_back.*
   text    data     bss     dec     hex filename
   4474     680       8    5162    142a emplace-vs-push_back.eb
   4830     680       8    5518    158e emplace-vs-push_back.nt
   5083     656       8    5747    1673 emplace-vs-push_back.pb

somewhat at the expense of pessimising push_back(), which used to be 500b smaller in Comment 3, but at least the relation between emplace_back and push_back, and between emplace_back(char[2], char[2], char[2]) and emplace_back(std::string, std::string, std::string) are now as expected.
Comment 8 Jonathan Wakely 2016-07-12 11:19:14 UTC
Using the code in comment 6, with 4.9.3, 5.3.0, 6.1.0 and recent 7.0 trunk:

   text    data     bss     dec     hex filename
   5606     696      40    6342    18c6 493.eb
   4943     696      40    5679    162f 493.nt
   4476     672      40    5188    1444 493.pb
   4609     676       4    5289    14a9 530.eb
   4881     704       8    5593    15d9 530.nt
   4996     652       4    5652    1614 530.pb
   4527     704       8    5239    1477 610.eb
   4729     704       8    5441    1541 610.nt
   4974     680       8    5662    161e 610.pb
   4960     704       8    5672    1628 700.eb
   5037     696       8    5741    166d 700.nt
   5234     672       8    5914    171a 700.pb