Created attachment 32429 [details] Source of the programme used to generate the mentioned numbers Compiling the attached program on Linux AMD64 with the following command lines: $ g++ -O2 -std=c++11 -o emplace-vs-push_back{.pb,.cpp} $ g++ -O2 -std=c++11 -o emplace-vs-push_back{.eb,.cpp} -DEMPLACE_BACK and stripping the resulting executables: $ strip emplace-vs-push_back.* I get the following sizes: $ size emplace-vs-push_back.* text data bss dec hex filename 5570 696 40 6306 18a2 emplace-vs-push_back.eb 4338 672 40 5050 13ba emplace-vs-push_back.pb IOW: the emplace_back version generates roughly 1K more text (code). This is surprising, since functionally, emplace_back is the same as push_back(S&&), except that it saves one move ctor and one dtor call due to in-place construction. This should result in _less_ code generated, not more. $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.7.2 (Debian 4.7.2-5)
Some things that help: -fabi-version=0 -fwhole-program (so it knows emplace_back won't be used anywhere else, and it can inline it and remove the unneeded paths)
Yes, that helps a bit, but emplace_back still generates larger code than the corresponding rvalue-push_back. Considering that the latter also needs to generate the implicitly defined move ctor for S, this is still somewhat surprising and runs counter to the motivation to have emplace_back in the first place.
Now, what is _really_ weird is that push_back(T&&) _calls_ emplace_back(). I also tried the magic incantation g++ --param large-unit-insns=100000000 \ --param inline-unit-growth=100000000 \ --param max-inline-insns-single=100000000 \ --param large-function-growth=100000000 \ --param large-function-insns=100000000 -O2 to no avail. I can get the two version to within 80 bytes of text of each other by adding -fno-exceptions, so it's probably related to that. The (implicit) move ctor of S cannot throw, but the std::string(const char*) ctor can. Ie. in the rvalue-push_back case, emplace_back only dabbles in noexcept operations, and in the 3xconst char* case, it needs to deal with three throwing ctors. I can reduce the text size to within a few hundreds of bytes by marking both emplace_back and _M_emplace_back_aux as __attribute__((always_inline)), so something prevents gcc from inlining even when turning the inlining paramters all the way up. I can also reduce the text size by passing std::strings instead of conat char*s: text data bss dec hex filename 5628 672 40 6340 18c4 emplace-vs-push_back.eb 4991 672 40 5703 1647 emplace-vs-push_back.nt 4516 648 40 5204 1454 emplace-vs-push_back.pb (where .nt is EMPLACE_BACK_NOTHROW). Still a large gap... Have we accepted another auto_ptr into the standard? :)
Created attachment 34723 [details] New version of the test programme.
Testing a bit, it really looks like the issue resides in how and where the temporary string objects are created. Changing marc’s code to have struct S { S(const char* a, const char * b, const char *c); }; makes it reverting back to only 0.3k more text (which can be explained because two emplace_back function instanciation are needed vs one), and better insertion performance (insertion time is worse otherwise, which breaks emplace_back purpose). The same goes if strings are constructed before being passed to S constructor. (see new attachment). Looks like an optimizer issue to me. (note : tested with gcc 4.9.2)
Created attachment 36032 [details] New version of marc's code
Retesting with GCC 6.1, it looks better now: $ g++ -O2 -o emplace-vs-push_back{.pb,.cpp} $ g++ -O2 -o emplace-vs-push_back{.eb,.cpp} -DEMPLACE_BACK $ strip emplace-vs-push_back.* $ size emplace-vs-push_back.* text data bss dec hex filename 4474 680 8 5162 142a emplace-vs-push_back.eb 4830 680 8 5518 158e emplace-vs-push_back.nt 5083 656 8 5747 1673 emplace-vs-push_back.pb somewhat at the expense of pessimising push_back(), which used to be 500b smaller in Comment 3, but at least the relation between emplace_back and push_back, and between emplace_back(char[2], char[2], char[2]) and emplace_back(std::string, std::string, std::string) are now as expected.
Using the code in comment 6, with 4.9.3, 5.3.0, 6.1.0 and recent 7.0 trunk: text data bss dec hex filename 5606 696 40 6342 18c6 493.eb 4943 696 40 5679 162f 493.nt 4476 672 40 5188 1444 493.pb 4609 676 4 5289 14a9 530.eb 4881 704 8 5593 15d9 530.nt 4996 652 4 5652 1614 530.pb 4527 704 8 5239 1477 610.eb 4729 704 8 5441 1541 610.nt 4974 680 8 5662 161e 610.pb 4960 704 8 5672 1628 700.eb 5037 696 8 5741 166d 700.nt 5234 672 8 5914 171a 700.pb