This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c++/77456] New: Suboptimal code when returning a string generated with a constexpr fn at compile time
- From: "petschy at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 02 Sep 2016 14:19:36 +0000
- Subject: [Bug c++/77456] New: Suboptimal code when returning a string generated with a constexpr fn at compile time
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77456
Bug ID: 77456
Summary: Suboptimal code when returning a string generated with
a constexpr fn at compile time
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: petschy at gmail dot com
Target Milestone: ---
Created attachment 39541
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39541&action=edit
C++ source
I ran into this when converting expression trees to strings at compile time.
Though it's surely a rare application, the fix might have positive impact on a
wider range of scenarios.
The attached code converts the integers [0..N] to a string at compile time.
There are several conversions with differing N's. Also, some conversions
calculate the exact size of the resulting strings, others just use a large
enough buffer.
Platform is is Debian Jessie, x86-64. Tested w/ 6.x and 7.0. To compile:
g++ -std=c++14 -Wall -Wextra -O3 20160831-constexpr.cpp
Please be patient, this takes almost 30 secs on my machine (AMD FX 8150 @
4GHz), due to lots of compile-time constexpr work.
foo(): [0..13] w/ a 200 byte buffer. It seems that the initial zero fill of the
buffer is not considered in dead-store elimination, so the 200 bytes are rep
stos'd, then the actual characters are copied via xmm0 and bytewise literal
stores:
Dump of assembler code for function _Z3foov:
0x0000000000400620 <+0>: mov %rdi,%rdx
0x0000000000400623 <+3>: movq $0x0,0xc0(%rdi)
0x000000000040062e <+14>: lea 0x8(%rdi),%rdi
0x0000000000400632 <+18>: mov %rdx,%rcx
0x0000000000400635 <+21>: movdqa 0x27033(%rip),%xmm0 # 0x427670
0x000000000040063d <+29>: and $0xfffffffffffffff8,%rdi
0x0000000000400641 <+33>: xor %eax,%eax
0x0000000000400643 <+35>: sub %rdi,%rcx
0x0000000000400646 <+38>: add $0xc8,%ecx
0x000000000040064c <+44>: shr $0x3,%ecx
0x000000000040064f <+47>: rep stos %rax,%es:(%rdi)
0x0000000000400652 <+50>: movups %xmm0,(%rdx)
0x0000000000400655 <+53>: movb $0x38,0x10(%rdx)
0x0000000000400659 <+57>: movb $0x20,0x11(%rdx)
0x000000000040065d <+61>: mov %rdx,%rax
0x0000000000400660 <+64>: movb $0x39,0x12(%rdx)
0x0000000000400664 <+68>: movb $0x20,0x13(%rdx)
0x0000000000400668 <+72>: movb $0x31,0x14(%rdx)
0x000000000040066c <+76>: movb $0x30,0x15(%rdx)
0x0000000000400670 <+80>: movb $0x20,0x16(%rdx)
0x0000000000400674 <+84>: movb $0x31,0x17(%rdx)
0x0000000000400678 <+88>: movb $0x31,0x18(%rdx)
0x000000000040067c <+92>: movb $0x20,0x19(%rdx)
0x0000000000400680 <+96>: movb $0x31,0x1a(%rdx)
0x0000000000400684 <+100>: movb $0x32,0x1b(%rdx)
0x0000000000400688 <+104>: movb $0x20,0x1c(%rdx)
0x000000000040068c <+108>: movb $0x31,0x1d(%rdx)
0x0000000000400690 <+112>: movb $0x33,0x1e(%rdx)
0x0000000000400694 <+116>: retq
Since the buffer is larger, all the movb's could have been converted to another
xmm0 load+store. Though an explicit zero byte is written in the C++ code after
the last digit, this is missing in the disassembly above, so there is no "movb
$0x00, 0x1f(%rdx)" at the end, meaning that the compiler eliminated this store,
instead of merging all the 16 byte stores into a single xmm0 operation, and
skipping the first 32 bytes in the rep stos.
foo_sized() generates the same string, but first it calculates the needed size.
There is no zero fill here in the asm, so it was successfully eliminated, and
the characters are initialized via two xmm0 loads/stores, as expected:
Dump of assembler code for function _Z9foo_sizedv:
0x00000000004006a0 <+0>: movdqa 0x26fc8(%rip),%xmm0 # 0x427670
0x00000000004006a8 <+8>: mov %rdi,%rax
0x00000000004006ab <+11>: movups %xmm0,(%rdi)
0x00000000004006ae <+14>: movdqa 0x26fca(%rip),%xmm0 # 0x427680
0x00000000004006b6 <+22>: movups %xmm0,0x10(%rdi)
0x00000000004006ba <+26>: retq
bar/bar_sized/bar_static/bar_sized_static(): the same as foo, but the range is
[0..42], and the static versions use a static constexpr, and return the buffer
pointer, not the buffer by value.
bar() zero fills and then copies over with xmm0 and byte literals. bar_sized()
lacks the zero fill, but initializes the characters the same way. The static
versions just return a pointer as expected.
baz_sized() works as expected: since the memory to copy is large, it calls
memcpy instead of doing the above xmm0 + literal bytes stuff.
The problem is with baz(). The range is much larger, [0..4200]. There is no DSE
here either, so the buffer is first zeroed with memset, but then ALL the
characters are initialized via bytewise literal stores, resulting in very large
function, around 138K. Why didn't the logic kicked in that replaced the
in-place init with memcpy? Or, at least, much of the copy could have been done
with xmm0, copying 16 bytes at once.
One more thing: if you disable the return in fixbuf() via setting the #if 1 to
0 at line 76, interesting things will happen:
6.2.1:
g++-6.2.1 -std=c++14 -Wall -Wextra -O3 20160831-constexpr.cpp
‘
20160831-constexpr.cpp:83: confused by earlier errors, bailing out
In a terminal window with black bg and gray font, the single quote is gray,
then the error message on the next line is bold white, and it stays so, so
anything I type after this will be bold white.
7.0.0: an earlier version did the same, but two days ago I built a fresh
version and now it crashes:
g++-7.0.0 -std=c++14 -Wall -Wextra -O3 20160831-constexpr.cpp
‘
In function ‘auto foo()’:
Segmentation fault
constexpr auto x = fixbuf<13, 200>();
^
Please submit a full bug report,
The exact gcc versions used:
$ g++-6.2.1 -v
Using built-in specs.
COLLECT_GCC=g++-6.2.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/6.2.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-6.2.1 --disable-bootstrap CFLAGS='-O2 -march=native'
CXXFLAGS='-O2 -march=native'
Thread model: posix
gcc version 6.2.1 20160831 (GCC)
git b823cdd4ccc1499a674e3863ce875c7459207727
g++-7.0.0 -v
Using built-in specs.
COLLECT_GCC=g++-7.0.0
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-7.0.0 --disable-bootstrap CFLAGS='-O2 -march=native'
CXXFLAGS='-O2 -march=native'
Thread model: posix
gcc version 7.0.0 20160831 (experimental) (GCC)
git 14c36b15d931bf299bbc214707b903d0af124449