This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c++/77456] New: Suboptimal code when returning a string generated with a constexpr fn at compile time


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77456

            Bug ID: 77456
           Summary: Suboptimal code when returning a string generated with
                    a constexpr fn at compile time
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: petschy at gmail dot com
  Target Milestone: ---

Created attachment 39541
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39541&action=edit
C++ source

I ran into this when converting expression trees to strings at compile time.
Though it's surely a rare application, the fix might have positive impact on a
wider range of scenarios.

The attached code converts the integers [0..N] to a string at compile time.
There are several conversions with differing N's. Also, some conversions
calculate the exact size of the resulting strings, others just use a large
enough buffer.

Platform is is Debian Jessie, x86-64. Tested w/ 6.x and 7.0. To compile:
g++ -std=c++14 -Wall -Wextra -O3 20160831-constexpr.cpp

Please be patient, this takes almost 30 secs on my machine (AMD FX 8150 @
4GHz), due to lots of compile-time constexpr work.

foo(): [0..13] w/ a 200 byte buffer. It seems that the initial zero fill of the
buffer is not considered in dead-store elimination, so the 200 bytes are rep
stos'd, then the actual characters are copied via xmm0 and bytewise literal
stores:

Dump of assembler code for function _Z3foov:
   0x0000000000400620 <+0>:     mov    %rdi,%rdx
   0x0000000000400623 <+3>:     movq   $0x0,0xc0(%rdi)
   0x000000000040062e <+14>:    lea    0x8(%rdi),%rdi
   0x0000000000400632 <+18>:    mov    %rdx,%rcx
   0x0000000000400635 <+21>:    movdqa 0x27033(%rip),%xmm0        # 0x427670
   0x000000000040063d <+29>:    and    $0xfffffffffffffff8,%rdi
   0x0000000000400641 <+33>:    xor    %eax,%eax
   0x0000000000400643 <+35>:    sub    %rdi,%rcx
   0x0000000000400646 <+38>:    add    $0xc8,%ecx
   0x000000000040064c <+44>:    shr    $0x3,%ecx
   0x000000000040064f <+47>:    rep stos %rax,%es:(%rdi)
   0x0000000000400652 <+50>:    movups %xmm0,(%rdx)
   0x0000000000400655 <+53>:    movb   $0x38,0x10(%rdx)
   0x0000000000400659 <+57>:    movb   $0x20,0x11(%rdx)
   0x000000000040065d <+61>:    mov    %rdx,%rax
   0x0000000000400660 <+64>:    movb   $0x39,0x12(%rdx)
   0x0000000000400664 <+68>:    movb   $0x20,0x13(%rdx)
   0x0000000000400668 <+72>:    movb   $0x31,0x14(%rdx)
   0x000000000040066c <+76>:    movb   $0x30,0x15(%rdx)
   0x0000000000400670 <+80>:    movb   $0x20,0x16(%rdx)
   0x0000000000400674 <+84>:    movb   $0x31,0x17(%rdx)
   0x0000000000400678 <+88>:    movb   $0x31,0x18(%rdx)
   0x000000000040067c <+92>:    movb   $0x20,0x19(%rdx)
   0x0000000000400680 <+96>:    movb   $0x31,0x1a(%rdx)
   0x0000000000400684 <+100>:   movb   $0x32,0x1b(%rdx)
   0x0000000000400688 <+104>:   movb   $0x20,0x1c(%rdx)
   0x000000000040068c <+108>:   movb   $0x31,0x1d(%rdx)
   0x0000000000400690 <+112>:   movb   $0x33,0x1e(%rdx)
   0x0000000000400694 <+116>:   retq   

Since the buffer is larger, all the movb's could have been converted to another
xmm0 load+store. Though an explicit zero byte is written in the C++ code after
the last digit, this is missing in the disassembly above, so there is no "movb
$0x00, 0x1f(%rdx)" at the end, meaning that the compiler eliminated this store,
instead of merging all the 16 byte stores into a single xmm0 operation, and
skipping the first 32 bytes in the rep stos.

foo_sized() generates the same string, but first it calculates the needed size.
There is no zero fill here in the asm, so it was successfully eliminated, and
the characters are initialized via two xmm0 loads/stores, as expected:

Dump of assembler code for function _Z9foo_sizedv:
   0x00000000004006a0 <+0>:     movdqa 0x26fc8(%rip),%xmm0        # 0x427670
   0x00000000004006a8 <+8>:     mov    %rdi,%rax
   0x00000000004006ab <+11>:    movups %xmm0,(%rdi)
   0x00000000004006ae <+14>:    movdqa 0x26fca(%rip),%xmm0        # 0x427680
   0x00000000004006b6 <+22>:    movups %xmm0,0x10(%rdi)
   0x00000000004006ba <+26>:    retq   

bar/bar_sized/bar_static/bar_sized_static(): the same as foo, but the range is
[0..42], and the static versions use a static constexpr, and return the buffer
pointer, not the buffer by value. 

bar() zero fills and then copies over with xmm0 and byte literals. bar_sized()
lacks the zero fill, but initializes the characters the same way. The static
versions just return a pointer as expected.

baz_sized() works as expected: since the memory to copy is large, it calls
memcpy instead of doing the above xmm0 + literal bytes stuff.

The problem is with baz(). The range is much larger, [0..4200]. There is no DSE
here either, so the buffer is first zeroed with memset, but then ALL the
characters are initialized via bytewise literal stores, resulting in very large
function, around 138K. Why didn't the logic kicked in that replaced the
in-place init with memcpy? Or, at least, much of the copy could have been done
with xmm0, copying 16 bytes at once.

One more thing: if you disable the return in fixbuf() via setting the #if 1 to
0 at line 76, interesting things will happen:

6.2.1:
g++-6.2.1 -std=c++14 -Wall -Wextra -O3 20160831-constexpr.cpp 
‘
20160831-constexpr.cpp:83: confused by earlier errors, bailing out

In a terminal window with black bg and gray font, the single quote is gray,
then the error message on the next line is bold white, and it stays so, so
anything I type after this will be bold white.

7.0.0: an earlier version did the same, but two days ago I built a fresh
version and now it crashes:
g++-7.0.0 -std=c++14 -Wall -Wextra -O3 20160831-constexpr.cpp 
‘
In function ‘auto foo()’:
Segmentation fault
  constexpr auto x = fixbuf<13, 200>();
                                     ^
Please submit a full bug report,

The exact gcc versions used:

$ g++-6.2.1 -v
Using built-in specs.
COLLECT_GCC=g++-6.2.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/6.2.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-6.2.1 --disable-bootstrap CFLAGS='-O2 -march=native'
CXXFLAGS='-O2 -march=native'
Thread model: posix
gcc version 6.2.1 20160831 (GCC) 
git b823cdd4ccc1499a674e3863ce875c7459207727

g++-7.0.0 -v 
Using built-in specs.
COLLECT_GCC=g++-7.0.0
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-7.0.0 --disable-bootstrap CFLAGS='-O2 -march=native'
CXXFLAGS='-O2 -march=native'
Thread model: posix
gcc version 7.0.0 20160831 (experimental) (GCC)
git 14c36b15d931bf299bbc214707b903d0af124449

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]