[Bug c++/85637] New: Unneeded store of member variables in inner loop

petschy at gmail dot com gcc-bugzilla@gcc.gnu.org
Thu May 3 20:02:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85637

            Bug ID: 85637
           Summary: Unneeded store of member variables in inner loop
           Product: gcc
           Version: 7.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: petschy at gmail dot com
  Target Milestone: ---

Created attachment 44058
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44058&action=edit
source

Attached a simple Adler32 checksum class. When updating with an array of bytes,
the inner loop just accumulates the two sums, then the modulo is done in the
outer loop. This way the cost of the two modulos is amortized.

At the start the two member variables are loaded into registers, however, they
are stored back to memory in each inner loop iteration. Then, also at the end
after the modulo, but before the end of the outer loop. There is only one exit
from the function.

Why not store the registers back just once right before the ret?

Dump of assembler code for function Adler32::Update(void const*, unsigned int):
   0x0000000000400500 <+0>:     test   %edx,%edx
   0x0000000000400502 <+2>:     je     0x400578 <Adler32::Update(void const*,
unsigned int)+120>
   0x0000000000400504 <+4>:     mov    (%rdi),%ecx      ; ecx is m_s1
   0x0000000000400506 <+6>:     mov    0x4(%rdi),%r8d   ; r8d is m_s2
   0x000000000040050a <+10>:    mov    $0x80078071,%r10d
   0x0000000000400510 <+16>:    xor    %r9d,%r9d
   0x0000000000400513 <+19>:    cmp    $0x15af,%edx
   0x0000000000400519 <+25>:    jbe    0x400527 <Adler32::Update(void const*,
unsigned int)+39>
   0x000000000040051b <+27>:    lea    -0x15b0(%rdx),%r9d
   0x0000000000400522 <+34>:    mov    $0x15b0,%edx
   0x0000000000400527 <+39>:    lea    -0x1(%rdx),%eax
   0x000000000040052a <+42>:    lea    0x1(%rsi,%rax,1),%rdx
   0x000000000040052f <+47>:    nop
   0x0000000000400530 <+48>:    add    $0x1,%rsi
   0x0000000000400534 <+52>:    movzbl -0x1(%rsi),%eax
   0x0000000000400538 <+56>:    add    %eax,%ecx        ; m_s1 += *buf
   0x000000000040053a <+58>:    add    %ecx,%r8d        ; m_s2 += m_s1
   0x000000000040053d <+61>:    cmp    %rdx,%rsi
   0x0000000000400540 <+64>:    mov    %ecx,(%rdi)      ; !!! unneeded store
   0x0000000000400542 <+66>:    mov    %r8d,0x4(%rdi)   ; !!! ditto
   0x0000000000400546 <+70>:    jne    0x400530 <Adler32::Update(void const*,
unsigned int)+48>
   0x0000000000400548 <+72>:    mov    %ecx,%eax
   0x000000000040054a <+74>:    mul    %r10d
   0x000000000040054d <+77>:    mov    %r8d,%eax
   0x0000000000400550 <+80>:    shr    $0xf,%edx
   0x0000000000400553 <+83>:    imul   $0xfff1,%edx,%edx
   0x0000000000400559 <+89>:    sub    %edx,%ecx
   0x000000000040055b <+91>:    mul    %r10d
   0x000000000040055e <+94>:    mov    %ecx,(%rdi)      ; !!! this could be
done after the jne at +118
   0x0000000000400560 <+96>:    shr    $0xf,%edx
   0x0000000000400563 <+99>:    imul   $0xfff1,%edx,%edx
   0x0000000000400569 <+105>:   sub    %edx,%r8d
   0x000000000040056c <+108>:   test   %r9d,%r9d
   0x000000000040056f <+111>:   mov    %r9d,%edx
   0x0000000000400572 <+114>:   mov    %r8d,0x4(%rdi)   ; !!! ditto
   0x0000000000400576 <+118>:   jne    0x400510 <Adler32::Update(void const*,
unsigned int)+16>
   0x0000000000400578 <+120>:   repz retq 

The above code is generated w/ 7.3.1, 6.3.1 generates the exact same code. 

8.0.1  and 8.1.1 generates somewhat different code, longer by 32 bytes, but the
placing of the stores are the same. The size difference is odd, but I'll open
another bug for that.

Platform: AMD64 (FX-8150), Debian 9.4

$ g++-6.3.1 -v
Using built-in specs.
COLLECT_GCC=g++-6.3.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/6.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-6.3.1 --disable-bootstrap --enable-checking=release
CFLAGS='-O2 -march=native' CXXFLAGS='-O2 -march=native'
Thread model: posix
gcc version 6.3.1 20170120 (GCC)

$ g++-7.3.1 -v
Using built-in specs.
COLLECT_GCC=g++-7.3.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-7.3.1 --disable-bootstrap CFLAGS='-O2 -march=native
-mtune=native' CXXFLAGS='-O2 -march=native -mtune=native'
Thread model: posix
gcc version 7.3.1 20180429 (GCC)

$ g++-8.0.1 -v
Using built-in specs.
COLLECT_GCC=g++-8.0.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-8.0.1 --disable-bootstrap CFLAGS='-O2 -march=native'
CXXFLAGS='-O2 -march=native'
Thread model: posix
gcc version 8.0.1 20180214 (experimental) (GCC)

$ g++-8.1.1 -v
Using built-in specs.
COLLECT_GCC=g++-8.1.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.1.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-8.1.1 --disable-bootstrap CFLAGS='-O2 -march=native
-mtune=native' CXXFLAGS='-O2 -march=native -mtune=native'
Thread model: posix
gcc version 8.1.1 20180502 (GCC)


More information about the Gcc-bugs mailing list