[Bug c++/85637] New: Unneeded store of member variables in inner loop
petschy at gmail dot com
gcc-bugzilla@gcc.gnu.org
Thu May 3 20:02:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85637
Bug ID: 85637
Summary: Unneeded store of member variables in inner loop
Product: gcc
Version: 7.3.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: petschy at gmail dot com
Target Milestone: ---
Created attachment 44058
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44058&action=edit
source
Attached a simple Adler32 checksum class. When updating with an array of bytes,
the inner loop just accumulates the two sums, then the modulo is done in the
outer loop. This way the cost of the two modulos is amortized.
At the start the two member variables are loaded into registers, however, they
are stored back to memory in each inner loop iteration. Then, also at the end
after the modulo, but before the end of the outer loop. There is only one exit
from the function.
Why not store the registers back just once right before the ret?
Dump of assembler code for function Adler32::Update(void const*, unsigned int):
0x0000000000400500 <+0>: test %edx,%edx
0x0000000000400502 <+2>: je 0x400578 <Adler32::Update(void const*,
unsigned int)+120>
0x0000000000400504 <+4>: mov (%rdi),%ecx ; ecx is m_s1
0x0000000000400506 <+6>: mov 0x4(%rdi),%r8d ; r8d is m_s2
0x000000000040050a <+10>: mov $0x80078071,%r10d
0x0000000000400510 <+16>: xor %r9d,%r9d
0x0000000000400513 <+19>: cmp $0x15af,%edx
0x0000000000400519 <+25>: jbe 0x400527 <Adler32::Update(void const*,
unsigned int)+39>
0x000000000040051b <+27>: lea -0x15b0(%rdx),%r9d
0x0000000000400522 <+34>: mov $0x15b0,%edx
0x0000000000400527 <+39>: lea -0x1(%rdx),%eax
0x000000000040052a <+42>: lea 0x1(%rsi,%rax,1),%rdx
0x000000000040052f <+47>: nop
0x0000000000400530 <+48>: add $0x1,%rsi
0x0000000000400534 <+52>: movzbl -0x1(%rsi),%eax
0x0000000000400538 <+56>: add %eax,%ecx ; m_s1 += *buf
0x000000000040053a <+58>: add %ecx,%r8d ; m_s2 += m_s1
0x000000000040053d <+61>: cmp %rdx,%rsi
0x0000000000400540 <+64>: mov %ecx,(%rdi) ; !!! unneeded store
0x0000000000400542 <+66>: mov %r8d,0x4(%rdi) ; !!! ditto
0x0000000000400546 <+70>: jne 0x400530 <Adler32::Update(void const*,
unsigned int)+48>
0x0000000000400548 <+72>: mov %ecx,%eax
0x000000000040054a <+74>: mul %r10d
0x000000000040054d <+77>: mov %r8d,%eax
0x0000000000400550 <+80>: shr $0xf,%edx
0x0000000000400553 <+83>: imul $0xfff1,%edx,%edx
0x0000000000400559 <+89>: sub %edx,%ecx
0x000000000040055b <+91>: mul %r10d
0x000000000040055e <+94>: mov %ecx,(%rdi) ; !!! this could be
done after the jne at +118
0x0000000000400560 <+96>: shr $0xf,%edx
0x0000000000400563 <+99>: imul $0xfff1,%edx,%edx
0x0000000000400569 <+105>: sub %edx,%r8d
0x000000000040056c <+108>: test %r9d,%r9d
0x000000000040056f <+111>: mov %r9d,%edx
0x0000000000400572 <+114>: mov %r8d,0x4(%rdi) ; !!! ditto
0x0000000000400576 <+118>: jne 0x400510 <Adler32::Update(void const*,
unsigned int)+16>
0x0000000000400578 <+120>: repz retq
The above code is generated w/ 7.3.1, 6.3.1 generates the exact same code.
8.0.1 and 8.1.1 generates somewhat different code, longer by 32 bytes, but the
placing of the stores are the same. The size difference is odd, but I'll open
another bug for that.
Platform: AMD64 (FX-8150), Debian 9.4
$ g++-6.3.1 -v
Using built-in specs.
COLLECT_GCC=g++-6.3.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/6.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-6.3.1 --disable-bootstrap --enable-checking=release
CFLAGS='-O2 -march=native' CXXFLAGS='-O2 -march=native'
Thread model: posix
gcc version 6.3.1 20170120 (GCC)
$ g++-7.3.1 -v
Using built-in specs.
COLLECT_GCC=g++-7.3.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-7.3.1 --disable-bootstrap CFLAGS='-O2 -march=native
-mtune=native' CXXFLAGS='-O2 -march=native -mtune=native'
Thread model: posix
gcc version 7.3.1 20180429 (GCC)
$ g++-8.0.1 -v
Using built-in specs.
COLLECT_GCC=g++-8.0.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-8.0.1 --disable-bootstrap CFLAGS='-O2 -march=native'
CXXFLAGS='-O2 -march=native'
Thread model: posix
gcc version 8.0.1 20180214 (experimental) (GCC)
$ g++-8.1.1 -v
Using built-in specs.
COLLECT_GCC=g++-8.1.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.1.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-8.1.1 --disable-bootstrap CFLAGS='-O2 -march=native
-mtune=native' CXXFLAGS='-O2 -march=native -mtune=native'
Thread model: posix
gcc version 8.1.1 20180502 (GCC)
More information about the Gcc-bugs
mailing list