This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c++/81602] New: Unnecessary zero-extension after 16 bit popcnt
- From: "christoph.diegelmann at gmx dot de" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 28 Jul 2017 13:04:32 +0000
- Subject: [Bug c++/81602] New: Unnecessary zero-extension after 16 bit popcnt
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602
Bug ID: 81602
Summary: Unnecessary zero-extension after 16 bit popcnt
Product: gcc
Version: 7.1.1
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: christoph.diegelmann at gmx dot de
Target Milestone: ---
GCC misses an optimization on this:
#include <cstdint>
#include "immintrin.h"
void test(std::uint16_t* mask, std::uint16_t* data) {
for (int i = 0; i < 1024; ++i) {
*data = 0;
unsigned tmp = *mask++;
unsigned step = _mm_popcnt_u32(tmp);
data += step;
}
}
g++ -O3 -Wall -std=c++14 -march=skylake generates:
test(unsigned short*, unsigned short*):
leaq 2048(%rdi), %rdx
.L2:
xorl %eax, %eax
addq $2, %rdi
movw %ax, (%rsi)
popcntw -2(%rdi), %ax
movzwl %ax, %eax
leaq (%rsi,%rax,2), %rsi
cmpq %rdx, %rdi
jne .L2
ret
The rax register is known to be zero at the time of `popcntw -2(%rdi), %ax`.
Anyway gcc still clears the upper bits using `movzwl %ax, %eax` afterwards.
While clang uses 32 bit popcnt and `movzwl (%rdi,%rax,2), %ecx` it correctly
recognises that there's no need to clear the upper bits.
clang -O3 -Wall -std=c++14 -march=skylake -fno-unroll-loops generates:
test(unsigned short*, unsigned short*):
xorl %eax, %eax
.LBB0_1:
movw $0, (%rsi)
movzwl (%rdi,%rax,2), %ecx
popcntl %ecx, %ecx
leaq (%rsi,%rcx,2), %rsi
addq $1, %rax
cmpl $1024, %eax # imm = 0x400
jne .LBB0_1
retq
See https://godbolt.org/g/kgQ7VS