This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/82438] New: Memory access not optimized for loops with known bounds
- From: "antoshkka at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 05 Oct 2017 12:08:12 +0000
- Subject: [Bug middle-end/82438] New: Memory access not optimized for loops with known bounds
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82438
Bug ID: 82438
Summary: Memory access not optimized for loops with known
bounds
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: antoshkka at gmail dot com
Target Milestone: ---
Reading and writing small data in loops with -O2 generates "movzx ecx, BYTE PTR
[rdi+rdx]" and "mov BYTE PTR [rax], 15" instead of reading and writing using
words/dwords:
Code
unsigned loop_read(unsigned char* a) {
const unsigned size = 128;
unsigned sum = 0;
for (unsigned i = 0; i < size; ++i) {
sum += a[i];
}
return sum;
}
generates assembly
loop_read(unsigned char*):
lea rcx, [rdi+128]
xor eax, eax
.L7:
movzx edx, BYTE PTR [rdi]
add rdi, 1
add eax, edx
cmp rdi, rcx
jne .L7
rep ret
Reading words/dwords significantly reduces iterations count. Clang reads using
dwords:
loop_read(unsigned char*): # @loop_read(unsigned char*)
pxor xmm1, xmm1
mov rax, -128
pxor xmm0, xmm0
.LBB1_1: # =>This Inner Loop Header: Depth=1
movd xmm2, dword ptr [rdi + rax + 128] # xmm2 = mem[0],zero,zero,zero
punpcklbw xmm2, xmm1 # xmm2 =
xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
punpcklwd xmm2, xmm1 # xmm2 =
xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
paddd xmm0, xmm2
add rax, 4
jne .LBB1_1
pshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
paddd xmm1, xmm0
pshufd xmm0, xmm1, 229 # xmm0 = xmm1[1,1,2,3]
paddd xmm0, xmm1
movd eax, xmm0
ret