This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug middle-end/82438] New: Memory access not optimized for loops with known bounds

From: "antoshkka at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 05 Oct 2017 12:08:12 +0000
Subject: [Bug middle-end/82438] New: Memory access not optimized for loops with known bounds
Auto-submitted: auto-generated

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82438

            Bug ID: 82438
           Summary: Memory access not optimized for loops with known
                    bounds
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Reading and writing small data in loops with -O2 generates "movzx ecx, BYTE PTR
[rdi+rdx]" and "mov BYTE PTR [rax], 15" instead of reading and writing using
words/dwords:

Code 

unsigned loop_read(unsigned char* a) {
    const unsigned size = 128;

    unsigned sum = 0;
    for (unsigned i = 0; i < size; ++i) {
        sum += a[i];
    }

    return sum;
}

generates assembly

loop_read(unsigned char*):
  lea rcx, [rdi+128]
  xor eax, eax
.L7:
  movzx edx, BYTE PTR [rdi]
  add rdi, 1
  add eax, edx
  cmp rdi, rcx
  jne .L7
  rep ret


Reading words/dwords significantly reduces iterations count. Clang reads using
dwords:

loop_read(unsigned char*): # @loop_read(unsigned char*)
  pxor xmm1, xmm1
  mov rax, -128
  pxor xmm0, xmm0
.LBB1_1: # =>This Inner Loop Header: Depth=1
  movd xmm2, dword ptr [rdi + rax + 128] # xmm2 = mem[0],zero,zero,zero
  punpcklbw xmm2, xmm1 # xmm2 =
xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
  punpcklwd xmm2, xmm1 # xmm2 =
xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
  paddd xmm0, xmm2
  add rax, 4
  jne .LBB1_1
  pshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
  paddd xmm1, xmm0
  pshufd xmm0, xmm1, 229 # xmm0 = xmm1[1,1,2,3]
  paddd xmm0, xmm1
  movd eax, xmm0
  ret

Follow-Ups:
- [Bug middle-end/82438] Memory access not optimized for loops with known bounds
  - From: jakub at gcc dot gnu.org
- [Bug middle-end/82438] Memory access not optimized for loops with known bounds
  - From: rguenth at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]