This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/82450] New: Consider optimizing multidimensional arrays access without -ftree-vectorize


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82450

            Bug ID: 82450
           Summary: Consider optimizing multidimensional arrays access
                    without -ftree-vectorize
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Iterating over multidimensional array uses a counter for each dimension. For
code 

using array_t = unsigned[10][10];
void multidim_array_fill_1(array_t& data) {
    for (unsigned i = 0; i < 10; ++i) {
        for (unsigned j = 0; j < 10; ++j) {
            data[i][j] = 1;
        }
    }
}


The following assembly is generated with -O2:

multidim_array_fill_1(unsigned int (&) [10][10]):
  lea rdx, [rdi+40]
  lea rcx, [rdi+440] <=== This could be avoided
.L3:
  lea rax, [rdx-40] <=== This could be avoided
.L2:
  mov DWORD PTR [rax], 1
  add rax, 4
  cmp rax, rdx
  jne .L2
  lea rdx, [rax+40] <=== This could be avoided
  cmp rdx, rcx      <=== This could be avoided
  jne .L3           <=== This could be avoided
  rep ret


Optimal assembly would be 

multidim_array_fill_1_opt(unsigned int (&) [10][10]):
  lea rax, [rdi+400]
.L2:
  mov DWORD PTR [rdi], 1
  add rdi, 4
  cmp rdi, rax
  jne .L2
  rep ret


as if rewriting the initial C++ code as:

void multidim_array_fill_1_opt(array_t& data_md) {
    unsigned* data = &data_md[0][0];
    for (unsigned i = 0; i < 100; ++i) {
        data[i] = 1;
    }
}


Seems that representing array as a single dimensional without vectorizing could
be enabled at -O2 because it is always better: less registers used, code is
smaller, less comparisons and instructions in loop.


P.S.: With -ftree-vectorize array is represented as a single dimensional array,
but memory access is vectorized with increase of code size:
.L2:
  mov DWORD PTR [rdi+32], 1
  mov DWORD PTR [rdi+36], 1
  add rdi, 40
  movups XMMWORD PTR [rdi-40], xmm0
  movups XMMWORD PTR [rdi-24], xmm0
  cmp rax, rdi
  jne .L2

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]