This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/82450] New: Consider optimizing multidimensional arrays access without -ftree-vectorize
- From: "antoshkka at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 06 Oct 2017 12:01:25 +0000
- Subject: [Bug tree-optimization/82450] New: Consider optimizing multidimensional arrays access without -ftree-vectorize
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82450
Bug ID: 82450
Summary: Consider optimizing multidimensional arrays access
without -ftree-vectorize
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: antoshkka at gmail dot com
Target Milestone: ---
Iterating over multidimensional array uses a counter for each dimension. For
code
using array_t = unsigned[10][10];
void multidim_array_fill_1(array_t& data) {
for (unsigned i = 0; i < 10; ++i) {
for (unsigned j = 0; j < 10; ++j) {
data[i][j] = 1;
}
}
}
The following assembly is generated with -O2:
multidim_array_fill_1(unsigned int (&) [10][10]):
lea rdx, [rdi+40]
lea rcx, [rdi+440] <=== This could be avoided
.L3:
lea rax, [rdx-40] <=== This could be avoided
.L2:
mov DWORD PTR [rax], 1
add rax, 4
cmp rax, rdx
jne .L2
lea rdx, [rax+40] <=== This could be avoided
cmp rdx, rcx <=== This could be avoided
jne .L3 <=== This could be avoided
rep ret
Optimal assembly would be
multidim_array_fill_1_opt(unsigned int (&) [10][10]):
lea rax, [rdi+400]
.L2:
mov DWORD PTR [rdi], 1
add rdi, 4
cmp rdi, rax
jne .L2
rep ret
as if rewriting the initial C++ code as:
void multidim_array_fill_1_opt(array_t& data_md) {
unsigned* data = &data_md[0][0];
for (unsigned i = 0; i < 100; ++i) {
data[i] = 1;
}
}
Seems that representing array as a single dimensional without vectorizing could
be enabled at -O2 because it is always better: less registers used, code is
smaller, less comparisons and instructions in loop.
P.S.: With -ftree-vectorize array is represented as a single dimensional array,
but memory access is vectorized with increase of code size:
.L2:
mov DWORD PTR [rdi+32], 1
mov DWORD PTR [rdi+36], 1
add rdi, 40
movups XMMWORD PTR [rdi-40], xmm0
movups XMMWORD PTR [rdi-24], xmm0
cmp rax, rdi
jne .L2