[Bug c/83202] Try joining operations on consecutive array elements during tree vectorization
bugzilla@poradnik-webmastera.com
gcc-bugzilla@gcc.gnu.org
Tue Nov 28 21:27:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202
--- Comment #1 from Daniel Fruzynski <bugzilla@poradnik-webmastera.com> ---
This was compiled with -O3 -mavx -ftree-vectorize
After sending this I noticed that I wrote inner loop incorrectly, I meant one
below. Anyway, it it also not optimized:
for (int j = 0; j < i; j+=4)
I also checked code which could be optimized using operations on YMM registers:
void test(double data[8][8])
{
for (int i = 0; i < 8; i++)
{
for (int j = 0; j < i; j+=4)
{
data[i][j] *= data[i][j];
data[i][j+1] *= data[i][j+1];
data[i][j+2] *= data[i][j+2];
data[i][j+3] *= data[i][j+3];
}
}
}
gcc output is, hmm, interesting:
test(double (*) [8]):
vmovupd xmm0, XMMWORD PTR [rdi+64]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+80], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+64], xmm0
vextractf128 XMMWORD PTR [rdi+80], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+128]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+144], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+128], xmm0
vextractf128 XMMWORD PTR [rdi+144], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+192]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+208], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+192], xmm0
vextractf128 XMMWORD PTR [rdi+208], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+256]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+272], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+256], xmm0
vextractf128 XMMWORD PTR [rdi+272], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+320]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+336], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+320], xmm0
vextractf128 XMMWORD PTR [rdi+336], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+352]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+368], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+352], xmm0
vextractf128 XMMWORD PTR [rdi+368], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+384]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+400], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+384], xmm0
vextractf128 XMMWORD PTR [rdi+400], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+416]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+432], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+416], xmm0
vextractf128 XMMWORD PTR [rdi+432], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+448]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+464], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+448], xmm0
vextractf128 XMMWORD PTR [rdi+464], ymm0, 0x1
vmovupd xmm0, XMMWORD PTR [rdi+480]
vinsertf128 ymm0, ymm0, XMMWORD PTR [rdi+496], 0x1
vmulpd ymm0, ymm0, ymm0
vmovups XMMWORD PTR [rdi+480], xmm0
vextractf128 XMMWORD PTR [rdi+496], ymm0, 0x1
vzeroupper
ret
More information about the Gcc-bugs
mailing list