This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Vectorizer/alignment
- From: Hendrik Greving <hendrik dot greving dot intel at gmail dot com>
- To: GCC Development <gcc at gcc dot gnu dot org>
- Date: Fri, 8 Nov 2013 09:19:25 -0800
- Subject: Vectorizer/alignment
- Authentication-results: sourceware.org; auth=none
The code for a simple loop like
for (i = 0; i < LENGTH-1; i++) {
g_c[i] = g_a[i] + g_b[i];
}
looks good for g++ (4.9.0 20131028 (experimental)) (-O3 core-avx2)
.L2:
vmovdqa g_a(%rax), %ymm0 # 26 *movv8si_internal/2 [length = 8]
vpaddd g_b(%rax), %ymm0, %ymm0 # 27 *addv8si3/2 [length = 8]
addq $32, %rax # 29 *adddi_1/1 [length = 4]
vmovaps %ymm0, g_c-32(%rax) # 28 *movv8si_internal/3 [length = 8]
cmpq $39968, %rax # 31 *cmpdi_1/1 [length = 6]
jne .L2 # 32 *jcc_1 [length = 2]
but for gcc, I'm getting
.L4:
vmovdqu (%rsi,%rax), %xmm0 # 156 sse2_loaddquv16qi [length = 5]
vinserti128 $0x1, 16(%rsi,%rax), %ymm0, %ymm0 # 157
avx_vec_concatv32qi/1 [length = 8]
addl $1, %edx # 161 *addsi_1/1 [length = 3]
vpaddd (%rdi,%rax), %ymm0, %ymm0 # 158 *addv8si3/2 [length = 5]
vmovups %xmm0, (%rcx,%rax) # 412 *movv16qi_internal/3 [length = 5]
vextracti128 $0x1, %ymm0, 16(%rcx,%rax) # 160 vec_extract_hi_v32qi/2
[length = 8]
addq $32, %rax # 162 *adddi_1/1 [length = 4]
cmpl $1248, %edx # 164 *cmpsi_1/1 [length = 6]
jbe .L4 # 165 *jcc_1 [length = 2]
unless I add "__attribute__ ((aligned (64)));" g_a, g_b, g_c.
2 questions: Does C have different alignment requirements/specs than
C++ (I don't think so)? But if so, why does gcc not just align the
arrays (they are in the same module in my example...)? Let aside the
alignment question, why not just do avx2 (ymm) moves as g++ does?
Guess my question is, is this a bug or a feature?
Thanks,
Regards,
Hendrik