This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]


The code for a simple loop like

for (i = 0; i < LENGTH-1; i++) {
        g_c[i] = g_a[i] + g_b[i];

looks good for g++ (4.9.0 20131028 (experimental)) (-O3 core-avx2)

vmovdqa g_a(%rax), %ymm0 # 26 *movv8si_internal/2 [length = 8]
vpaddd g_b(%rax), %ymm0, %ymm0 # 27 *addv8si3/2 [length = 8]
addq $32, %rax # 29 *adddi_1/1 [length = 4]
vmovaps %ymm0, g_c-32(%rax) # 28 *movv8si_internal/3 [length = 8]
cmpq $39968, %rax # 31 *cmpdi_1/1 [length = 6]
jne .L2 # 32 *jcc_1 [length = 2]

but for gcc, I'm getting

vmovdqu (%rsi,%rax), %xmm0 # 156 sse2_loaddquv16qi [length = 5]
vinserti128 $0x1, 16(%rsi,%rax), %ymm0, %ymm0 # 157
avx_vec_concatv32qi/1 [length = 8]
addl $1, %edx # 161 *addsi_1/1 [length = 3]
vpaddd (%rdi,%rax), %ymm0, %ymm0 # 158 *addv8si3/2 [length = 5]
vmovups %xmm0, (%rcx,%rax) # 412 *movv16qi_internal/3 [length = 5]
vextracti128 $0x1, %ymm0, 16(%rcx,%rax) # 160 vec_extract_hi_v32qi/2
[length = 8]
addq $32, %rax # 162 *adddi_1/1 [length = 4]
cmpl $1248, %edx # 164 *cmpsi_1/1 [length = 6]
jbe .L4 # 165 *jcc_1 [length = 2]

unless I add "__attribute__ ((aligned (64)));" g_a, g_b, g_c.

2 questions: Does C have different alignment requirements/specs than
C++ (I don't think so)? But if so, why does gcc not just align the
arrays (they are in the same module in my example...)? Let aside the
alignment question, why not just do avx2 (ymm) moves as g++ does?

Guess my question is, is this a bug or a feature?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]