Hello, for the reduced code below, its second loop is redundant. There seems to be a missed optimization. https://godbolt.org/z/McqeYdnfY int a[1024]; int b[1024]; void func() { for (int i = 0; i < 1024; i+=1) { a[i] = b[i] * 2; } for (int i = 0; i < 1024; i+=1) { a[i] = b[i] * 2; } } GCC -O3: func: xor eax, eax .L2: movdqa xmm0, XMMWORD PTR b[rax] add rax, 16 pslld xmm0, 1 movaps XMMWORD PTR a[rax-16], xmm0 cmp rax, 4096 jne .L2 xor eax, eax .L3: movdqa xmm0, XMMWORD PTR b[rax] add rax, 16 pslld xmm0, 1 movaps XMMWORD PTR a[rax-16], xmm0 cmp rax, 4096 jne .L3 ret Expected code: func: xor eax, eax .L2: movdqa xmm0, XMMWORD PTR b[rax] add rax, 16 pslld xmm0, 1 movaps XMMWORD PTR a[rax-16], xmm0 cmp rax, 4096 jne .L2 ret Thank you very much for your time and effort! We look forward to hearing from you.
Confirmed. loop fusion would detect the redundancy.