Following testcase from PR target/33329 shows the problem where gcc doesn't fold vector arithmetic operations with constant arguments to a load of vector constant. For clarity, sse4 will be used, but the same problem is present on sse2. --cut here-- extern void g (int *); void f (void) { int tabs[8], tabcount; for (tabcount = 1; tabcount <= 8; tabcount += 7) { int i; for (i = 0; i < 8; i++) tabs[i] = 2 * i; g (tabs); } } --cut here-- produces (gcc -O2 -msse4 -ftree-vectorize): .LCFI2: movdqa .LC0(%rip), %xmm1 leaq 16(%rsp), %rbp movdqa .LC1(%rip), %xmm0 paddd .LC2(%rip), %xmm1 pmulld %xmm1, %xmm0 # 19 *sse4_1_mulv4si3 [length = 4] movdqa %xmm0, (%rsp) .L2: movdqa .LC3(%rip), %xmm0 # 54 movq %rbp, %rdi addl $1, %ebx movdqa (%rsp), %xmm2 # 55 movdqa %xmm0, (%rbp) movdqa %xmm2, 16(%rbp) call g cmpl $2, %ebx jne .L2 All instructions above the loop have constant arguments. This is evident from combine RTL dump, where insn 19 is represented using following RTX: (insn 19 17 25 2 pr33329.c:13 (set (reg:V4SI 78) (mult:V4SI (reg:V4SI 77) (reg:V4SI 73))) 1136 {*sse4_1_mulv4si3} (expr_list:REG_DEAD (reg:V4S I 73) (expr_list:REG_EQUAL (const_vector:V4SI [ (const_int 8 [0x8]) (const_int 10 [0xa]) (const_int 12 [0xc]) (const_int 14 [0xe]) ]) (nil)))) Actually gcc already calculated correct const_vector value, but it looks like it doesn't know what to do with it. For optimal code, insn #55 should load vector constant from the constant pool in the same way as insn #54.
This is fixed at least from gcc version 4.6.2 20110827 (prerelease) onward: f: .LFB0: .cfi_startproc subq $40, %rsp .cfi_def_cfa_offset 48 movq %rsp, %rdi movl $0, (%rsp) movl $2, 4(%rsp) movl $4, 8(%rsp) movl $6, 12(%rsp) movl $8, 16(%rsp) movl $10, 20(%rsp) movl $12, 24(%rsp) movl $14, 28(%rsp) call g movq %rsp, %rdi movl $0, (%rsp) movl $2, 4(%rsp) movl $4, 8(%rsp) movl $6, 12(%rsp) movl $8, 16(%rsp) movl $10, 20(%rsp) movl $12, 24(%rsp) movl $14, 28(%rsp) call g addq $40, %rsp .cfi_def_cfa_offset 8 ret