This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, I was looking at some loops that can be vectorized by LLVM, but not GCC. One type of loop is with store of negative step. void test1(short * __restrict__ x, short * __restrict__ y, short * __restrict__ z) { int i; for (i=127; i>=0; i--) { x[i] = y[127-i] + z[127-i]; } } I don't know why GCC only implements negative step for load, but not store. I implemented a patch, very similar to code in vectorizable_load. ~/scratch/install-x86/bin/gcc ghs-dec.c -ftree-vectorize -S -O2 -mavx Without patch: test1: .LFB0: addq $254, %rdi xorl %eax, %eax .p2align 4,,10 .p2align 3 .L2: movzwl (%rsi,%rax), %ecx subq $2, %rdi addw (%rdx,%rax), %cx addq $2, %rax movw %cx, 2(%rdi) cmpq $256, %rax jne .L2 rep; ret With patch: test1: .LFB0: vmovdqa .LC0(%rip), %xmm1 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L2: vmovdqu (%rsi,%rax), %xmm0 movq %rax, %rcx negq %rcx vpaddw (%rdx,%rax), %xmm0, %xmm0 vpshufb %xmm1, %xmm0, %xmm0 addq $16, %rax cmpq $256, %rax vmovups %xmm0, 240(%rdi,%rcx) jne .L2 rep; ret Performance is definitely improved here. It is bootstrapped for x86_64-unknown-linux-gnu, and has no additional regressions on my machine. For reference, LLVM seems to use different instructions and slightly worse code. I am not so familiar with x86 assemble code. The patch is originally for our private port. test1: # @test1 .cfi_startproc # BB#0: # %entry addq $240, %rdi xorl %eax, %eax .align 16, 0x90 .LBB0_1: # %vector.body # =>This Inner Loop Header: Depth=1 movdqu (%rsi,%rax,2), %xmm0 movdqu (%rdx,%rax,2), %xmm1 paddw %xmm0, %xmm1 shufpd $1, %xmm1, %xmm1 # xmm1 = xmm1[1,0] pshuflw $27, %xmm1, %xmm0 # xmm0 = xmm1[3,2,1,0,4,5,6,7] pshufhw $27, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2,3,7,6,5,4] movdqu %xmm0, (%rdi) addq $8, %rax addq $-16, %rdi cmpq $128, %rax jne .LBB0_1 # BB#2: # %for.end ret Any comment? Bingfeng Mei Broadcom UK
Attachment:
patch
Description: patch
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |