bash-3.1$ cat pmaxsw.c typedef short vec_t; extern __attribute__((aligned(16))) vec_t x [64]; extern __attribute__((aligned(16))) vec_t y [64]; extern __attribute__((aligned(16))) vec_t m [64]; void foo () { int i; for (i = 0; i < 20; i++) #if 0 m [i] = (x [i] < y [i]) ? y [i] : x[i]; #else if (x [i] < y [i]) m [i] = y [i]; else m [i] = x[i]; #endif } bash-3.1$ make /usr/gcc-4.3/bin/gcc -O2 -mssse3 -ftree-vectorize -ftree-vectorizer-verbose=1 -S pmaxsw.c pmaxsw.c:9: note: for a in pmaxsw.s; do \ insn=`basename $a .s`; \ echo Check vectorizer on $insn.c:; \ grep $insn $a | grep xmm; \ done Check vectorizer on pmaxsw.c: make: *** [all] Error 1 bash-3.1$ bash-3.1$ cat pmaxsw.c typedef short vec_t; extern __attribute__((aligned(16))) vec_t x [64]; extern __attribute__((aligned(16))) vec_t y [64]; extern __attribute__((aligned(16))) vec_t m [64]; void foo () { int i; for (i = 0; i < 20; i++) #if 1 m [i] = (x [i] < y [i]) ? y [i] : x[i]; #else if (x [i] < y [i]) m [i] = y [i]; else m [i] = x[i]; #endif } bash-3.1$ make /usr/gcc-4.3/bin/gcc -O2 -mssse3 -ftree-vectorize -ftree-vectorizer-verbose=1 -S pmaxsw.c pmaxsw.c:12: note: LOOP VECTORIZED. pmaxsw.c:9: note: vectorized 1 loops in function. for a in pmaxsw.s; do \ insn=`basename $a .s`; \ echo Check vectorizer on $insn.c:; \ grep $insn $a | grep xmm; \ done Check vectorizer on pmaxsw.c: pmaxsw x(%rip), %xmm0 pmaxsw x+16(%rip), %xmm0 bash-3.1$ Why can't vectorizer recognize if (x [i] < y [i]) m [i] = y [i]; else m [i] = x[i]; is the same as m [i] = (x [i] < y [i]) ? y [i] : x[i];
Maybe you can show the assembly output you're expecting? tree level if-conversion for the vectorizer _should_ be able to recognize this case. It may be that it's just not set up to deal with x86* insns of this kind.
For code: typedef short vec_t; extern __attribute__((aligned(16))) vec_t x [64]; extern __attribute__((aligned(16))) vec_t y [64]; extern __attribute__((aligned(16))) vec_t m [64]; void foo () { int i; for (i = 0; i < 64; i++) if (x [i] < y [i]) m [i] = y [i]; else m [i] = x [i]; } I am expecting: .globl foo .type foo, @function foo: .LFB2: movdqa y(%rip), %xmm0 movl $16, %eax pmaxsw x(%rip), %xmm0 movdqa %xmm0, m(%rip) .p2align 4,,7 .L2: movdqa y(%rax), %xmm0 pmaxsw x(%rax), %xmm0 movdqa %xmm0, m(%rax) addq $16, %rax cmpq $128, %rax jne .L2 rep ; ret .LFE2: .size foo, .-foo But I got .globl foo .type foo, @function foo: .LFB2: xorl %ecx, %ecx .p2align 4,,7 .L2: movzwl x(%rcx,%rcx), %edx movzwl y(%rcx,%rcx), %eax cmpw %ax, %dx cmovge %edx, %eax movw %ax, m(%rcx,%rcx) addq $1, %rcx cmpq $64, %rcx jne .L2 rep ; ret .LFE2: .size foo, .-foo
This is not the vectorizer's fault. >tree level if-conversion for the vectorizer _should_ be able to recognize this > case. It may be that it's just not set up to deal with x86* insns of this kind. It cannot because there is a store there and no other pass sinks the store. So basically if(a) a[i] = xxx; else a[i] = yyy; is not converted to if (a) ddd= xxx; else ddd = yyy; a[i] = ddd; Which means this is really a dup of another bug.
yes, this is indeed a known problem (I don't know if there's a PR open for it). It is one of the tree-ifcvt enhancements that Victor was going to tackle for 4.3 (item (2.3) in http://gcc.gnu.org/wiki/AutovectBranchOptimizations?).
(In reply to comment #4) > yes, this is indeed a known problem (I don't know if there's a PR open for it). > It is one of the tree-ifcvt enhancements that Victor was going to tackle for > 4.3 (item (2.3) in http://gcc.gnu.org/wiki/AutovectBranchOptimizations?). Plus it was mentioned before this does not belong in tree-ifcvt either. See http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00494.html And there is already a bug about this, see PR25553 *** This bug has been marked as a duplicate of 25553 ***