Whole failure list. g++: g++.target/i386/pr100637-1b.C -std=gnu++14 scan-assembler-times pcmpeqb 2 g++: g++.target/i386/pr100637-1b.C -std=gnu++17 scan-assembler-times pcmpeqb 2 g++: g++.target/i386/pr100637-1b.C -std=gnu++20 scan-assembler-times pcmpeqb 2 g++: g++.target/i386/pr100637-1b.C -std=gnu++98 scan-assembler-times pcmpeqb 2 g++: g++.target/i386/pr100637-1w.C -std=gnu++14 scan-assembler-times pcmpeqw 2 g++: g++.target/i386/pr100637-1w.C -std=gnu++17 scan-assembler-times pcmpeqw 2 g++: g++.target/i386/pr100637-1w.C -std=gnu++20 scan-assembler-times pcmpeqw 2 g++: g++.target/i386/pr100637-1w.C -std=gnu++98 scan-assembler-times pcmpeqw 2 g++: g++.target/i386/pr103861-1.C -std=gnu++14 scan-assembler-times pcmpeqb 2 g++: g++.target/i386/pr103861-1.C -std=gnu++17 scan-assembler-times pcmpeqb 2 g++: g++.target/i386/pr103861-1.C -std=gnu++20 scan-assembler-times pcmpeqb 2 g++: g++.target/i386/pr103861-1.C -std=gnu++98 scan-assembler-times pcmpeqb 2 gcc: gcc.target/i386/pr88540.c scan-assembler minpd There're extra 1 pcmpeq instruction generated in below 3 testcase for comparison of GTU, x86 doesn't support native GTU comparison, but use psubusw + pcmpeq + pcmpeq, the second pcmpeq is used to negate the mask, and the negate can be eliminated in vcond{,u,eq} expander by just swapping if_true and if_else. g++: g++.target/i386/pr100637-1b.C g++.target/i386/pr100637-1w.C g++: g++.target/i386/pr103861-1.C This one maybe a little bit difficult, it's x86 specific floating point min/max{ps,pd} which is an exact match of a > b ? a : b, and not ieee-conformant. gcc: gcc.target/i386/pr88540.c scan-assembler minpd
For the a > b ? a : b and a < b ? a : b (or is that a <= b ? a : b for min{ps,pd}?) I wonder if we want new optabs like cond_fmax (that name is already taken though - maybe fmax_gt and fmin_le?). I also wonder if there are other archs with similar instructions. The unsigned compare looks like a general trick we could use in vectorizer pattern recognition or alternatively in vectorizable_comparison which needs adjustments anyway in case vec_cmp expanders start to reject some compare operators. Of course the testcase use generic vectors so the same applies to vector lowering (code sharing between vectorization and lowering as far as "tricks" go would be nice though this one looks difficult to generalize).