void bar (double *); void foo (double *a, double *b, double *c, double *d) { double e[1024], f; int i; for (i = 0; i < 1024; i++) e[i] = (f = a[i] + b[i] + c[i] * d[i]) < 0.0 ? 3.0 : f; bar (e); } is not vectorized at -O3 (is at -O3 -ffast-math though). One problem is in ifcvt, which creates bogus dead stmts with bool type, but even with that fixed the vectorizer gives up. I'll attach ifcvt fix, is there something that can be done about this on the vectorizer side?
Created attachment 22031 [details] gcc46-pr46008.patch The ifcvt fix (untested). Still the vectorizer gives up, because the floating point comparison is first computed into a _Bool/bool variable which is then (with a single immediate use) immediately used in the following COND_EXPR.
Author: jakub Date: Thu Oct 14 19:34:16 2010 New Revision: 165476 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=165476 Log: PR tree-optimization/46008 * tree-if-conv.c (predicate_bbs): Try to canonicalize c2 if possible. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-if-conv.c
I think there was some talk about changing COND_EXPR not to have a comparison as the first operand.
Fixed: .L2: ldr q0, [x1, x4] ldr q3, [x5, x4] ldr q1, [x2, x4] ldr q2, [x3, x4] fadd v0.2d, v0.2d, v3.2d fmla v0.2d, v2.2d, v1.2d fcmlt v1.2d, v0.2d, 0 bit v0.16b, v4.16b, v1.16b str q0, [x0, x4] add x4, x4, 16 cmp x4, 8192 bne .L2 5.4.0 produces something slightly worse but still vectorizered: .L2: ldr q0, [x1, x4] ldr q3, [x5, x4] ldr q1, [x2, x4] ldr q2, [x3, x4] fadd v0.2d, v0.2d, v3.2d fmla v0.2d, v2.2d, v1.2d fcmlt v1.2d, v0.2d, 0 bit v0.16b, v4.16b, v1.16b str q0, [x4, x0] add x4, x4, 16 cmp x4, 8192 bne .L2