Bug 46008 - Floating point condexpr not vectorized
Summary: Floating point condexpr not vectorized
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 enhancement
Target Milestone: 5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2010-10-13 16:06 UTC by Jakub Jelinek
Modified: 2016-08-27 23:14 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-03-13 00:00:00


Attachments
gcc46-pr46008.patch (477 bytes, patch)
2010-10-13 16:19 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2010-10-13 16:06:59 UTC
void bar (double *);
void foo (double *a, double *b, double *c, double *d)
{
  double e[1024], f;
  int i;

  for (i = 0; i < 1024; i++)
    e[i] = (f = a[i] + b[i] + c[i] * d[i]) < 0.0 ? 3.0 : f;
  bar (e);
}

is not vectorized at -O3 (is at -O3 -ffast-math though).
One problem is in ifcvt, which creates bogus dead stmts with bool type, but even with that fixed the vectorizer gives up.
I'll attach ifcvt fix, is there something that can be done about this on the vectorizer side?
Comment 1 Jakub Jelinek 2010-10-13 16:19:27 UTC
Created attachment 22031 [details]
gcc46-pr46008.patch

The ifcvt fix (untested).  Still the vectorizer gives up, because the floating point comparison is first computed into a _Bool/bool variable which is then (with a single immediate use) immediately used in the following COND_EXPR.
Comment 2 Jakub Jelinek 2010-10-14 19:34:20 UTC
Author: jakub
Date: Thu Oct 14 19:34:16 2010
New Revision: 165476

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=165476
Log:
	PR tree-optimization/46008
	* tree-if-conv.c (predicate_bbs): Try to canonicalize c2
	if possible.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-if-conv.c
Comment 3 Andrew Pinski 2012-03-13 23:16:29 UTC
I think there was some talk about changing COND_EXPR not to have a comparison as the first operand.
Comment 4 Andrew Pinski 2016-08-27 23:14:56 UTC
Fixed:
.L2:
        ldr     q0, [x1, x4]
        ldr     q3, [x5, x4]
        ldr     q1, [x2, x4]
        ldr     q2, [x3, x4]
        fadd    v0.2d, v0.2d, v3.2d
        fmla    v0.2d, v2.2d, v1.2d
        fcmlt   v1.2d, v0.2d, 0
        bit     v0.16b, v4.16b, v1.16b
        str     q0, [x0, x4]
        add     x4, x4, 16
        cmp     x4, 8192
        bne     .L2

5.4.0 produces something slightly worse but still vectorizered:
.L2:
        ldr     q0, [x1, x4]
        ldr     q3, [x5, x4]
        ldr     q1, [x2, x4]
        ldr     q2, [x3, x4]
        fadd    v0.2d, v0.2d, v3.2d
        fmla    v0.2d, v2.2d, v1.2d
        fcmlt   v1.2d, v0.2d, 0
        bit     v0.16b, v4.16b, v1.16b
        str     q0, [x4, x0]
        add     x4, x4, 16
        cmp     x4, 8192
        bne     .L2