Bug 50328 - reduction with constant or invariant not vectorized
Summary: reduction with constant or invariant not vectorized
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2011-09-08 13:27 UTC by Richard Biener
Modified: 2011-09-09 12:36 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-09-08 00:00:00


Attachments
preliminary patch (800 bytes, patch)
2011-09-08 13:55 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2011-09-08 13:27:25 UTC
For

double dvec[256];

void test (void)
{
  long i, j;
  for (j = 0; j < 131072; ++j)
    for (i = 0; i < 256; ++i)
      dvec[i] *= 1.0000001;
}

the loops are interchanged with -Ofast -floop-interchange but the vectorizer
is confused by the extra IV lim inserts:

<bb 4>:
  # graphite_IV.6_21 = PHI <0(3), graphite_IV.6_22(5)>
  # dvec_I_lsm.7_26 = PHI <dvec_I_lsm.7_10(3), D.2732_25(5)>
  # ivtmp.9_19 = PHI <131072(3), ivtmp.9_29(5)>
  D.2732_25 = dvec_I_lsm.7_26 * 1.0000001000000000583867176828789524734020233154296875e+0;
  graphite_IV.6_22 = graphite_IV.6_21 + 1;
  ivtmp.9_29 = ivtmp.9_19 - 1;
  if (ivtmp.9_29 != 0)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
  goto <bb 4>;

this isn't detected as reduction for some reason.
Comment 1 Richard Biener 2011-09-08 13:30:10 UTC
Doesn't seem to handle reduction with one operand being a constant.
Comment 2 Richard Biener 2011-09-08 13:36:23 UTC
With the following untested patch we apply outer loop vectorization.

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c        (revision 178687)
+++ gcc/tree-vect-loop.c        (working copy)
@@ -2149,7 +2149,7 @@ vect_is_simple_reduction_1 (loop_vec_inf
       op1 = gimple_assign_rhs1 (def_stmt);
       op2 = gimple_assign_rhs2 (def_stmt);
 
-      if (TREE_CODE (op1) != SSA_NAME || TREE_CODE (op2) != SSA_NAME)
+      if (TREE_CODE (op1) != SSA_NAME && TREE_CODE (op2) != SSA_NAME)
         {
           if (vect_print_dump_info (REPORT_DETAILS))
            report_vect_op (def_stmt, "reduction: uses not ssa_names: ");
@@ -2255,7 +2255,7 @@ vect_is_simple_reduction_1 (loop_vec_inf
     def2 = SSA_NAME_DEF_STMT (op2);
 
   if (code != COND_EXPR
-      && (!def1 || !def2 || gimple_nop_p (def1) || gimple_nop_p (def2)))
+      && ((!def1 && !def2) || (gimple_nop_p (def1) && gimple_nop_p (def2))))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
        report_vect_op (def_stmt, "reduction: no defs for operands: ");
@@ -2268,6 +2268,7 @@ vect_is_simple_reduction_1 (loop_vec_inf
 
   if (def2 && def2 == phi
       && (code == COND_EXPR
+         || !def1
           || (def1 && flow_bb_inside_loop_p (loop, gimple_bb (def1))
               && (is_gimple_assign (def1)
                  || is_gimple_call (def1)
@@ -2285,6 +2286,7 @@ vect_is_simple_reduction_1 (loop_vec_inf
 
   if (def1 && def1 == phi
       && (code == COND_EXPR
+         || !def2
           || (def2 && flow_bb_inside_loop_p (loop, gimple_bb (def2))
              && (is_gimple_assign (def2)
                  || is_gimple_call (def2)
Comment 3 Richard Biener 2011-09-08 13:40:30 UTC
Triggered by report http://gcc.gnu.org/ml/gcc/2011-09/msg00052.html
OpenCC then unrolls the outer loop to get

.LBB16_double_array_mults_by_const:
 #<loop> Loop body line 62, nesting depth: 2, iterations: 16384
 #<loop> unrolled 4 times
        mulpd %xmm6,%xmm0               # [0]
        movaps %xmm0,%xmm1              # [4]
        mulpd %xmm6,%xmm1               # [6]
        mulpd %xmm6,%xmm1               # [10]
        addq $8,%rax                    # [14]
        mulpd %xmm6,%xmm1               # [14]
        cmpq $131071,%rax               # [15]
        setle %dil                      # [16]
        testb %dil,%dil                 # [17]
        movaps %xmm1,%xmm0              # [18]
        jne .LBB16_double_array_mults_by_const  # [18]

instead of what we get with the patch

.L2:
        subl    $1, %eax
        mulpd   %xmm1, %xmm0
        jne     .L2

we don't have outer loop unrolling either.
Comment 4 Richard Biener 2011-09-08 13:55:54 UTC
Created attachment 25228 [details]
preliminary patch

Patch that still ICEs its testcase.  Works for testcases that first require
loop interchange though.
Comment 5 Richard Biener 2011-09-08 14:06:58 UTC
Hmm, we fail to set the vectype for the PHI - I'm somewhat lost in the
reduction code.
Comment 6 Richard Biener 2011-09-09 09:05:39 UTC
Ah, we fail outer loop vectorization because of a bug (versioning for alias
required) and then drop into vectorizable_reduction with a
statement with a operand def that is a PHI with vect_unused_in_scope which of course does not have its vector type set.  It's also not marked live
for some reason.  Doh, seems to be a genuine swap_tree_operands bug.
Comment 7 Richard Biener 2011-09-09 12:35:16 UTC
Author: rguenth
Date: Fri Sep  9 12:35:11 2011
New Revision: 178728

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178728
Log:
2011-09-09  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/50328
	* tree-vect-loop.c (vect_is_simple_reduction_1): Allow one
	constant or default-def operand.

	* gcc.dg/vect/fast-math-vect-outer-7.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-outer-7.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-loop.c
Comment 8 Richard Biener 2011-09-09 12:36:29 UTC
Fixed for 4.7.