For double dvec[256]; void test (void) { long i, j; for (j = 0; j < 131072; ++j) for (i = 0; i < 256; ++i) dvec[i] *= 1.0000001; } the loops are interchanged with -Ofast -floop-interchange but the vectorizer is confused by the extra IV lim inserts: <bb 4>: # graphite_IV.6_21 = PHI <0(3), graphite_IV.6_22(5)> # dvec_I_lsm.7_26 = PHI <dvec_I_lsm.7_10(3), D.2732_25(5)> # ivtmp.9_19 = PHI <131072(3), ivtmp.9_29(5)> D.2732_25 = dvec_I_lsm.7_26 * 1.0000001000000000583867176828789524734020233154296875e+0; graphite_IV.6_22 = graphite_IV.6_21 + 1; ivtmp.9_29 = ivtmp.9_19 - 1; if (ivtmp.9_29 != 0) goto <bb 5>; else goto <bb 6>; <bb 5>: goto <bb 4>; this isn't detected as reduction for some reason.
Doesn't seem to handle reduction with one operand being a constant.
With the following untested patch we apply outer loop vectorization. Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 178687) +++ gcc/tree-vect-loop.c (working copy) @@ -2149,7 +2149,7 @@ vect_is_simple_reduction_1 (loop_vec_inf op1 = gimple_assign_rhs1 (def_stmt); op2 = gimple_assign_rhs2 (def_stmt); - if (TREE_CODE (op1) != SSA_NAME || TREE_CODE (op2) != SSA_NAME) + if (TREE_CODE (op1) != SSA_NAME && TREE_CODE (op2) != SSA_NAME) { if (vect_print_dump_info (REPORT_DETAILS)) report_vect_op (def_stmt, "reduction: uses not ssa_names: "); @@ -2255,7 +2255,7 @@ vect_is_simple_reduction_1 (loop_vec_inf def2 = SSA_NAME_DEF_STMT (op2); if (code != COND_EXPR - && (!def1 || !def2 || gimple_nop_p (def1) || gimple_nop_p (def2))) + && ((!def1 && !def2) || (gimple_nop_p (def1) && gimple_nop_p (def2)))) { if (vect_print_dump_info (REPORT_DETAILS)) report_vect_op (def_stmt, "reduction: no defs for operands: "); @@ -2268,6 +2268,7 @@ vect_is_simple_reduction_1 (loop_vec_inf if (def2 && def2 == phi && (code == COND_EXPR + || !def1 || (def1 && flow_bb_inside_loop_p (loop, gimple_bb (def1)) && (is_gimple_assign (def1) || is_gimple_call (def1) @@ -2285,6 +2286,7 @@ vect_is_simple_reduction_1 (loop_vec_inf if (def1 && def1 == phi && (code == COND_EXPR + || !def2 || (def2 && flow_bb_inside_loop_p (loop, gimple_bb (def2)) && (is_gimple_assign (def2) || is_gimple_call (def2)
Triggered by report http://gcc.gnu.org/ml/gcc/2011-09/msg00052.html OpenCC then unrolls the outer loop to get .LBB16_double_array_mults_by_const: #<loop> Loop body line 62, nesting depth: 2, iterations: 16384 #<loop> unrolled 4 times mulpd %xmm6,%xmm0 # [0] movaps %xmm0,%xmm1 # [4] mulpd %xmm6,%xmm1 # [6] mulpd %xmm6,%xmm1 # [10] addq $8,%rax # [14] mulpd %xmm6,%xmm1 # [14] cmpq $131071,%rax # [15] setle %dil # [16] testb %dil,%dil # [17] movaps %xmm1,%xmm0 # [18] jne .LBB16_double_array_mults_by_const # [18] instead of what we get with the patch .L2: subl $1, %eax mulpd %xmm1, %xmm0 jne .L2 we don't have outer loop unrolling either.
Created attachment 25228 [details] preliminary patch Patch that still ICEs its testcase. Works for testcases that first require loop interchange though.
Hmm, we fail to set the vectype for the PHI - I'm somewhat lost in the reduction code.
Ah, we fail outer loop vectorization because of a bug (versioning for alias required) and then drop into vectorizable_reduction with a statement with a operand def that is a PHI with vect_unused_in_scope which of course does not have its vector type set. It's also not marked live for some reason. Doh, seems to be a genuine swap_tree_operands bug.
Author: rguenth Date: Fri Sep 9 12:35:11 2011 New Revision: 178728 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178728 Log: 2011-09-09 Richard Guenther <rguenther@suse.de> PR tree-optimization/50328 * tree-vect-loop.c (vect_is_simple_reduction_1): Allow one constant or default-def operand. * gcc.dg/vect/fast-math-vect-outer-7.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-outer-7.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-loop.c
Fixed for 4.7.