This testcase: #define N 32 int a[N], b[N]; int foo () { for (int i = 0; i < N ; i++) { int m = (a[i] & i) ? 5 : 4; b[i] = a[i] * m; } } does not vectorize at -O3 on x86_64 or other platforms. Following dom1, jump threading partially peels the loop to give: <bb 2>: goto <bb 8>; <bb 3>: # i_11 = PHI <i_9(6)> _5 = a[i_11]; _6 = i_11 & _5; if (_6 != 0) goto <bb 4>; else goto <bb 5>; <bb 4>: <bb 5>: # m_14 = PHI <5(4), 4(3)> <bb 6>: # m_2 = PHI <m_14(5), 4(8)> # _15 = PHI <_5(5), _10(8)> # i_16 = PHI <i_11(5), i_1(8)> _7 = m_2 * _15; b[i_16] = _7; i_9 = i_16 + 1; if (i_9 != 32) goto <bb 3>; else goto <bb 7>; <bb 7>: return; <bb 8>: # i_1 = PHI <0(2)> _10 = a[i_1]; _3 = i_1 & _10; goto <bb 6>; which form cannot be if-converted (tree-if-conv.c): /* If one of the loop header's edge is an exit edge then do not apply if-conversion. */ FOR_EACH_EDGE (e, ei, loop->header->succs) if (loop_exit_edge_p (loop, e)) return false; and even if it were, the PHI nodes at loop entry cannot be handled by the vectorizer.
Discussion here: https://gcc.gnu.org/ml/gcc/2015-04/msg00351.html Suggestion is to use loop-header-copying to rotate the loop to a form that both if-conversion and the vectorizer can handle.
Author: alalaw01 Date: Thu Jul 2 12:47:31 2015 New Revision: 225311 URL: https://gcc.gnu.org/viewcvs?rev=225311&root=gcc&view=rev Log: gcc/: * tree-pass.h (make_pass_ch_vect): New. * passes.def: Add pass_ch_vect just before pass_if_conversion. * tree-ssa-loop-ch.c (ch_base, pass_ch_vect, pass_data_ch_vect, pass_ch::process_loop_p, pass_ch_vect::process_loop_p, make_pass_ch_vect): New. (pass_ch): Extend ch_base. (pass_ch::execute): Move all but loop_optimizer_init/finalize to... (ch_base::copy_headers): ...here. gcc/testsuite/: * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of x,y,z,w. * gcc.dg/vect/vect-ifcvt-11.c: New testcase.