If we compile bb-slp-pattern-1.c from gcc.dg/vect suite with -mavx pattern vectorization won't happen since AVX has very poor support for 256-bit integer arithmetic. Particularly, widen-mult pattern is recognized but it is not supported for 256-bit vectors. Test is failed for native compiler build on AVX machine. The most simple decision is to use the same scheme as for loop vectorization by decreasing vector size from 256-bit to 128-bit.
Hmm, vectorization _does_ happen - it just happens in an awkward way (we just vectorize the store). We vectorize all of it with -mprefer-avx128. Note that the vectorizer thinks vectorizing it in the awkward way is profitable: 1: note: Cost model analysis: Vector inside of basic block cost: 1 Vector prologue cost: 5 Vector epilogue cost: 0 Scalar cost of basic block: 8 if it weren't it would try vectorizing with smaller vector size. I think it under-estimates vector construction cost here (prologue cost). From i386.c: case vec_construct: elements = TYPE_VECTOR_SUBPARTS (vectype); return ix86_cost->vec_stmt_cost * (elements / 2 + 1); But in the assembler I see 8 vector instructions plus the store. vec_construct is supposed to handle the case of building up a vector from element registers. Note the same is used for simple splats... detailed analysis is possible in the ix86_add_stmt_cost hook - but it might be "somewhat" awkward to extract enough info from the stmt_info the vectorizer passes down... (which stmt_info is passed down might also be somewhat random, not sure). Note the cost model is disabled in the vect.exp testsuite.
Richard, The problem is in pattern matching: /* Pattern detected. */ if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vect_recog_widen_mult_pattern: detected:\n"); /* Check target support */ vectype = get_vectype_for_scalar_type (half_type0); vecitype = get_vectype_for_scalar_type (itype); if (!vectype || !vecitype || !supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt, vecitype, vectype, &dummy_code, &dummy_code, &dummy_int, &dummy_vec)) return NULL; We found paatern but it does not supported for 256-bit vectype and need to try for 128-bit.