70482 – Opimization opportunity to vectorize basic block for -mavx target.

Bug 70482 - Opimization opportunity to vectorize basic block for -mavx target.

Summary: Opimization opportunity to vectorize basic block for -mavx target.

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	6.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2016-03-31 16:15 UTC by Yuri Rumyantsev
Modified:	2016-04-01 17:07 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2016-04-01 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Yuri Rumyantsev 2016-03-31 16:15:49 UTC

If we compile bb-slp-pattern-1.c from gcc.dg/vect suite with -mavx pattern vectorization won't happen since AVX has very poor support for 256-bit integer arithmetic. Particularly, widen-mult pattern is recognized but it is not supported for 256-bit vectors.
Test is failed for native compiler build on AVX machine. The most simple decision is to use the same scheme as for loop vectorization by decreasing vector size from 256-bit to 128-bit.

Comment 1 Richard Biener 2016-04-01 08:54:35 UTC

Hmm, vectorization _does_ happen - it just happens in an awkward way
(we just vectorize the store).  We vectorize all of it with -mprefer-avx128.

Note that the vectorizer thinks vectorizing it in the awkward way is profitable:

1: note: Cost model analysis:
  Vector inside of basic block cost: 1
  Vector prologue cost: 5
  Vector epilogue cost: 0
  Scalar cost of basic block: 8

if it weren't it would try vectorizing with smaller vector size.  I think
it under-estimates vector construction cost here (prologue cost).  From i386.c:

      case vec_construct:
        elements = TYPE_VECTOR_SUBPARTS (vectype);
        return ix86_cost->vec_stmt_cost * (elements / 2 + 1);

But in the assembler I see 8 vector instructions plus the store.  vec_construct
is supposed to handle the case of building up a vector from element registers.
Note the same is used for simple splats...  detailed analysis is possible
in the ix86_add_stmt_cost hook - but it might be "somewhat" awkward to
extract enough info from the stmt_info the vectorizer passes down... (which
stmt_info is passed down might also be somewhat random, not sure).

Note the cost model is disabled in the vect.exp testsuite.

Comment 2 Yuri Rumyantsev 2016-04-01 17:07:28 UTC

Richard, 
The problem is in pattern matching:

  /* Pattern detected.  */
  if (dump_enabled_p ())
    dump_printf_loc (MSG_NOTE, vect_location,
                     "vect_recog_widen_mult_pattern: detected:\n");

  /* Check target support  */
  vectype = get_vectype_for_scalar_type (half_type0);
  vecitype = get_vectype_for_scalar_type (itype);
  if (!vectype
      || !vecitype
      || !supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt,
					  vecitype, vectype,
					  &dummy_code, &dummy_code,
					  &dummy_int, &dummy_vec))
    return NULL;
 We found paatern but it does not supported for 256-bit vectype and need to try for 128-bit.