[Bug tree-optimization/88873] missing vectorization for decomposed operations on a vector type

Wed Jan 16 10:51:00 GMT 2019

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-01-16
             Blocks|                            |53947
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  bar is not vectorized because it looks like

  <bb 2> [local count: 1073741824]:
  _1 = BIT_FIELD_REF <c_10(D), 64, 0>;
  _2 = BIT_FIELD_REF <b_11(D), 64, 0>;
  _3 = BIT_FIELD_REF <a_12(D), 64, 0>;
  _4 = fma (_3, _2, _1);
  r_14 = BIT_INSERT_EXPR <r_13(D), _4, 0 (64 bits)>;
  _5 = BIT_FIELD_REF <c_10(D), 64, 64>;
  _6 = BIT_FIELD_REF <b_11(D), 64, 64>;
  _7 = BIT_FIELD_REF <a_12(D), 64, 64>;
  _8 = fma (_7, _6, _5);
  r_15 = BIT_INSERT_EXPR <r_14, _8, 64 (64 bits)>;
  return r_15;

and there are no loads/stores BB vectorization can work with.  There's
an enhancement request for BB vectorization to key off
vector constructors and this one is similar.  Eventually

  r_14 = BIT_INSERT_EXPR <r_13(D), _4, 0 (64 bits)>;
  r_15 = BIT_INSERT_EXPR <r_14, _8, 64 (64 bits)>;

should be combined to

  r_15 = { _4, _8 };

but then dependence on BB SLP of vector CONSTRUCTORs remains.  There's
also still no loads but eventually the BIT_FIELD_REFs are enough here.
Appearantly not:

  v2df r;
v2df bar (v2df a, v2df b, v2df c)
{

  r[0] = fma (a[0], b[0], c[0]);
  r[1] = fma (a[1], b[1], c[1]);
  return r;
}

results in

  <bb 2> [local count: 1073741824]:
  _1 = BIT_FIELD_REF <c_13(D), 64, 0>;
  _2 = BIT_FIELD_REF <b_14(D), 64, 0>;
  _3 = BIT_FIELD_REF <a_15(D), 64, 0>;
  _4 = fma (_3, _2, _1);
  _5 = BIT_FIELD_REF <c_13(D), 64, 64>;
  _6 = BIT_FIELD_REF <b_14(D), 64, 64>;
  _7 = BIT_FIELD_REF <a_15(D), 64, 64>;
  _8 = fma (_7, _6, _5);
  _16 = {_4, _8};
  vect_cst__17 = _16;
  MEM[(vector(2) double *)&r] = vect_cst__17;
  _12 = r;
  return _12;

so we only vectorize the store:

t.c:18:10: missed:   Build SLP failed: not grouped load _3 = BIT_FIELD_REF
<a_15(D), 64, 0>;

but that should be possible to fix as well.

Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations