[Bug tree-optimization/92819] [10 Regression] Worse code generated on avx2 due to simplify_vector_constructor

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Dec 5 11:46:00 GMT 2019


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92819

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Richard - when we have

_7 = { _2, _2 };
VEC_PERM <x_3(D), _7, { 0, 3 }>

then we somehow run into

        /* See if the permutation is performing a single element
           insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
           in that case.  But only if the vector mode is supported,
           otherwise this is invalid GIMPLE.  */
        if (TYPE_MODE (type) != BLKmode
            && (TREE_CODE (cop0) == VECTOR_CST
                || TREE_CODE (cop0) == CONSTRUCTOR
                || TREE_CODE (cop1) == VECTOR_CST
                || TREE_CODE (cop1) == CONSTRUCTOR))
          {
            if (sel.series_p (1, 1, nelts + 1, 1))
              {
                /* After canonicalizing the first elt to come from the
                   first vector we only can insert the first elt from
                   the first vector.  */
                at = 0;
                if ((ins = fold_read_from_vector (cop0, sel[0])))
                  op0 = op1;

but of course cop0 isn't something we can simplify.  So - why can
we only insert the first elt from the first vector?  Is series_p
falsely triggering because both vectors are "series_p"?  That said,
we can insert both ways but only one way succeeds in the end.

The code also looks like it would fail for V1m vectors since then
the base element number is out of range.  That is, does the above
even make sense for nelts <= 2?

This is

v2df
qux (v2df x, double *p)
{
  return (v2df) { x[0], *p };
}

it works for

v2df
qux (v2df x, double *p)
{
  return (v2df) { *p, x[1] };
}

when guarding the special case with
TREE_CODE (cop0) == VECTOR_CST || TREE_CODE (cop0) == CONSTRUCTOR
the later code runs into

bool
vec_perm_indices::series_p (unsigned int out_base, unsigned int out_step,
                            element_type in_base, element_type in_step) const
{
  /* Check the base value.  */
  if (maybe_ne (clamp (m_encoding.elt (out_base)), clamp (in_base)))
    return false;

and thus doesn't handle insertion at the very last element?  I can
"fix" that by doing

@@ -6047,9 +6049,11 @@ (define_operator_list COND_TERNARY
                for (at = 0; at < encoded_nelts; ++at)
                  if (maybe_ne (sel[at], at))
                    break;
-               if (at < encoded_nelts && sel.series_p (at + 1, 1, at + 1, 1))
+               if (at < encoded_nelts
+                   && (known_eq (at + 1, nelts)
+                       || sel.series_p (at + 1, 1, at + 1, 1)))
                  {

maybe the earlier series_p query needs to be adjusted similarly?  Or do you
think that

@@ -6032,7 +6032,9 @@ (define_operator_list COND_TERNARY
                || TREE_CODE (cop1) == VECTOR_CST
                || TREE_CODE (cop1) == CONSTRUCTOR))
           {
-           if (sel.series_p (1, 1, nelts + 1, 1))
+           if (sel.series_p (1, 1, nelts + 1, 1)
+               && (TREE_CODE (cop0) == VECTOR_CST
+                   || TREE_CODE (cop0) == CONSTRUCTOR))
              {
                /* After canonicalizing the first elt to come from the
                   first vector we only can insert the first elt from

is fine?


More information about the Gcc-bugs mailing list