[Bug tree-optimization/63677] Failure to constant fold with vectorization.

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Wed Oct 29 18:21:00 GMT 2014


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-10-29
                 CC|                            |jakub at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The problem is that the loop is first vectorized, then several passes later slp
vectorizes the initialization, so after some cleanups we have e.g. in cddce2:
  MEM[(int *)&a] = { 0, 1, 2, 3 };
  MEM[(int *)&a + 16B] = { 4, 5, 6, 7 };
  vect__13.6_20 = MEM[(int *)&a];
  vect__13.6_17 = MEM[(int *)&a + 16B];
But there is no further FRE pass that would optimize the loads into
  vect__13.6_20 = { 0, 1, 2, 3 };
  vect__13.6_17 = { 4, 5, 6, 7 };
(supposedly that would need to be done before forwprop4 that could in theory
refold all the stmts into constant).

Richard, how expensive would be to schedule another FRE pass if anything has
been vectorized in the current function (either vect pass, or slp)?  Or are
there other passes that handle this?  Looking at e.g.
typedef int V __attribute__((vector_size (4 * sizeof (int))));
struct S { int a[4]; };
V __attribute__ ((noinline)) foo (struct S *p)
{
  *(V *) p = (V) { 1, 2, 3, 4 };
  return *(V *) p;
}
with -O2 -fno-tree-fre, it seems DOM is able to do that, but unfortunately at
dom2 time the values have not been sufficiently forward propagated for dom2 to
optimize this.



More information about the Gcc-bugs mailing list