This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

From: "wschmidt at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Fri, 02 Feb 2018 20:53:40 +0000
Subject: [Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678
Auto-submitted: auto-generated
References: <bug-81038-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

--- Comment #8 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
The commentary for r248678 reads in part: "Compute costs for doing no peeling
at all, compare to the best peeling costs so far and avoid peeling if cheaper."
 Indeed, if you look at the vect dump for r248677, you see that the vectorizer
decides to force alignment using peeling, even though the target processor has
efficient unaligned memory access.  Peeling proved to be barely unprofitable:

/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note: Cost model analysis:
  Vector inside of loop cost: 1
  Vector prologue cost: 7
  Vector epilogue cost: 6
  Scalar iteration cost: 1
  Scalar outside cost: 0
  Vector outside cost: 13
  prologue iterations: 2
  epilogue iterations: 2
  Calculated minimum iters for profitability: 17
/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note:   Runtime profitability threshold = 16
/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note:   Static estimate profitability threshold = 16
/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note: not vectorized: vectorization not profitable.

In the vect dump for r248678, the vectorizer isn't overly focused on peeling,
and determines that it can use the efficient unaligned storage accesses.  This
leads to the more reasonable cost calculation:

/home/wschmidt/gcc/gcc-mainline-test/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note: Cost model analysis:
  Vector inside of loop cost: 1
  Vector prologue cost: 1
  Vector epilogue cost: 0
  Scalar iteration cost: 1
  Scalar outside cost: 0
  Vector outside cost: 1
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 2
/home/wschmidt/gcc/gcc-mainline-test/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note:   Runtime profitability threshold = 3
/home/wschmidt/gcc/gcc-mainline-test/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note:   Static estimate profitability threshold = 3
/home/wschmidt/gcc/gcc-mainline-test/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note: loop vectorized

For this processor, we vectorized the code in "vect" rather than in "slp".  For
other processors, the choice could be different because of cost model
differences.  But I think in general we should always vectorize.  In both cases
the "optimized" dump produces:

void mydata::Set(float) (struct mydata * const this, float x)
{
  vector(4) float vect_cst__10;

  <bb 2> [11.11%]:
  vect_cst__10 = {x_5(D), x_5(D), x_5(D), x_5(D)};
  MEM[(float *)this_4(D)] = vect_cst__10;
  MEM[(float *)this_4(D) + 16B] = vect_cst__10;
  return;

}

So I think perhaps it would be better to change the test to examine the
"optimized" dump for one definition and two uses of a vect_cst__*.  The point
of the original complaint in PR56812 was that this test case was not vectorized
(by SLP at the time), but so long as it is vectorized, that should be good
enough for everyone.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]