This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/71992] Missed BB SLP vectorization in GCC
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 25 Jul 2016 12:08:48 +0000
- Subject: [Bug tree-optimization/71992] Missed BB SLP vectorization in GCC
- Auto-submitted: auto-generated
- References: <bug-71992-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71992
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2016-07-25
Version|tree-ssa |7.0
Blocks| |53947
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. I think doing it as
[a, b, b, b] * [a, b, 3., 3.] + [3., c, a, a]
would be "optimal" (not factoring in vector construction cost of course).
The issue is how SLP construction works and the number of swaps / builds
from scalars do.
One issue is that we even try with a group-size of 5. Fixing that
doesn't fix it though as we do not consider building a vector from scalars
until we tried to swap the parent op (and if that fails we don't go back
building children from scalars). Only trying with a group size of 4
would also regress the case where we'd have split after the first element.
That said, the whole SLP discovery needs a different algorithmic approach
to fix cases like this.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations