This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/77902] New: Auto-vectorizes epilogue loops or manually vectorized functions
- From: "linux at carewolf dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 08 Oct 2016 11:51:19 +0000
- Subject: [Bug tree-optimization/77902] New: Auto-vectorizes epilogue loops or manually vectorized functions
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902
Bug ID: 77902
Summary: Auto-vectorizes epilogue loops or manually vectorized
functions
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 39774
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39774&action=edit
Example that trigger the pointless auto-vectorization
A common pattern when manually vectorizing an inner function is to have a small
epilogue that handles the remainder of the input vector that cannot be handled
by the vectorized stepping.
For instance:
int i = 0;
for (; i < (count - 3); i +=4)
// do 4 at a time
for (; i < count; ++i)
// do 1 at a time
When compiled with -O3 or -ftree-loop-vectorize that last epilogue may be
auto-vectorized by GCC even though it can at most be run 3 times, and the
auto-vectorized code-path will never be called.
Rewriting it as
int i = 0;
for (; i < (count - 3); i +=4)
// do 4 at a time
for (int _i; _i < 3 && i < count; ++_i, ++i)
// do 1 at a time
Fixes the issue.
I am guessing GCC would do well to learn a range from the main-loop so that it
can figure out on its own that the epilogue can not be run more than 3 times.