[Bug tree-optimization/98339] GCC could not vectorize loop with conditional reduced add and store
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jan 4 15:57:51 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98339
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Target| |x86_64-*-*
Ever confirmed|0 |1
Blocks| |53947
Last reconfirmed| |2021-01-04
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that we need to vectorize this as reduction and since there's no
"masked scalar store" on GIMPLE LIM itself doesn't help. The issue why
LIM doesn't apply store-motion here is the _load_ which can trap. LIM would
like to do
ret0 = ret[0];
bool stored = false;
for (int i = 0; i < n; i++)
{
int pos = start + i;
if ( pos <= m)
{
ret0 += x[i];
stored = true;
}
}
if (stored)
ret[0] = ret0;
but as you can see the unconditional load breaks this. LIM would need to
be changed to handle the whole load-update-store sequence delaying the
load as well (thereby re-associating the reduction).
An alternative would be to split the loop and apply store-motion to the tail.
for (int i = 0; i < n; i++)
{
int pos = start + i;
if ( pos <= m)
break;
}
if (i < n)
{
ret0 = ret[0];
for (int i = 0; i < n; i++)
{
int pos = start + i;
if ( pos <= m)
ret0 += x[i];
}
ret[0] = ret0;
}
we can then vectorize the second loop.
At the source level the fix is to make sure the load from ret[0] doesn't trap.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
More information about the Gcc-bugs
mailing list