[Bug tree-optimization/96053] Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jul 6 07:13:41 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96053
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2020-07-06
Blocks| |53947
Status|UNCONFIRMED |NEW
CC| |avieira at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org
Ever confirmed|0 |1
Keywords| |missed-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
In the end it is indeed a costing issue (also finding SLP sequences from
reductions is quite ad-hoc - either all reductions form a SLP sequence or
none). There's epilogue cost which for SLP reductions is usually cheaper
than from reduction chains and then there's cost of the participating loads
and required permutations which depends very much on the actual case ...
For the immediate benefit I think giving more control to the user sometimes
makes sense and if then I'd go a route like
#pragma GCC vect [no-]reduc-chain
and document those as hints.
But as you say, basing the decision on costing would be way better.
Note ILP for the reduction chain is probably higher since both reductions
can execute in parallel, so for the simple testcase I'd expect the reduction
chain variant to be faster.
Note for some reason your testcase vectorizes as a SLP reduction and not
as reduction chains for me on x86_64, association seems off vectorizers
expectation.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
More information about the Gcc-bugs
mailing list