[Bug tree-optimization/96053] Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains

Mon Jul 6 07:13:41 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96053

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-07-06
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
                 CC|                            |avieira at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
In the end it is indeed a costing issue (also finding SLP sequences from
reductions is quite ad-hoc - either all reductions form a SLP sequence or
none).  There's epilogue cost which for SLP reductions is usually cheaper
than from reduction chains and then there's cost of the participating loads
and required permutations which depends very much on the actual case ...

For the immediate benefit I think giving more control to the user sometimes
makes sense and if then I'd go a route like

#pragma GCC vect [no-]reduc-chain

and document those as hints.

But as you say, basing the decision on costing would be way better.

Note ILP for the reduction chain is probably higher since both reductions
can execute in parallel, so for the simple testcase I'd expect the reduction
chain variant to be faster.

Note for some reason your testcase vectorizes as a SLP reduction and not
as reduction chains for me on x86_64, association seems off vectorizers
expectation.

Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations