[Bug tree-optimization/88760] GCC unrolling is suboptimal

Fri Oct 11 10:29:00 GMT 2019

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #30 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 11 Oct 2019, wilco at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
> 
> --- Comment #29 from Wilco <wilco at gcc dot gnu.org> ---
> (In reply to Jiu Fu Guo from comment #28)
> > For these kind of small loops, it would be acceptable to unroll in GIMPLE,
> > because register pressure and instruction cost may not be major concerns;
> > just  like "cunroll" and "cunrolli" passes (complete unroll) which also been
> > done at O2.
> 
> Absolutely, unrolling is a high-level optimization like vectorization.

To expose ILP?  I'd call that low-level though ;)

If it exposes data reuse then I'd call it high-level - and at that level
we already have passes like predictive commoning or unroll-and-jam doing
exactly that.  Or vectorization.

We've shown though data that unrolling without a good idea on CPU
pipeline details is a loss on x86_64.  This further hints at it
being low-level.