[Bug target/82862] [8 Regression] SPEC CPU2006 465.tonto performance regression with r253975 (up to 40% drop for particular loop)

Wed Nov 22 15:41:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82862

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I don't have any good ideas here.  Fortran with allocated arrays tends to use
quite some integer registers for all the IV setup and computation.

One can experiment with less peeling of vector epilogues (--param
max-completely-peel-times=1) as well as maybe adding another code sinking pass.
 In the end
it's intelligent remat of expressions (during RA) that needs to be done as I
fully expect not having enough integer registers to compute and keep live
everything.

There seems to be missed invariant motion on the GIMPLE side and also
stack allocation in an inner loop which we might be able to hoist.  Maybe
that (__builtin_stack_save/restore) confuses RA.

Those builtins confuse LIM at least (a present memcpy does as well, and
we expand that to a libcall).  -fno-tree-loop-distribute-patterns helps for
that.

But even then we still spill a lot.  Thus, try
-fno-tree-loop-distribute-patterns plus

Index: gcc/tree-ssa-loop-im.c
===================================================================
--- gcc/tree-ssa-loop-im.c      (revision 255051)
+++ gcc/tree-ssa-loop-im.c      (working copy)
@@ -1432,7 +1432,10 @@ gather_mem_refs_stmt (struct loop *loop,
   bool is_stored;
   unsigned id;

-  if (!gimple_vuse (stmt))
+  if (!gimple_vuse (stmt)
+      || gimple_call_builtin_p (stmt, BUILT_IN_STACK_SAVE)
+      || gimple_call_builtin_p (stmt, BUILT_IN_STACK_RESTORE)
+      || gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN))
     return;

   mem = simple_mem_ref_in_stmt (stmt, &is_stored);