[Bug middle-end/47735] [4.7/4.8/4.9 Regression] Unnecessary adjustments to stack pointer

jakub at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Jan 2 09:32:00 GMT 2014


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47735

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I wonder what is the point of even looking at the alignment of VAR_DECLs that
are SSA_NAME_VAR of SSA_NAMEs if we're not putting those into stack.

So perhaps something like:
--- gcc/cfgexpand.c.jj    2013-12-16 09:08:17.000000000 +0100
+++ gcc/cfgexpand.c    2014-01-02 10:04:39.525480578 +0100
@@ -1215,8 +1215,11 @@ expand_one_var (tree var, bool toplevel,
      we conservatively assume it will be on stack even if VAR is
      eventually put into register after RA pass.  For non-automatic
      variables, which won't be on stack, we collect alignment of
-     type and ignore user specified alignment.  */
-      if (TREE_STATIC (var) || DECL_EXTERNAL (var))
+     type and ignore user specified alignment.  Similarly for
+     SSA_NAMEs for which use_register_for_decl returns true.  */
+      if (TREE_STATIC (var)
+      || DECL_EXTERNAL (var)
+      || (TREE_CODE (origvar) == SSA_NAME && use_register_for_decl (var)))
     align = MINIMUM_ALIGNMENT (TREE_TYPE (var),
                    TYPE_MODE (TREE_TYPE (var)),
                    TYPE_ALIGN (TREE_TYPE (var)));

That said, I really wonder if we shouldn't besides estimated stack alignment
track also what we really need, i.e. record stack alignment requirements
without any pessimistic assumptions, only bump it when we actually allocate
something on the stack that needs bigger alignment (when we create MEM
DECL_RTL, when say assign_stack_temp* creates stack slot that needs bigger
alignment, when RA spills something that needs bigger alignment etc.).  RA etc.
would work as is, but ix86_finalize_stack_realign_flags would look at the
actual value instead.

Consider say:
typedef double m256 __attribute__((vector_size (32)));
m256 bar (m256 x, m256 y);
m256 foo (m256 x, m256 y, m256 z)
{
  return bar (x + z, y - z) + (m256) { 1.0, 2.0, 3.0, 4.0 };
}

        vaddpd  %ymm2, %ymm0, %ymm0
        pushq   %rbp
        vsubpd  %ymm2, %ymm1, %ymm1
        movq    %rsp, %rbp
        andq    $-32, %rsp
        call    bar
        vaddpd  .LC0(%rip), %ymm0, %ymm0
        leave
        ret

pushq %rbp; movq %rsp, %rbp; andq $-32, %rsp; leave all seem to be completely
unnecessary to me (well, some push/pop or rsp -=4/+=4 would be needed to
maintain 128-bit stack alignment), bar doesn't take any argument on the stack,
there is no V4DFmode spilling, etc.  For leaf functions
ix86_finalize_stack_realign_flags already manages to avoid that if the stack
pointer is never touched and frame pointer isn't needed.
I guess by adding another integer to x_rtl and tracking this carefully we could
get rid of the dynamic stack realignment here, still likely
frame_pointer_needed would be set.  Wonder if we couldn't optimize that away
(unless user requested frame pointer) too in some cases if frame pointer
register is unused or only used to look at arguments before stack is first
decremented.



More information about the Gcc-bugs mailing list