[Bug middle-end/47735] [4.7/4.8/4.9 Regression] Unnecessary adjustments to stack pointer
jakub at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Jan 2 09:32:00 GMT 2014
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47735
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I wonder what is the point of even looking at the alignment of VAR_DECLs that
are SSA_NAME_VAR of SSA_NAMEs if we're not putting those into stack.
So perhaps something like:
--- gcc/cfgexpand.c.jj 2013-12-16 09:08:17.000000000 +0100
+++ gcc/cfgexpand.c 2014-01-02 10:04:39.525480578 +0100
@@ -1215,8 +1215,11 @@ expand_one_var (tree var, bool toplevel,
we conservatively assume it will be on stack even if VAR is
eventually put into register after RA pass. For non-automatic
variables, which won't be on stack, we collect alignment of
- type and ignore user specified alignment. */
- if (TREE_STATIC (var) || DECL_EXTERNAL (var))
+ type and ignore user specified alignment. Similarly for
+ SSA_NAMEs for which use_register_for_decl returns true. */
+ if (TREE_STATIC (var)
+ || DECL_EXTERNAL (var)
+ || (TREE_CODE (origvar) == SSA_NAME && use_register_for_decl (var)))
align = MINIMUM_ALIGNMENT (TREE_TYPE (var),
TYPE_MODE (TREE_TYPE (var)),
TYPE_ALIGN (TREE_TYPE (var)));
That said, I really wonder if we shouldn't besides estimated stack alignment
track also what we really need, i.e. record stack alignment requirements
without any pessimistic assumptions, only bump it when we actually allocate
something on the stack that needs bigger alignment (when we create MEM
DECL_RTL, when say assign_stack_temp* creates stack slot that needs bigger
alignment, when RA spills something that needs bigger alignment etc.). RA etc.
would work as is, but ix86_finalize_stack_realign_flags would look at the
actual value instead.
Consider say:
typedef double m256 __attribute__((vector_size (32)));
m256 bar (m256 x, m256 y);
m256 foo (m256 x, m256 y, m256 z)
{
return bar (x + z, y - z) + (m256) { 1.0, 2.0, 3.0, 4.0 };
}
vaddpd %ymm2, %ymm0, %ymm0
pushq %rbp
vsubpd %ymm2, %ymm1, %ymm1
movq %rsp, %rbp
andq $-32, %rsp
call bar
vaddpd .LC0(%rip), %ymm0, %ymm0
leave
ret
pushq %rbp; movq %rsp, %rbp; andq $-32, %rsp; leave all seem to be completely
unnecessary to me (well, some push/pop or rsp -=4/+=4 would be needed to
maintain 128-bit stack alignment), bar doesn't take any argument on the stack,
there is no V4DFmode spilling, etc. For leaf functions
ix86_finalize_stack_realign_flags already manages to avoid that if the stack
pointer is never touched and frame pointer isn't needed.
I guess by adding another integer to x_rtl and tracking this carefully we could
get rid of the dynamic stack realignment here, still likely
frame_pointer_needed would be set. Wonder if we couldn't optimize that away
(unless user requested frame pointer) too in some cases if frame pointer
register is unused or only used to look at arguments before stack is first
decremented.
More information about the Gcc-bugs
mailing list