This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Half-baked i386 stack alignment thoughts
- To: gcc at gcc dot gnu dot org, rth at cygnus dot com, tm2 at best dot com, john at feith dot com
- Subject: Re: Half-baked i386 stack alignment thoughts
- From: Jan Hubicka <jh at suse dot cz>
- Date: Fri, 3 Nov 2000 12:38:14 +0100
> >On Thu, Nov 02, 2000 at 07:00:39PM -0500, John Wehle wrote:
> >> What I propose is making -maccumulate-outgoing-args the default
> >> and use push to place the arguments on the stack. I.e.:
> >>
> >> ...
> >> addl $8, %esp
> >> pushl %eax
> >> pushl %eax
> >> call subr
> >> movl %eax, %ebx
Concerning the pushl and modern CPUs issues, I've run few tests and
-maccumulate-outgoing-args seems to be slight win on PPro and slight loss on
Athlon. Even when push requires 2 cycles to execute and update ESP on Athlon,
it seems to be win due to better code density. My current plan is to find
time to verify these results using spec2000 and possible send patch setting the
-maccumulate-outgoing-args dependeing on the target CPU.
This holds even for -mpreferred-stack-boundary=2 mode, where the ovehead
of multiple stack adjustements is missing.
Assuming that we stay with -mno-accumulate-outgoing-args mode,
I would rather concentrate on improving combine_stack_adjustments pass
by making it work over basic blocks (I am even having some code, but it
got suck in the queue, since my combine_stack_adjustments fix is still
waiting for review - I would like to update this patch soon and try to get
it trought first).
I've spent quite a bit time trying to cleanup and understand calls.c and I
tend to believe that making -maccumulate-outgoing-args to work in a way you
suggest would be quite hard, since assumption about constantness of stack
pointer is made at number of places with ACCUMULATE_OUTGOING_ARGS mode.
Overall the idea seems to be good, you need to keep in mind, that
-maccumulate-outgoing-args has some drawbacks too - you will lose the
deffer-pop optimization and ability to do nested calls (that is good to reduce
register pressure).
My guess is that with sane -maccumulate-outgiong-args and few tweaks
to calls.c we would get better code by using current method than by
implementing brand new mode.
Honza