I've noticed every now and then while working on optimizations that when
code generation is only slightly changed, the size of the stack frame
can vary by more than one would expect. I decided to find out why, and
found the problem in reload.
In reload's main loop, we align the stack frame in every iteration, so
something like the following sequence can happen:
spill a register to a new 4-byte stack slot
align the stack to 16 bytes
spill another 4-byte register
align the stack to 16 bytes again
which ends up wasting huge amounts of stack space.
The following patch fixes it; assign_stack_local now keeps track of
unused areas in the stack frame and tries to use them first. In many
cases on i686, this can reduce stack frames by 16 bytes or sometimes
more. Testing on ARM also showed some improvement.
Bootstrapped and regression tested on i686-linux, also bootstrapped on
ia64-linux (as a !FRAME_GROWS_DOWNWARD target). Ok?