This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[Committed] PR15422: x87 "double swap" problem in reg-stack
- From: Roger Sayle <roger at eyesopen dot com>
- To: gcc-patches at gcc dot gnu dot org
- Date: Mon, 30 May 2005 12:02:00 -0600 (MDT)
- Subject: [Committed] PR15422: x87 "double swap" problem in reg-stack
Now that reg-stack.c has been reorganized into two passes, the following
small patch is sufficient to fix PR rtl-optimization/15422 which is the
infamous double fxch problem.
The issue is that the algorithms used to assign the initial register
stack permutations of basic blocks, do so without considering the
requirements of the actual instructions in that block. Hence,
we may assign an arbitrary register stack order only to later discover
that the first instruction in the basic block has a strict requirement
for which register is at the top of the stack.
Currently, this means that at the start of a basic block we may need to
insert code to shuffle the assigned permutation into what we actually
want/need. But additionally on incoming edges, we may have to permute
the stack to the assigned layout, even if the stack was "correct" to
begin with. Hence the origin of the "double swap", on an incoming edge
we swap the top two elements on the top of stack to match the layout
assigned at the start of the block, them immediately swap them back
because the following code really preferred them the way they were.
In a concrete example, from PR15422:
double foo(double x, double y) { return fmod(x,y); }
is currently generated as:
foo: fldl 4(%esp)
fldl 12(%esp)
jmp .L2
.p2align 2,,3
.L7: fxch %st(1)
.L2: fxch %st(1)
fprem
fnstsw %ax
testb $4, %ah
jne .L7
fstp %st(1)
ret
You'll notice the back edge of the loop to L7 contains two consecutive
fcxh %st(1) instructions, that inefficiently undo the effects of each
other, but often on the critical-path loop edges. fxch is normally
almost free as it may be combined with most other FP instructions.
One notable exception is another fxch.
The solution implemented below is to consider the initial stack assignment
for the start of a basic block to be "soft". So instead of inserting any
instructions before the first reg-stack insn in a basic block, we simply
revise the initial stack order that should have been assigned. In the
example, above the "fprem" instruction places a strong constrain on where
"x" and "y" should be on the stack. The advantage of back propagating
this to "stack_in", is that is allows the later edge compensation code
to a better job. Currently it generates (near?) optimal code to permute
the stack into the specified order, so it makes sense for this order to
be what we want.
The net effect of this patch is that we now generate:
foo:
fldl 4(%esp)
fldl 12(%esp)
fxch %st(1)
.L2: fprem
fnstsw %ax
testb $4, %ah
jne .L2
fstp %st(1)
ret
where the remaining fxch has been hoisted out of the loop. This still
isn't ideal as the initial basic block would have been better to load
12(%esp) before 4(%esp) rather than just swapping them afterwards, but
that's the topic of a follow-up optimization.
Unfortunately, I was a little disappointed that this change was size
neutral with -Os on CSiBE, but this improvement really requires floating
point intensive code and/or -ffast-math to show a real benefit.
The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages, and regression tested with a
top-level "make -k check" with no new failures.
Committed to mainline CVS.
2005-05-30 Roger Sayle <roger@eyesopen.com>
PR rtl-optimization/15422
* reg-stack.c (starting_stack_p): New static global.
(straighten_stack): Delete prototype. Change to update the stack
before the current insn.
(subst_stack_regs): Update call to straighten stack.
(emit_swap_insn): Delete prototype. For the first insn in a
basic block, update stack_in instead of emitting a real swap.
(change_stack): When changing the stack before the first insn
in a basic block, update stack_in instead of emitting real code.
(compensate_edges): Clear starting_stack_p during compensation.
(convert_regs_1): Keep track of starting_stack_p whilst processing
a basic block.
Index: reg-stack.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/reg-stack.c,v
retrieving revision 1.181
diff -c -3 -p -r1.181 reg-stack.c
*** reg-stack.c 29 May 2005 15:37:44 -0000 1.181
--- reg-stack.c 30 May 2005 14:05:56 -0000
*************** enum emit_where
*** 224,229 ****
--- 224,234 ----
/* The block we're currently working on. */
static basic_block current_block;
+ /* In the current_block, whether we're processing the first register
+ stack or call instruction, i.e. the the regstack is currently the
+ same as BLOCK_INFO(current_block)->stack_in. */
+ static bool starting_stack_p;
+
/* This is the register file for all register after conversion. */
static rtx
FP_mode_reg[LAST_STACK_REG+1-FIRST_STACK_REG][(int) MAX_MACHINE_MODE];
*************** static rtx not_a_num;
*** 237,243 ****
/* Forward declarations */
static int stack_regs_mentioned_p (rtx pat);
- static void straighten_stack (rtx, stack);
static void pop_stack (stack, int);
static rtx *get_true_reg (rtx *);
--- 242,247 ----
*************** static void replace_reg (rtx *, int);
*** 248,254 ****
static void remove_regno_note (rtx, enum reg_note, unsigned int);
static int get_hard_regnum (stack, rtx);
static rtx emit_pop_insn (rtx, stack, rtx, enum emit_where);
- static void emit_swap_insn (rtx, stack, rtx);
static void swap_to_top(rtx, stack, rtx, rtx);
static bool move_for_stack_reg (rtx, stack, rtx);
static bool move_nan_for_stack_reg (rtx, stack, rtx);
--- 252,257 ----
*************** next_flags_user (rtx insn)
*** 344,351 ****
return NULL_RTX;
}
! /* Reorganize the stack into ascending numbers,
! after this insn. */
static void
straighten_stack (rtx insn, stack regstack)
--- 347,353 ----
return NULL_RTX;
}
! /* Reorganize the stack into ascending numbers, before this insn. */
static void
straighten_stack (rtx insn, stack regstack)
*************** straighten_stack (rtx insn, stack regsta
*** 365,371 ****
for (top = temp_stack.top = regstack->top; top >= 0; top--)
temp_stack.reg[top] = FIRST_STACK_REG + temp_stack.top - top;
! change_stack (insn, regstack, &temp_stack, EMIT_AFTER);
}
/* Pop a register from the stack. */
--- 367,373 ----
for (top = temp_stack.top = regstack->top; top >= 0; top--)
temp_stack.reg[top] = FIRST_STACK_REG + temp_stack.top - top;
! change_stack (insn, regstack, &temp_stack, EMIT_BEFORE);
}
/* Pop a register from the stack. */
*************** emit_swap_insn (rtx insn, stack regstack
*** 864,869 ****
--- 866,881 ----
return;
}
+ /* Avoid emitting the swap if this is the first register stack insn
+ of the current_block. Instead update the current_block's stack_in
+ and let compensate edges take care of this for us. */
+ if (current_block && starting_stack_p)
+ {
+ BLOCK_INFO (current_block)->stack_in = *regstack;
+ starting_stack_p = false;
+ return;
+ }
+
swap_rtx = gen_swapxf (FP_MODE_REG (hard_regno, XFmode),
FP_MODE_REG (FIRST_STACK_REG, XFmode));
*************** subst_stack_regs (rtx insn, stack regsta
*** 2205,2211 ****
if (top >= 0)
{
! straighten_stack (PREV_INSN (insn), regstack);
/* Now mark the arguments as dead after the call. */
--- 2217,2223 ----
if (top >= 0)
{
! straighten_stack (insn, regstack);
/* Now mark the arguments as dead after the call. */
*************** change_stack (rtx insn, stack old, stack
*** 2296,2301 ****
--- 2308,2326 ----
int reg;
int update_end = 0;
+ /* Stack adjustments for the first insn in a block update the
+ current_block's stack_in instead of inserting insns directly.
+ compensate_edges will add the necessary code later. */
+ if (current_block
+ && starting_stack_p
+ && where == EMIT_BEFORE)
+ {
+ BLOCK_INFO (current_block)->stack_in = *new;
+ starting_stack_p = false;
+ *old = *new;
+ return;
+ }
+
/* We will be inserting new insns "backwards". If we are to insert
after INSN, find the next insn, and insert before it. */
*************** compensate_edges (FILE *file)
*** 2720,2725 ****
--- 2745,2752 ----
bool inserted = false;
basic_block bb;
+ starting_stack_p = false;
+
FOR_EACH_BB (bb)
if (bb != ENTRY_BLOCK_PTR)
{
*************** convert_regs_1 (FILE *file, basic_block
*** 2800,2807 ****
}
}
- current_block = block;
-
if (file)
{
fprintf (file, "\nBasic block %d\nInput stack: ", block->index);
--- 2827,2832 ----
*************** convert_regs_1 (FILE *file, basic_block
*** 2810,2817 ****
--- 2835,2845 ----
/* Process all insns in this block. Keep track of NEXT so that we
don't process insns emitted while substituting in INSN. */
+ current_block = block;
next = BB_HEAD (block);
regstack = bi->stack_in;
+ starting_stack_p = true;
+
do
{
insn = next;
*************** convert_regs_1 (FILE *file, basic_block
*** 2834,2839 ****
--- 2862,2868 ----
print_stack (file, ®stack);
}
control_flow_insn_deleted |= subst_stack_regs (insn, ®stack);
+ starting_stack_p = false;
}
}
while (next);
Roger
--