This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] cse: Lookup for both PLUS operands in an address
- From: Andreas Krebbel <krebbel1 at de dot ibm dot com>
- To: gcc-patches at gcc dot gnu dot org
- Date: Mon, 19 Sep 2005 16:53:36 +0200
- Subject: [PATCH] cse: Lookup for both PLUS operands in an address
Hello,
the attached patch fixes a performance regression on S/390 against gcc 4.0.
In the testcase we have an array accessed with an index resulting in the
following gcc 4.0 pseudo code:
r1 = &x
r2 = r3 + 4
r4 = [r2 + r1]
In find_best_addr there is a code part optimizing addresses like REG + x looking
for a replacement for REG. Afterwards we have:
r1 = &x
(r2 = r3 + 4) can potentially removed later
r4 = [r3 + r1 + 4]
With gcc 4.1 the address is generated like this:
r1 = &x
r2 = r3 + 4
r4 = [r1 + r2]
which unfortunately does not trigger that optimization anymore. My guess is that
the order of the operands has changed due to the TARGET_MEM_REF usage in gcc 4.1.
The attached patch changes find_best_addr to perform the cse lookup for both
operands of the sum what brings us back to the performance of the gcc 4.0 code.
Bootstrapped on i686, s390 and s390x. No testsuite regressions.
OK for mainline?
:ADDPATCH rtl (cse):
Bye,
-Andreas-
2005-09-19 Andreas Krebbel <krebbel1@de.ibm.com>
* cse.c (find_best_addr): Perform lookup for both REGs in a
REG + REG address.
Index: gcc/cse.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cse.c,v
retrieving revision 1.360
diff -p -c -r1.360 cse.c
*** gcc/cse.c 6 Sep 2005 08:52:51 -0000 1.360
--- gcc/cse.c 19 Sep 2005 09:20:53 -0000
*************** find_best_addr (rtx insn, rtx *loc, enum
*** 2954,3036 ****
have REG+const and the register is another REG+const. We can often merge
the constants and eliminate one insn and one register. It may also be
that a machine has a cheap REG+REG+const. Finally, this improves the
! code on the Alpha for unaligned byte stores. */
if (flag_expensive_optimizations
&& ARITHMETIC_P (*loc)
&& REG_P (XEXP (*loc, 0)))
{
! rtx op1 = XEXP (*loc, 1);
do_not_record = 0;
- hash = HASH (XEXP (*loc, 0), Pmode);
do_not_record = save_do_not_record;
hash_arg_in_memory = save_hash_arg_in_memory;
! elt = lookup (XEXP (*loc, 0), hash, Pmode);
! if (elt == 0)
return;
! /* We need to find the best (under the criteria documented above) entry
! in the class that is valid. We use the `flag' field to indicate
! choices that were invalid and iterate until we can't find a better
! one that hasn't already been tried. */
!
! for (p = elt->first_same_value; p; p = p->next_same_value)
! p->flag = 0;
!
! while (found_better)
! {
! int best_addr_cost = address_cost (*loc, mode);
! int best_rtx_cost = (COST (*loc) + 1) >> 1;
! struct table_elt *best_elt = elt;
! rtx best_rtx = *loc;
! int count;
!
! /* This is at worst case an O(n^2) algorithm, so limit our search
! to the first 32 elements on the list. This avoids trouble
! compiling code with very long basic blocks that can easily
! call simplify_gen_binary so many times that we run out of
! memory. */
!
! found_better = 0;
! for (p = elt->first_same_value, count = 0;
! p && count < 32;
! p = p->next_same_value, count++)
! if (! p->flag
! && (REG_P (p->exp)
! || exp_equiv_p (p->exp, p->exp, 1, false)))
! {
! rtx new = simplify_gen_binary (GET_CODE (*loc), Pmode,
! p->exp, op1);
! int new_cost;
!
! /* Get the canonical version of the address so we can accept
! more. */
! new = canon_for_address (new);
!
! new_cost = address_cost (new, mode);
!
! if (new_cost < best_addr_cost
! || (new_cost == best_addr_cost
! && (COST (new) + 1) >> 1 > best_rtx_cost))
{
! found_better = 1;
! best_addr_cost = new_cost;
! best_rtx_cost = (COST (new) + 1) >> 1;
! best_elt = p;
! best_rtx = new;
}
! }
!
! if (found_better)
! {
! if (validate_change (insn, loc,
! canon_reg (copy_rtx (best_rtx),
! NULL_RTX), 0))
! return;
! else
! best_elt->flag = 1;
}
}
}
--- 2954,3052 ----
have REG+const and the register is another REG+const. We can often merge
the constants and eliminate one insn and one register. It may also be
that a machine has a cheap REG+REG+const. Finally, this improves the
! code on the Alpha for unaligned byte stores.
! If we have an address like REG + REG we try to find replacements for
! both registers. */
if (flag_expensive_optimizations
&& ARITHMETIC_P (*loc)
&& REG_P (XEXP (*loc, 0)))
{
! rtx ops[2];
! struct table_elt * elts[2] = {0, 0};
! int i;
do_not_record = 0;
do_not_record = save_do_not_record;
hash_arg_in_memory = save_hash_arg_in_memory;
! ops[0] = XEXP (*loc, 0);
! ops[1] = XEXP (*loc, 1);
!
! elts[0] = lookup (ops[0], HASH (ops[0], Pmode), Pmode);
! elts[1] = lookup (ops[1], HASH (ops[1], Pmode), Pmode);
!
! if (elts[0] == 0 && elts[1] == 0)
return;
! for (i = 0; i < 2; i++)
! {
! elt = elts[i];
!
! if (!elt)
! continue;
!
! /* We need to find the best (under the criteria documented above) entry
! in the class that is valid. We use the `flag' field to indicate
! choices that were invalid and iterate until we can't find a better
! one that hasn't already been tried. */
!
! for (p = elt->first_same_value; p; p = p->next_same_value)
! p->flag = 0;
!
! while (found_better)
! {
! int best_addr_cost = address_cost (*loc, mode);
! int best_rtx_cost = (COST (*loc) + 1) >> 1;
! struct table_elt *best_elt = elt;
! rtx best_rtx = *loc;
! int count;
!
! /* This is at worst case an O(n^2) algorithm, so limit our search
! to the first 32 elements on the list. This avoids trouble
! compiling code with very long basic blocks that can easily
! call simplify_gen_binary so many times that we run out of
! memory. */
!
! found_better = 0;
! for (p = elt->first_same_value, count = 0;
! p && count < 32;
! p = p->next_same_value, count++)
! if (! p->flag
! && (REG_P (p->exp)
! || exp_equiv_p (p->exp, p->exp, 1, false)))
{
! rtx new = simplify_gen_binary (GET_CODE (*loc), Pmode,
! p->exp, ops[1 - i]);
! int new_cost;
!
! /* Get the canonical version of the address so we can accept
! more. */
! new = canon_for_address (new);
!
! new_cost = address_cost (new, mode);
!
! if (new_cost < best_addr_cost
! || (new_cost == best_addr_cost
! && (COST (new) + 1) >> 1 > best_rtx_cost))
! {
! found_better = 1;
! best_addr_cost = new_cost;
! best_rtx_cost = (COST (new) + 1) >> 1;
! best_elt = p;
! best_rtx = new;
! }
}
!
! if (found_better)
! {
! if (validate_change (insn, loc,
! canon_reg (copy_rtx (best_rtx),
! NULL_RTX), 0))
! return;
! else
! best_elt->flag = 1;
! }
}
}
}