This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[Committed] PR17959: Use paradoxical SUBREGs in emit_group_store
- From: Roger Sayle <roger at eyesopen dot com>
- To: gcc-patches at gcc dot gnu dot org
- Date: Thu, 30 Mar 2006 10:28:31 -0700 (MST)
- Subject: [Committed] PR17959: Use paradoxical SUBREGs in emit_group_store
The following patch resolves PR target/17959 which a poor code generation
issue on ppc with -mpowerpc64. In that PR it's observed that using
-mpowerpc64 generates inferior code, to restricting GCC to the use of
32-bit instructions. It turns out the problem is not so much with the
64-bit arithmetic operations themselves, but with the overhead of
constructing a 64-bit argument when the ABI has passed this value in
2 32-bit registers.
The current RTL expansion code in emit_group store constructs this as:
li 0,0
rldimi 0,3,32,0
rldimi 0,4,0,32
or in terms that I can understand:
;; Initialize the destination to zero
(set (reg:DI 123) (const_int 0))
;; Set the low part to register 121
(set (zero_extract:DI (reg:DI 123) (const_int 32) (const_int 0))
(subreg:DI (reg:SI 121) 0)))
;; Set the high part to register 122
(set (zero_extract:DI (reg:DI 123) (const_int 32) (const_int 32))
(subreg:DI (reg:SI 122) 0)))
Unfortunately, this type of sequence is particularly difficult to
optimize in combine, and a peephole2 would only benefit a single
target. Instead its best to tackle this problem at its source
during RTL expansion.
Instead of initializing the desination to zero, and subsequently
filling the upper and lower parts, the patch below tweaks expr.c's
emit_group_store to identify which incoming register will form
the lowpart of the result, and then uses a paradoxical SUBREG to
initialize the destination, and the fills the uppers parts as before.
Hence we now generate:
;; Initialize as register 121 with the high bits undefined
(set (reg:DI 123) (subreg:DI (reg:SI 121) 0)))
;; Set the high part to register 122 as before
(set (zero_extract:DI (reg:DI 123) (const_int 32) (const_int 32))
(subreg:DI (reg:SI 122) 0)))
And through the miralce of register allocation, -mpowerpc64 now
generates a single instruction instead of three, and uses one
less register!
rldimi 3,4,0,32
Even on targets that require either a move, a sign extension or a
zero extension to implement the paradoxical SUBREG, this sequence
should now require only two instructions instead of three.
The following patch has been tested on powerpc-unknown-linux-gnu
with a full "make bootstrap", all default languages, and regression
tested with a top-level "make -k check" with no new failures.
Committed to mainline as revision 112543.
2006-03-29 Roger Sayle <roger@eyesopen.com>
PR target/17959
* expr.c (emit_group_store): Optimize group stores into a pseudo
register by using a paradoxical subreg to initialize the destination
if the first or last member of the group specifies a "low part".
Index: expr.c
===================================================================
--- expr.c (revision 112470)
+++ expr.c (working copy)
@@ -1857,7 +1857,7 @@
emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED, int ssize)
{
rtx *tmps, dst;
- int start, i;
+ int start, finish, i;
enum machine_mode m = GET_MODE (orig_dst);
gcc_assert (GET_CODE (src) == PARALLEL);
@@ -1883,11 +1883,12 @@
start = 0;
else
start = 1;
+ finish = XVECLEN (src, 0);
- tmps = alloca (sizeof (rtx) * XVECLEN (src, 0));
+ tmps = alloca (sizeof (rtx) * finish);
/* Copy the (probable) hard regs into pseudos. */
- for (i = start; i < XVECLEN (src, 0); i++)
+ for (i = start; i < finish; i++)
{
rtx reg = XEXP (XVECEXP (src, 0, i), 0);
if (!REG_P (reg) || REGNO (reg) < FIRST_PSEUDO_REGISTER)
@@ -1923,14 +1924,56 @@
}
else if (!MEM_P (dst) && GET_CODE (dst) != CONCAT)
{
+ enum machine_mode outer = GET_MODE (dst);
+ enum machine_mode inner;
+ unsigned int bytepos;
+ bool done = false;
+ rtx temp;
+
if (!REG_P (dst) || REGNO (dst) < FIRST_PSEUDO_REGISTER)
- dst = gen_reg_rtx (GET_MODE (orig_dst));
+ dst = gen_reg_rtx (outer);
+
/* Make life a bit easier for combine. */
- emit_move_insn (dst, CONST0_RTX (GET_MODE (orig_dst)));
+ /* If the first element of the vector is the low part
+ of the destination mode, use a paradoxical subreg to
+ initialize the destination. */
+ if (start < finish)
+ {
+ inner = GET_MODE (tmps[start]);
+ bytepos = subreg_lowpart_offset (outer, inner);
+ if (INTVAL (XEXP (XVECEXP (src, 0, start), 1)) == bytepos)
+ {
+ temp = simplify_gen_subreg (outer, tmps[start],
+ inner, bytepos);
+ emit_move_insn (dst, temp);
+ done = true;
+ start++;
+ }
+ }
+
+ /* If the first element wasn't the low part, try the last. */
+ if (!done
+ && start < finish - 1)
+ {
+ inner = GET_MODE (tmps[finish - 1]);
+ bytepos = subreg_lowpart_offset (outer, inner);
+ if (INTVAL (XEXP (XVECEXP (src, 0, finish - 1), 1)) == bytepos)
+ {
+ temp = simplify_gen_subreg (outer, tmps[finish - 1],
+ inner, bytepos);
+ emit_move_insn (dst, temp);
+ done = true;
+ finish--;
+ }
+ }
+
+ /* Otherwise, simply initialize the result to zero. */
+ if (!done)
+ emit_move_insn (dst, CONST0_RTX (outer));
}
/* Process the pieces. */
- for (i = start; i < XVECLEN (src, 0); i++)
+ for (i = start; i < finish; i++)
{
HOST_WIDE_INT bytepos = INTVAL (XEXP (XVECEXP (src, 0, i), 1));
enum machine_mode mode = GET_MODE (tmps[i]);
Roger
--