This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Improve ix86 machine reorg (PR target/39942)
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Jan Hubicka <jh at suse dot cz>, "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Thu, 30 Apr 2009 13:46:39 +0200
- Subject: [PATCH] Improve ix86 machine reorg (PR target/39942)
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
Hi!
This patch kills some IMHO completely unnecessary paddings and
decreases others, added for TARGET_FOUR_JUMP_LIMIT optimization
during ix86 machine reorg.
There are actually 2 parts of the patch. One, the x86-64.h/linux.h
change fixes the ASM_OUTPUT_MAX_SKIP_ALIGN macros so that they never
skip more than MAX_SKIP bytes. Normally for labels max_skip is hardcoded
from the backend based on -mtune, so the align, max_skip pairs can be
4,,7 or 4,,10 or 4,,15, but for the paddings added by
TARGET_FOUR_JUMP_LIMIT optimization max_skip can be anything from 1 through
10 or so.
If we emit
.p2align 4,,2
.p2align 3
into the assembly, because we want to skip at most 2 bytes, then
this can skip up to 7 bytes. In fact, for the "skip" instruction we
don't want to ever emit the second .p2align, but I think the 4 jumps
alignments >= 7 are rare enough that it is not worth introducing new
macros for it.
The other change is just taking into account the label .p2align directives
we are going to emit (and also the "align" instructions we added previously
in the pass). If there is say
.p2align 4,,15
going to be emitted, we know we don't have to worry about any instructions
before the label anymore, all jumps after it are in a different 16 byte
page. So we can pretend the label has minimal size 16. For
.p2align 4,,10
we know either that anything after the label also is in a new 16 byte page,
or nothing was added because >= 11 bytes would need to be skipped. But
in that case the current group could only contain at most 5 bytes.
If we pretend the label has size 11, the algorithm will not consider
anything but the last 5 bytes before it. Similarly, for say
.p2align 2
we know that either there were only at most 12 bytes in the current 16 byte
page before the alignment, or it aligned to a 16 byte boundary, so
pretending the label has size 4 works as well.
What the patch doesn't solve (and I've mentioned in the PR) is that in many
cases min_insn_size is too conservative, there are plenty of > 1 byte
instructions when not counting displacement, where we could assume larger
minimal size.
I've bootstrapped it/regtested on x86_64-linux, cc1 .text section
shrunk by 184KB (2.3%), which is IMHO significant.
I'll try to write some awk script to detect more than 3 jump/ret/call insns
in 16 byte page from objdump -d dumps and try it on cc1/cc1plus/f951.
2009-04-30 Jakub Jelinek <jakub@redhat.com>
PR target/39942
* final.c (label_to_max_skip): New function.
(label_to_alignment): Only use LABEL_TO_ALIGNMENT if
CODE_LABEL_NUMBER <= max_labelno.
* output.h (label_to_max_skip): New prototype.
* config/i386/x86-64.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Don't emit second
.p2align 3 if MAX_SKIP is smaller than 7.
* config/i386/linux.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise.
* config/i386/i386.c (min_insn_size): Don't define if
ASM_OUTPUT_MAX_SKIP_ALIGN isn't defined. Handle CODE_LABELs
and UNSPECV_ALIGN.
(ix86_avoid_jump_misspredicts): Renamed to...
(ix86_avoid_jump_mispredicts): ... this. Don't define if
ASM_OUTPUT_MAX_SKIP_ALIGN isn't defined. Update comment.
Add min_insn_size of newly created align insn to nbytes.
(ix86_reorg): Don't call ix86_avoid_jump_mispredicts if
ASM_OUTPUT_MAX_SKIP_ALIGN isn't defined.
--- gcc/config/i386/x86-64.h.jj 2009-04-14 16:33:49.000000000 +0200
+++ gcc/config/i386/x86-64.h 2009-04-30 09:03:35.000000000 +0200
@@ -74,7 +74,9 @@ see the files COPYING3 and COPYING.RUNTI
fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
/* Make sure that we have at least 8 byte alignment if > 8 byte \
alignment is preferred. */ \
- if ((LOG) > 3 && (1 << (LOG)) > ((MAX_SKIP) + 1)) \
+ if ((LOG) > 3 \
+ && (1 << (LOG)) > ((MAX_SKIP) + 1) \
+ && (MAX_SKIP) >= 7) \
fprintf ((FILE), "\t.p2align 3\n"); \
} \
} \
--- gcc/config/i386/i386.c.jj 2009-04-17 10:29:09.000000000 +0200
+++ gcc/config/i386/i386.c 2009-04-30 10:24:28.000000000 +0200
@@ -26852,6 +26852,7 @@ x86_function_profiler (FILE *file, int l
}
}
+#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
/* We don't have exact information about the insn sizes, but we may assume
quite safely that we are informed about all 1 byte insns and memory
address sizes. This is enough to eliminate unnecessary padding in
@@ -26862,13 +26863,36 @@ min_insn_size (rtx insn)
{
int l = 0;
+ if (GET_CODE (insn) == CODE_LABEL)
+ {
+ int align = label_to_alignment (insn);
+ int max_skip = label_to_max_skip (insn);
+
+ if (max_skip > 15)
+ max_skip = 15;
+ if (align >= 3)
+ /* Only up to 16 - max_skip - 1 bytes can be already
+ in the current 16 byte page, because otherwise
+ ASM_OUTPUT_MAX_SKIP_ALIGN could skip max_skip
+ or fewer bytes to reach 16 byte boundary. So
+ pretend the code label is max_skip + 1 bytes long. */
+ return max_skip + 1;
+ if (align > 0 && max_skip == (1 << align) - 1)
+ return 1 << align;
+ }
+
if (!INSN_P (insn) || !active_insn_p (insn))
return 0;
- /* Discard alignments we've emit and jump instructions. */
+ /* Alignments we've emit. */
if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE
&& XINT (PATTERN (insn), 1) == UNSPECV_ALIGN)
- return 0;
+ {
+ rtx align = XVECEXP (PATTERN (insn), 0, 0);
+
+ return INTVAL (align) + 1;
+ }
+
if (JUMP_P (insn)
&& (GET_CODE (PATTERN (insn)) == ADDR_VEC
|| GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC))
@@ -26902,7 +26926,7 @@ min_insn_size (rtx insn)
window. */
static void
-ix86_avoid_jump_misspredicts (void)
+ix86_avoid_jump_mispredicts (void)
{
rtx insn, start = get_insns ();
int nbytes = 0, njumps = 0;
@@ -26916,15 +26940,15 @@ ix86_avoid_jump_misspredicts (void)
The smallest offset in the page INSN can start is the case where START
ends on the offset 0. Offset of INSN is then NBYTES - sizeof (INSN).
- We add p2align to 16byte window with maxskip 17 - NBYTES + sizeof (INSN).
+ We add p2align to 16byte window with maxskip 15 - NBYTES + sizeof (INSN).
*/
- for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+ for (insn = start; insn; insn = NEXT_INSN (insn))
{
-
- nbytes += min_insn_size (insn);
+ int min_size = min_insn_size (insn);
+ nbytes += min_size;
if (dump_file)
- fprintf(dump_file, "Insn %i estimated to %i bytes\n",
- INSN_UID (insn), min_insn_size (insn));
+ fprintf (dump_file, "Insn %i estimated to %i bytes\n",
+ INSN_UID (insn), min_size);
if ((JUMP_P (insn)
&& GET_CODE (PATTERN (insn)) != ADDR_VEC
&& GET_CODE (PATTERN (insn)) != ADDR_DIFF_VEC)
@@ -26958,9 +26982,11 @@ ix86_avoid_jump_misspredicts (void)
fprintf (dump_file, "Padding insn %i by %i bytes!\n",
INSN_UID (insn), padsize);
emit_insn_before (gen_align (GEN_INT (padsize)), insn);
+ nbytes += min_insn_size (PREV_INSN (insn));
}
}
}
+#endif
/* AMD Athlon works faster
when RET is not destination of conditional jump or directly preceded
@@ -27023,9 +27049,14 @@ ix86_reorg (void)
if (TARGET_PAD_RETURNS && optimize
&& optimize_function_for_speed_p (cfun))
ix86_pad_returns ();
+#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
+ /* `align' insn expands to nothing if ASM_OUTPUT_MAX_SKIP_ALIGN
+ is not defined, so it makes no sense to do this optimization
+ in that case. */
if (TARGET_FOUR_JUMP_LIMIT && optimize
&& optimize_function_for_speed_p (cfun))
- ix86_avoid_jump_misspredicts ();
+ ix86_avoid_jump_mispredicts ();
+#endif
}
/* Return nonzero when QImode register that must be represented via REX prefix
--- gcc/config/i386/linux.h.jj 2009-02-20 15:13:51.000000000 +0100
+++ gcc/config/i386/linux.h 2009-04-30 09:03:01.000000000 +0200
@@ -1,6 +1,6 @@
/* Definitions for Intel 386 running Linux-based GNU systems with ELF format.
Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002, 2004, 2005,
- 2006, 2007, 2008 Free Software Foundation, Inc.
+ 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
Contributed by Eric Youngdale.
Modified for stabs-in-ELF by H.J. Lu.
@@ -153,7 +153,9 @@ along with GCC; see the file COPYING3.
fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP)); \
/* Make sure that we have at least 8 byte alignment if > 8 byte \
alignment is preferred. */ \
- if ((LOG) > 3 && (1 << (LOG)) > ((MAX_SKIP) + 1)) \
+ if ((LOG) > 3 \
+ && (1 << (LOG)) > ((MAX_SKIP) + 1) \
+ && (MAX_SKIP) >= 7) \
fprintf ((FILE), "\t.p2align 3\n"); \
} \
} \
--- gcc/final.c.jj 2009-04-17 10:29:15.000000000 +0200
+++ gcc/final.c 2009-04-30 08:48:53.000000000 +0200
@@ -553,7 +553,17 @@ static int min_labelno, max_labelno;
int
label_to_alignment (rtx label)
{
- return LABEL_TO_ALIGNMENT (label);
+ if (CODE_LABEL_NUMBER (label) <= max_labelno)
+ return LABEL_TO_ALIGNMENT (label);
+ return 0;
+}
+
+int
+label_to_max_skip (rtx label)
+{
+ if (CODE_LABEL_NUMBER (label) <= max_labelno)
+ return LABEL_TO_MAX_SKIP (label);
+ return 0;
}
#ifdef HAVE_ATTR_length
--- gcc/output.h.jj 2008-11-25 12:17:14.000000000 +0100
+++ gcc/output.h 2009-04-30 08:49:39.000000000 +0200
@@ -1,7 +1,7 @@
/* Declarations for insn-output.c. These functions are defined in recog.c,
final.c, and varasm.c.
Copyright (C) 1987, 1991, 1994, 1997, 1998, 1999, 2000, 2001, 2002,
- 2003, 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc.
+ 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
This file is part of GCC.
@@ -94,6 +94,10 @@ extern int insn_current_reference_addres
Defined in final.c. */
extern int label_to_alignment (rtx);
+/* Find the alignment maximum skip associated with a CODE_LABEL.
+ Defined in final.c. */
+extern int label_to_max_skip (rtx);
+
/* Output a LABEL_REF, or a bare CODE_LABEL, as an assembler symbol. */
extern void output_asm_label (rtx);
Jakub