This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 64bit aligned SSE va-args save prologues


On Sat, Apr 17, 2010 at 4:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch avoids need to align SSE prologues to 128 bits. ?This saves some stack
> when the va arg function is leaf (or calls just local functions). ?This is not
> that terribly common scenario, but it should help in integer only programs
> (and i.e. Linux kernel).
>
> The catch is that we need to expand the register save area. ?When stack don't
> need to be 128bit aligned, we don not need to save whole registers since we
> know we will never touch them. However this is known only after expansion while
> register save code is produced during expansion, so I delay the actual
> expansion of jumptable until after reload.
>
> This produce somewhat better code.
>
> I also noticed that previously this all worked kind of by accident because va_list
> itself push alignment to 128bits because local_alignment is bit too serious about
> bumping alignments up to help SSE instructions. ?In the case of va_list this is nonsence,
> so I fixed that too.
>
> Bootstrapped/regtested x86_64-linux, will commit it tomorrow.
>
> ? ? ? ?* i386.md (UNSPEC_SSE_PROLOGUE_SAVE_LOW): New.
> ? ? ? ?(sse_prologue_save_insn expander): Use new pattern.
> ? ? ? ?(sse_prologue_save_insn1): New pattern and splitter.
> ? ? ? ?(sse_prologue_save_insn): Update to deal also with 64bit aligned
> ? ? ? ?blocks.
> ? ? ? ?* i386.c (setup_incoming_varargs_64): Do not compute jump destination here.
> ? ? ? ?(ix86_gimplify_va_arg): Update alignment needed.
> ? ? ? ?(ix86_local_alignment): Do not align all local arrays
> ? ? ? ?to 128bit.
> Index: config/i386/i386.md
> ===================================================================
> --- config/i386/i386.md (revision 158277)
> +++ config/i386/i386.md (working copy)
> @@ -85,6 +85,7 @@
> ? ?(UNSPEC_SET_RIP ? ? ? ? ? ? 16)
> ? ?(UNSPEC_SET_GOT_OFFSET ? ? ?17)
> ? ?(UNSPEC_MEMORY_BLOCKAGE ? ? 18)
> + ? (UNSPEC_SSE_PROLOGUE_SAVE_LOW 19)
>
> ? ?; TLS support
> ? ?(UNSPEC_TP ? ? ? ? ? ? ? ? ?20)
> @@ -18471,15 +18472,24 @@
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM5_REG)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM6_REG)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM7_REG)] UNSPEC_SSE_PROLOGUE_SAVE))
> - ? ? ? ? ? ? (use (match_operand:DI 1 "register_operand" ""))
> + ? ? ? ? ? ? (clobber (match_operand:DI 1 "register_operand" ""))
> ? ? ? ? ? ? ?(use (match_operand:DI 2 "immediate_operand" ""))
> - ? ? ? ? ? ? (use (label_ref:DI (match_operand 3 "" "")))])]
> + ? ? ? ? ? ? (use (label_ref:DI (match_operand 3 "" "")))
> + ? ? ? ? ? ? (clobber (match_operand:DI 4 "register_operand" ""))
> + ? ? ? ? ? ? (use (match_dup 1))])]
> ? "TARGET_64BIT"
> ? "")
>
> -(define_insn "*sse_prologue_save_insn"
> +;; Pre-reload version of prologue save. ?Until after prologue generation we don't know
> +;; what the size of save instruction will be.
> +;; Operand 0+operand 6 is the memory save area
> +;; Operand 1 is number of registers to save (will get overwritten to operand 5)
> +;; Operand 2 is number of non-vaargs SSE arguments
> +;; Operand 3 is label starting the save block
> +;; Operand 4 is used for temporary computation of jump address
> +(define_insn "*sse_prologue_save_insn1"
> ? [(set (mem:BLK (plus:DI (match_operand:DI 0 "register_operand" "R")
> - ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:DI 4 "const_int_operand" "n")))
> + ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:DI 6 "const_int_operand" "n")))
> ? ? ? ?(unspec:BLK [(reg:DI XMM0_REG)
> ? ? ? ? ? ? ? ? ? ? (reg:DI XMM1_REG)
> ? ? ? ? ? ? ? ? ? ? (reg:DI XMM2_REG)
> @@ -18488,9 +18498,98 @@
> ? ? ? ? ? ? ? ? ? ? (reg:DI XMM5_REG)
> ? ? ? ? ? ? ? ? ? ? (reg:DI XMM6_REG)
> ? ? ? ? ? ? ? ? ? ? (reg:DI XMM7_REG)] UNSPEC_SSE_PROLOGUE_SAVE))
> + ? (clobber (match_operand:DI 1 "register_operand" "=r"))
> + ? (use (match_operand:DI 2 "const_int_operand" "i"))
> + ? (use (label_ref:DI (match_operand 3 "" "X")))
> + ? (clobber (match_operand:DI 4 "register_operand" "=&r"))
> + ? (use (match_operand:DI 5 "register_operand" "1"))]
> + ?"TARGET_64BIT
> + ? && INTVAL (operands[6]) + X86_64_SSE_REGPARM_MAX * 16 - 16 < 128
> + ? && INTVAL (operands[6]) + INTVAL (operands[2]) * 16 >= -128"
> + ?"#"
> + ?[(set_attr "type" "other")
> + ? (set_attr "memory" "store")
> + ? (set_attr "mode" "DI")])
> +
> +;; We know size of save instruction; expand the computation of jump address
> +;; in the jumptable.
> +(define_split
> + ?[(parallel [(set (match_operand:BLK 0 "" "")
> + ? ? ? ? ? ? ? ? ? (unspec:BLK [(reg:DI XMM0_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM1_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM2_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM3_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM4_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM5_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM6_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM7_REG)] UNSPEC_SSE_PROLOGUE_SAVE))
> + ? ? ? ? ? ? ?(clobber (match_operand:DI 1 "register_operand" ""))
> + ? ? ? ? ? ? ?(use (match_operand:DI 2 "const_int_operand" ""))
> + ? ? ? ? ? ? ?(use (match_operand 3 "" ""))
> + ? ? ? ? ? ? ?(clobber (match_operand:DI 4 "register_operand" ""))
> + ? ? ? ? ? ? ?(use (match_operand:DI 5 "register_operand" ""))])]
> + ?"reload_completed"
> + ?[(parallel [(set (match_dup 0)
> + ? ? ? ? ? ? ? ? ?(unspec:BLK [(reg:DI XMM0_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM1_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM2_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM3_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM4_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM5_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM6_REG)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM7_REG)] UNSPEC_SSE_PROLOGUE_SAVE_LOW))
> + ? ? ? ? ? ? (use (match_dup 1))
> + ? ? ? ? ? ? (use (match_dup 2))
> + ? ? ? ? ? ? (use (match_dup 3))
> + ? ? ? ? ? ? (use (match_dup 5))])]
> +{
> + ?/* Movaps is 4 bytes, AVX and movsd is 5 bytes. ?*/
> + ?int size = 4 + (TARGET_AVX || crtl->stack_alignment_needed < 128);
> +
> + ?/* Compute address to jump to:
> + ? ? label - eax*size + nnamed_sse_arguments*size. */
> + ?if (size == 5)
> + ? ?emit_insn (gen_rtx_SET (VOIDmode, operands[4],
> + ? ? ? ? ? ? ? ? ? ? ? ? ? gen_rtx_PLUS
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? (Pmode,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?gen_rtx_MULT (Pmode, operands[1],
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?GEN_INT (4)),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?operands[1])));
> + ?else ?if (size == 4)
> + ? ?emit_insn (gen_rtx_SET (VOIDmode, operands[4],
> + ? ? ? ? ? ? ? ? ? ? ? ? ? gen_rtx_MULT (Pmode, operands[1],
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? GEN_INT (4))));
> + ?else
> + ? ?gcc_unreachable ();
> + ?if (INTVAL (operands[2]))
> + ? ?emit_move_insn
> + ? ? ?(operands[1],
> + ? ? ? gen_rtx_CONST (DImode,
> + ? ? ? ? ? ? ? ? ? ? gen_rtx_PLUS (DImode,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[3],
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? GEN_INT (INTVAL (operands[2])
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* size))));
> + ?else
> + ? ?emit_move_insn (operands[1], operands[3]);
> + ?emit_insn (gen_subdi3 (operands[1], operands[1], operands[4]));
> + ?operands[5] = GEN_INT (size);
> +})
> +

This pattern clobbers CC and causes:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43799

This patch adds:

(clobber (reg:CC FLAGS_REG))

to those new patterns.  OK to install?

Thanks.

-- 
H.J.
---
gcc/

2010-05-04  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/43799
	* config/i386/i386.md (sse_prologue_save): Add clobber CC register.
	(*sse_prologue_save_insn1): Likewise.
	(SSE prologue save splitter): Likewise.

gcc/testsuite/

2010-05-04  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/43799
	* gcc.target/i386/pr43799.c: New.

Attachment: gcc-pr43799-1.patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]