This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Sat, Apr 17, 2010 at 4:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote: > Hi, > this patch avoids need to align SSE prologues to 128 bits. ?This saves some stack > when the va arg function is leaf (or calls just local functions). ?This is not > that terribly common scenario, but it should help in integer only programs > (and i.e. Linux kernel). > > The catch is that we need to expand the register save area. ?When stack don't > need to be 128bit aligned, we don not need to save whole registers since we > know we will never touch them. However this is known only after expansion while > register save code is produced during expansion, so I delay the actual > expansion of jumptable until after reload. > > This produce somewhat better code. > > I also noticed that previously this all worked kind of by accident because va_list > itself push alignment to 128bits because local_alignment is bit too serious about > bumping alignments up to help SSE instructions. ?In the case of va_list this is nonsence, > so I fixed that too. > > Bootstrapped/regtested x86_64-linux, will commit it tomorrow. > > ? ? ? ?* i386.md (UNSPEC_SSE_PROLOGUE_SAVE_LOW): New. > ? ? ? ?(sse_prologue_save_insn expander): Use new pattern. > ? ? ? ?(sse_prologue_save_insn1): New pattern and splitter. > ? ? ? ?(sse_prologue_save_insn): Update to deal also with 64bit aligned > ? ? ? ?blocks. > ? ? ? ?* i386.c (setup_incoming_varargs_64): Do not compute jump destination here. > ? ? ? ?(ix86_gimplify_va_arg): Update alignment needed. > ? ? ? ?(ix86_local_alignment): Do not align all local arrays > ? ? ? ?to 128bit. > Index: config/i386/i386.md > =================================================================== > --- config/i386/i386.md (revision 158277) > +++ config/i386/i386.md (working copy) > @@ -85,6 +85,7 @@ > ? ?(UNSPEC_SET_RIP ? ? ? ? ? ? 16) > ? ?(UNSPEC_SET_GOT_OFFSET ? ? ?17) > ? ?(UNSPEC_MEMORY_BLOCKAGE ? ? 18) > + ? (UNSPEC_SSE_PROLOGUE_SAVE_LOW 19) > > ? ?; TLS support > ? ?(UNSPEC_TP ? ? ? ? ? ? ? ? ?20) > @@ -18471,15 +18472,24 @@ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM5_REG) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM6_REG) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM7_REG)] UNSPEC_SSE_PROLOGUE_SAVE)) > - ? ? ? ? ? ? (use (match_operand:DI 1 "register_operand" "")) > + ? ? ? ? ? ? (clobber (match_operand:DI 1 "register_operand" "")) > ? ? ? ? ? ? ?(use (match_operand:DI 2 "immediate_operand" "")) > - ? ? ? ? ? ? (use (label_ref:DI (match_operand 3 "" "")))])] > + ? ? ? ? ? ? (use (label_ref:DI (match_operand 3 "" ""))) > + ? ? ? ? ? ? (clobber (match_operand:DI 4 "register_operand" "")) > + ? ? ? ? ? ? (use (match_dup 1))])] > ? "TARGET_64BIT" > ? "") > > -(define_insn "*sse_prologue_save_insn" > +;; Pre-reload version of prologue save. ?Until after prologue generation we don't know > +;; what the size of save instruction will be. > +;; Operand 0+operand 6 is the memory save area > +;; Operand 1 is number of registers to save (will get overwritten to operand 5) > +;; Operand 2 is number of non-vaargs SSE arguments > +;; Operand 3 is label starting the save block > +;; Operand 4 is used for temporary computation of jump address > +(define_insn "*sse_prologue_save_insn1" > ? [(set (mem:BLK (plus:DI (match_operand:DI 0 "register_operand" "R") > - ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:DI 4 "const_int_operand" "n"))) > + ? ? ? ? ? ? ? ? ? ? ? ? (match_operand:DI 6 "const_int_operand" "n"))) > ? ? ? ?(unspec:BLK [(reg:DI XMM0_REG) > ? ? ? ? ? ? ? ? ? ? (reg:DI XMM1_REG) > ? ? ? ? ? ? ? ? ? ? (reg:DI XMM2_REG) > @@ -18488,9 +18498,98 @@ > ? ? ? ? ? ? ? ? ? ? (reg:DI XMM5_REG) > ? ? ? ? ? ? ? ? ? ? (reg:DI XMM6_REG) > ? ? ? ? ? ? ? ? ? ? (reg:DI XMM7_REG)] UNSPEC_SSE_PROLOGUE_SAVE)) > + ? (clobber (match_operand:DI 1 "register_operand" "=r")) > + ? (use (match_operand:DI 2 "const_int_operand" "i")) > + ? (use (label_ref:DI (match_operand 3 "" "X"))) > + ? (clobber (match_operand:DI 4 "register_operand" "=&r")) > + ? (use (match_operand:DI 5 "register_operand" "1"))] > + ?"TARGET_64BIT > + ? && INTVAL (operands[6]) + X86_64_SSE_REGPARM_MAX * 16 - 16 < 128 > + ? && INTVAL (operands[6]) + INTVAL (operands[2]) * 16 >= -128" > + ?"#" > + ?[(set_attr "type" "other") > + ? (set_attr "memory" "store") > + ? (set_attr "mode" "DI")]) > + > +;; We know size of save instruction; expand the computation of jump address > +;; in the jumptable. > +(define_split > + ?[(parallel [(set (match_operand:BLK 0 "" "") > + ? ? ? ? ? ? ? ? ? (unspec:BLK [(reg:DI XMM0_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM1_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM2_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM3_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM4_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM5_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM6_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(reg:DI XMM7_REG)] UNSPEC_SSE_PROLOGUE_SAVE)) > + ? ? ? ? ? ? ?(clobber (match_operand:DI 1 "register_operand" "")) > + ? ? ? ? ? ? ?(use (match_operand:DI 2 "const_int_operand" "")) > + ? ? ? ? ? ? ?(use (match_operand 3 "" "")) > + ? ? ? ? ? ? ?(clobber (match_operand:DI 4 "register_operand" "")) > + ? ? ? ? ? ? ?(use (match_operand:DI 5 "register_operand" ""))])] > + ?"reload_completed" > + ?[(parallel [(set (match_dup 0) > + ? ? ? ? ? ? ? ? ?(unspec:BLK [(reg:DI XMM0_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM1_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM2_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM3_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM4_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM5_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM6_REG) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (reg:DI XMM7_REG)] UNSPEC_SSE_PROLOGUE_SAVE_LOW)) > + ? ? ? ? ? ? (use (match_dup 1)) > + ? ? ? ? ? ? (use (match_dup 2)) > + ? ? ? ? ? ? (use (match_dup 3)) > + ? ? ? ? ? ? (use (match_dup 5))])] > +{ > + ?/* Movaps is 4 bytes, AVX and movsd is 5 bytes. ?*/ > + ?int size = 4 + (TARGET_AVX || crtl->stack_alignment_needed < 128); > + > + ?/* Compute address to jump to: > + ? ? label - eax*size + nnamed_sse_arguments*size. */ > + ?if (size == 5) > + ? ?emit_insn (gen_rtx_SET (VOIDmode, operands[4], > + ? ? ? ? ? ? ? ? ? ? ? ? ? gen_rtx_PLUS > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? (Pmode, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?gen_rtx_MULT (Pmode, operands[1], > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?GEN_INT (4)), > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?operands[1]))); > + ?else ?if (size == 4) > + ? ?emit_insn (gen_rtx_SET (VOIDmode, operands[4], > + ? ? ? ? ? ? ? ? ? ? ? ? ? gen_rtx_MULT (Pmode, operands[1], > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? GEN_INT (4)))); > + ?else > + ? ?gcc_unreachable (); > + ?if (INTVAL (operands[2])) > + ? ?emit_move_insn > + ? ? ?(operands[1], > + ? ? ? gen_rtx_CONST (DImode, > + ? ? ? ? ? ? ? ? ? ? gen_rtx_PLUS (DImode, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? operands[3], > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? GEN_INT (INTVAL (operands[2]) > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* size)))); > + ?else > + ? ?emit_move_insn (operands[1], operands[3]); > + ?emit_insn (gen_subdi3 (operands[1], operands[1], operands[4])); > + ?operands[5] = GEN_INT (size); > +}) > + This pattern clobbers CC and causes: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43799 This patch adds: (clobber (reg:CC FLAGS_REG)) to those new patterns. OK to install? Thanks. -- H.J. --- gcc/ 2010-05-04 H.J. Lu <hongjiu.lu@intel.com> PR target/43799 * config/i386/i386.md (sse_prologue_save): Add clobber CC register. (*sse_prologue_save_insn1): Likewise. (SSE prologue save splitter): Likewise. gcc/testsuite/ 2010-05-04 H.J. Lu <hongjiu.lu@intel.com> PR target/43799 * gcc.target/i386/pr43799.c: New.
Attachment:
gcc-pr43799-1.patch
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |