This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi Uros, Thanks a lot forr your comments. I prepared new patch and ChangeLog. Testing of x32 is in progress. It it OK for trunk? 2012-08-14 Yuri Rumyantsev <ysrumyan@gmail.com> * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add additional argument. * config/i386/i386.md (ix86_split_lea_for_addr) : Add additional argument curr_insn. * config/i386/i386.c (ix86_split_lea_for_addr): Do instructions reodering to get opportunities for better scheduling. (ix86_lea_outperforms): Do more aggressive lea splitting. (find_nearest_reg-def): New function. Find nearest register definition used in address. 2012/8/14 Uros Bizjak <ubizjak@gmail.com>: > Hello! > >> It is known that LEA splitting is one of the most critical problems >> for Atom processors and changes try to improve it through: >> 1. More aggressive Lea splitting – do not perform splitting if >> only split cost exceeds AGU stall . >> 2. Reordering splitting instructions to get better scheduling – >> use the farthest defined register for SET instruction, then add >> constant offset if any and finally generate add instruction.This gives >> +0.5% speedup in geomean for eembc2.0 suite on Atom. >> All required testing was done – bootstraps for Atom & Core2, make check. >> Note that this fix affects only on Atom processors. > > IMO, you should test LEA handling changes on x32 atom, too. With > recent changes, you will get lots of zero_extended addresses through > these functions, so I think it is worth to benchmark on x32 target. > >> ChangeLog: >> 2012-08-08 Yuri Rumyantsev Yuri.S.Rumyantsev@intel.com >> >> * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add additional argument. >> * config/i386/i386.md (ix86_splitt_lea_for_addr) : Add additional >> argument curr_insn. >> * config/i386/i386.c (find_nearest_reg-def): New function. Find >> nearest register definition used in address. > > (find_nearest_reg_def) > >> (ix86_split_lea_for_addr) : Do more aggressive lea splitting and >> instructions reodering to get opportunities for better scheduling. > > Please merge entries for ix86_split_lea_for_addr. > > @@ -16954,7 +16954,7 @@ ix86_lea_outperforms (rtx insn, unsigned int > regno0, unsigned int regno1, > /* If there is no use in memory addess then we just check > that split cost exceeds AGU stall. */ > if (dist_use < 0) > - return dist_define >= LEA_MAX_STALL; > + return dist_define > LEA_MAX_STALL; > > You didn't described this change in ChangeLog. Does this affect also > affect benchmark speedup? > > +/* Return 0 if regno1 def is nearest to insn and 1 otherwise. */ > > Watch comment formatting and vertical spaces! > > +static int > +find_nearest_reg_def (rtx insn, int regno1, int regno2) > > IMO, you can return 0, 1 and 2; with 0 when no definitions are found > in the BB, 1 and 2 when either of two regnos are found. Otherwise, > please use bool for function type. > > + if (insn_defines_reg (regno1, regno1, prev)) > + return 0; > + else if (insn_defines_reg (regno2, regno2, prev)) > > Please use INVALID_REGNUM as the second argument in the call to > insn_defines_reg when looking for only one regno definition. > > { > - emit_insn (gen_rtx_SET (VOIDmode, target, parts.base)); > - tmp = parts.index; > + rtx tmp1; > + /* Try to give more opportunities to scheduler - > + choose operand for move instruction with longer > + distance from its definition to insn. */ > > (Hm, I don't think you mean gcc insn scheduler here.) > > + if (find_nearest_reg_def (insn, regno1, regno2) == 0) > + { > + tmp = parts.index; /* choose index for move. */ > + tmp1 = parts.base; > + } > + else > + { > + tmp = parts.base; > + tmp1 = parts.index; > + } > + emit_insn (gen_rtx_SET (VOIDmode, target, tmp)); > + if (parts.disp && parts.disp != const0_rtx) > + ix86_emit_binop (PLUS, mode, target, parts.disp); > + ix86_emit_binop (PLUS, mode, target, tmp1); > + return; > } > > (Please use tabs instead of spaces in the added code.) > > However, this whole new part can be written simply as following (untested): > > { > rtx tmp1; > > if (find_nearest_reg_def (insn, regno1, regno2) == 0) > tmp1 = parts.base, tmp = parts.index; > else > tmp1 = parts.index, tmp = parts.base; > > emit_insn (gen_rtx_SET (VOIDmode, target, tmp1); > } > > Please see how tmp is handled further down in the function. > > - ix86_emit_binop (PLUS, mode, target, tmp); > + ix86_emit_binop (PLUS, mode, target, tmp); > > Please watch accidental whitespace changes. > > Uros.
Attachment:
lea_split_improve.diff
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |