This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
Hi Uros,
Thanks a lot forr your comments.
I prepared new patch and ChangeLog. Testing of x32 is in progress.
It it OK for trunk?
2012-08-14 Yuri Rumyantsev <ysrumyan@gmail.com>
* config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add
additional argument.
* config/i386/i386.md (ix86_split_lea_for_addr) : Add
additional argument curr_insn.
* config/i386/i386.c (ix86_split_lea_for_addr): Do instructions
reodering to get opportunities for better scheduling.
(ix86_lea_outperforms): Do more aggressive lea splitting.
(find_nearest_reg-def): New function. Find nearest register
definition used in address.
2012/8/14 Uros Bizjak <ubizjak@gmail.com>:
> Hello!
>
>> It is known that LEA splitting is one of the most critical problems
>> for Atom processors and changes try to improve it through:
>> 1. More aggressive Lea splitting – do not perform splitting if
>> only split cost exceeds AGU stall .
>> 2. Reordering splitting instructions to get better scheduling –
>> use the farthest defined register for SET instruction, then add
>> constant offset if any and finally generate add instruction.This gives
>> +0.5% speedup in geomean for eembc2.0 suite on Atom.
>> All required testing was done – bootstraps for Atom & Core2, make check.
>> Note that this fix affects only on Atom processors.
>
> IMO, you should test LEA handling changes on x32 atom, too. With
> recent changes, you will get lots of zero_extended addresses through
> these functions, so I think it is worth to benchmark on x32 target.
>
>> ChangeLog:
>> 2012-08-08 Yuri Rumyantsev Yuri.S.Rumyantsev@intel.com
>>
>> * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add additional argument.
>> * config/i386/i386.md (ix86_splitt_lea_for_addr) : Add additional
>> argument curr_insn.
>> * config/i386/i386.c (find_nearest_reg-def): New function. Find
>> nearest register definition used in address.
>
> (find_nearest_reg_def)
>
>> (ix86_split_lea_for_addr) : Do more aggressive lea splitting and
>> instructions reodering to get opportunities for better scheduling.
>
> Please merge entries for ix86_split_lea_for_addr.
>
> @@ -16954,7 +16954,7 @@ ix86_lea_outperforms (rtx insn, unsigned int
> regno0, unsigned int regno1,
> /* If there is no use in memory addess then we just check
> that split cost exceeds AGU stall. */
> if (dist_use < 0)
> - return dist_define >= LEA_MAX_STALL;
> + return dist_define > LEA_MAX_STALL;
>
> You didn't described this change in ChangeLog. Does this affect also
> affect benchmark speedup?
>
> +/* Return 0 if regno1 def is nearest to insn and 1 otherwise. */
>
> Watch comment formatting and vertical spaces!
>
> +static int
> +find_nearest_reg_def (rtx insn, int regno1, int regno2)
>
> IMO, you can return 0, 1 and 2; with 0 when no definitions are found
> in the BB, 1 and 2 when either of two regnos are found. Otherwise,
> please use bool for function type.
>
> + if (insn_defines_reg (regno1, regno1, prev))
> + return 0;
> + else if (insn_defines_reg (regno2, regno2, prev))
>
> Please use INVALID_REGNUM as the second argument in the call to
> insn_defines_reg when looking for only one regno definition.
>
> {
> - emit_insn (gen_rtx_SET (VOIDmode, target, parts.base));
> - tmp = parts.index;
> + rtx tmp1;
> + /* Try to give more opportunities to scheduler -
> + choose operand for move instruction with longer
> + distance from its definition to insn. */
>
> (Hm, I don't think you mean gcc insn scheduler here.)
>
> + if (find_nearest_reg_def (insn, regno1, regno2) == 0)
> + {
> + tmp = parts.index; /* choose index for move. */
> + tmp1 = parts.base;
> + }
> + else
> + {
> + tmp = parts.base;
> + tmp1 = parts.index;
> + }
> + emit_insn (gen_rtx_SET (VOIDmode, target, tmp));
> + if (parts.disp && parts.disp != const0_rtx)
> + ix86_emit_binop (PLUS, mode, target, parts.disp);
> + ix86_emit_binop (PLUS, mode, target, tmp1);
> + return;
> }
>
> (Please use tabs instead of spaces in the added code.)
>
> However, this whole new part can be written simply as following (untested):
>
> {
> rtx tmp1;
>
> if (find_nearest_reg_def (insn, regno1, regno2) == 0)
> tmp1 = parts.base, tmp = parts.index;
> else
> tmp1 = parts.index, tmp = parts.base;
>
> emit_insn (gen_rtx_SET (VOIDmode, target, tmp1);
> }
>
> Please see how tmp is handled further down in the function.
>
> - ix86_emit_binop (PLUS, mode, target, tmp);
> + ix86_emit_binop (PLUS, mode, target, tmp);
>
> Please watch accidental whitespace changes.
>
> Uros.
Attachment:
lea_split_improve.diff
Description: Binary data
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |