LEA-splitting improvement patch.
Yuri Rumyantsev
ysrumyan@gmail.com
Tue Aug 14 13:35:00 GMT 2012
Uros,
Let me try to explain you why I used such code duplication:
Here we have a common case of LEA with 3 different registers - r0
(target), r1(base), r2(index) and possible offset.
To get the better scheduling we first try to determine what register
is prefirable for inititial setting - r1 or r2 through
find_nearest_reg_def. And then we generate the following sequence of
instructions:
r0 = r_best;
r0 = $const, r0
r0 = r_worse, r0
that can save 2 cycles for Atom since first 2 instructions can be hoisted up.
I could not find better way for coding it.
Below is modified ChangeLog.
2012-08-14 Yuri Rumyantsev <ysrumyan@gmail.com>
* config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add
additional argument.
* config/i386/i386.md (ix86_split_lea_for_addr) : Add
additional argument curr_insn.
* config/i386/i386.c (ix86_split_lea_for_addr): Do instructions
reodering to get opportunities for better scheduling.
(ix86_lea_outperforms): Prefer LEA if only split cost exceeds
AGU stall.
(find_nearest_reg-def): New function. Find nearest register
definition used in address.
2012/8/14 Uros Bizjak <ubizjak@gmail.com>:
> On Tue, Aug 14, 2012 at 2:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>
>> Thanks a lot forr your comments.
>>
>> I prepared new patch and ChangeLog. Testing of x32 is in progress.
>>
>> It it OK for trunk?
>>
>> 2012-08-14 Yuri Rumyantsev <ysrumyan@gmail.com>
>>
>> * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add
>> additional argument.
>> * config/i386/i386.md (ix86_split_lea_for_addr) : Add
>> additional argument curr_insn.
>> * config/i386/i386.c (ix86_split_lea_for_addr): Do instructions
>> reodering to get opportunities for better scheduling.
>> (ix86_lea_outperforms): Do more aggressive lea splitting.
>
> You are not doing splitting in ix86_lea_outperforms.
>
>> (find_nearest_reg-def): New function. Find nearest register
>> definition used in address.
>
> Just say:
>
> (find_nearest_reg_def): New function.
>
> + emit_insn (gen_rtx_SET (VOIDmode, target, tmp));
> + if (parts.disp && parts.disp != const0_rtx)
> + ix86_emit_binop (PLUS, mode, target, parts.disp);
> + ix86_emit_binop (PLUS, mode, target, tmp1);
> + return;
>
> Can you explain, why you have to duplicate this code? Here you
> generate the same sequence as in the code below. Use tmp and tmp1 in
> the way that it will fit existing code.
>
> Uros.
More information about the Gcc-patches
mailing list