[PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Kyrill Tkachov
kyrylo.tkachov@arm.com
Fri Jan 9 11:32:00 GMT 2015
Ping.
Thanks,
Kyrill
On 18/12/14 15:55, Kyrill Tkachov wrote:
> Ping.
>
> Thanks,
> Kyrill
>
> On 11/12/14 15:06, Kyrill Tkachov wrote:
>> Ping.
>> https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html
>>
>> Thanks,
>> Kyrill
>>
>> On 04/12/14 09:19, Kyrill Tkachov wrote:
>>> On 02/12/14 22:58, Ramana Radhakrishnan wrote:
>>>> On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> This is the arm implementation of the macro fusion hook.
>>>>> It tries to fuse movw+movt operations together. It also tries to take lo_sum
>>>>> RTXs into account since those generate movt instructions as well.
>>>>>
>>>>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>>>>
>>>>> Ok for trunk?
>>>>> if (current_tune->fuseable_ops & ARM_FUSE_MOVW_MOVT)
>>>>> + {
>>>>> + /* We are trying to fuse
>>>>> + movw imm / movt imm
>>>>> + instructions as a group that gets scheduled together. */
>>>>> +
>>>> A comment here about the insn structure would be useful.
>>> Done. It's similar to the aarch64 adrp+add case. It does make it easier
>>> to read, thanks.
>>>
>>> 2014-12-04 Kyrylo Tkachov kyrylo.tkachov@arm.com\
>>>
>>> * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
>>> * config/arm/arm.c (arm_macro_fusion_p): New function.
>>> (arm_macro_fusion_pair_p): Likewise.
>>> (TARGET_SCHED_MACRO_FUSION_P): Define.
>>> (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
>>> (ARM_FUSE_NOTHING): Likewise.
>>> (ARM_FUSE_MOVW_MOVT): Likewise.
>>> (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
>>> arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
>>> arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
>>> arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
>>> arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
>>> arm_cortex_a5_tune): Specify fuseable_ops value.
>>>
>>>>> + set_dest = SET_DEST (curr_set);
>>>>> + if (GET_CODE (set_dest) == ZERO_EXTRACT)
>>>>> + {
>>>>> + if (CONST_INT_P (SET_SRC (curr_set))
>>>>> + && CONST_INT_P (SET_SRC (prev_set))
>>>>> + && REG_P (XEXP (set_dest, 0))
>>>>> + && REG_P (SET_DEST (prev_set))
>>>>> + && REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
>>>>> + return true;
>>>>> + }
>>>>> + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
>>>>> + && REG_P (SET_DEST (curr_set))
>>>>> + && REG_P (SET_DEST (prev_set))
>>>>> + && GET_CODE (SET_SRC (prev_set)) == HIGH
>>>>> + && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
>>>>> + {
>>>>> + return true;
>>>>> + }
>>>> Can we add a fast path exit to be
>>>>
>>>> if (GET_MODE (set_dest) != SImode)
>>>> return false;
>>> Done, but if/when we extend the function to handle more fusion cases it
>>> will need to be
>>> refactored, since we will want to just bail out of this MOVW+MOVT case
>>> rather than the whole function.
>>>
>>>> I did think whether we wanted to use reg_overlap_mentioned_p as that
>>>> may simplify the logic a bit but that's overkill here as we still
>>>> want to restrict it to the cases above.
>>>>
>>>> Otherwise OK.
>>> Here's the updated patch. I've tested on arm-none-eabi and made sure
>>> that the
>>> fusion still happens on the benchmarks I looked at.
>>> Ok?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>> Ramana
>>>>
>>>>
>>>>
>>>>
>>>>> + }
>>>>> + return false;
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>> 2014-11-11 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>>>
>>>>> * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
>>>>> * config/arm/arm.c (arm_macro_fusion_p): New function.
>>>>> (arm_macro_fusion_pair_p): Likewise.
>>>>> (TARGET_SCHED_MACRO_FUSION_P): Define.
>>>>> (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
>>>>> (ARM_FUSE_NOTHING): Likewise.
>>>>> (ARM_FUSE_MOVW_MOVT): Likewise.
>>>>> (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
>>>>> arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
>>>>> arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
>>>>> arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
>>>>> arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
>>>>> arm_cortex_a5_tune): Specify fuseable_ops value.
>>
>
>
More information about the Gcc-patches
mailing list