This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Add scheduling and cost models for Exynos M1
- From: Andrew Pinski <pinskia at gmail dot com>
- To: James Greenhalgh <james dot greenhalgh at arm dot com>
- Cc: Evandro Menezes <e dot menezes at samsung dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- Date: Wed, 28 Oct 2015 18:40:51 +0800
- Subject: Re: [AArch64] Add scheduling and cost models for Exynos M1
- Authentication-results: sourceware.org; auth=none
- References: <001b01d1110d$0008f890$001ae9b0$ at samsung dot com> <20151028103602 dot GA33325 at arm dot com>
On Wed, Oct 28, 2015 at 6:36 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote:
>> This patch adds the scheduling and cost models for Exynos M1.
>>
>> Though it?s a rather large patch, much of it is the DFA model for the
>> pipeline.? Still, I?d appreciate any feedback.
>>
>> Please, commit if it?s alright.
>
> Hi Evandro,
>
> Thanks for the patch, I have some comments.
>
> To ease review, could I ask you to turn this in to a patch series? Roughly
> structured as so:
>
> 1/4: Add the Exynos-M1 cost models.
> 2/4: Add the Exynos M1 scheduling model.
> 3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD.
> 4/4: Add the extra tuning heuristics.
>
> Your support is missing a critical hunk for AArch64, there should be an
>
> (include "../arm/exynos-m1.md")
>
> in aarch64.md to get this working.
>
> This is a fairly large pipeline description (add (automata_option "stats")
> to the .md file):
>
> Automaton `exynos_m1'
> 62320 NDFA states, 489094 NDFA arcs
>
> From experience, you get little benefit from such a complex model, but you
> do slow bootstrap times. It isn't for me to say where the model can be
> trimmed (I don't have access to documentation for the Exynos-M1), but
> you may find it useful to split out the SIMD/FP automaton, and look at whether
> your modelling of long latency instructions is entirely neccesary. Have a
> look at the Cortex-A57 and Cortex-A53 for some examples of what I mean.
>
> For comparison, here are the stats for Cortex-A53 and Cortex-A57:
>
> Automaton `cortex_a53'
> 281 NDFA states, 1158 NDFA arcs
> Automaton `cortex_a53_advsimd'
> 9072 NDFA states, 49572 NDFA arcs
> Automaton `cortex_a57'
> 764 NDFA states, 3600 NDFA arcs
> Automaton `cortex_a57_cx'
> 204 NDFA states, 864 NDFA arcs
>
>> @@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options *opts)
>> opts->x_param_values,
>> global_options_set.x_param_values);
>>
>> + /* Adjust the heuristics for Exynos M1. */
>> + if (selected_cpu->sched_core == exynosm1)
>
> I think it would be preferable to pull these tuning parameters in to
> the target structures somehow, rather than guarding them off by specific
> CPUs.
>
>> + {
>> + /* Increase the maximum peeling limit. */
>> + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
>> + 400,
>> + opts->x_param_values,
>> + global_options_set.x_param_values);
>> +
>> + /* Set the L1 cache line size. */
>> + maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
>> + 64,
>> + opts->x_param_values,
>> + global_options_set.x_param_values);
>> + }
>> +
>> aarch64_override_options_after_change_1 (opts);
>> }
I have a patch for the L1 cache size (prefetch) infastructure which
sets it via tunning parameters but I have not had time to submit it
yet.
Also Peeling parameter changes helps ThunderX too.
Thanks,
Andrew
>>
>
>
>> @@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t)
>> return float_type_node;
>> return NULL_TREE;
>> }
>> +
>> +/* Implement TARGET_CASE_VALUES_THRESHOLD. */
>> +
>> +static unsigned int
>> +aarch64_case_values_threshold (void)
>> +{
>> + /* For Exynos M1, raise the bar for using jump tables. */
>> + if (selected_cpu->sched_core == exynosm1
>> + && optimize > 2)
>> + return 48;
>
> Likewise, I think this should end up in the per-core tuning structures
> rather than masked off by selected_cpu->sched_core == exynosm1.
>
>> + else
>> + return default_case_values_threshold ();
>> +}
>> +
>> #undef TARGET_ADDRESS_COST
>> #define TARGET_ADDRESS_COST aarch64_address_cost
>>
>
> Thanks,
> James
>