This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [AArch64] Add scheduling and cost models for Exynos M1

From: Andrew Pinski <pinskia at gmail dot com>
To: James Greenhalgh <james dot greenhalgh at arm dot com>
Cc: Evandro Menezes <e dot menezes at samsung dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
Date: Wed, 28 Oct 2015 18:40:51 +0800
Subject: Re: [AArch64] Add scheduling and cost models for Exynos M1
Authentication-results: sourceware.org; auth=none
References: <001b01d1110d$0008f890$001ae9b0$ at samsung dot com> <20151028103602 dot GA33325 at arm dot com>

On Wed, Oct 28, 2015 at 6:36 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote:
>> This patch adds the scheduling and cost models for Exynos M1.
>>
>> Though it?s a rather large patch, much of it is the DFA model for the
>> pipeline.? Still, I?d appreciate any feedback.
>>
>> Please, commit if it?s alright.
>
> Hi Evandro,
>
> Thanks for the patch, I have some comments.
>
> To ease review, could I ask you to turn this in to a patch series? Roughly
> structured as so:
>
>   1/4: Add the Exynos-M1 cost models.
>   2/4: Add the Exynos M1 scheduling model.
>   3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD.
>   4/4: Add the extra tuning heuristics.
>
> Your support is missing a critical hunk for AArch64, there should be an
>
>   (include "../arm/exynos-m1.md")
>
> in aarch64.md to get this working.
>
> This is a fairly large pipeline description (add (automata_option "stats")
> to the .md file):
>
>  Automaton `exynos_m1'
>     62320 NDFA states,          489094 NDFA arcs
>
> From experience, you get little benefit from such a complex model, but you
> do slow bootstrap times. It isn't for me to say where the model can be
> trimmed (I don't have access to documentation for the Exynos-M1), but
> you may find it useful to split out the SIMD/FP automaton, and look at whether
> your modelling of long latency instructions is entirely neccesary. Have a
> look at the Cortex-A57 and Cortex-A53 for some examples of what I mean.
>
> For comparison, here are the stats for Cortex-A53 and Cortex-A57:
>
>  Automaton `cortex_a53'
>     281 NDFA states,           1158 NDFA arcs
>  Automaton `cortex_a53_advsimd'
>     9072 NDFA states,          49572 NDFA arcs
>  Automaton `cortex_a57'
>     764 NDFA states,           3600 NDFA arcs
>  Automaton `cortex_a57_cx'
>     204 NDFA states,            864 NDFA arcs
>
>> @@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options *opts)
>>                        opts->x_param_values,
>>                        global_options_set.x_param_values);
>>
>> +  /* Adjust the heuristics for Exynos M1.  */
>> +  if (selected_cpu->sched_core == exynosm1)
>
> I think it would be preferable to pull these tuning parameters in to
> the target structures somehow, rather than guarding them off by specific
> CPUs.
>
>> +    {
>> +      /* Increase the maximum peeling limit.  */
>> +      maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
>> +                             400,
>> +                             opts->x_param_values,
>> +                          global_options_set.x_param_values);
>> +
>> +      /* Set the L1 cache line size.  */
>> +      maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
>> +                             64,
>> +                             opts->x_param_values,
>> +                          global_options_set.x_param_values);
>> +    }
>> +
>>    aarch64_override_options_after_change_1 (opts);
>>  }

I have a patch for the L1 cache size (prefetch) infastructure which
sets it via tunning parameters but I have not had time to submit it
yet.
Also Peeling parameter changes helps ThunderX too.

Thanks,
Andrew

>>
>
>
>> @@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t)
>>      return float_type_node;
>>    return NULL_TREE;
>>  }
>> +
>> +/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
>> +
>> +static unsigned int
>> +aarch64_case_values_threshold (void)
>> +{
>> +  /* For Exynos M1, raise the bar for using jump tables.  */
>> +  if (selected_cpu->sched_core == exynosm1
>> +      && optimize > 2)
>> +    return 48;
>
> Likewise, I think this should end up in the per-core tuning structures
> rather than masked off by selected_cpu->sched_core == exynosm1.
>
>> +  else
>> +    return default_case_values_threshold ();
>> +}
>> +
>>  #undef TARGET_ADDRESS_COST
>>  #define TARGET_ADDRESS_COST aarch64_address_cost
>>
>
> Thanks,
> James
>

Follow-Ups:
- Re: [AArch64] Add scheduling and cost models for Exynos M1
  - From: Evandro Menezes

References:
- [AArch64] Add scheduling and cost models for Exynos M1
  - From: Evandro Menezes
- Re: [AArch64] Add scheduling and cost models for Exynos M1
  - From: James Greenhalgh

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]