This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [aarch64}: added variable issue rate feature for falkor


Hello Kyrill,

thanks for your comments.

2018-08-14 16:50 GMT+02:00 Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>:
> Hi Kai,
>
>
> On 13/08/18 17:48, Kai Tietz wrote:
>>
>> I repost updated patch containing ChangeLog entry.
>>
>> Regards,
>> Kai
>
>
> I think I understand what this patch does, please correct me if I'm wrong.
> You model the processors micro-ops and some A64 instructions use multiple
> micro-ops.
> This is what the falkor_variable_issue attribute specifies.
> In TARGET_SCHED_VARIABLE_ISSUE you count the issue slots that the micro-ops
> take and how much "space" is left over, which is stored in leftover_uops
> and you use leftover_uops in TARGET_SCHED_REORDER to tell the scheduler how
> many more micro-ops it can issue on that cycle.

Correct.

> And with that change the issue_rate is no longer the *instruction* issue
> rate, but rather the *micro-op* issue rate.
> Overall this looks very similar to the implementation of this functionality
> in the powerpc port (rs6000).
> Is this correct?

Yes, it is somewhat similar to the rs6000 variant.

> I have a few comments on the implementation inline...
>
>         Jim Wilson<jim.wilson@linaro.org>
>         Kai Tietz<kai.tietz@linaro.org>
>
>         * config/aarch64.c (aarch64_sched_reorder): Implement
>         TARGET_SCHED_REORDER hook.
>         (aarch64_variable_issue): Implement TARGET_SCHED_VARIABLE_ISSUE
>         hook.
>         (TARGET_SCHED_REORDER): Define.
>         (TARGET_SCHED_VARIABLE_ISSUE): Likewise.
>         * config/aarch64/falkor.md (falkor_variable_issue): New.
>
> Index: aarch64/aarch64.c
> ===================================================================
> --- aarch64.orig/aarch64.c
> +++ aarch64/aarch64.c
> @@ -914,7 +914,7 @@ static const struct tune_params qdf24xx_
>    &generic_branch_cost,
>    &generic_approx_modes,
>    4, /* memmov_cost  */
> -  4, /* issue_rate  */
> +  8, /* issue_rate  */
>    (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
>     | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
>    "16",        /* function_align.  */
> @@ -17551,6 +17551,105 @@ aarch64_run_selftests (void)
>   #endif /* #if CHECKING_P */
>  +/* The number of micro ops left over after issuing the last instruction in
> a
> +   cycle.  This is subtracted from the next cycle before we start issuing
> insns.
> +   This is initialized to 0 at the start of every basic block.  */
> +static int leftover_uops = 0;
> +
>
> I believe the scheduler provides hooks specifically for storing
> backend-specific scheduling state so we should
> avoid creating such static variables in aarch64.c. Can you use the
> TARGET_SCHED_*_SCHED_CONTEXT family of hooks here?
> Then it will be up to the scheduler midend to keep track of the state and
> between basic blocks, functions etc.

I think you refer to the ppc implementation. But if you take a closer
look to it, you will see that nevertheless such an implementation will
require global variables.
So I am not really sure it is worth to introduce the ..._SCHED_CONTEXT
API to avoid one global variable by introducing at least two others.
Neverthelss I am admit that making use of SCHED_CONTEXT could be a
general nice to have, but not necessarily a gain in that case.

>  +/* Implement TARGET_SCHED_REORDER.  */
> +
> +static int
> +aarch64_sched_reorder (FILE *file, int verbose,
> +                      rtx_insn **ready ATTRIBUTE_UNUSED,
> +                      int *n_readyp ATTRIBUTE_UNUSED,
> +                      int clock)
> +{
> +  int can_issue_more = aarch64_sched_issue_rate ();
> +
> +  if ((enum attr_tune) aarch64_tune == TUNE_FALKOR)
> +    {
> +      /* The start of a basic block.  */
> +      if (clock == 0)
> +       {
> +         if (leftover_uops && file && (verbose > 3))
> +           fprintf (file, ";;\tLeftover uops ignored at bb start.\n");
> +
> +         leftover_uops = 0;
> +       }
> +
> +      /* Account for issue slots left over from previous cycle.  This value
> +        can be larger than the number of issue slots per cycle, so we need
> +        to check it here before scheduling any instructions.  */
> +      else if (leftover_uops)
> +       {
> +         can_issue_more -= leftover_uops;
> +
> +         if (file && (verbose > 3))
> +           {
> +             fprintf (file, ";;\tUse %d issue slots for leftover uops.\n",
> +                      leftover_uops);
> +             fprintf (file, ";;\t%d issue slots left.\n", can_issue_more);
> +           }
> +
> +         leftover_uops = 0;
> +
> +         if (can_issue_more < 0)
> +           {
> +             leftover_uops = 0 - can_issue_more;
> +             can_issue_more = 0;
> +
> +             if (file && (verbose > 3))
> +               {
> +                 fprintf (file, ";;skipping issue cycle.\n");
> +                 fprintf (file, ";;\t%d uops left over.\n", leftover_uops);
> +               }
> +           }
> +       }
> +    }
> +
> +  return can_issue_more;
> +}
> +
> +/* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
> +
>
> A comment here like you have for TARGET_SCHED_REORDER describing what this
> function accomplishes would be very helpful.

Ok. As it is a simple implementation of a well described hook, it
didn't seemed to me that it would need more comment. but I am open for
improving it.

>  +static int
> +aarch64_variable_issue (FILE *file, int verbose,
> +                       rtx_insn *insn, int more)
> +{
> +  if (GET_CODE (PATTERN (insn)) != USE
> +      && GET_CODE (PATTERN (insn)) != CLOBBER)
> +    {
> +      if ((enum attr_tune) aarch64_tune != TUNE_FALKOR)
> +       more -= 1;
> +      else
> +       {
> +         int issue_slots = get_attr_falkor_variable_issue (insn);
> +         more -= issue_slots;
> +
>
>
> We generally try to avoid having explicit CPU-specific checks like this in
> the aarch64 backend.
> Instead we try to keep all the CPU-specific tuning information in the CPU
> tuning structures.
>
> This particular infrastructure looks like it could be used for other CPUs in
> the future. In order for that to happen we don't want
> to have a check of aarch64_tune, but rather a tuning flag defined in
> aarch64-tuning-flags.def that tells us whether we're scheduling
> considering micro-ops or not. It can be on by default in the falkor tuning
> struct.

Ok, this is fair. I will add such a tuning-flags.def indicating
scheduling for micro-ops/or not.

> Then, we'd want the falkor_variable_issue attribute to be a generic
> attribute that specifies the number of micro-ops per-instruction and
> per-cpu.
Ok, so I will rename hook to a generic name instead.

> Unfortunately I'm don't see how this could be done in the current RTL
> attribute infrastructure without creating a big unwieldy attribute that will
> be hard to keep
> up to date :( Ideally we'd want something that could be specified in each
> CPU's .md file. Any ideas?

Sadly no. I have thought about this too. The approach via such a 'big'
attribute looks long-term hard to handle. As you mentioned the
maintaining for such a beast will be gross.
Therefore I descided for now to address it in an cpu-flavor specific way.

> Thanks,
> Kyrill
>

Thanks,
Kai


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]