[PATCH/AARCH64] Add scheduler for Thunderx2t99

James Greenhalgh james.greenhalgh@arm.com
Fri Jan 13 17:26:00 GMT 2017


On Thu, Jan 12, 2017 at 03:43:52AM +0000, Hurugalawadi, Naveen wrote:
> Hi James,
> 
> The scheduling patch for vulcan was posted at the following link:-
> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01205.html
> 
> We are working on the patch and addressed the comments for thunderx2t99.

Great, thanks.

> 
> >> I tried lowering the repeat expressions as so:
> Done.
> 
> >>split off the AdvSIMD/FP model from the main pipeline
> Done.
> 
> >> A change like wiring the vulcan_f0 and vulcan_f1 reservations
> >> to be cpu_units of a new define_automaton "vulcan_advsimd"
> Done.

Perfect, the automaton size is much more palatable now.

  Automaton `thunderx2t99'
      184 NDFA states,            838 NDFA arcs
      184 DFA states,             838 DFA arcs
      184 minimal DFA states,     838 minimal DFA arcs
      370 all insns          8 insn equivalence classes
    0 locked states
   1016 transition comb vector els,  1472 trans table els: use simple vect
   1472 min delay table els, compression factor 4

  Automaton `thunderx2t99_advsimd'
    12231 NDFA states,          85833 NDFA arcs
    12231 DFA states,           85833 DFA arcs
     9246 minimal DFA states,   66554 minimal DFA arcs
      370 all insns         11 insn equivalence classes
    0 locked states
  84074 transition comb vector els, 101706 trans table els: use simple vect
  101706 min delay table els, compression factor 2

  Automaton `thunderx2t99_ldst'
       49 NDFA states,            193 NDFA arcs
       49 DFA states,             193 DFA arcs
       16 minimal DFA states,      94 minimal DFA arcs
      370 all insns         13 insn equivalence classes
    0 locked states
     91 transition comb vector els,   208 trans table els: use simple vect
    208 min delay table els, compression factor 2

  Automaton `thunderx2t99_mult'
        2 NDFA states,              5 NDFA arcs
        2 DFA states,               5 DFA arcs
        2 minimal DFA states,       5 minimal DFA arcs
      370 all insns          3 insn equivalence classes
    0 locked states
    6 transition comb vector els,     6 trans table els: use simple vect
    6 min delay table els, compression factor 8

> 
> >> simplifying some of the remaining large expressions
> >> (vulcan_asimd_load*_mult, vulcan_asimd_load*_elts) can bring the size down
> Did not understand much about this comment.
> Can you please let me know about the simplification?

I wonder whether the current modeling of:

  (define_insn_reservation "thunderx2t99_asimd_load4_elts" 6

as:

  "thunderx2t99_ls_both*2,(thunderx2t99_ls0d1+thunderx2t99_ls1d1),\
   (thunderx2t99_ls0d2+thunderx2t99_ls1d2),\
   (thunderx2t99_ls0d3+thunderx2t99_ls1d3),thunderx2t99_f01")

Actually benefits the schedule in a meaningful way, or if it just increases
the size of the automaton. My guess is that given how many approximations
scheduling makes as to the actual working of the machine (e.g. always perfect
alignment, no cache misses, no other reason to restart loads) there is only
a very small benefit to accurately describing the flow of an instruction
through the pipelines in this way.

The size of the automaton is more reasonable now, so I won't insist on
changing it, but even with your changes the thunderx2t99 model is still
the largest automaton we're building.

> Please find attached the modified patch as per your suggestions and comments.
> Please review the patch and let us know if its okay?

I'd like for the size to come down again, or at least for you to comment on
whether the modelling you do for thunderx2t99_asimd_load*_mult and
thunderx2t99_asimd_load*_elts can be justified by an improved schedule.

> diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
> index a7a4b33..4d39673 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -75,7 +75,7 @@ AARCH64_CORE("xgene1",      xgene1,    xgene1,    8A,  AARCH64_FL_FOR_ARCH8, xge
>  
>  /* Broadcom ('B') cores. */
>  AARCH64_CORE("thunderx2t99",  thunderx2t99, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)

You'll want to update this to use your new scheduling model :-).

> -AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
> +AARCH64_CORE("vulcan",  vulcan, vulcan, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)

This line is incorrect, you should be changing vulcan to use the new
thunderx2t99 model. The tune attribute "vulcan" won't match your new model,
so with this change you'd get "generic" (i.e. no) scheduling for -mcpu=vulcan .

Otherwise, I don't have any concerns with this patch but it must be
retested as the new scheduling model can never be used with this patch
version.

Thanks,
James



More information about the Gcc-patches mailing list