This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [AArch64] A question about Cortex-A57 pipeline description


Indeed, we observed some problems with scheduling which we believe has more
to do with the scheduling algorithm than with the model DFA, as we said in
https://gcc.gnu.org/ml/gcc/2015-09/msg00118.html

Cheers,

-- 
Evandro Menezes                              Austin, TX

> -----Original Message-----
> From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of
> Nikolai Bozhenov
> Sent: Monday, September 14, 2015 2:28
> To: James Greenhalgh
> Cc: gcc@gcc.gnu.org
> Subject: Re: [AArch64] A question about Cortex-A57 pipeline description
> 
> Thanks for the reply! I see you point. Indeed, I've also seen cases where
the
> load pipeline was overused at the beginning of a basic block, whereas at
the
> end the code got stuck with a bunch of stores and no other instructions to
> run in parallel. And indeed, relaxing the restrictions makes things even
> worse in some cases. Anyway, I don't believe it's the best we can do, I'm
> going to have a closer look at the scheduler and see what can be done to
> improve the situation.
> 
> Nikolai
> 
> 
> On 09/11/2015 07:21 PM, James Greenhalgh wrote:
> > On Fri, Sep 11, 2015 at 04:31:37PM +0100, Nikolai Bozhenov wrote:
> >> Hi!
> >>
> >> Recently I got somewhat confused by Cortex-A57 pipeline description
> >> in GCC and I would be grateful if you could help me understand a few
> >> unclear points.
> > Sure,
> >
> >> Particularly I am interested in how memory operations (loads/stores)
> >> are scheduled. It seems that according to the cortex-a57.md file,
> >> firstly, two memory operations may never be scheduled at the same
> >> cycle and, secondly, two loads may never be scheduled at two
consecutive
> cycles:
> >>
> >>       ;; 5.  Two pipelines for load and store operations: LS1, LS2. The
> most
> >>       ;;     valuable thing we can do is force a structural hazard to
> split
> >>       ;;     up loads/stores.
> >>
> >>       (define_cpu_unit "ca57_ls_issue" "cortex_a57")
> >>       (define_cpu_unit "ca57_ldr, ca57_str" "cortex_a57")
> >>       (define_reservation "ca57_load_model" "ca57_ls_issue,ca57_ldr*2")
> >>       (define_reservation "ca57_store_model"
> >> "ca57_ls_issue,ca57_str")
> >>
> >> However, the Cortex-A57 Software Optimization Guide states that the
> >> core is able to execute one load operation and one store operation
> >> every cycle. And that agrees with my experiments. Indeed, a loop
> >> consisting of 10 loads, 10 stores and several arithmetic operations
> >> takes on average about 10 cycles per iteration, provided that the
> instructions are intermixed properly.
> >>
> >> So, what is the purpose of additional restrictions imposed on the
> >> scheduler in cortex-a57.md file? It doesn't look like an error.
> >> Rather, it looks like a deliberate decision.
> > When designing the model for the Cortex-A57 processor, I was primarily
> > trying to build a model which would increase the blend of utilized
> > pipelines on each cycle across a range of benchmarks, rather than to
> > accurately reflect the constraints listed in the Cortex-A57 Software
> > Optimisation Guide [1].
> >
> > My reasoning here is that the Cortex-A57 is a high-performance
> > processor, and an accurate model would be infeasible to build. Because
> > of this, it is unlikely that the model in GCC will be representative
> > of the true state of the processor, and consequently GCC may make
> > decisions which result in an instruction stream which would bias
> > towards one execution pipeline. In particular, given a less
> > restrictive model, GCC will try to hoist more loads to be earlier in
> > the basic block, which can result in less good utilization of the other
> execution pipelines.
> >
> > In my experiments, I found this model to be more beneficial across a
> > range of benchmarks than a model with the additional restrictions I
imposed
> relaxed.
> > I'd be happy to consider counter-examples where this modeling produces
> > suboptimal results - and where the changes you suggest are sufficient
> > to resolve the issue.
> >
> > Thanks,
> > James
> >
> > ---
> > [1]: Cortex-A57 Software Optimisation Guide
> >
> >
>
http://infocenter.arm.com/help/topic/com.arm.doc.uan0015a/cortex_a57_softwar
e
> _optimisation_guide_external.pdf
> >


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]