This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Atom: Scheduler improvements for better imul placement

From: Andrey Belevantsev <abel at ispras dot ru>
To: Igor Zamyatin <izamyatin at gmail dot com>
Cc: Richard Guenther <richard dot guenther at gmail dot com>, Alexander Monakov <amonakov at ispras dot ru>, Andi Kleen <andi at firstfloor dot org>, gcc-patches at gcc dot gnu dot org
Date: Fri, 13 Apr 2012 16:20:49 +0400
Subject: Re: [PATCH] Atom: Scheduler improvements for better imul placement
References: <CAKdSQZnr8R7BGCE+Fc0aTTZVsODXTr7vHn13PK8FENUxfOjqDA@mail.gmail.com> <m24nsq1j8e.fsf@firstfloor.org> <CAFiYyc1w-+brSp_qCm40V+iawbdjz5PT5=zC=D8LBVm38q8mxQ@mail.gmail.com> <CAKdSQZkfoWY9tZbBtKerv8fY57fw-jp-pz56PK+tEiVkdUwWuA@mail.gmail.com> <CAFiYyc3SdjwU5F+26chDnZgG32Xg8peffDPYBi0q=ao=Nwf0yQ@mail.gmail.com> <alpine.LNX.2.00.1204121556520.28340@monoid.intra.ispras.ru> <CAFiYyc0zUDB9T0042JZ1soiRQ_aqnfztb8CrJ=8idPaJq5=taw@mail.gmail.com> <CAKdSQZ=kgxAS6_a5nqZvXRPB9i34hDo9UZ6rxUpq80-fw0_F+w@mail.gmail.com> <CAFiYyc1oRbeJRg2y4pLFG_oyUKwUUSzu7oKXJixQ4j3-4yhjCw@mail.gmail.com> <4F86D235.70808@ispras.ru> <CAKdSQZkWD7TkLdz1-PVY_xF8ii2bWGUdn6604SqWbq-Nj3g1nA@mail.gmail.com>

On 13.04.2012 14:18, Igor Zamyatin wrote:

On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantsev<abel@ispras.ru> wrote:

On 12.04.2012 16:38, Richard Guenther wrote:


On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin<izamyatin@gmail.com>
  wrote:


On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
<richard.guenther@gmail.com>    wrote:


On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov<amonakov@ispras.ru>
  wrote:

Can atom execute two IMUL in parallel?  Or what exactly is the
pipeline
behavior?

As I understand from Intel's optimization reference manual, the
behavior is as
follows: if the instruction immediately following IMUL has shorter
latency,
execution is stalled for 4 cycles (which is IMUL's latency); otherwise,
a
4-or-more cycles latency instruction can be issued after IMUL without a
stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.

It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation "atom-imul-32"
                    "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
                     atom-port-0")

;;; imul instruction excludes other non-FP instructions.
(exclusion_set "atom-eu-0, atom-eu-1"
               "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4")

The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.

Why does that not happen without your patch?  Does it never happen without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?

It does not happen because the scheduler by itself does not do such specific
reordering.  That said, it is easy to imagine the cases where this patch
will make things worse rather than better.

Igor, why not try different subtler mechanisms like adjust_priority, which
is get called when an insn is added to the ready list?  E.g. increase the
producer's priority.

The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when
you have more than one imul in the ready list?  Don't you want to bump the
priority of the other imul found?

Could you provide some examples when this patch would harm the performance?

I thought of the cases when the other ready insns can fill up the hole and that would be more beneficial because e.g. they would be on more critical paths than the producer of your second imul. I don't know enough of Atom to give an example -- maybe some long divisions?


Sched_reorder was chosen since it is used in other ports and looks
most suitable for such case, e.g. it provides access to the whole
ready list.
BTW, just increasing producer's priority seems to be more risky in
performance sense - we can incorrectly start delaying some
instructions.

Yes, but exactly because of the above example you can start incorrectly delaying other insns, too, as you force the insn to be the first in the list. While bumping priority still leaves the scheduler sorting heuristics in place and actually lowers that risk.


Thought ready list doesn't contain DEBUG_INSN... Is it so? If it
contains them - this could be added easily

It does, but I'm not sure the sched_reorder hook gets them or they are immediately removed -- I saw similar checks in one of the targets' hooks.

Anyways, my main thought was that it is better to test on more benchmarks to alleviate the above concerns, so as long as the i386 maintainers are happy, I don't see major problems here. A good idea could be to generalize the patch to handle other long latency insns as second consumers, not only imuls, if this is relevant for Atom.

Andrey


The case with more than one imul in the ready list wasn't considered
just because the goal was to catch the particular case when there is a
risk to get the following picture: "imul-producer-imul" which is less
effective than "producer-imul-imul" for Atom

Andrey

And MD allows us only prefer scheduling of successive IMUL instructions,
i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.

at least from my very limited guessing of what the above does.  So, did
you
analyze why the scheduler does not properly schedule things for you?

Richard.


  From reading the patch, I could not understand the link between
pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander


Thanks,
Igor

Follow-Ups:
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Igor Zamyatin

References:
- [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Igor Zamyatin
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Andi Kleen
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Richard Guenther
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Igor Zamyatin
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Richard Guenther
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Alexander Monakov
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Richard Guenther
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Igor Zamyatin
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Richard Guenther
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Andrey Belevantsev
- Re: [PATCH] Atom: Scheduler improvements for better imul placement
  - From: Igor Zamyatin

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]