This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- From: Jeff Law <law at redhat dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>, Mike Stump <mikestump at comcast dot net>
- Cc: Bin Cheng <bin dot cheng at arm dot com>, gcc-patches List <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 30 Oct 2014 13:43:14 -0600
- Subject: Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass
- Authentication-results: sourceware.org; auth=none
- References: <000001cfdc90$1d95c670$58c15350$ at arm dot com> <54384C12 dot 6060401 at redhat dot com> <80EFD85E-49B5-4F71-9401-40F8FA85BD65 at comcast dot net> <CAHFci2__KJST3yCzz4sKge9pN37CZwUZ8BvFL4++SuCJAsAtxw at mail dot gmail dot com>
On 10/10/14 21:32, Bin.Cheng wrote:
Mike already gave great answers, here are just some of my thoughts on
the specific questions. See embedded below.
Thanks to both of you for your answers.
Fundamentally, what I see is this scheme requires us to be able to come
up with a key based solely on information in a particular insn. To get
fusion another insn has to have the same or a closely related key.
This implies that the the two candidates for fusion are related, even if
there isn't a data dependency between them. The canonical example would
be two loads with reg+d addressing modes. If they use the same base
register and the displacements differ by a word, then we don't have a
data dependency between the insns, but the insns are closely related by
their address computations and we can compute a key to ensure those two
related insns end up consecutive. At any given call to the hook, the
only context we can directly see is the current insn.
I'm pretty sure if I were to tweak the ARM bits ever-so-slightly it
could easily model the load-load or store-store special case on the
PA7xxx[LC] processors. Normally a pair of loads or stores can't dual
issue. But if the two loads (or two stores) hit the upper and lower
half of a double-word objects, then the instructions can dual issue.
I'd forgotten about that special case scheduling opportunity until I
started looking at some unrelated enhancement for prefetching.
Your code would also appear to allow significant cleanup of the old
caller-save code that had a fair amount of bookkeeping added to issue
double-word memory loads/stores rather than single word operations.
This *greatly* improved performance on the old sparc processors which
had no call-saved FP registers.
However, your new code doesn't handle fusing instructions which are
totally independent and of different static types. There just isn't a
good way to compute a key that I can see. And this is OK -- that case,
if we cared to improve it, would be best handled by the SCHED_REORDER hooks.
#2 would be the best solution for the case I was pondering, but I don't
think solving that case is terribly important given the processors for
which it was profitable haven't been made for a very long time.
I guess another way to ask the question, are fusion priorities static based on the insn/alternative, or can they vary? And if they can vary, can they vary each tick of the scheduler?
Though this pass works on predefined fusion types and priorities now,
there might be two possible fixes for this specific problem.
1) Introduce another exclusive_pri, now it's like "fusion_pri,
priority, exclusive_pri". The first one is assigned to mark
instructions belonging to same fusion type. The second is assigned to
fusion each pair/consecutive instructions together. The last one is
assigned to prevent specific pair of instructions from being fused,
just like "BC" mentioned.
2) Extend the idea by using hook function
TARGET_SCHED_REORDER/TARGET_SCHED_REORDER2. Now we can assign
fusion_pri at the first place, making sure instructions in same fusion
type will be adjacent to each other, then we can change priority (thus
reorder the ready list) at back-end's wish even per each tick of the